分类目录归档：nginx

nginx 生成coredump的办法

网上很多说nginx生成coredump的办法在RHEL7 RHEL8都无法正确的生成core 文件, 这里给一下正确的步骤

#新建一个文件夹, 并确认nginx可以读写
$ mkdir /opt/itc/fssnginx/logs/cores/
$ sudo chown root:root /opt/itc/fssnginx/logs/cores/
$ sudo chmod 1777 /opt/itc/fssnginx/logs/cores/

#设置unlimited core file dump
$ ulimit -c unlimited
#也可以在系统中彻底修改
$vim /etc/security/limits.conf
* soft core unlimited

#设置系统级别的core file
$ echo "/opt/itc/fssnginx/logs/cores/core.%e.%p" | sudo tee /proc/sys/kernel/core_pattern

#允许suid dumpable
$ sudo sysctl -w fs.suid_dumpable=2

$ sysctl -p


#重启nginx
systemctl restart nginx

#编译nginx.conf,然后重启nginx 服务
working_directory  /opt/itc/fssnginx/logs/cores/;
worker_rlimit_core 500M;

测试,找到nginx workprocess的进程号,发送 SIGSEGV 给该work process进程

$ps aux|grep nginx
$kill -11 `$work_process_pid`
#检查core 文件
$ll /opt/itc/fssnginx/logs/cores/
-rw-------. 1 nobody nobody 65753088 Jun 22 17:54 core.nginx.5662
-rw-------. 1 nobody nobody 70475776 Jun 22 18:03 core.nginx.5702

释疑: 看起来跟内核变量有关

#旧版本RHEL6是这个设置:

kernel.core_pattern = core

#新版本OS 是这个: 会被通过管道发给这个可执行命令

kernel.core_pattern = |/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %h %e

参考文档:

https://docs.nginx.com/nginx/admin-guide/monitoring/debugging/

nginx resolver 和 /etc/resolv.conf 以及AAAA ipv6的关系

发表评论

Syntax:	`resolver address ... [valid=time] [ipv6=on\|off];`
Default:	—
Context:	`http`, `server`, `location`

http://nginx.org/en/docs/http/ngx_http_core_module.html#resolver

我们都知道nginx 有个resolver 能提供NS解析的功能,那么什么时候用系统自带配置,什么时候用resolver呢? 这里直接给结论:

proxy_pass 给一个域名, 用系统自带的resolv.conf

server {
    listen 80 ;
    server_name www.4os.org;
    resolver 114.114.114.114 ipv6=off;
    location / {
        proxy_pass http://www.qq.com;
    }
}

upstream server 里边跟域名, 用系统自带的resolv.conf

upstream backends {
    server www.qq.com;
}
server {
    listen 80 ;
    server_name www.sohu.com;
    resolver 114.114.114.114 ipv6=off;
    location / {
        proxy_pass http:// backends;
    }
}

proxy_pass 后边跟的是变量,比如你设置的$host, 用resolver

server {
    listen 80 ;
    server_name www.4os.org;
    resolver 114.114.114.114 ipv6=off;
    location / {
        set $ups “www.qq.com”;
        proxy_pass http://$ups;
    }
}

另外 nginx plus版本, upstream server 的域名如果带了resolver 参数,那么用resolver

resolver 10.0.0.2 valid=10s;

upstream backends {
    zone backends 64k;
    server backends.example.com:8080 resolve;
}

server {
    location / {
        proxy_pass http://backends;
    }
}

至于AAAA的IPV6结果,如果是用系统自带的解析器,那么nginx 会ipv4和ipv6一起解析, 如果你的nginx服务器没有V6地址,这会产生额外的一个upstream error,并next_upstream给V4地址

所以,如果要禁用V6解析, 你需要使用变量设置你的proxy_pass 回源,并明确定义resolver 的ipv6=off参数

以上结论基于nginx官方文档,并使用tcpdump监测dns解析获得

参考文档: https://www.nginx.com/blog/dns-service-discovery-nginx-plus/#domain-name-proxy_pass

nginx 与 TLS1.3

发表评论

TLS1.3支持了更优秀的SSL 新特性,可以有效降低https的协商时间,建议升级

本文使用了最新的nginx 1.14.1 (该版本修正了 1.14 H2 cpu/mem 攻击漏洞: low)

#wget https://www.openssl.org/source/openssl-1.1.1.tar.gz
#wget http://nginx.org/download/nginx-1.14.1.tar.gz

#spdy兼容补丁
#wget https://raw.githubusercontent.com/favortel/nginx_patch/master/nginx-1.14.0_spdy_h2.patch

#OKHTTP H2 头部动态压缩兼容补丁
#wget https://raw.githubusercontent.com/favortel/nginx_patch/master/fssnginx_1.14.0_dynamic_table_size.patch

#PCRE ZLIB等
#wget https://ftp.pcre.org/pub/pcre/pcre-8.42.tar.gz
#wget https://zlib.net/zlib-1.2.11.tar.gz

#tar -zxvf openssl-1.1.1.tar.gz
#tar -zxvf pcre-8.42.tar.gz
#tar -zxvf zlib-1.2.11.tar.gz
#tar -zxvf nginx-1.14.1.tar.gz

#cd nginx-1.14.1
#patch -p1 < ../fssnginx_1.14.0_dynamic_table_size.patch
#patch -p1 < ../nginx-1.14.0_spdy_h2.patch

./configure --prefix=/opt/itc/nginx --with-http_stub_status_module --with-http_realip_module --with-http_ssl_module --with-openssl=../openssl-1.1.1 --with-pcre=../pcre-8.42 --with-pcre-jit --with-zlib=../zlib-1.2.11 --with-http_v2_module --with-http_spdy_module
#make && make install

配置比较简单,就是ssl_protocols 增加TLSv1.3就好, ssl_ciphers 不用特意改动:

...
ssl_protocols               TLSv1 TLSv1.1 TLSv1.2 TLSv1.3;
ssl_ciphers                 ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES128-GCM-SHA256;
ssl_prefer_server_ciphers   on;
ssl_ecdh_curve              secp384r1;
...

比如本站,如果您是用chrome访问的是https协议,打开开发者工具-security,就能看到本站使用了TLS1.3了

proxy_cache behavier difference of nginx 1.6 and 1.7 above

发表评论

nginx 1.6 cache objects depend on proxy_cache_key,such as

nginx 1.6的缓存对象取决于proxy_cache_key的设置,比如

proxy_cache_key $uri$is_args$args;

nginx 1.7 cache objects depend on proxy_cache_key and VARY Header

nginx 1.7的缓存对象取决于proxy_cache_key 和服务器返回的VARY头的内容
Changes with nginx 1.7.7 28 Oct 2014

*) Change: now nginx takes into account the "Vary" header line in a backend response while caching.

If the upstream server reponsed this header: “Vary: Accept-Encoding”, cache object of the same url may save many copys depend on the different request headers: “Accept-Encoding”, in nginx 1.7 or above versions.

假如,你的upstream服务器返回了这样的header: “Vary: Accept-Encoding”,

相同一个url的缓存对象会基于请求header里边的”Accept-Encoding”的不同值保存为许多份不同的副本,这个特性起于nginx1.7 版本

In this case, purge cache may failed, cause you don’t how many cache objects with “Accept-Encoding” headers

在这种情况下,清理缓存通常会失败,因为你不知道有多少的缓存对象有着哪些对应的”Accept-Encoding” headers

You have 2 choices to fix this problem:

你有两种选择来对付这个问题:

1. Rewrite the reuqest header “Accept-Encoding” in the front proxy serve

在最前端的代理服务器重写请求的header: “Accept-Encoding”

set $compress "none"; if ($http_accept_encoding ~* "gzip|deflate|compress") {set $compress "gzip,deflate";} proxy_set_header Accept-Encoding $compress;

2. Ignore the response header “Vary” in the middle cache server

或者在中间层的cache服务器忽略掉服务器返回的header “Vary”
proxy_ignore_headers Vary;

在这里, “Vary: Accept-Encoding”, 只是一个非常常见的举例,实际上你的后端源站可能会返回特异的Vary Header.

nginx 1.14 与 okhttp H2不兼容的情况

发表评论

新上线 nginx 1.14的时候,发现许多使用okhttp的APP通讯异常了, 听包能发现服务器返回的很多小包被认为是非法数据忽略了

查了下nginx 官方资料,发现这是一个H2 动态压缩导致的不兼容情况, 需要使用补丁去绕开这个问题

问题的关键看起来是OKHTTP 库无法支持H2协议里边头部压缩的一个标准特性: Dynamic Table Size 更新,导致了客户端和服务端协商失败

测试证明nginx 1.13.6 引入了这个特性并会导致okhttp3.4.1版本(我测试的版本)H2通讯异常, 可以通过以下链接获取到这些信息,应用这个补丁可以解决这个问题

https://trac.nginx.org/nginx/changeset/fbb683496705f91db4dad32b3ec2ec4ed75115c0/nginx

https://trac.nginx.org/nginx/ticket/1397

另外,在1.15.3版本,修正了这个问题,但是1.14序列没有同步跟进

nginx的正则 *

发表评论

nginx 里边在server_name 或者 if的时候会用到一些简单的正则,实际上跟普通的正则会有些不同,简单举2个例子

1. server_name *.4os.org
它其实能匹配 test.www.4os.org, 它能匹配多级,这个需要注意下

2. 普通正则里边的 \s* 这里的* 表示匹配前面的子表达式零次或多次
而nginx 里边的* 一般都是表示存在的东西,这个是概念上的不同

nginx spdy patch for 1.14.0 1.13.12

发表评论

spdy 协议由于安卓碎片化的存在暂时还是需要保留一段时间的兼容性

准备升级到nginx1.14的时候发现 work process 会自动退出, 同时系统日志有 nginx segfault的信息

修改配置,抓取coredump信息,需要做以下内容
nginx 增加

worker_rlimit_core 5000M;
working_directory /path/to/cores/;
$> ulimit -c unlimited
$> mkdir /opt/coredump/ && chown nobody.nobody /opt/coredump/ # 先建目录，还要确认nginx用户可以写此目录
$> echo “/opt/coredump/core-%e-%p-%h-%t” > /proc/sys/kernel/core_pattern

拿到coredump文件后使用gdb分析

gdb /path/to/nginx /path/to/cores/nginx.core
backtrace full

发现问题指向了
src/http/ngx_http_spdy.c:ngx_http_spdy_state_read_data 的
buf->last = ngx_cpymem(buf->last, pos, size);

简单调试发现buf->last是个0, ngx_cpymem会因为内存越界导致coredump

而分析代码 + gdb 断点调试看到初始化r->request_body->buf的部分: ngx_http_spdy_init_request_body(r) 并未执行

打印r->request_body 内容发现这块被初始化了,对比nginx1.12.2和1.10.3版本发现旧版本则是未做初始化

翻了下调用的部分:ngx_http_request_body: ngx_http_read_client_request_body 可以看到在nginx 1.13.12版本开始会对r->request_body 做了初始化操作,这部分直接导致了SPDY 补丁的不兼容

所以答案就很简单了,修改下判断条件即可

新补丁放在了:https://github.com/favortel/nginx_patch/blob/master/nginx-1.14.0_spdy_h2.patch
参考文档:
https://toontong.github.io/blog/nginx-gdb-coredump-segfault.html
https://www.nginx.com/resources/wiki/start/topics/tutorials/debugging/
http://lxr.nginx.org/source/src/http/ngx_http_request_body.c
http://lxr.nginx.org/source/src/http/ngx_http_request_body.c?v=nginx-1.12.2

nginx add_header的一个陷阱

发表评论

Syntax:	`add_header name value [always];`
Default:	—
Context:	`http`, `server`, `location`, `if in location`

Adds the specified field to a response header provided that the response code equals 200, 201 (1.3.10), 204, 206, 301, 302, 303, 304, 307 (1.1.16, 1.0.13), or 308 (1.13.0). The value can contain variables.

There could be several add_header directives. These directives are inherited from the previous level if and only if there are no add_header directives defined on the current level.

add_header指令可以从上一级继承 当且仅当 当前级别没有add_header指令

If the always parameter is specified (1.7.5), the header field will be added regardless of the response code.

在全局配置加了个需要使用的header, 现在假如在某个location 或者 server 也需要add_header,则会覆盖掉全局配置的

http {… add_header http-global ‘123’;…}

server {… add_header server-conf ‘321’;…}

则这个server 是看不到 http-global: 123 这个header的,需要注意

其实,proxy_set_header 也是完全相同的情况

Syntax: proxy_set_header field value;

Default:

Syntax:	`proxy_set_header field value;`
Default:	proxy_set_header Host $proxy_host; proxy_set_header Connection close;
Context:	`http`, `server`, `location`

proxy_set_header Host $proxy_host;

proxy_set_header Connection close;

Context: http, server, location

Allows redefining or appending fields to the request header passed to the proxied server. The value can contain text, variables, and their combinations. These directives are inherited from the previous level if and only if there are no proxy_set_header directives defined on the current level. By default, only two fields are redefined:

nginx proxy模式下502 bad gateway 问题

并发测试的时候发现nginx 502 bad gateway 了,看了下日志发现很多upstream Cannot assign requested address的记录
connect() to 192.168.89.170:80 failed (99: Cannot assign requested address) while connecting to upstream

正常判断应该是端口不够用了
不过,我确实开启了: net.ipv4.tcp_tw_recycle = 1 和 net.ipv4.tcp_tw_reuse = 1两个参数
理论上应该可以把timewait 端口重用,查了下这个参数跟tcp_timestamps有关(http://blog.sina.com.cn/s/blog_781b0c850100znjd.html)

if (tmp_opt.saw_tstamp &&
tcp_death_row.sysctl_tw_recycle &&
(dst = inet_csk_route_req(sk, req)) != NULL &&
(peer = rt_get_peer((struct rtable *)dst)) != NULL &&
peer->v4daddr == saddr) {
if (get_seconds() < peer->tcp_ts_stamp + TCP_PAWS_MSL &&
(s32)(peer->tcp_ts – req->ts_recent) >
TCP_PAWS_WINDOW) {
NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_PAWSPASSIVEREJECTED);
goto drop_and_release;
}
}

tmp_opt.saw_tstamp：该socket支持tcp_timestamp
sysctl_tw_recycle：本机系统开启tcp_tw_recycle选项
TCP_PAWS_MSL：60s，该条件判断表示该源ip的上次tcp通讯发生在60s内
TCP_PAWS_WINDOW：1，该条件判断表示该源ip的上次tcp通讯的timestamp 大于本次tcp

因此: 应该在proxy端和后端都开启net.ipv4.tcp_timestamps=1

nginx with static libcurl

场景是这样子的: 这边有个nginx 模块 include curl/curl.h,而我的编译参数–with-openssl使用了最新的openssl 1.0.1g,编译出来的nginx直接segfault

去除这个模块或者去掉–with-openssl都能正常使用,推测是系统的libcurl(https)包含了libssl的依赖,与内嵌的openssl产生冲突

于是解决办法就是把libcurl也编译到nginx里边,绕开冲突和依赖

1. 静态编译libssl
cd openssl-1.0.1g
./config –prefix=/usr/src/redhat/BUILD/nginx-1.4.7/openssl-1.0.1g/.openssl no-shared no-threads
make
make install
make install LIBDIR=lib

2. 静态编译libcurl
cd curl-7.36.0
./configure –prefix=/usr/src/redhat/BUILD/nginx-1.4.7/curl-7.36.0/.curl –with-ssl=/usr/src/redhat/BUILD/nginx-1.4.7/openssl-1.0.1g/.openssl/lib/ –disable-ldap –disable-ldaps –without-libidn –enable-static=yes –enable-shared=no

#去除对librt.so的依赖,不介意可以不修改
sed -i /HAVE_CLOCK_GETTIME_MONOTONIC/d lib/curl_config.h

make
make install

3. 修改nginx的Makefile
#替换libcurl.so(lcurl)为静态编译的libcurl.a
sed -i ‘s#-lcurl#curl-7.36.0/.curl/lib/libcurl.a -Lopenssl-1.0.1g/.openssl/lib -lcrypto -lz#g’ objs/Makefile
make
make install

做完这步,就生成了包含libcurl和libssl的nginx了