需求:Promethus 监控Nginx主要用到以下三个模块:
- nginx-module-vts:Nginx virtual host traffic status module,Nginx的监控模块,能够提供JSON格式的数据产出。
- nginx-vts-exporter:Simple server that scrapes Nginx vts stats and exports them via HTTP for Prometheus consumption。主要用于收集Nginx的监控数据,并给Prometheus提供监控接口,默认端口号9913。
- Prometheus:监控Nginx-vts-exporter提供的Nginx数据,并存储在时序数据库中,可以使用PromQL对时序数据进行查询和聚合。
一、初始化
yum install -y gcc gcc-c++ curl wget bzip2
yum install -y pcre pcre-devel openssl openssl-devel zlib zlib-devel
cd /opt
wget https://github.com/jemalloc/jemalloc/releases/download/5.1.0/jemalloc-5.1.0.tar.bz2
tar -jxvf jemalloc-5.1.0.tar.bz2
cd jemalloc-5.1.0
./configure --prefix=/usr/local/jemalloc
make -j 2 &>/dev/null && make install &>/dev/null
echo "/usr/local/jemalloc/lib" >/etc/ld.so.conf.d/jemalloc.conf
ldconfig
ln -s /usr/local/jemalloc/lib/libjemalloc.so /usr/lib64/libjemalloc.so
ln -s /usr/local/jemalloc/lib/libjemalloc.so.2 /usr/lib64/libjemalloc.so.2
二、安装nginx-module-vts
-
1、安装脚本
cd /opt/ git clone https://github.com/vozlt/nginx-module-vts wget https://openresty.org/download/openresty-1.15.8.2.tar.gz tar -zxvf openresty-1.15.8.2.tar.gz cd openresty-1.15.8.2 ./configure --prefix=/usr/local/openresty --with-stream --with-threads --with-http_ssl_module --with-http_v2_module --with-http_realip_module --with-http_gzip_static_module --with-http_stub_status_module --user=www --group=www --build="LiveOps build at
date +%Y-%m-%d
" --with-ld-opt="-Ijemalloc" --add-module=/opt/nginx-module-vts/ gmake gmake install
2、配置nginx
http {
vhost_traffic_status_zone;
vhost_traffic_status_filter_by_host on;
...
server {
...
location /status {
vhost_traffic_status_display;
vhost_traffic_status_display_format html;
}
}
- 3、打开vhost过滤:
vhost_traffic_status_filter_by_host on;
开启此功能,在Nginx配置有多个server_name的情况下,会根据不同的server_name进行流量的统计,否则默认会把流量全部计算到第一个server_name上。
-
4、在不想统计流量的server区域禁用vhost_traffic_status,配置示例:
server { ... vhost_traffic_status off; ... }
假如nginx没有规范配置server_name或者无需进行监控的server上,那么建议在此vhost上禁用统计监控功能。否则会出现"127.0.0.1",hostname等的域名监控信息。
- 5、安装完vts模块后,可以通过nginx status接口进行监控数据的查看
三、安装nginx-vts-exporter
wget https://github.com/hnlq715/nginx-vts-exporter/releases/download/v0.10.3/nginx-vts-exporter-0.10.3.linux-amd64.tar.gz
tar zxvf nginx-vts-exporter-0.10.3.linux-amd64.tar.gz
mv nginx-vts-exporter-0.10.3.linux-amd64 /usr/local/exporter/nginx-vts-exporter
cd /usr/local/exporter/
nohup ./nginx-vts-exporter -nginx.scrape_timeout 10 -nginx.scrape_uri http://114.67.116.119/status/format/json
启动日志
[root@Prometheus nginx-vts-exporter]# tail -f nohup.out
2020/02/21 12:16:50 Starting nginx_vts_exporter (version=0.10.3, branch=HEAD, revision=8aa2881c7050d9b28f2312d7ce99d93458611d04)
2020/02/21 12:16:50 Build context (go=go1.10, user=root@56ca8763ee48, date=20180328-05:47:47)
2020/02/21 12:16:50 Starting Server at : :9913
2020/02/21 12:16:50 Metrics endpoint: /metrics
2020/02/21 12:16:50 Metrics namespace: nginx
2020/02/21 12:16:50 Scraping information from : http://114.67.116.119/status/format/json
备注:推荐exporter和nginx安装在同一台机器上,如果不在同一台主机,把scrape_uri改为nginx主机的地址。
nginx_vts_exporter的默认端口号:9913,对外暴露监控接口http://xxx:9913/metrics.
展示:http://xxx:9913/metrics
四、nginx-vts-exporte 接入 promethueus.yml
修改prometheus.yml配置文件
vim /usr/local/prometheus/promethueus.yml
`
`
* `job_name: nginx
static_configs:
``
`
* `targets: ['114.67.116.119:9913']
labels:
instance: nginx-test
启动服务
nohup /usr/local/prometheus/prometheus --config.file=/usr/local/prometheus/prometheus.yml &`
图形展示
五、granfana展示
在grafana数据源导入2949的模板 (https://grafana.com/dashboards/2949)
常用监控汇总表达式
DomainName对应nginx conf里的server_name,这里可以根据不同的server_name和upstream分别进行qps、2xx/3xx/4xx/5xx的状态码监控,另外也可以监控nginx每台后端server的qps和后端接口响应时间。
如果不需要区分server_name,可以把表达式里的$DomainName改为星号,"*****"代表所有;
# 1. 求Nginx的QPS:
sum(irate(nginx_server_requests{code="total",host=~"$DomainName"}[5m]))
sum(irate(nginx_server_requests{instance=~"$Instance", code!="total"}[5m])) by (code)
# 2. 求4xx万分率(5xx类似,code="5xx"):
(sum(irate(nginx_server_requests{code="4xx",host=~"$DomainName"}[5m])) / sum(irate(nginx_server_requests{code="total",host=~"$DomainName"}[5m]))) * 10000
# 3. 求upstream的QPS(示例求group1的qps):
sum(irate(nginx_upstream_requests{code="total",upstream="group1"}[5m]))
# 4. 求upstream后端server的响应时间(示例求group1的后端响应时间):
nginx_upstream_responseMsec{upstream="group1"}
nginx_upstream_responseMsec{backend="192.168.x.xxx:8803",instance="nginx-web-1",job="nginx",upstream="UPSTREAM_NAME"}