写在前面

之前部署了Prometheus 来监控MongoDB和Redis集群,但是对MongoDB能监控的项目不是很全面,包括监控节点是否存活,集群运行状态等等,找了一下其他的解决方案也不适用我这里的生产环境,现阶段只能通过检测端口来判断节点是否存活

部署黑盒

本来打算还是通过Docker部署,但是没有找到好一点的镜像,就改为从官方下载程序然后托管到系统运行

#创建并切换目录
mkdir -p /opt/install
cd /opt/install
#下载源码文件
wget https://github.com/prometheus/blackbox_exporter/releases/download/v0.21.0/blackbox_exporter-0.21.0.linux-amd64.tar.gz
#解压文件
tar zxvf blackbox_exporter-0.21.0.linux-amd64.tar.gz
#创建systemctl控制文件
vim /lib/systemd/system/blackbox_exporter.service
#加入以下内容(注意路径)

[Unit]
Description=blackbox_exporter

[Service]
User=root
Type=simple
ExecStart=/opt/install/blackbox_exporter/blackbox_exporter --config.file=/opt/install/blackbox_exporter/blackbox.yml
Restart=on-failure

接着我们查看是否可用

#上面的命令运行完成如果没有报错,表示配置文件写的正确
systemctl status blackbox_exporter
#接着我们配置开机自启动并启动
systemctl enable blackbox_exporter
systemctl start blackbox_exporter
#查看运行状态
systemctl status blackbox_exporter

程序运行起来后,去修改Prometheus 的相关配置文件

修改Prometheus 配置文件

找到之前的Prometheus 配置文件并增加以下内容

vim /opt/yaml/prometheus.yml

#增加配置,注意和前面的配置一样保持缩进
- job_name: 'port_status'
    metrics_path: /probe
    params:
      module: [tcp_connect]
    static_configs:
      - targets: ['172.18.66.165:28017','172.18.66.165:27018','172.18.66.165:27017']
        labels:
          instance: '172.18.66.165'
      - targets: ['172.18.66.166:28017','172.18.66.166:27018','172.18.66.166:27017']
        labels:
          instance: '172.18.66.166'
      - targets: ['172.18.66.171:28017','172.18.66.171:27018','172.18.66.171:27001']
        labels:
          instance: '172.18.66.171'
   relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: 172.18.66.178:9115 #blackbox默认端口就是9115

Prometheus配置文件修改完成以后,新增报警配置文件

#新增一个配置文件,并加入内容
vim /opt/rules/port.rules

groups:
- name: blackbox_network_stats
  rules:
  - alert: 'MongoDB端口探测失败'
    expr: probe_success == 0
    for: 60s
    labels:
      severity: high
      alertinfo: push_blackbox_alert
    annotations:
      summary: "{{ $labels.instance }}探测失败"
      description: "MongoDB端口探测失败,请检查业务是否正常!!!"

接着重启Prometheus容器

#重启容易,注意容器名字可能和我的不一样
docker restart prometheus-one
#查看容器运行状态
docker inspect prometheus-one

运行正常的话就可以通过web界面查看规则是否启用了

可以看到探测和报警都已经添加成功,还可以增加Grafana的数据面板方便查看

其实黑盒不止可以监控端口,还可以通过ping和http探测服务或服务器是否存活

#下面是模版配置,在Prometheus配置文件增加相应内容即可

   # 网站监控
  - job_name: 'http_status'
    metrics_path: /probe
    params:
      module: [http_2xx]
    static_configs:
      - targets: ['<http://www.baidu.com>']
        labels:
          instance: http_status
          group: web
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - target_label: __address__
        replacement: 172.18.66.178:9115

   # ping 检测
  - job_name: 'ping_status'
    metrics_path: /probe
    params:
      module: [icmp]
    static_configs:
      - targets: ['192.168.31.62']
        labels:
          instance: 'ping_status'
          group: 'icmp'
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - target_label: __address__
        replacement: 172.18.66.178:9115

   # 端口监控
  - job_name: 'port_status'
    metrics_path: /probe
    params:
      module: [tcp_connect]
    static_configs:
      - targets: ['192.168.31.62:80']
        labels:
          instance: 'port_status'
          group: 'port'
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - target_label: __address__
        replacement: 172.18.66.178:9115