Prometheus 监控分为两种:

  • 白盒监控

  • 墨盒监控

白盒监控:是指我们日常监控主机的资源用量、容器的运行状态、数据库中间件等运行数据。这些都是支持业务和服务的基础设施,通过白盒能够了解其内部的实际运行状态,通过对监控指标的观察能够预判可能出现的问题,从而对潜在的不确定因素进行优化。

墨盒监控:即以用户的身份测试服务的外部可见性,常见的黑盒监控包括 HTTP探针、TCP探针、Dns、icmp等用于检测站点、服务的可访问性、服务的连通性、证书过期时间以及访问效率等。

两者比较:黑盒监控相较于白盒监控最大的不同在于黑盒监控是以故障为导向当故障发生时,黑盒监控能快速发现故障,而白盒监控则侧重于主动发现或者预测潜在的问题。一个完善的监控目标是要能够从白盒的角度发现潜在问题,能够在黑盒的角度快速发现已经发生的问题。

在Kubernetes 中使用Helm 部署 Blackbox-Exporter

官方仓库地址:https://github.com/prometheus-community/helm-charts 这里面包含prometheus 相关组件几乎所有的helm chert。

  • 添加仓库

    $ helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
    $ helm repo update
  • 下载&解压

    $ helm pull prometheus-community/prometheus-blackbox-exporter
    $ tar -zxvf prometheus-blackbox-exporter.tar.gz
  • 修改values.yaml 修改blackbox_exporter配置

      modules:
        http_2xx:  # http 检测模块  Blockbox-Exporter 中所有的探针均是以 Module 的信息进行配置
          prober: http
          timeout: 10s
          http:
            valid_http_versions: ["HTTP/1.1", "HTTP/2"]
            valid_status_codes: [200]  # 默认 2xx,这里定义一个返回状态码,在grafana作图时,有明示。
            method: GET
            headers:
              Host: prometheus.example.com
              Accept-Language: en-US
              Origin: example.com
            preferred_ip_protocol: "ip4" # 首选IP协议
            no_follow_redirects: false # 关闭跟随重定向
        http_post_2xx: # http post 监测模块
          prober: http
          timeout: 10s
          http:
            valid_http_versions: ["HTTP/1.1", "HTTP/2"]
            method: POST
            # post 请求headers, body 这里可以不声明
            headers:  # 使用 json 格式
              Content-Type: application/json
            body: '{"text": "hello"}'
            preferred_ip_protocol: "ip4"
        tcp_connect:  # TCP 检测模块
          prober: tcp
          timeout: 10s
        dns_tcp:  # DNS 通过TCP检测模块
          prober: dns
          dns:
            transport_protocol: "tcp"  # 默认是 udp
            preferred_ip_protocol: "ip4"  # 默认是 ip6
            query_name: "kubernetes.default.svc.cluster.local" # 利用这个域名来检查 dns 服务器
            # query_type: "A"  # 如果是 kube-dns ,一定要加入这个,因为不支持Ipv6

image-20210427153254512

  • 部署 Blackbox-Exporter

    $ helm install blackbox-exporter -n monitor .

Prometheus Operator 配置

  • HTTP 监控(监控外部域名)

          - job_name: 'blackbox_http_2xx'
            metrics_path: /probe
            params:
              module: [http_2xx]  # Look for a HTTP 200 response.
            static_configs:
              - targets:
                #- http://prometheus.io    # Target to probe with http.
                #- https://prometheus.io   # Target to probe with https.
                - https://www.baidu.com # Target to probe with http on port 8080.
            relabel_configs:
              - source_labels: [__address__]
                target_label: __param_target
              - source_labels: [__param_target]
                target_label: instance
              - target_label: __address__
                replacement: prometheus-blackbox-exporter.monitor:9115  # The blackbox exporter's real hostname:port.
  • ping 监测配置

    在内网可以通过ping (icmp)检测服务器的存活,以前面的最基本的module配置为例,在Prometheus的配置文件中配置使用ping module:

          - job_name: 'blackbox_ping_all'
            scrape_interval: 1m
            metrics_path: /probe
            params:
              module: [ping]
            static_configs:
              - targets:
                - 64.115.3.100
              labels:
                instance: test
            relabel_configs:
              - source_labels: [__address__]
                target_label: __param_target
              - target_label: __address__
                replacement: prometheus-blackbox-exporter.monitor:9115 # blackbox-exporter Sevice 地址端口

Helm 部署的Prometheus Operator 直接修改values.yaml 文件在prometheus.additionalScrapeConfigs 下添加配置然后更新即可。

image-20210427155813871

  • 更新

    $ vim values.yaml
    $ helm upgrade prometheus -n monitor .
  • 打开Prometheus Dashboard的 Target 页面,就会看到 上面定义的任务

image-20210427160015883

同样的操作还可以添加更多不通类型的黑盒监测

  • DNS 监控

    - job_name: "blackbox-k8s-service-dns"
      scrape_interval: 30s
      scrape_timeout: 10s
      metrics_path: /probe # 不是 metrics,是 probe
      params:
        module: [dns_tcp] # 使用 DNS TCP 模块
      static_configs:
      - targets:
        - kube-dns.kube-system:53  # 不要省略端口号
      relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: prometheus-blackbox-exporter.monitor:9115 # blackbox-exporter Sevice 地址端口
  • ICMP监测

    - job_name: node_status
        metrics_path: /probe
        params:
          module: [icmp]
        static_configs:
          - targets: ['10.165.94.31']
            labels:
              instance: node_status
              group: 'node'
        relabel_configs:
          - source_labels: [__address__]
            target_label: __param_target
          - target_label: __address__
            replacement: prometheus-blackbox-exporter.monitor:9115 # blackbox-exporter Sevice 地址端口
  • TCP监测

    - job_name: 'prometheus_port_status'
        metrics_path: /probe
        params:
          module: [tcp_connect]
        static_configs:
          - targets: ['172.19.155.133:8765']
            labels:
              instance: 'port_status'
              group: 'tcp'
        relabel_configs:
          - source_labels: [__address__]
            target_label: __param_target
          - source_labels: [__param_target]
            target_label: instance
          - target_label: __address__
            replacement: prometheus-blackbox-exporter.monitor:9115 # blackbox-exporter Sevice 地址端口
  • SSL 证书过期时间监测

    rule_files:
      - ssl_expiry.rules
    scrape_configs:
      - job_name: 'blackbox'
        metrics_path: /probe
        params:
          module: [http_2xx]  # Look for a HTTP 200 response.
        static_configs:
          - targets:
            - example.com  # Target to probe
        relabel_configs:
          - source_labels: [__address__]
            target_label: __param_target
          - source_labels: [__param_target]
            target_label: instance
          - target_label: __address__
            replacement: prometheus-blackbox-exporter.monitor:9115 # blackbox-exporter Sevice 地址端口

    ssl_expiry.rules

    groups: 
      - name: ssl_expiry.rules 
        rules: 
          - alert: SSLCertExpiringSoon 
            expr: probe_ssl_earliest_cert_expiry{job="blackbox"} - time() < 86400 * 30 
            for: 10m

参考:

Blackbox prober exporter

prometheus-community/helm-charts

YP小站

爱是与世界平行

发表评论

您的电子邮箱地址不会被公开。 必填项已用*标注

此站点使用Akismet来减少垃圾评论。了解我们如何处理您的评论数据

open