Prometheus
通常使用 Grafana 进行监控数据的展示,使用 AlertManager 组件进行提前预警

Prometheus

Prometheus 是一套开源监控系统,使用 Go 语言开发,是 Google BorgMon 监控系统的类似实现;基本原理是通过 HTTP 协议周期性抓取被监控组件的状态,任意组件只要提供对应的 HTTP 接口就可以接入监控,输出监控信息的 HTTP 接口被称作 exporter,想要监控什么服务下载相应的 exporter 即可。

安装 Prometheus

1
2
3
4
wget https://github.com/prometheus/prometheus/releases/download/v2.37.4/prometheus-2.37.4.linux-amd64.tar.gz
tar -zxvf prometheus-2.37.4.linux-amd64.tar.gz -C /app/service
mv /app/service/prometheus-2.37.4.linux-amd64 /app/service/prometheus
mkdir -pv /app/data/prometheus

修改配置 vi /app/service/prometheus/prometheus.yml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
#全局配置
global:
scrape_interval: 15s #隔 15s 采集一次数据
evaluation_interval: 15s #隔 15s 做一次告警检测
# scrape_timeout is set to the global default (10s).

# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets:
# - alertmanager:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files: #指定告警规则文件
# - "first_rules.yml"
# - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs: 指定监控的目标
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: "prometheus"

# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.

static_configs:
- targets: ["localhost:9090"]

启动 Prometheus

1
2
3
4
5
6
7
8
##前台运行
/app/service/prometheus/prometheus --config.file=/app/service/prometheus/prometheus.yml --storage.tsdb.path=/app/data/prometheus
##后台运行,"--web.enable-lifecycle" 通过 HTTP 请求重新加载配置
nohup /app/service/prometheus/prometheus --config.file=/app/service/prometheus/prometheus.yml --storage.tsdb.path=/app/data/prometheus --web.enable-lifecycle > /app/logs/prometheus.log &

ps -ef | grep prometheus
##重新加载配置
curl -X POST localhost:9090/-/reload

系统服务

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
cat << EOF >> /etc/systemd/system/prometheus.service
[Unit]
Description=Prometheus Server
Documentation=https://prometheus.io
After=network.target

[Service]
Type=simple
ExecStart=/app/service/prometheus/prometheus \
--config.file=/app/service/prometheus/prometheus.yml \
--storage.tsdb.path=/app/data/prometheus \
--storage.tsdb.retention=15d \
--web.enable-lifecycle
ExecReload=/bin/kill -HUP $MAINPID
Restart=on-failure

[Install]
WantedBy=multi-user.target
EOF

重新加载系统服务并设置开机启动

1
2
3
4
5
6
##加载服务
systemctl daemon-reload
##启动服务
systemctl start prometheus
##开启自启
systemctl enable prometheus

node 监控组件

1
2
3
4
5
6
7
wget https://github.com/prometheus/node_exporter/releases/download/v1.4.0/node_exporter-1.4.0.linux-amd64.tar.gz
tar -zxvf node_exporter-1.4.0.linux-amd64.tar.gz -C /app/service
mv /app/service/node_exporter-1.4.0.linux-amd64 /app/service/node_exporter
#启动
/app/service/node_exporter/node_exporter -h
nohup /app/service/node_exporter/node_exporter > /app/logs/node_exporter.log &
ps -ef | grep node_exporter

添加配置 vi /app/service/prometheus/prometheus.yml

1
2
3
- job_name: "node_exporter"
static_configs:
- targets: ["192.168.254.100:9100"]

系统服务

1
2
3
4
5
6
7
8
9
10
11
12
13
14
cat << EOF >> /etc/systemd/system/node_exporter.service
[Unit]
Description=node_exporter
Documentation=https://prometheus.io
After=network.target

[Service]
Type=simple
ExecStart=/app/service/node_exporter/node_exporter
Restart=on-failure

[Install]
WantedBy=multi-user.target
EOF

重新加载系统服务并设置开机启动

1
2
3
4
5
6
##加载服务
systemctl daemon-reload
##启动服务
systemctl start node_exporter
##开启自启
systemctl enable node_exporter

mysqld 监控组件

1
2
3
wget https://github.com/prometheus/mysqld_exporter/releases/download/v0.14.0/mysqld_exporter-0.15.0.linux-amd64.tar.gz
tar -zxvf mysqld_exporter-0.15.0.linux-amd64.tar.gz -C /app/service
mv /app/service/mysqld_exporter-0.15.0.linux-amd64 /app/service/mysqld_exporter

创建用户并授权

1
2
CREATE USER 'monitor'@'localhost' IDENTIFIED BY 'Monitor@123' WITH MAX_USER_CONNECTIONS 3;
GRANT PROCESS, REPLICATION CLIENT, SELECT ON *.* TO 'monitor'@'localhost';

创建 my.cnf 配置文件:vi /app/service/mysqld_exporter/my.cnf
1
2
3
4
5
[client]
host=127.0.0.1
port=3306
user=monitor
password=Monitor@123

系统服务

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
cat << EOF >> /etc/systemd/system/mysqld_exporter.service
[Unit]
Description=mysqld_exporter
Documentation=https://prometheus.io
After=network.target

[Service]
Type=simple
ExecStart=/app/service/mysqld_exporter/mysqld_exporter \
--config.my-cnf /app/service/mysqld_exporter/my.cnf \
--web.listen-address=:9104 \
--log.level=info \
--log.format=logfmt
Restart=on-failure

[Install]
WantedBy=multi-user.target
EOF

重新加载系统服务并设置开机启动

1
2
3
4
5
6
##加载服务
systemctl daemon-reload
##启动服务
systemctl start mysqld_exporter
##开启自启
systemctl enable mysqld_exporter

添加配置:vi /app/service/prometheus/prometheus.yml
1
2
3
- job_name: "mysqld_exporter"
static_configs:
- targets: ["192.168.254.100:9104"]

监控大盘:17320

redis 监控组件

1
2
3
4
5
6
7
wget https://github.com/oliver006/redis_exporter/releases/download/v1.45.0/redis_exporter-v1.45.0.linux-amd64.tar.gz
tar -zxvf redis_exporter-v1.45.0.linux-amd64.tar.gz -C /app/service
mv /app/service/redis_exporter-v1.45.0.linux-amd64 /app/service/redis_exporter
#启动
/app/service/redis_exporter/redis_exporter -h
nohup /app/service/redis_exporter/redis_exporter > /app/logs/redis_exporter.log &
ps -ef | grep redis_exporter

添加配置:vi /app/service/prometheus/prometheus.yml

1
2
3
4
- job_name: "redis_exporter"
scrape_interval: 10s
static_configs:
- targets: ["192.168.254.100:9121"]

alertmanager 告警组件

alertmanager 主要用于接收 Prometheus 发送的告警信息,它支持丰富的告警通知渠道,而且很容易做到告警信息进行去重,降噪,分组等,是一款前卫的告警通知系统。
安装 alertmanager

1
2
3
wget https://github.com/prometheus/alertmanager/releases/download/v0.24.0/alertmanager-0.24.0.linux-amd64.tar.gz
tar -zxvf alertmanager-0.24.0.linux-amd64.tar.gz -C /app/service
mv alertmanager-0.24.0.linux-amd64 alertmanager

修改配置 vi alertmanager/alertmanager.yml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
route:
group_by: ['alertname']
group_wait: 30s
group_interval: 5m
repeat_interval: 1h
receiver: 'web.hook'
receivers:
- name: 'web.hook'
webhook_configs:
- url: 'http://127.0.0.1:5001/'
inhibit_rules:
- source_match:
severity: 'critical'
target_match:
severity: 'warning'
equal: ['alertname', 'dev', 'instance']

修改完成以后检查
1
/app/service/alertmanager/amtool check-config /app/service/alertmanager/alertmanager.yml

启动 alertmanager
1
/app/service/alertmanager/alertmanager --config.file /app/service/alertmanager/alertmanager.yml

Grafana

Grafana 是一个跨平台的开源的度量分析和可视化工具,可以通过将采集的数据查询然后可视化的展示,并及时通知。

安装 Grafana

1
2
3
wget https://dl.grafana.com/enterprise/release/grafana-enterprise-9.0.0.linux-amd64.tar.gz
tar -zxvf grafana-enterprise-9.0.0.linux-amd64.tar.gz -C /app/service
mv /app/service/grafana-9.0.0 /app/service/grafana

按需修改配置文件

1
vi /app/service/grafana/conf/defaults.ini

启动 Grafana

1
2
nohup /app/service/grafana/bin/grafana-server --homepath=/app/service/grafana --config=/app/service/grafana/conf/defaults.ini &
ps -ef | grep grafana-server

系统服务

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
cat << EOF >> /etc/systemd/system/grafana.service
[Unit]
Description=Grafana Server
Documentation=Grafana
After=network.target

[Service]
Type=simple
ExecStart=/app/service/grafana/bin/grafana-server \
--homepath=/app/service/grafana \
--config=/app/service/grafana/conf/defaults.ini
ExecReload=/bin/kill -HUP $MAINPID
Restart=on-failure

[Install]
WantedBy=multi-user.target
EOF

重新加载系统服务并设置开机启动

1
2
3
4
5
6
##加载服务
systemctl daemon-reload
##启动服务
systemctl start grafana
##开启自启
systemctl enable grafana

可视化展示

浏览器访问 IP:3000,输入初始账户和密码:admin

数据源

在 Grafana 左侧工具栏选择 Configuration,点击到下面的 Data sources,打开添加数据源的页面


点击页面中的 Add data source 按钮开始添加数据源,选择第一项 Prometheus 数据源进行配置

在 HTTP 项中配置 URL 填写 Prometheus 的地址,Access 选择默认的 Server 代理方式,配置完成后,拉到最下方点击 Save & test,提示添加成功即表面数据源添加成功

监控面板

附:官方面板库
可点击 Grafana 左侧菜单导入现有的 node_exporter 的仪表盘

输入仪表盘 id(1860、8919、16098),点击 Load

选择使用 prometheus 数据源,然后点击 Import,即可得到一个完整监控 node_exporter 指标的仪表盘

然后保存即可

修改密码

1
2
3
4
5
6
7
8
9
10
11
sqlite3 /app/service/grafana/data/grafana.db
#查看数据库中包含的表
.tables
#查看 user 表内容
select * from user;
#重置 admin 用户的密码为默认
update user set password = '59acf18b94d7eb0694c61e60ce44c110c7a683ac6a8f09580d626f90f4a242000746579358d77dd9e570e83fa24faa88a8a6' where login = 'admin';
#退出 sqlite3
.exit
#修改指定用户为管理员
udpate user set is_admin = 1 where login = 'xxxx';

容器部署

1
2
docker run -d --name prometheus -p 9090:9090 prom/prometheus
docker run -d --name=grafana -p 3000:3000 -v /var/lib/grafana:/var/lib/grafana grafana/grafana-oss