Files
hms/docker/alertmanager/config.yml
iven 6457c53d9c feat(docker): PP-04 可观测性 MVP — Alertmanager 告警出口 + Grafana provisioning
PP-04 核实属实:11 条告警规则在 prometheus 加载但无 alertmanager(告警
无通知出口),grafana provisioning 目录空,exporter 服务也未部署
("配置齐全运行为零")。

MVP 打通告警链路 + 让 grafana 可用(不依赖 exporter,基于 app metrics):
- docker-compose.production.yml 加 alertmanager 服务 + alertmanager_data 卷
- prometheus.yml 加 alerting 指向 alertmanager:9093
- alertmanager/config.yml 路由(SEV-1 critical 即时通知 + 分组)
- grafana/provisioning/datasources 自动连 prometheus
- grafana/provisioning/dashboards provider 就绪

待办(上线前):① alertmanager 占位 webhook 替换为真实渠道(钉钉/企微/邮件)
② 补 grafana dashboard JSON ③ 部署 postgres/redis/nginx exporter 让 prometheus 抓得到
2026-06-26 09:25:43 +08:00

35 lines
1.2 KiB
YAML
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Alertmanager 告警通知配置
#
# ⚠️ TODO上线前必填将 receivers.default.webhook_configs 替换为真实通知渠道:
# - 钉钉机器人https://oapi.dingtalk.com/robot/send?access_token=XXX
# - 企业微信群机器人https://qyapi.weixin.qq.com/cgi-bin/webhook/send?key=XXX
# - 邮件 SMTP配置 global.smtp_* + email_configs
#
# 当前为占位 webhook指向无效端点alertmanager 会启动但告警 POST 失败记日志。
# PP-04 MVP 目的:先打通 prometheus → alertmanager 链路,渠道上线前填。
global:
resolve_timeout: 5m
# 路由:按 alertname + service 分组,先 SEV-1critical走即时通知
route:
receiver: "default"
group_by: ["alertname", "service"]
group_wait: 30s
group_interval: 5m
repeat_interval: 4h
routes:
# SEV-1 关键告警DB 宕机/5xx 飙升/Redis 不可达立即通知5 分钟重复
- matchers:
- severity = "critical"
receiver: "default"
group_wait: 0s
repeat_interval: 5m
receivers:
- name: "default"
# 占位:上线前替换为真实 webhook
webhook_configs:
- url: "http://placeholder.invalid/alert"
send_resolved: true