返回顶部
M

Monitoring

Set up observability for applications and infrastructure with metrics, logs, traces, and alerts.

作者: admin | 来源: ClawHub
源自
ClawHub
版本
V 1.0.0
安全检测
已通过
4,561
下载量
4
收藏
概述
安装方式
版本历史

Monitoring

## Complexity Levels | Level | Tools | Setup Time | Best For | |-------|-------|------------|----------| | **Minimal** | UptimeRobot, Healthchecks.io | 15 min | Side projects, MVPs | | **Standard** | Uptime Kuma, Sentry, basic Grafana | 1-2 hours | Small teams, startups | | **Professional** | Prometheus, Grafana, Loki, Alertmanager | 1-2 days | Production systems | | **Enterprise** | Datadog, New Relic, or full OSS stack | Ongoing | Large-scale operations | ## The Three Pillars | Pillar | What It Answers | Tools | |--------|-----------------|-------| | **Metrics** | "How is the system performing?" | Prometheus, Grafana, Datadog | | **Logs** | "What happened?" | Loki, ELK, CloudWatch | | **Traces** | "Why is this request slow?" | Jaeger, Tempo, Sentry | ## Quick Start by Use Case **"I just want to know if it's down"** → UptimeRobot (free) or Uptime Kuma (self-hosted). See `simple.md`. **"I need to debug production errors"** → Sentry with your framework SDK. 5-minute setup. See `apm.md`. **"I want real observability"** → Prometheus + Grafana + Loki. See `prometheus.md`. **"I need to centralize logs"** → Loki for simple, ELK for complex queries. See `logs.md`. ## What to Monitor ### Applications (RED Method) - **R**ate — requests per second - **E**rrors — error rate by endpoint - **D**uration — latency (p50, p95, p99) ### Infrastructure (USE Method) - **U**tilization — CPU, memory, disk usage - **S**aturation — queue depth, load average - **E**rrors — hardware/system errors ## Alerting Principles | Do | Don't | |----|-------| | Alert on symptoms (user impact) | Alert on causes (CPU high) | | Include runbook link | Require investigation to understand | | Set appropriate severity | Make everything P1 | | Require action | Alert on "interesting" metrics | **Alert fatigue kills monitoring.** If alerts are ignored, you have no monitoring. For alert configuration, severities, and on-call setup, see `alerting.md`. ## Cost Comparison | Solution | Monthly Cost (small) | Monthly Cost (medium) | |----------|---------------------|----------------------| | UptimeRobot | Free | $7 | | Uptime Kuma | $5 (VPS) | $5 (VPS) | | Sentry | Free / $26 | $80 | | Grafana Cloud | Free tier | $50+ | | Datadog | $15/host | $23/host + features | | Self-hosted stack | $10-20 (VPS) | $50-100 (VPS) | ## Common Mistakes - Starting with Prometheus/Grafana when Uptime Kuma would suffice - No alerting (dashboards nobody watches) - Too many alerts (alert fatigue → ignored) - Missing runbooks (alert fires, nobody knows what to do) - Not monitoring from outside (only internal checks) - Storing logs forever (cost explodes)

标签

skill ai

通过对话安装

该技能支持在以下平台通过对话安装:

OpenClaw WorkBuddy QClaw Kimi Claude

方式一:安装 SkillHub 和技能

帮我安装 SkillHub 和 monitoring-1776420088 技能

方式二:设置 SkillHub 为优先技能安装源

设置 SkillHub 为我的优先技能安装源,然后帮我安装 monitoring-1776420088 技能

通过命令行安装

skillhub install monitoring-1776420088

下载 Zip 包

⬇ 下载 Monitoring v1.0.0

文件大小: 11.15 KB | 发布时间: 2026-4-17 19:59

v1.0.0 最新 2026-4-17 19:59
Initial release

Archiver·手机版·闲社网·闲社论坛·羊毛社区· 多链控股集团有限公司 · 苏ICP备2025199260号-1

Powered by Discuz! X5.0   © 2024-2025 闲社网·线报更新论坛·羊毛分享社区·http://xianshe.com

p2p_official_large
返回顶部