返回顶部
b

bgp-analysis

>-

作者: admin | 来源: ClawHub
源自
ClawHub
版本
V 0.1.1
安全检测
已通过
112
下载量
0
收藏
概述
安装方式
版本历史

bgp-analysis

# BGP Protocol Analysis Protocol-reasoning-driven analysis skill for BGP peering, path selection, and route propagation. Unlike device health checks that compare metrics against thresholds, BGP analysis requires interpreting protocol state machines, walking the best-path algorithm, and validating policy application across the control plane. Commands are labeled **[Cisco]**, **[JunOS]**, or **[EOS]** where syntax diverges. Unlabeled statements apply to all three vendors. ## When to Use - BGP peer reported down or stuck in a non-Established state - Suspected route leak — prefixes appearing in tables where they should not - Path selection not matching expectations after policy changes - Convergence too slow after planned maintenance or unplanned failover - Post-change verification of BGP configuration (new peers, policy updates, community changes) - Capacity planning for prefix table growth or session scaling - Investigating asymmetric routing caused by inconsistent BGP attributes ## Prerequisites - SSH or console access to the router (read-only privilege sufficient) - BGP process running on the device with at least one configured peer - Knowledge of expected peer topology: which neighbors should be up, expected prefix counts per peer, intended path selection outcomes - Awareness of configured routing policy: route-maps, prefix-lists, community filters, AS-path access lists, and local-preference assignments - For iBGP: understanding of the route-reflector or full-mesh topology ## Procedure Follow this diagnostic flow sequentially. Each step builds on the data from prior steps. The procedure moves from broad inventory to targeted diagnosis. ### Step 1: BGP Session Inventory Collect all peer states and compare against expected topology. **[Cisco]** ``` show bgp ipv4 unicast summary ``` **[JunOS]** ``` show bgp summary ``` **[EOS]** ``` show ip bgp summary ``` Record each neighbor: address, AS number, state, prefixes received, up/down time. Compare against expected topology — every configured peer should appear. A peer missing from output means it was never configured or was removed. Any peer not showing a numeric prefix count is not Established — proceed to Step 2 for that peer. ### Step 2: Peer State Diagnosis For any peer not in Established state, the BGP FSM state reveals the failure domain. This is the core diagnostic reasoning step. **[Cisco]** ``` show bgp ipv4 unicast neighbors [addr] | include state|last reset|error ``` **[JunOS]** ``` show bgp neighbor [addr] | match "State|Last Error|Last State" ``` **[EOS]** ``` show ip bgp neighbors [addr] | include state|last reset|error ``` Interpret the FSM state to isolate the failure: - **Idle** → BGP process not attempting connection. Causes: administratively shut down, no route to peer address, configured remote AS does not match, or maximum-prefix limit hit triggering teardown. - **Connect** → TCP SYN sent, waiting for response. Peer is unreachable at Layer 3 or a firewall is blocking TCP port 179. - **Active** → TCP connection attempt failed, retrying. Same causes as Connect but the router has cycled back. Check: ACLs blocking port 179, peer not configured for this neighbor, peer address unreachable. - **OpenSent** → TCP connected, OPEN message sent, no reply. Remote end accepted TCP but is not sending OPEN — typically remote BGP not configured for this neighbor or remote peer in admin shutdown. - **OpenConfirm** → OPEN received but parameters rejected. Check: AS number mismatch, capability negotiation failure (AFI/SAFI mismatch), hold timer negotiation failure, authentication (MD5/TCP-AO) mismatch. Check "last reset reason" and "last error" fields — they often provide the definitive cause. Reference: `references/state-machine.md` for full FSM detail. ### Step 3: Route Table Analysis For Established peers, verify prefix exchange matches expectations. **[Cisco]** ``` show bgp ipv4 unicast neighbors [addr] routes | include Total ``` **[JunOS]** ``` show route receive-protocol bgp [addr] table summary ``` **[EOS]** ``` show ip bgp neighbors [addr] received-routes | include Total ``` Compare received prefix count against baseline. Significant deviation indicates: - Drop >10% → upstream is withdrawing routes (maintenance, filter change, or failure) - Increase >10% → route leak or new prefixes originated upstream - Zero received → peer is Established but sending no routes (missing export policy on JunOS, or outbound filter on remote blocking everything) Check advertised prefix count similarly — confirm this router is sending the expected number of routes to each peer. ### Step 4: Path Selection Verification When traffic takes an unexpected path, walk the BGP best-path algorithm to identify which attribute is making the selection. **[Cisco]** ``` show bgp ipv4 unicast [prefix] bestpath ``` **[JunOS]** ``` show route [prefix] detail | match "AS path|Local|MED|Weight|preference" ``` **[EOS]** ``` show ip bgp [prefix] detail ``` The best-path algorithm evaluates in this order (first difference wins): 1. **Weight** (Cisco/EOS local, highest wins — JunOS does not use weight) 2. **Local Preference** (highest wins, default 100) 3. **Locally originated** (network/aggregate preferred over learned) 4. **AS Path length** (shortest wins) 5. **Origin** (IGP < EGP < Incomplete) 6. **MED** (lowest wins, compared only within same neighbor AS by default) 7. **eBGP over iBGP** (external preferred) 8. **IGP metric to next-hop** (lowest wins) 9. **Router ID** (lowest wins, tiebreaker) Identify which attribute selects the current best path. If unexpected, check the route-map or policy applying that attribute on ingress. ### Step 5: Route Filtering Validation Verify that route-maps, prefix-lists, and community filters apply as intended. **[Cisco]** ``` show bgp ipv4 unicast neighbors [addr] policy ``` **[JunOS]** ``` show policy [policy-name] | display detail ``` **[EOS]** ``` show route-map [name] ``` For suspected route leaks: examine the RIB for prefixes that should not be present. Check inbound filters on the peer that is the source. Common leak causes: missing or misordered prefix-list entry, regex error in AS-path filter, community match that is too broad. For missing routes: verify the outbound policy on the advertising peer is not filtering the prefix. On JunOS, a peer with no export policy sends nothing by default — this is the most common JunOS-specific omission. ### Step 6: Convergence Assessment Evaluate convergence behavior and route stability. **[Cisco]** ``` show bgp ipv4 unicast dampening dampened-paths ``` **[JunOS]** ``` show route damping suppressed ``` **[EOS]** ``` show ip bgp dampening dampened-paths ``` Check for dampened (suppressed) routes — these indicate persistent flapping. Review the BGP update activity: high update/withdrawal rates indicate churn. After a planned change, measure convergence time from the change event to the last BGP update. Compare against the target convergence window. ## Threshold Tables Operational parameter norms for BGP — these are protocol-level expectations, not device resource thresholds. | Parameter | Cisco Default | JunOS Default | EOS Default | Notes | |-----------|--------------|---------------|-------------|-------| | Hold Timer | 180s | 90s | 180s | Negotiated to lower value | | Keepalive Interval | 60s | 30s | 60s | Hold/3 by convention | | ConnectRetry Timer | 120s | Varies | 120s | Time between TCP attempts | | MRAI (eBGP) | 30s | 0s (immediate) | 30s | Minimum Route Advertisement Interval | | MRAI (iBGP) | 5s | 0s | 5s | Lower than eBGP for faster iBGP convergence | | Default Local Pref | 100 | 100 | 100 | Same across vendors | **Table Size Norms (IPv4 unicast):** | Deployment Type | Expected Prefixes | Warning | Critical | |----------------|-------------------|---------|----------| | Internet edge (full table) | ~950K | >1M | >1.1M | | Internet edge (partial) | 5K–100K | Varies | Per design | | Enterprise WAN | 100–10K | >2x baseline | >5x baseline | | Data center leaf | 50–5K | >2x baseline | >5x baseline | **Convergence Targets:** | Scenario | Target | Acceptable | Degraded | |----------|--------|------------|----------| | eBGP failover | < 90s | 90–180s | > 180s | | iBGP reconvergence | < 30s | 30–60s | > 60s | | Full table reload | < 5min | 5–10min | > 10min | ## Decision Trees ### Peer Not Established ``` Peer not in Established state ├── State: Idle │ ├── Admin shut? → Check config for "neighbor shutdown" / "deactivate" │ ├── No route to peer? → Check IGP/static route to peer address │ ├── Prefix-limit exceeded? → Check logs for max-prefix teardown │ └── AS mismatch? → Verify "remote-as" matches peer's local AS │ ├── State: Connect / Active │ ├── Peer reachable? → Ping/traceroute peer address │ │ ├── No → Fix Layer 3 reachability (IGP, static route) │ │ └── Yes → TCP port 179 blocked? │ │ ├── ACL on local device? → Check interface/control-plane ACL │ │ ├── ACL on remote device? → Check remote inbound ACL │ │ └── Firewall between peers? → Verify TCP/179 permitted both directions │ └── Peer configured? → Verify remote has this router as neighbor │ ├── State: OpenSent │ ├── Remote not sending OPEN → Peer may not be configured for this neighbor │ ├── TCP resets after connect → Check for TTL issues (eBGP multihop) │ └── Authentication? → Verify MD5/TCP-AO passwords match both sides │ ├── State: OpenConfirm │ ├── Capability mismatch? → Check AFI/SAFI (IPv4/IPv6/VPNv4) match │ ├── AS mismatch in OPEN? → Verify configured AS matches OPEN AS │ ├── Hold timer = 0 on one side? → Both peers must agree or both use 0 │ └── Check NOTIFICATION message → Decode error code/subcode │ └── Established but dropping ├── Hold timer expiry? → Keepalives not arriving (CPU, QoS, path issue) ├── NOTIFICATION received? → Decode error code for root cause ├── Route refresh storm? → Peer sending excessive route-refresh requests └── Max-prefix limit? → Peer sending more prefixes than limit allows ``` ### Unexpected Route Selection ``` Wrong path selected for prefix ├── Check Weight (Cisco/EOS only) │ └── Weight set via route-map? → Highest weight wins ├── Check Local Preference │ └── Local pref differs? → Set via inbound route-map; highest wins ├── Check AS Path Length │ ├── AS prepending applied? → Verify prepend count │ └── AS path differs? → Shortest wins ├── Check Origin │ └── IGP vs Incomplete? → IGP (network statement) preferred ├── Check MED │ ├── MED comparison enabled across AS? → "always-compare-med" │ └── MED set correctly? → Lowest wins within same neighbor AS ├── Check eBGP vs iBGP │ └── External path preferred over internal if equal above ├── Check IGP metric to next-hop │ └── Closest exit wins (hot-potato routing) └── All equal → Lowest Router ID wins (or oldest route if stable) ``` ## Report Template ``` BGP ANALYSIS REPORT ==================== Device: [hostname] Vendor: [Cisco | JunOS | EOS] Check Time: [timestamp] Performed By: [operator/agent] SESSION STATUS: - Total configured peers: [n] - Established: [n] | Not Established: [n] - Peers requiring attention: [list with FSM states] FINDINGS: 1. [Severity] [Category] — [Description] Peer: [neighbor address / AS] Observed: [state or metric] Expected: [normal state or value] Root Cause: [diagnosis from decision tree] Action: [recommended remediation] PATH ANALYSIS: - Prefix: [prefix under review] - Selected path via: [next-hop / AS path] - Selecting attribute: [which best-path attribute decided] - Expected path: [if different, what was expected and why] ROUTE TABLE SUMMARY: - IPv4 prefixes received: [total across all peers] - Baseline deviation: [% change from expected] RECOMMENDATIONS: - [Prioritized action list] NEXT CHECK: [based on severity — CRITICAL: 4hr, WARNING: 24hr, HEALTHY: 7d] ``` ## Troubleshooting ### Session Flapping Peer cycles between Established and Idle/Active repeatedly. Common causes: unstable underlying transport (IGP flap, link errors), aggressive hold timers on congested control planes, or MTU issues on the path causing fragmented keepalives to be dropped. Check `last reset reason` and correlate with interface or IGP events at the same timestamps. ### Route Oscillation The same prefix alternates between two or more paths. Caused by inconsistent MED comparison across route reflectors, or deterministic-MED not enabled when multiple exit points exist to the same neighbor AS. Enable `always-compare-med` and `deterministic-med` to stabilize. ### Memory Pressure from Full Table Full Internet table (~950K IPv4 prefixes) requires 1–2 GB of RIB memory depending on path diversity. Symptoms: slow convergence, peer resets during table reload. Mitigate with soft-reconfiguration inbound (trades memory for stability) or ORF (Outbound Route Filtering) to reduce inbound load. ### Community Stripping Routes arrive without expected communities. Check each transit AS in the path — many providers strip non-standard communities by default. Use large communities (RFC 8092) for end-to-end propagation across providers that strip standard communities. ### JunOS Default Export Policy JunOS sends no routes to a peer without an explicit export policy. If a peer shows Established with zero prefixes sent, add an export policy. This is the most common JunOS-specific BGP issue and does not occur on Cisco or EOS.

标签

skill ai

通过对话安装

该技能支持在以下平台通过对话安装:

OpenClaw WorkBuddy QClaw Kimi Claude

方式一:安装 SkillHub 和技能

帮我安装 SkillHub 和 bgp-analysis-1776106413 技能

方式二:设置 SkillHub 为优先技能安装源

设置 SkillHub 为我的优先技能安装源,然后帮我安装 bgp-analysis-1776106413 技能

通过命令行安装

skillhub install bgp-analysis-1776106413

下载 Zip 包

⬇ 下载 bgp-analysis v0.1.1

文件大小: 12.56 KB | 发布时间: 2026-4-14 14:37

v0.1.1 最新 2026-4-14 14:37
- Added CLI syntax and BGP state machine reference documents to the skill package.
- Updated SKILL.md metadata with openclaw safety, requirements, and tagging information.
- No changes made to analysis procedures or core documentation.

Archiver·手机版·闲社网·闲社论坛·羊毛社区· 多链控股集团有限公司 · 苏ICP备2025199260号-1

Powered by Discuz! X5.0   © 2024-2025 闲社网·线报更新论坛·羊毛分享社区·http://xianshe.com

p2p_official_large
返回顶部