SR OS peering telemetry dashboard

SR OS peering telemetry dashboard

Go to the full dashboard (telemetry.srexperts.net)

The peering telemetry dashboard visualizes several important system and control plane performance metrics about the health of the router, as well as Border Gateway Protocol (BGP) routing and peering information. Port and interface statistics are also visualized to provide a complete example of what can be visualized. The router that we're monitoring forwards little traffic though, so the graphs mostly show control plane statistics. The focus of this visualization demonstration is on the information that we can show about internet peering and routing statistics collected over a long period of time. Other demonstrations show port, link, and traffic visualizations with comprehensive dashboards, such as the SR OS Lab or the Nokia SR Linux Streaming Telemetry Lab.

The following settings are used in gNMIc for state path statistics subscriptions:

mode: stream
stream-mode: sample
sample-interval: 30s

SR OS supports sampling intervals as low as 1s and ON_CHANGE or TARGET_DEFINED subscriptions, but SAMPLE subscriptions at 30s intervals work well for this demonstration.

The next sections describe the dashboard in detail. Several rows and panels are used to group relevant peering telemetry information. In each panel, you can select one source by clicking and multiple sources by shift-clicking them in the legend.

System status

These panels show CPU, memory, and chassis information.

CPU utilization

This panel shows the historical and current 60s CPU utilization from /state/system/cpu[sample-period=60].

You will notice that there are regular spikes in CPU utilization a few times per day. These spikes are not due to the gNMI subscriptions, which have negligible CPU utilization, but due to CLI scripts running show commands with large outputs that are collected daily. Consistent high CPU utilization or spikes may indicate churn or flapping and should be investigated.

Memory utilization

This panel shows the memory utilization from /state/system/memory-pools/summary with colored thresholds. The used utilization percentage is calculated as ((available-memory + current-total-size) - total-in-use) / (available-memory + current-total-size) * 100.

Memory utilization increases when more protocols and services are configured. Changes in memory utilization may indicate churn or flapping and should be investigated.

Temperature

This panel shows the temperatures of the installed Control Processor Modules (CPM), Input/Output Modules (IOM), and Media Dependent Adapters (MDA) from the following paths:

  • /state/card[slot-number=*]/hardware-data/temperature
  • /state/card[slot-number=*]/mda[mda-slot=*]/hardware-data/temperature
  • /state/chassis[chassis-class=router][chassis-number=1]/chassis-control-module[ccm-slot=*]/hardware-data/temperature

The temperatures should be monitored to ensure that the router operates in the environment that it is designed for. Consistent high temperatures or spikes may indicate a facility cooling problem or a fan failure and should be investigated.

Fan speed

This panel shows the fan speed percentage from /state/chassis[chassis-class=router][chassis-number=1]/fan[fan-slot=*]/speed with colored thresholds for low, medium, high, and full speeds.

SR OS uses intelligent fan controls that respond to changes in temperature. Consistent high fan speed or variations in fan speed may indicate a chassis cooling or airflow problem and should be investigated.

BGP statistics

These panels show BGP neighbor, route, and message information.

BGP neighbors

These panels show how many BGP neighbors are up by summing /state/router[router-name=Base]/bgp/convergence/family[family-type=*]/up-peers and how many are down by subtracting the sum of up-peers from /state/router[router-name=Base]/bgp/peers. If any neighbors are down, the color of the number will change from green to red. The number of neighbor families is the sum of /state/router[router-name=Base]/bgp/convergence/family[family-type=*]/up-peers for the IPv4 and IPv6 address families.

If a BGP neighbor is down, the cause should be investigated.

RPKI records

This panel shows the number of IPv4 Resource Public Key Infrastructure (RPKI) records from /state/router[router-name=Base]/origin-validation/rpki-session[ip-address=x.x.x.x]/active-ipv4-records and the number of IPv6 RPKI records from /state/router[router-name=Base]/origin-validation/rpki-session[ip-address=x.x.x.x]/active-ipv6-records.

The number of RPKI records grows steadily over time as operators deploy Route Origin Validation (ROV). A sudden drop may indicate a problem with the validator sessions or software and should be investigated.

BGP routes

This panel shows the number of BGP Routing Information Base (RIB) and Forwarding Information Base (FIB) routes for IPv4 from /state/router[router-name=Base]/bgp/statistics/routes-per-family/ipv4 and for IPv6 from /state/router[router-name=Base]/bgp/statistics/routes-per-family/ipv6.

The number of routes grows steadily over time as more are announced in the global internet routing table. Drops or spikes may indicate churn or flapping and should be investigated.

BGP paths

This panel shows the number of BGP paths from /state/router[router-name=Base]/bgp/statistics/bgp-paths.

The number of paths changes over time as routing information in the global internet routing table changes. Drops or spikes may indicate churn or flapping and should be investigated.

BGP messages

These panels show the number of BGP messages per neighbor in a 60s interval for the 10 most active neighbors from /state/router[router-name=Base]/bgp/neighbor[ip-address=*]/statistics/received/messages and /state/router[router-name=Base]/bgp/neighbor[ip-address=*]/statistics/sent/messages. The total number of messages is the sum of the messages from all neighbors. Received messages are shown as a positive number, and sent messages are shown as a negative number.

The number of BGP messages for a neighbor varies based on how much routing information the neighbor announces and how often the routing information changes. For example, a neighbor announcing a full routing table for transit will send many more messages than a neighbor announcing a few routes for private peering. This router only announces four stable routes and has one IBGP neighbor, so there are few sent messages. Persistent spikes may indicate churn or flapping and should be investigated.

BGP RIB and FIB statistics

These panels show information about BGP RIB and FIB statistics.

RIB families

This panel shows the BGP RIB address family percentages for IPv4 from /state/router[router-name=Base]/bgp/statistics/routes-per-family/ipv4/remote-routes and for IPv6 from /state/router[router-name=Base]/bgp/statistics/routes-per-family/ipv6/remote-routes.

The number of routes grows steadily over time as more are announced in the global internet routing table, and the percentage of IPv6 routes is growing as networks transition to IPv6. A large change may indicate a problem with one of the address families and should be investigated.

RIB top 10

These panels show the 10 most received RIB routes by BGP neighbors for IPv4 from /state/router[router-name=Base]/bgp/neighbor[ip-address=*]/statistics/family-prefix/ipv4/received and for IPv6 from /state/router[router-name=Base]/bgp/neighbor[ip-address=*]/statistics/family-prefix/ipv6/received.

These neighbors are announcing a full routing table for transit or a large number of routes; for example, from peering with the DE-CIX route servers. A change may indicate that a neighbor announcing a large number of routes was added or went down and should be investigated.

FIB families

This panel shows the BGP FIB address family percentages for IPv4 from /state/router[router-name=Base]/bgp/statistics/routes-per-family/ipv4/remote-active-routes and for IPv6 from /state/router[router-name=Base]/bgp/statistics/routes-per-family/ipv6/remote-active-routes.

The number of routes grows steadily over time as more are announced in the global internet routing table, and the percentage of IPv6 routes is growing as networks transition to IPv6. A large change may indicate a problem with one of the address families, and should be investigated.

The RIB and FIB address family percentages for IPv4 and IPv6 are similar because the best path is selected from the RIB and installed in the FIB.

FIB top 10

These panels show the 10 most installed FIB routes from BGP neighbors for IPv4 from /state/router[router-name=Base]/bgp/neighbor[ip-address=*]/statistics/family-prefix/ipv4/active and for IPv6 from /state/router[router-name=Base]/bgp/neighbor[ip-address=*]/statistics/family-prefix/ipv6/active.

These neighbors are announcing a full routing table for transit or a large number of routes; for example, from peering with the DE-CIX route servers. A change may indicate that a neighbor announcing a large number of routes was added or went down and should be investigated.

The RIB and FIB top 10 percentages for IPv4 and IPv6 are noticeably different because the best path is selected from the RIB and installed in the FIB. While a neighbor may be announcing a large number of routes, these routes may not be installed as the best route. In this case, a route policy prefers the transit routes from one neighbor over another.

Traffic statistics

These panels show traffic statistics.

Port traffic

This panel shows the aggregated physical port input and output byte counters in a 60s interval from /state/port[port-id=*]/statistics/in-octets and /state/port[port-id=*]/statistics/out-octets in one graph. These state paths correspond to the SNMP IF-MIB ifHCInOctets and ifHCOutOctets OIDs.

Input traffic is shown as a positive number, and output traffic is shown as a negative number. This router forwards very little traffic except for control plane traffic, as you can see in the graphs.

Interface traffic

This panel shows the interface input and output byte counters in a 60s interval from /state/router[router-name=Base]/interface[interface-name=*]/statistics/in-octets and /state/router[router-name=Base]/interface[interface-name=*]/statistics/out-octets in one graph. These state paths correspond to the SNMP IF-MIB ifHCInOctets and ifHCOutOctets OIDs.

Input traffic is shown as a positive number, and output traffic is shown as a negative number. This router forwards very little traffic except for control plane traffic, as you can see in the graphs.

CPM MAC filter

This panel shows the CPM MAC filter input packet counters in a 60s interval from /state/system/security/cpm-filter/mac-filter/entry[entry-id=*]/forwarded-packets. The CPM MAC filter is configured to forward only ARP, IPv4, and IPv6 frame types and to drop all others. The three forwarded frame types are the only ones that carry valid traffic on IXP peering ports.

Forwarded packets are shown as a positive number, and all dropped packets are shown as a negative number. Spikes indicate that more of a particular frame type were received by the CPM. The spikes in IPv6 are due to CLI scripts running show commands with large outputs that are collected daily.

CPM IPv4 filter

This panel shows the CPM IPv4 filter input packet counters in a 60s interval from /state/system/security/cpm-filter/ip-filter/entry[entry-id=*]/forwarded-packets and from /state/system/security/cpm-filter/ip-filter/entry[entry-id=*]/dropped-packets. The CPM IPv4 filter is configured to forward several packet types for routing and management and to drop all other packet types that are invalid on IXP peering ports.

Forwarded packets are shown as a positive number, and dropped packets are shown as a negative number. Spikes indicate that more packets of a particular IPv4 packet type were received by the CPM. You can also see a constant low gRPC packet rate that is streaming telemetry to the gNMIc client.

CPM IPv6 filter

This panel shows the CPM IPv6 filter input packet counters in a 60s interval from /state/system/security/cpm-filter/ipv6-filter/entry[entry-id=*]/forwarded-packets and from /state/system/security/cpm-filter/ipv6-filter/entry[entry-id=*]/dropped-packets. The CPM IPv6 filter is configured to forward several packet types for routing and management and to drop all other packet types that are invalid on IXP peering ports.

Forwarded packets are shown as a positive number, and dropped packets are shown as a negative number. Spikes indicate that more packets of a particular IPv6 packet type were received by the CPM. The spikes in IPv6 are due to CLI scripts running show commands with large outputs that are collected daily.

Port statistics

These panels show port statistics.

Port packet rates

This panel shows individual physical port input and output packet rates in a 60s interval from the following paths:

  • /state/port[port-id=1/1/2]/ethernet/statistics/in-errors
  • /state/port[port-id=1/1/2]/ethernet/statistics/in-broadcast-packets
  • /state/port[port-id=1/1/2]/ethernet/statistics/in-multicast-packets
  • /state/port[port-id=1/1/2]/ethernet/statistics/in-unicast-packets
  • /state/port[port-id=1/1/2]/ethernet/statistics/out-discards
  • /state/port[port-id=1/1/2]/ethernet/statistics/out-errors
  • /state/port[port-id=1/1/2]/ethernet/statistics/out-broadcast-packets
  • /state/port[port-id=1/1/2]/ethernet/statistics/out-multicast-packets
  • /state/port[port-id=1/1/2]/ethernet/statistics/out-unicast-packets

These state paths correspond to the SNMP IF-MIB ifInDiscards, ifInErrors, ifHCInBroadcastPkts, ifHCOutMulticastPkts, ifHCInUcastPkts, ifOutDiscards, ifOutErrors, ifHCOutBroadcastPkts, ifHCInMulticastPkts, and ifHCOutUcastPkts OIDs.

Input packet rates are shown as a positive number, and output packet rates are shown as a negative number. This router forwards very little traffic except for control plane traffic, as you can see in the graphs.

Port packet sizes

This panel shows individual physical port packet sizes in a 60s interval from several paths:

  • /state/port[port-id=1/1/2]/ethernet/statistics/packet-size/octets-64
  • /state/port[port-id=1/1/2]/ethernet/statistics/packet-size/octets-65-to-127
  • /state/port[port-id=1/1/2]/ethernet/statistics/packet-size/octets-128-to-255
  • /state/port[port-id=1/1/2]/ethernet/statistics/packet-size/octets-256-to-511
  • /state/port[port-id=1/1/2]/ethernet/statistics/packet-size/octets-512-to-1023
  • /state/port[port-id=1/1/2]/ethernet/statistics/packet-size/octets-1024-to-1518
  • /state/port[port-id=1/1/2]/ethernet/statistics/packet-size/octets-1519-to-max

These state paths correspond to the SNMP HC-RMON-MIB etherStatsHighCapacityPkts64Octets, etherStatsHighCapacityPkts65to127Octets, etherStatsHighCapacityPkts128to255Octets, etherStatsHighCapacityPkts256to511Octets, etherStatsHighCapacityPkts512to1023Octets, and etherStatsHighCapacityPkts1024to1518Octets OIDs.

Input packet rates are shown as a positive number, and output packet rates are shown as a negative number. This router forwards very little traffic except for control plane traffic, which is mostly 64-byte packets.

DDM transmit output power

This panel shows the Digital Diagnostic Monitoring (DDM) transmit output power from /state/port[port-id=1/1/2]/transceiver/digital-diagnostic-monitoring/transmit-output-power/current with several thresholds based on the high-alarm, high-warning, low-alarm, and low-warning values.

Sudden changes or a steady change in value over time may indicate an optical problem or failing optic and should be investigated.

DDM received optical power

This panel shows the DDM received optical power from /state/port[port-id=1/1/2]/transceiver/digital-diagnostic-monitoring/received-optical-power/current with several thresholds based on the high-alarm, high-warning, low-alarm, and low-warning values.

Sudden changes or a steady change in value over time may indicate an optical problem or failing optic and should be investigated.

On this page