Fault Management REST/Kafka API

Overview

This Fault Management REST/Kafka API Tutorial is intended to demonstrate the typical fault management alarm retrieval process required to access the NSP FM alarm data. It is intended to show the request flow required to be able to access the fault management alarm data via REST, and to subscribe to the relevant Kafka fault management topic. In this use case, the topic in question is the alarm management topic NSP-FAULT.

The purpose is to demonstrate alarm samples from REST and Kafka, for the model driven managed ( MDM ) source, and NFM-P source ( classic )

Whilst alarm data sequences can be dependent on configurable alarms settings, it also aims to demonstrate a typical alarm object cycle in terms of alarm-create / alarm-change / alarm-delete via the NSP-FAULT topic. The intent is to inform any potential user about the typical expected event flow required to be able to maintain the alarm data via the north bound API, for example, alarm clearing, alarm correlation ( causes/impacts )

Whilst alarm filters can be extremely specific and varied, the tutorial also aims to demonstrate a few basic alarm filters and a more specific filter

Whilst the purpose of the tutorial is to provide important aspects of the typical FM alarm management process, it is important to note that all developments intended to access the NSP FM alarms, should aim to test and evaluate alarm data in a customer/lab environment, which would also include the effects of any alarm policy settings, the general alarm settings and any alarm filters that might be required by an end user.

Prerequisite

The assumptions for this use case are:

  1. The user has retrieved a token to be able to use the REST API services. The token is refreshed before expiring ( after 60 minutes ) -- See the Access & Authentication API Tutorial for more information.
  2. In this tutorial, Kafka samples are retrieved using a third party client application called Offset Explorer, formerly called KafkaTool. This tutorial is based on NSP being used in a secure TLS enabled environment. Any Kafka client application used to monitor alarms from the NSP-FAULT topic must have the necessary TLS certificate for the NSP gateway server ( SDN gateway ), which is typically imported into the relevant TLS keyStore/trustStore

Alarm Settings

The NSP FM alarm settings can affect the alarm clearing/deletion behaviour within the GUI FM application, but also the behaviour of the event sequence through the alarm notification mechanism, in this tutorial case, the Kafka topic NSP-FAULT.

The settings can affect how alarms are deleted, for example, whether to delete alarms on manual clearing / acknowledgement etc. It is even possible to specify that alarms are not deleted. The alarm settings are therefore important to the alarm object life-cycle

The alarms settings are available from within the NSP Fault Management application ( global settings ), but also the NFM-P GUI alarm settings. The NSP Fault Management application alarm settings apply to most source type domains ( MDM, NSP, NFM-T ), but not the NFM-P alarms. The NFM-P alarm settings must be configured from the NFM-P GUI alarm settings.

This tutorial uses the default alarm settings, this might be considered as typically deployed settings, where it is often desirable that alarms are automatically removed from the current alarms list when a network fault has been restored and the original alarm has been cleared. In such a scenario, the originally created alarm is deleted within the FM application, and an alarm-deletion event is sent via the NSP-FAULT Kafka topic, that references the original alarm object FDN ( full distinguished name )

Alarm settings default

If enabled under the settings, when an alarm is manually cleared, an alarm-change event to indicate the severity change would also be sent via the NSP-FAULT topic. This would be an additional event that is not part of the typical alarm life-cycle that would occur for normally automatically clearing alarms. For that reason, and due to the implementation within the FM component and NFM-P, it is mandated that alarms maintained via the NSP-FAULT topic, use the delete notification to represent the clearing of the fault condition. In normal usage, the severity of an alarm can be changed to "cleared", but it is feasible for alarms to persist in the application with a severity set to "cleared". Whether or not the alarm is deleted in this scenario, will be dependent on the alarm settings configuration, and be reflected in the NSP-FAULT notifications.

An API Client can use the Fault Management REST API services to perform management operations on alarms that include acknowledging an alarm, with or without an acknowledgement note.  Also, an API Client can unacknowledge an alarm; update an alarm with attributes changes; as well as delete an alarm.


Typical FM REST Use Case Methods

Dependencies:

  • User credentials are required to obtain the tokens respectively for access to the Fault Management application. Those tokens retrieved from NSP will expire after 60 minutes, and should be refreshed in order to ensure authorisation validity. These methods are included in the collection and in the typical order of sequence. A suggested 15 minute repeat interval for token refresh ( step 5 ) and Kafka subscription renewal ( step 6 ) is typical. In order to reduce load and to limit resource use on the NSP server side, it is not advisable to refresh tokens and renew Kafka subscriptions too frequently.
  • See the Access & Authentication API Tutorial for more information, and the Managing a Persistent Subscription Tutorial

The following postman collection has sample requests/responses for the Use Case described below.

Get Postman Collection

Typical Use Case Methods (1-6 )


1) POST Initial Authentication - Get Token

This token is required for initial authentication and the subsequent REST requests below


2) GET alarms details ( all alarms )

This method request returns all alarms from the FM application component. For the purpose of this tutorial, the included postman collection has sample responses contained within, alarms from MDM , NFM-P, some system alarms from NSP, and also alarms from model driven managed multi-vendor devices

There are no NFM-T alarms used in the context of this tutorial.

The three samples below are listed for convenience to demonstrate the available properties and data values, with some notable observations

NFM-P sample

{
                "fdn": "fdn:model:fm:Alarm:11699",
                "objectFullName": "faultManager:network@10.10.10.4@router-1@ospf-v2@areaSite-0.0.0.1@interface-4|alarm-141-24-112",
                "sourceType": "nfmp",
                "sourceSystem": "fdn:realm:sam",
                "severity": "warning",
                "previousSeverity": "indeterminate",
                "originalSeverity": "warning",
                "highestSeverity": "warning",
                "probableCause": "OspfInterfaceDown",
                "alarmName": "OspfInterfaceDown",
                "specificProblem": "Not Applicable",
                "alarmType": "OspfInterfaceDown",
                "affectedObject": "network:10.10.10.4:router-1:ospf-v2:areaSite-0.0.0.1:interface-4",
                "affectedObjectType": "ospf.Interface",
                "affectedObjectName": "toSR2",
                "acknowledged": false,
                "wasAcknowledged": false,
                "acknowledgedBy": "N/A",
                "clearedBy": "N/A",
                "deletedBy": "N/A",
                "firstTimeDetected": 1680301104293,
                "lastTimeDetected": 1680301104293,
                "lastTimeSeverityChanged": 0,
                "lastTimeEscalated": null,
                "lastTimeDeEscalated": null,
                "lastTimeCleared": 0,
                "lastTimeAcknowledged": 0,
                "nodeTimeOffset": -1,
                "frequency": null,
                "numberOfOccurrences": 1,
                "numberOfOccurrencesSinceClear": 1,
                "numberOfOccurrencesSinceAck": 0,
                "serviceAffecting": false,
                "implicitlyCleared": true,
                "additionalText": "N/A",
                "neId": "10.10.10.4",
                "neName": "SR4",
                "userText": "N/A",
                "adminState": "unlocked",
                "impact": 1,
                "rootCause": true
            }

MDM sample

{
                "fdn": "fdn:model:fm:Alarm:13125",
                "objectFullName": "10.10.10.2:fm:Alarm:/port[port-id='1/1/19']/linkDown",
                "sourceType": "mdm",
                "sourceSystem": "fdn:app:mdm-ami-cmodel",
                "severity": "major",
                "previousSeverity": "indeterminate",
                "originalSeverity": "major",
                "highestSeverity": "major",
                "probableCause": "equipmentMalfunction",
                "alarmName": "LinkDown",
                "specificProblem": null,
                "alarmType": "processingErrorAlarm",
                "affectedObject": "10.10.10.2:equipment:Equipment:/port[port-id='1/1/19']",
                "affectedObjectType": "equipment.Equipment",
                "affectedObjectName": "port=1/1/19",
                "acknowledged": false,
                "wasAcknowledged": false,
                "acknowledgedBy": "N/A",
                "clearedBy": "N/A",
                "deletedBy": "N/A",
                "firstTimeDetected": 1680023147300,
                "lastTimeDetected": 1680023147300,
                "lastTimeSeverityChanged": null,
                "lastTimeEscalated": null,
                "lastTimeDeEscalated": null,
                "lastTimeCleared": null,
                "lastTimeAcknowledged": null,
                "nodeTimeOffset": -1,
                "frequency": null,
                "numberOfOccurrences": 1,
                "numberOfOccurrencesSinceClear": 0,
                "numberOfOccurrencesSinceAck": 0,
                "serviceAffecting": null,
                "implicitlyCleared": true,
                "additionalText": "Interface 1/1/19 is not operational",
                "neId": "10.10.10.2",
                "neName": "SR2",
                "userText": "N/A",
                "adminState": "unlocked",
                "impact": 0,
                "rootCause": true
            }

NSP sample

{
                "fdn": "fdn:model:fm:Alarm:150",
                "objectFullName": "nsp-mdt-nsp-mediator-8678f9b7b6-zvpgj:kubernetes-node",
                "sourceType": "nsp",
                "sourceSystem": "fdn:app:server",
                "severity": "major",
                "previousSeverity": "indeterminate",
                "originalSeverity": "major",
                "highestSeverity": "major",
                "probableCause": "systemFailed",
                "alarmName": "NspApplicationPodDown",
                "specificProblem": "N/A",
                "alarmType": "communicationsAlarm",
                "affectedObject": "nsp-mdt-nsp-mediator-8678f9b7b6-zvpgj:kubernetes-node",
                "affectedObjectType": "NmsSystem",
                "affectedObjectName": "nsp-mdt-nsp-mediator-8678f9b7b6-zvpgj:kubernetes-node",
                "acknowledged": false,
                "wasAcknowledged": false,
                "acknowledgedBy": "N/A",
                "clearedBy": "N/A",
                "deletedBy": "N/A",
                "firstTimeDetected": 1680285422890,
                "lastTimeDetected": 1680285482239,
                "lastTimeSeverityChanged": null,
                "lastTimeEscalated": null,
                "lastTimeDeEscalated": null,
                "lastTimeCleared": null,
                "lastTimeAcknowledged": null,
                "nodeTimeOffset": -1,
                "frequency": null,
                "numberOfOccurrences": 2,
                "numberOfOccurrencesSinceClear": 0,
                "numberOfOccurrencesSinceAck": 0,
                "serviceAffecting": false,
                "implicitlyCleared": false,
                "additionalText": "nsp-mdt-nsp-mediator-8678f9b7b6-zvpgj is not in running state",
                "neId": "nsp-mdt-nsp-mediator-8678f9b7b6-zvpgj:kubernetes-node",
                "neName": "nsp-mdt-nsp-mediator-8678f9b7b6-zvpgj:kubernetes-node",
                "userText": "N/A",
                "adminState": "unknown",
                "impact": 0,
                "rootCause": true
            }

REST Alarm Sample Observations

The REST samples above do not include impacted alarm object information. This data can be included by setting the flag includeRootCauseAndImpactDetails=true in the REST request. This information and sample responses are documented on the main portal fault-management-apis

Three sample responses are included in the 'Get Alarms details' method in the postman collection, the third sample includes the cause and impact details.

The rootCause value is initially created with a null value in new alarms. This is updated once correlation algorithms are processed, and may become true or false, depending on any correlation that may relate other impacted alarms.

The correlation process is an ongoing dynamic process, causes/impacts fields are updated after initial alarm creation and notified through the NSP-FAULT Kafka notifications.

The 'impact' count property will also be updated during this process. It is therefore important to maintain such values in the user application through the event driven notifications.

The includeRootCauseAndImpactDetails flag can also be set to true in the advancedFilter to include impacted alarm object information in the NSP-FAULT topic events ( please refer to NSP-FAULT Kafka samples in step 3 below )

The 'fdn' property is the unique alarm object identifier used to represent alarm objects in the NSP FM component, and is common to all sourceTypes. This is the key naming attribute for Kafka NSP-FAULT topic consumers, used for alarm object identification, and is essential to key on for subsequent alarm life-cycle notifications. This FDN value ( eg: "fdn:model:fm:Alarm:150" ) is used in all alarm create/change/delete notifications.

Alarms from different sourceType can differ in their format with respect to object naming. Whilst the NSP-FAULT receives notifications at a common central point, values like objectFullName and affectedObject are not standardised between sourceType, but this alarm object containment data is defined as a list of colon separated objects for all types.

for example:

Alarm from sourceType mdm identifies a port object as:

"affectedObject": "10.10.10.2:equipment:Equipment:/port[port-id='1/1/19']"

Alarm from sourceType nfmp identifies a port object as:

"affectedObject": "network:10.10.10.5:shelf-1:cardSlot-1:card:daughterCardSlot-1:daughterCard:port-3"

Alarm from sourceType nfmt identifies a port object as:

"affectedObject": "network:192.168.100.77:shelf-2:slot-8:card:port-C1"

ProbableCause, alarmName and alarmType values will also differ between different sources, and is a reflection of the different technologies that the FM application component is able to receive alarms from.

For example:

NFM-P sample

alarmName "LinkDown" has a probableCause of 'portLinkProblem' and alarmType 'communicationsAlarm'

MDM sample

alarmName "LinkDown" has a probableCause of 'equipmentMalfunction' and alarmType ' processingErrorAlarm'

NFM-P alarms use an object naming that should be familiar to NFM-P JMS users, hence the objectFullName uses an appended 3 field problem code , ie: alarm-141-24-112

The Alarm Search tools included in NFM-P and NSP are a very useful resource to define the possible alarm values used in the FM component. Those values are also visible through the REST and Kafka NSP-FAULT APIs

For the vast majority of NFM-P / NSP and MD managed entities, the specificProblem value is mostly not populated. There are specific exceptions to this, these values may be populated for alarms from NFM-P managed LTE and eNB entities. Some MD managed multi-vendor entities may also implement a specificProblem value.

3) POST Create Subscription NSP-FAULT topic ( all alarms with causes/impacts and alarm change event details)

This method creates the required Kafka notification topic subscription NSP-FAULT necessary for listening to alarms from the NSP FM application. No filter is specified in this sample, resulting in all alarms being recorded on the topic.

The following example shows a successful subscription response:

{
"response": {
"status": 0,
"startRow": 0,
"endRow": 0,
"totalRows": 1,
"data": {
"subscriptionId": "27bceab8-9738-4557-9ac5-da465d25e9b4",
"clientId": null,
"topicId": "ns-eg-27bceab8-9738-4557-9ac5-da465d25e9b4",
"timeOfSubscription": 1603902987820,
"expiresAt": 1603906587820,
"stage": "ACTIVE",
"persisted": true
},
"errors": null
}
}

Once the the topicId instance is obtained, the user can consume the events from the NSP-FAULT topic in the same way as any other Kafka topic registered on the target NSP server. The process to determine the correct server and the active Kafka component resource will depend on the type of NSP redundancy configured, and is beyond the scope of this particular tutorial, but more information on managing a persistent subscription topic in a redundant set-up can be found in the Managing a Persistent Subscription tutorial

The Kafkatool client ( Offset Explorer ) used for the purpose of this tutorial is free for use under limited conditions ( evaluation/educational purposes ), but a license is required for commercial use. The attached Kafkatool word document details the required configuration for a secure TLS Kafka connection

In this tutorial using Kafkatool , where a containerized NSP set-up is being used, the keystore/trustStore has been copied from the NSP SDN server where the nspos component and Kafka services are running and serve as the registration point of access. It is not advisable to use the keystore/trustStore from underlying components like NFM-P or NFM-T in a containerized set-up where the target resource API runs on the NSP SDN server, the certificate may work, but host verification may need to be disabled.

As mentioned previously, Kafka client applications used to monitor alarms from the NSP-FAULT topic must have the necessary TLS certificate for the NSP gateway server ( SDN gateway ), this is typically imported into the relevant TLS keyStore/trustStore on the client side.

The attached Kafkatool word document defines parameters for the relevant NSP IP, TLS keystore/trustStore, passwords, Zookeeper host/port and bootstrap IP settings.

Once connected to the server, the Kafka client can consume the events stored in the subscription topicId, in this instance, with the id from the response above:

ns-eg-27bceab8-9738-4557-9ac5-da465d25e9b4

The below alarm notifications are categorised according to their sourceType, ie:

'nfmp'

'mdm'

'nsp'

and also their event types, ie:

nsp-fault:alarm-create

nsp-fault:alarm-change

nsp-fault:alarm-delete

Kafka NSP-FAULT Alarm create notification sample ( from NFM-P )

{
  "data": {
    "ietf-restconf:notification": {
      "eventTime": "2023-04-18T14:55:06Z",
      "nsp-fault:alarm-create": {
        "originalSeverity": "major",
        "neId": "10.10.10.5",
        "lastTimeAcknowledged": 0,
        "acknowledged": false,
        "userText": "N/A",
        "sourceSystem": "fdn:realm:sam",
        "additionalText": "N/A",
        "affectedObject": "network:10.10.10.5:shelf-1:cardSlot-1:card:daughterCardSlot-1:daughterCard:port-40",
        "lastTimeDeEscalated": null,
        "acknowledgedBy": "N/A",
        "lastTimeCleared": 0,
        "neName": "SR5",
        "frequency": null,
        "lastTimeEscalated": null,
        "probableCause": "inoperableEquipment",
        "firstTimeDetected": 1681829656248,
        "adminState": "unlocked",
        "rootCause": null,
        "numberOfOccurrencesSinceAck": 0,
        "nodeTimeOffset": -1,
        "objectId": "fdn:model:fm:Alarm:1321185",
        "severity": "major",
        "affectedObjectName": "Port 1/1/40",
        "clearedBy": "N/A",
        "serviceAffecting": false,
        "numberOfOccurrences": 1,
        "impact": 0,
        "implicitlyCleared": true,
        "alarmName": "EquipmentDown",
        "wasAcknowledged": false,
        "numberOfOccurrencesSinceClear": 1,
        "objectFullName": "faultManager:network@10.10.10.5@shelf-1@cardSlot-1@card@daughterCardSlot-1@daughterCard@port-40|alarm-10-3-8",
        "previousSeverity": "indeterminate",
        "highestSeverity": "major",
        "affectedObjectType": "equipment.PhysicalPort",
        "alarmType": "equipmentAlarm",
        "specificProblem": "Not Applicable",
        "sourceType": "nfmp",
        "lastTimeSeverityChanged": 0,
        "lastTimeDetected": 1681829656248,
        "rootCauseAndImpactDetails": {
          "status": "inProgress",
          "impacts": null,
          "rootCauses": null
        }
      }
    }
  }
}

Kafka NSP-FAULT Alarm change notification sample ( from NFM-P ) with includeAlarmDetailsOnChangeEvent=true

{
  "data": {
    "ietf-restconf:notification": {
      "eventTime": "2023-04-18T14:55:08Z",
      "nsp-fault:alarm-change": {
        "originalSeverity": "major",
        "neId": "10.10.10.5",
        "lastTimeAcknowledged": 0,
        "acknowledged": false,
        "userText": "N/A",
        "sourceSystem": "fdn:realm:sam",
        "additionalText": "N/A",
        "affectedObject": "network:10.10.10.5:shelf-1:cardSlot-1:card:daughterCardSlot-1:daughterCard:port-40",
        "lastTimeDeEscalated": null,
        "acknowledgedBy": "N/A",
        "lastTimeCleared": 0,
        "neName": "SR5",
        "frequency": null,
        "lastTimeEscalated": null,
        "probableCause": "inoperableEquipment",
        "firstTimeDetected": 1681829656248,
        "adminState": "unlocked",
        "rootCause": {
          "old-value": null,
          "new-value": true
        },
        "numberOfOccurrencesSinceAck": 0,
        "nodeTimeOffset": -1,
        "objectId": "fdn:model:fm:Alarm:1321185",
        "severity": "major",
        "affectedObjectName": "Port 1/1/40",
        "clearedBy": "N/A",
        "serviceAffecting": false,
        "numberOfOccurrences": 1,
        "impact": {
          "old-value": 0,
          "new-value": 1
        },
        "implicitlyCleared": true,
        "alarmName": "EquipmentDown",
        "wasAcknowledged": false,
        "numberOfOccurrencesSinceClear": 1,
        "objectFullName": "faultManager:network@10.10.10.5@shelf-1@cardSlot-1@card@daughterCardSlot-1@daughterCard@port-40|alarm-10-3-8",
        "previousSeverity": "indeterminate",
        "highestSeverity": "major",
        "affectedObjectType": "equipment.PhysicalPort",
        "alarmType": "equipmentAlarm",
        "specificProblem": "Not Applicable",
        "sourceType": "nfmp",
        "lastTimeSeverityChanged": 0,
        "lastTimeDetected": 1681829656248,
        "rootCauseAndImpactDetails": {
          "status": "causalityDetailsAvailable",
          "impacts": {
            "alarmFdn": "fdn:model:fm:Alarm:1321185",
            "impacts": [
              {
                "alarmFdn": "fdn:model:fm:Alarm:1321184",
                "impacts": []
              }
            ]
          },
          "rootCauses": [
            "fdn:model:fm:Alarm:1321185"
          ]
        }
      }
    }
  }
}

Kafka NSP-FAULT Alarm delete notification sample

{
  "data": {
    "ietf-restconf:notification": {
      "eventTime": "2023-04-18T15:03:32Z",
      "nsp-fault:alarm-delete": {
        "objectId": "fdn:model:fm:Alarm:1321185"
      }
    }
  }
}

Kafka NSP-FAULT Alarm create notification sample ( MDM )

{
  "data": {
    "ietf-restconf:notification": {
      "eventTime": "2023-04-18T16:09:02Z",
      "nsp-fault:alarm-create": {
        "originalSeverity": "warning",
        "neId": "10.10.10.60",
        "lastTimeAcknowledged": null,
        "acknowledged": false,
        "userText": "N/A",
        "sourceSystem": "fdn:app:mdm-ami-cmodel",
        "additionalText": "",
        "affectedObject": "10.10.10.60:equipment:NetworkElement:10.10.10.60",
        "lastTimeDeEscalated": null,
        "acknowledgedBy": "N/A",
        "lastTimeCleared": null,
        "neName": "SR60",
        "frequency": 1,
        "lastTimeEscalated": null,
        "probableCause": "receiveFailure",
        "firstTimeDetected": 1681834142556,
        "adminState": "unlocked",
        "rootCause": null,
        "numberOfOccurrencesSinceAck": 0,
        "nodeTimeOffset": -1,
        "objectId": "fdn:model:fm:Alarm:1322480",
        "severity": "warning",
        "affectedObjectName": "SR60",
        "clearedBy": "N/A",
        "serviceAffecting": false,
        "numberOfOccurrences": 1,
        "impact": 0,
        "implicitlyCleared": true,
        "alarmName": "PingConnectionProblem",
        "wasAcknowledged": false,
        "numberOfOccurrencesSinceClear": 0,
        "objectFullName": "10.10.10.60:fm:Alarm:PingConnectionProblem-communicationsAlarm-PingConnectionProblem",
        "previousSeverity": "indeterminate",
        "highestSeverity": "warning",
        "affectedObjectType": "necontrol.DiscoveredNe",
        "alarmType": "communicationsAlarm",
        "specificProblem": null,
        "sourceType": "mdm",
        "lastTimeSeverityChanged": null,
        "lastTimeDetected": 1681834142556,
        "rootCauseAndImpactDetails": {
          "status": "inProgress",
          "impacts": null,
          "rootCauses": null
        }
      }
    }
  }
}

Kafka NSP-FAULT Alarm change notification sample ( MDM ) with includeAlarmDetailsOnChangeEvent=true

{
  "data": {
    "ietf-restconf:notification": {
      "eventTime": "2023-04-18T14:34:01Z",
      "nsp-fault:alarm-change": {
        "originalSeverity": "major",
        "neId": "10.10.10.48",
        "lastTimeAcknowledged": null,
        "acknowledged": false,
        "userText": "N/A",
        "sourceSystem": "fdn:app:mdm-ami-cmodel",
        "additionalText": "Subscription failed for node:10.10.10.48 :: Stream cancelled with cause io exception:UNAVAILABLE: io exception:finishConnect(..) failed: Connection refused: /192.168.100.39:57400:finishConnect(..) failed: Connection refused:",
        "affectedObject": "10.10.10.48:equipment:NetworkElement:10.10.10.48",
        "lastTimeDeEscalated": null,
        "acknowledgedBy": "N/A",
        "lastTimeCleared": null,
        "neName": "VSR-NRC",
        "frequency": {
          "old-value": 4302,
          "new-value": 4303
        },
        "lastTimeEscalated": null,
        "probableCause": "configurationOrCustomizationError",
        "firstTimeDetected": 1680305722893,
        "adminState": "unlocked",
        "rootCause": true,
        "numberOfOccurrencesSinceAck": 0,
        "nodeTimeOffset": -1,
        "objectId": "fdn:model:fm:Alarm:14780",
        "severity": "major",
        "affectedObjectName": "VSR-NRC",
        "clearedBy": "N/A",
        "serviceAffecting": false,
        "numberOfOccurrences": {
          "old-value": 76139,
          "new-value": 76140
        },
        "impact": 0,
        "implicitlyCleared": false,
        "alarmName": "NodeConfigurationProblem",
        "wasAcknowledged": false,
        "numberOfOccurrencesSinceClear": 0,
        "objectFullName": "10.10.10.48:fm:Alarm:-NodeConfigurationProblem-GNMI-NOTIFICATION",
        "previousSeverity": "indeterminate",
        "highestSeverity": "major",
        "affectedObjectType": "necontrol.DiscoveredNe",
        "alarmType": "equipmentAlarm",
        "specificProblem": null,
        "sourceType": "mdm",
        "lastTimeSeverityChanged": null,
        "lastTimeDetected": {
          "old-value": 1681828440862,
          "new-value": 1681828440911
        }
      }
    }
  }
}

Kafka NSP-FAULT Alarm Sample Observations

The FM REST 'fdn' property used to store the alarm object FDN is called 'objectId' in Kafka NSP-FAULT notifications.

As the rootCause value is null in all new alarm-create notifications, it is updated as the impacts/causes information is processed.

In the alarm-change update, the new and old value are specified, eg:

"rootCause": {

"old-value": null,

"new-value": true

}

The impacted alarm information is included in the alarm-change property array "rootCauseAndImpactDetails"

for example:

"rootCauseAndImpactDetails": {
"status": "causalityDetailsAvailable",
"impacts": {
"alarmFdn": "fdn:model:fm:Alarm:1321185",
"impacts": [
{
"alarmFdn": "fdn:model:fm:Alarm:1321184",
"impacts": []
}
]
},
"rootCauses": [
"fdn:model:fm:Alarm:1321185"
]
}

This data captures the hierarchical relationship between root cause and impacted alarms. The relationship consists of a tree of FDN identifiers ( the objectId ) used to identify the alarms.

The impacts and rootCauses fields capture the relationships. There is also a status property, the inProgress value is the initial correlation state notified in the alarm-create, the causalityDetailsAvailable value is sent in the alarm-change event and is the end state that indicates that correlation is complete

The possible values are as follows:

inProgress -> Correlation is not ready, still in progress


causalityDetailsAvailable -> Correlation is ready


noCausality -> There is no correlation associated with this alarm


error -> Unexpected exception will put an error in the status


causalityDetailsNotAvailable -> NFM-T alarm not ready for correlation



If an alarm does not have any impacts, the rootCauses property will specify it's own alarm FDN. In this case, the status will also be “causalityDetailsAvailable”.



More information on causes/impacts can be found in the main portal fault-management-apis

The advanced filter in the POST subscription method in step 3 specifies the includeAlarmDetailsOnChangeEvent flag. This results in all properties being sent in alarm-change notification events. If not specified, or set to false, the alarm-change information will only include the essential properties required to notify the changed properties. A sample of this can be found on the main portal fault-management-apis

Migrating to Kafka from JMS

JMS subscription topics use an attribute list that aligns with the property list found in the NSP-PACKET-ALL topic. However, it is the NSP-FAULT topic property set that is common to all source types that deliver information to the NSP FM application component, FM alarm object properties are stored in accordance with that NSP-FAULT property set. NFM-P JMS users migrating to the NSP-FAULT topic for FM requirements, benefit from the ability to manage alarms from NFM-P, but also model driven, optical, and multi-vendor sources. The payload in the NSP-FAULT topic is a somewhat reduced property list compared to the general and fault xml topics available via NFM-P JMS. The nuances of those differences need to be be considered when migrating from JMS, and are beyond the scope of this tutorial.

However, there are 3 important distinctions:

a) the alarm object id 'fdn' or 'objectId' is the unique object used for alarm life-cycle operations, as opposed to the objectFullName

b) the causes/impacts information in NSP-FAULT can contain multiple causes, and is sent as a tree of objects

c) the rootCause value is similarly used to ALA_isCorr in JMS, but is not available at alarm creation time due to ongoing correlation algorithms that are used to determine impacted alarms. Writing a Kafka NSP-FAULT propertyFilter that specified "rootCause='true'" would therefore result in all new alarms ( with eventType alarm-create ) being filtered out, due to the default 'null' setting at alarm creation time.

The samples above are typical of the create/change/delete alarm life-cycle that occurs without any additional user interventions.

The following NSP-FAULT Kafka samples are due to manual interventions, such as alarm severity change / acknowledgment etc.

The properties that are changing are indicated by the old and new values. The additional property details values are useful mostly for filtering requirements, where specific change-records could be narrowed to limit the amount of notifications retrieved via the Kafka API. This could be useful as a means to reduce the impact on network throughput, and reduce the time taken to pull alarm records from the Kafka consumer. It's important to remember that the Kafka topic implementation is a pull consumer model, the client is regularly pulling the events from the topic, as opposed to the server pushing the events to the client as they occur in real time. A well designed property filter can reduce the total amount of alarm objects stored in the topic subscription, and is a recommended approach in general to improving bandwidth utilisation and processing overhead.

For clients that do not wish to employ filtering of alarm-change events, and only require to track the properties that are changing, for performance reasons, it is advisable not to include the additional fields and leave the includeAlarmDetailsOnChangeEvent flag unspecified or false.

Kafka NSP-FAULT Manual Alarm Severity change notification sample ( MDM ) with includeAlarmDetailsOnChangeEvent=true

Assign alarm severity to minor from within the FM GUI component

{
  "data": {
    "ietf-restconf:notification": {
      "eventTime": "2023-04-24T16:35:23Z",
      "nsp-fault:alarm-change": {
        "originalSeverity": "major",
        "neId": "192.168.96.4",
        "lastTimeAcknowledged": null,
        "acknowledged": false,
        "userText": "N/A",
        "sourceSystem": "fdn:app:mdm",
        "additionalText": "The alarm is raised when a EPIPE SDP Binding has operational disabled state \u0026 administrative state of unlocked.",
        "affectedObject": "192.168.96.4:service:ServiceResourceBinding:/service[service-id\u003d\u0027344\u0027]/spoke-sdp[sdp-bind-id\u003d\u002710:344\u0027]",
        "lastTimeDeEscalated": null,
        "acknowledgedBy": "N/A",
        "lastTimeCleared": null,
        "neName": "SR_Green",
        "frequency": 0,
        "lastTimeEscalated": null,
        "probableCause": "configurationOrCustomizationError",
        "firstTimeDetected": 1682257105644,
        "adminState": "unknown",
        "rootCause": false,
        "numberOfOccurrencesSinceAck": 0,
        "nodeTimeOffset": -1,
        "objectId": "fdn:model:fm:Alarm:741370",
        "severity": {
          "old-value": "major",
          "new-value": "critical"
        },
        "affectedObjectName": "name\u003d192.168.96.4-circuit-10-344",
        "clearedBy": "N/A",
        "serviceAffecting": null,
        "numberOfOccurrences": 1,
        "impact": 0,
        "implicitlyCleared": true,
        "alarmName": "SdpBindingTunnelDown",
        "wasAcknowledged": false,
        "numberOfOccurrencesSinceClear": 0,
        "objectFullName": "fm.Alarm:SERVICE_ELINE_TUNNEL_BINDING_Alarms:SdpBindingTunnelDown:fdn:app:mdm-ami-cmodel:192.168.96.4:service:ServiceResourceBinding:/service[service-id\u003d\u0027344\u0027]/spoke-sdp[sdp-bind-id\u003d\u002710:344\u0027]",
        "previousSeverity": {
          "old-value": "indeterminate",
          "new-value": "major"
        },
        "highestSeverity": {
          "old-value": "major",
          "new-value": "critical"
        },
        "affectedObjectType": "service.ServiceResourceBinding",
        "alarmType": "communicationsAlarm",
        "specificProblem": null,
        "sourceType": "mdm",
        "lastTimeSeverityChanged": {
          "old-value": null,
          "new-value": 1682354123267
        },
        "lastTimeDetected": 1682257105644
      }
    }
  }
}



{
  "data": {
    "ietf-restconf:notification": {
      "eventTime": "2023-04-19T13:30:42Z",
      "nsp-fault:alarm-change": {
        "originalSeverity": "major",
        "neId": "10.10.10.2",
        "lastTimeAcknowledged": null,
        "acknowledged": false,
        "userText": "N/A",
        "sourceSystem": "fdn:app:mdm-ami-cmodel",
        "additionalText": "Interface 1/1/20 is not operational",
        "affectedObject": "10.10.10.2:equipment:Equipment:/port[port-id\u003d\u00271/1/20\u0027]",
        "lastTimeDeEscalated": null,
        "acknowledgedBy": "N/A",
        "lastTimeCleared": null,
        "neName": null,
        "frequency": 0,
        "lastTimeEscalated": null,
        "probableCause": "equipmentMalfunction",
        "firstTimeDetected": 1680023147300,
        "adminState": "unlocked",
        "rootCause": true,
        "numberOfOccurrencesSinceAck": 0,
        "nodeTimeOffset": -1,
        "objectId": "fdn:model:fm:Alarm:1351089",
        "severity": {
          "old-value": "major",
          "new-value": "minor"
        },
        "affectedObjectName": "port\u003d1/1/20",
        "clearedBy": "N/A",
        "serviceAffecting": null,
        "numberOfOccurrences": 1,
        "impact": 0,
        "implicitlyCleared": true,
        "alarmName": "LinkDown",
        "wasAcknowledged": false,
        "numberOfOccurrencesSinceClear": 0,
        "objectFullName": "10.10.10.2:fm:Alarm:/port[port-id\u003d\u00271/1/20\u0027]/linkDown",
        "previousSeverity": {
          "old-value": "indeterminate",
          "new-value": "major"
        },
        "highestSeverity": "major",
        "affectedObjectType": "equipment.Equipment",
        "alarmType": "processingErrorAlarm",
        "specificProblem": null,
        "sourceType": "mdm",
        "lastTimeSeverityChanged": {
          "old-value": null,
          "new-value": 1681911042608
        },
        "lastTimeDetected": 1680023147300
      }
    }
  }
}

Kafka NSP-FAULT Manual Alarm Acknowledgment change notification sample ( MDM ) with includeAlarmDetailsOnChangeEvent=true

Manually acknowledge an alarm from within the FM GUI component

{
  "data": {
    "ietf-restconf:notification": {
      "eventTime": "2023-04-19T14:55:15Z",
      "nsp-fault:alarm-change": {
        "originalSeverity": "major",
        "neId": "10.10.10.2",
        "lastTimeAcknowledged": {
          "old-value": null,
          "new-value": 1681916115382
        },
        "acknowledged": {
          "old-value": false,
          "new-value": true
        },
        "userText": "N/A",
        "sourceSystem": "fdn:app:mdm-ami-cmodel",
        "additionalText": "Interface 1/1/20 is not operational",
        "affectedObject": "10.10.10.2:equipment:Equipment:/port[port-id\u003d\u00271/1/20\u0027]",
        "lastTimeDeEscalated": null,
        "acknowledgedBy": {
          "old-value": "N/A",
          "new-value": "admin"
        },
        "lastTimeCleared": null,
        "neName": null,
        "frequency": 0,
        "lastTimeEscalated": null,
        "probableCause": "equipmentMalfunction",
        "firstTimeDetected": 1680023147300,
        "adminState": "unlocked",
        "rootCause": true,
        "numberOfOccurrencesSinceAck": 0,
        "nodeTimeOffset": -1,
        "objectId": "fdn:model:fm:Alarm:1351089",
        "severity": "minor",
        "affectedObjectName": "port\u003d1/1/20",
        "clearedBy": "N/A",
        "serviceAffecting": null,
        "numberOfOccurrences": 1,
        "impact": 0,
        "implicitlyCleared": true,
        "alarmName": "LinkDown",
        "wasAcknowledged": {
          "old-value": false,
          "new-value": true
        },
        "numberOfOccurrencesSinceClear": 0,
        "objectFullName": "10.10.10.2:fm:Alarm:/port[port-id\u003d\u00271/1/20\u0027]/linkDown",
        "previousSeverity": "major",
        "highestSeverity": "major",
        "affectedObjectType": "equipment.Equipment",
        "alarmType": "processingErrorAlarm",
        "specificProblem": null,
        "sourceType": "mdm",
        "lastTimeSeverityChanged": 1681911042608,
        "lastTimeDetected": 1680023147300
      }
    }
  }
}

Kafka NSP-FAULT Manual Alarm Clearing notification sample ( MDM ) with includeAlarmDetailsOnChangeEvent=true

Manually clear an alarm from within the FM GUI component

The FM GUI alarm settings checkbox to allow manual alarms deletion has been enabled and the option to "Allow manual alarm deletion when cleared or acknowledged"

In this scenario an alarm-change and an alarm-delete will be sent via the NSP-FAULT topic

{
  "data": {
    "ietf-restconf:notification": {
      "eventTime": "2023-04-19T15:36:13Z",
      "nsp-fault:alarm-change": {
        "originalSeverity": "major",
        "neId": "10.10.10.1",
        "lastTimeAcknowledged": null,
        "acknowledged": false,
        "userText": "N/A",
        "sourceSystem": "fdn:app:mdm-ami-cmodel",
        "additionalText": "Interface 1/2/1 is not operational",
        "affectedObject": "10.10.10.1:equipment:Equipment:/port[port-id\u003d\u00271/2/1\u0027]",
        "lastTimeDeEscalated": null,
        "acknowledgedBy": "N/A",
        "lastTimeCleared": {
          "old-value": null,
          "new-value": 1681918573118
        },
        "neName": null,
        "frequency": 0,
        "lastTimeEscalated": null,
        "probableCause": "equipmentMalfunction",
        "firstTimeDetected": 1680023135200,
        "adminState": "unlocked",
        "rootCause": true,
        "numberOfOccurrencesSinceAck": 0,
        "nodeTimeOffset": -1,
        "objectId": "fdn:model:fm:Alarm:1352405",
        "severity": {
          "old-value": "major",
          "new-value": "cleared"
        },
        "affectedObjectName": "port\u003d1/2/1",
        "clearedBy": {
          "old-value": "N/A",
          "new-value": "admin"
        },
        "serviceAffecting": null,
        "numberOfOccurrences": 1,
        "impact": 0,
        "implicitlyCleared": true,
        "alarmName": "LinkDown",
        "wasAcknowledged": false,
        "numberOfOccurrencesSinceClear": 0,
        "objectFullName": "10.10.10.1:fm:Alarm:/port[port-id\u003d\u00271/2/1\u0027]/linkDown",
        "previousSeverity": {
          "old-value": "indeterminate",
          "new-value": "major"
        },
        "highestSeverity": "major",
        "affectedObjectType": "equipment.Equipment",
        "alarmType": "processingErrorAlarm",
        "specificProblem": null,
        "sourceType": "mdm",
        "lastTimeSeverityChanged": {
          "old-value": null,
          "new-value": 1681918573118
        },
        "lastTimeDetected": 1680023135200
      }
    }
  }
}
{
  "data": {
    "ietf-restconf:notification": {
      "eventTime": "2023-04-19T15:36:13Z",
      "nsp-fault:alarm-delete": {
        "objectId": "fdn:model:fm:Alarm:1352405"
      }
    }
  }
}

Kafka NSP-FAULT Edit Custom Text alarm notification sample ( MDM ) with includeAlarmDetailsOnChangeEvent=true

Create a custom text entry for an alarm from within the FM GUI component

{
  "data": {
    "ietf-restconf:notification": {
      "eventTime": "2023-04-19T16:08:46Z",
      "nsp-fault:alarm-change": {
        "originalSeverity": "major",
        "neId": "10.10.10.60",
        "lastTimeAcknowledged": null,
        "acknowledged": false,
        "userText": {
          "old-value": "N/A",
          "new-value": "Alarm has been assigned to an engineer"
        },
        "sourceSystem": "fdn:app:mdm-ami-cmodel",
        "additionalText": "Request failed for node:192.168.100.60 :: timeout",
        "affectedObject": "10.10.10.60:equipment:NetworkElement:10.10.10.60",
        "lastTimeDeEscalated": null,
        "acknowledgedBy": "N/A",
        "lastTimeCleared": null,
        "neName": "SR60",
        "frequency": 2,
        "lastTimeEscalated": null,
        "probableCause": "configurationOrCustomizationError",
        "firstTimeDetected": 1681893229696,
        "adminState": "unlocked",
        "rootCause": true,
        "numberOfOccurrencesSinceAck": 0,
        "nodeTimeOffset": -1,
        "objectId": "fdn:model:fm:Alarm:1387951",
        "severity": "major",
        "affectedObjectName": "SR60",
        "clearedBy": "N/A",
        "serviceAffecting": false,
        "numberOfOccurrences": 2,
        "impact": 0,
        "implicitlyCleared": false,
        "alarmName": "NodeConfigurationProblem",
        "wasAcknowledged": false,
        "numberOfOccurrencesSinceClear": 0,
        "objectFullName": "10.10.10.60:fm:Alarm:-NodeConfigurationProblem-SNMP-EVENT",
        "previousSeverity": "indeterminate",
        "highestSeverity": "major",
        "affectedObjectType": "necontrol.DiscoveredNe",
        "alarmType": "equipmentAlarm",
        "specificProblem": null,
        "sourceType": "mdm",
        "lastTimeSeverityChanged": null,
        "lastTimeDetected": 1681915309771
      }
    }
  }
}

4) Create NSP-FAULT Subscription - alarm-create only - multiple advancedFilter flags

( optional method )

This sample includes a simple filter for eventType to only capture alarm creation objects ( new alarms ). This is an optional method to the previous step 3, a second NSP-FAULT subscription would be created if both requests were executed.

5) POST Authentication - Refresh Token

This method is used to renew the authentication token


6) POST Renew Subscription

This method is used to renew the Kafka subscription

Alarm Filtering Use Case Tutorial ( 1-4 )

When retrieving alarms from REST, you can further restrict the alarms you are interested in by including a filter in your request.

The following postman collection includes the following requests/responses and relevant filter examples:

Get Postman Collection


1) REST GET Alarm Details with simple filter sourceType 'mdm'

Retrieves all alarms ( and details ) with alarmFilter:

sourceType='mdm'


2) REST GET Alarm Details with sourceType - impact count - severity list - nested filter

Retrieves all alarms ( and details ) with alarmFilter:

((sourceType='mdm' or sourceType='nfmp') and ( impact<>0 )) and (severity='critical' or severity='major')

<> is not equal


3) REST GET Alarm Details with sourceType - severity list - rootCause - nested filter

Retrieves all alarms ( and details ) with alarmFilter:

((sourceType='mdm' or sourceType='nsp') and (severity='critical' or severity='major')) and ( rootCause='true' )


4) REST GET Alarm Details with neId list - alarmName list - severity list - nested filter

Retrieves all alarms ( and details ) with alarmFilter:

((neId='10.10.10.2' or neId='10.10.10.4') and ( alarmName like 'LinkDown' or alarmName%2520like%2520'%2525Down%2525')) and ( severity='critical' or severity='major' )

Note: Sample 4 demonstrates the use of URI encoding for the alarmName string to allow the wildcard % character to be used in the string match, and not treated as a reserved character as per the URI specification:

where

%2520 is the space character

%2525 is the % character

It would not be possible to write the statement as:

alarmName like '%Down%'

Alarm Squelching and Kafka NSP-FAULT

The Fault Management application also employs a squelch feature to discards alarms based on the Affected Object and Site ID parameters of the alarm.  Squelching against a NE, discards alarms with a Site ID that matches the squelched NE.  Squelching a port discards alarms with an Affected Object parameter that matches the squelched port.  Squelching a port also discards alarms for the following objects:

  • L2 Access Interfaces
  • L3 Access Interfaces
  • Routing Network Interfaces
  • Service endpoints of physical links

Alarms related to squelched objects will naturally be squelched in the Kafka alarm notification topic NSP-FAULT

On this page