Co- authored by: Karthikeyan Murugesan
In the medical world, doctors diagnose a disease by recognizing the unique patterns of health indicators like heartbeat and blood circulation among others. Using the same analogy in the VNF SLA monitoring domain, we have proposed and implemented a PoC demonstrating ANN [Artificial Neural Network] based metrics pattern recognition, in view of predicting an upcoming fault or breakdown.
The following use cases can be addressed (not limited to) using this solution-
- Recognize a hacker’s traffic pattern and direct to a honeypot
- Recognize a CPU hogging pattern and stop a process or assign more vcpu
- Recognize a memory hogging pattern and isolate the process for debugging
Consider a post-production VNF deployment scenario where, after the service provider completes the design, testing, and deployment of a network for an enterprise customer, he/she encounters the following complex challenges:
- Recognize patterns of recurrent faults
- Prevent known faults from recurring
- Plan ahead for preventive maintenance
For Telcos, the transition from physical network elements to virtual network elements presents a multifold challenge
These challenges will be seen in VNF monitoring and troubleshooting. This is a result of the exponential network growth driven by IoT and next generation mobile technologies.
HCL’s Network Management and Analytics Centre of Excellence has conceptualized, designed, and cultivated expertise to develop a cutting-edge VNF SLA monitoring solution titled ‘VNFs Fault Prevention by Recognizing Fault Patterns’.
The ‘VNFs Fault Prevention by Recognizing Fault Patterns’ solution comprises the following modules:
- User Traffic Profile Generator
- Traffic to Metrics Pattern mapper
- Fault Pattern Database
- Pattern recognizer
The software modules of the ‘VNFs Fault Prevention by Fault Pattern Recognition based Analytics’ solution are described briefly below-
User Traffic Profile Generator
In this module, traffic samples of several user profiles are collected. The profiles are mapped to specific user actions and other environmental factors. This enables the pattern to be accurately mapped to a specific scenario. Traffic profiles will be qualified by the following parameters in order to uniquely identify and map them to a known environmental event:
- Persona of the user
- User’s intention or end goal
- User’s series of actions to achieve the result
- Other dependent workflows
Traffic to Metric Pattern Mapper
In this module, the user traffic profiles identified by a “User Traffic Profile Generator” are played out. Additionally, metrics from NFVs are captured for the same duration. An ANN based module will consume the time series variation of these metrics and consequently learn and record the unique metrics pattern. A crucial pre-requisite of this module is to have a controlled environment where only the selected user profiles are being played out.
To map patterns through this module, the following base metrics will be considered-
- CPU
- Memory
- Disk and
- Interface
Fault Pattern Database - FPD
This module stores the unique fault patterns along with all the constraints under which the patterns are valid. The various dimensions of the patterns include, but are not limited to, the following-
- Time of the day
- Duration
- Operating System
- Application Type
- RAM, CPU, Disk
The NFV fault pattern database will be independent of any implementation. Also, it can be shared as a package to be plugged into relevant domains where faults need to be proactively identified and stopped before disintegrating a critical system.
This facilitates knowledge sharing and collaboration across different service providers and ensures effective VNF fault prevention before they can occur.
Pattern Recognizer
The pattern recognizer module continuously monitors the developing patterns in the current metrics data and raises an alarm when a pattern matches the known fault patterns from the FPD. The faults predicted by the pattern recognizer can be emailed or logged. Additionally, automated action like blocking a process or traffic can be taken.
In a nutshell, the blog proposes a solution for pro-active VNF SLA monitoring across service providers, ensuring uninterrupted service in a rapidly growing network.