AIOPS network intelligent operation and maintenance

We would be discussing the ways for clearing. I would suggest you focus on the below-mentioned resources and also check out the CCNP 300-615 Dumps offered at the ITCertDumps, they are the best when it comes to Certifications Vendor.

Intelligent operation and maintenance , AIOPS origin is “Algorithmic It Operations”, algorithm IT operation and maintenance. In the hottest year of AI development, it gives it a new meaning “Artificial Intelligence for It Operation”, which is the artificial intelligence IT operation and maintenance. Gartner’s definition of AIOPS is:

“AIOPS platform utilizes big data, modern machine learning and other advanced analysis technology to directly or indirectly enhance IT operations (monitoring, automation) with active, personalized and dynamic insights And the service desk) The function. TheAIOPS platform allows concurrently using multiple data sources, data collection methods, analysis (real-time and depth) technology, and representation technology. “

This article brings in intelligent transportation A common task “Alarm Cluster” of the WI field “Alarm Cluster” is based on metric. This is a distance-based supervisory clustering algorithm that trains the distance calculation model on a smaller training data set, and the alarm cluster is based on this. This solution has smartly solves some characteristic processing challenges in the network operation and maintenance scene, only a small amount of data training but has achieved better clustering effect.

Business Background

optical transfer (OTN) Network now will have a large alarm every day, most of whom is Derivative alarms result in a significant increase in the difficulty of the alarm. The current network is high, the artificial operation and maintenance efficiency is low, and it cannot be timely and effectively solved the fault every day and eliminate the relevant alarm. In order to reduce the cost of the current network transport, it is necessary to use the AI ??algorithm to quickly and accurately, and the alarm belonging to a set will be classified, and the root alarms are found.

The alarm information collected in the current network has a alarm name, the alarm topology information, the occurrence and clearing time. We labeled hundreds of alarms in a continuous time, as reference and training data. In the alarm cluster, manually judge whether the two group alarms belong to the topology information and time of the primary reference alarm. Topological information is network structure information, including the alarm element information and the alarm path information. The network structure can be abstracted as a picture, the network element is the point of the figure, the path is the side of the figure. A particular path is that they are hierarchical, and a higher level of hierarchy may correspond to one or more low-level paths. For two groups of alarms, if they have the same point on the topology map, they have associated edges and have a proximity time, they are a group of alarms.

Figure 1, the alarm cluster example

In this question, there are several difficult challenges, bringing a lot of trouble to solve the problem. .

Challenge 1: How to deal with data?

    Topology information and the alarm are called a string, unlike numbers, it can be compared, and the difference is calculated.

    string of values ??cannot be exhausted in a real network environment.

    Seeing this, many people’s ideas are one-hot encoding. But the second point limits it, this is not preferable. If we take 100 string values ??to do one-hot, the no value is represented by a new bit, and the feature vector is

    . Suppose the real network environment has 10,000 values, that 9,900 values ??are all

    . Expressed with one-hot is the same, and they should be different from each other, so one-hot is unreasonable.

    Challenge 2: Model effect and general use?

    Reference and label set for training only hundreds of data, the amount of data is much smaller than the real network.

    The scene of the data set for tabetry data can be covered by the network operation and maintenance field. It is really a hip of the iceberg.

    For the challenge of data processing, we believe that a “ relative feature ” idea can be introduced. Processing a single alarm data, we may have a harder for this string feature, but when studying the relationship between the two alarms, there is a more reasonable way. It is determined whether the string of the two alerts corresponds to the field, and the string similarity can be quantified. If they have a alarm name (or alarm elements, etc.), it may have a close relationship, and different words may be lower. This way we don’t need to know all the values ??of the string, and can be handled on the unable string of string, avoiding difficulties. If the equal conditions are more demanding, you will lose information, you can use a variety of ways to extract multiple features. For example, extract a string prefix and a suffix, determine whether it is equal, and as a new dimension of the “relative feature” vector. Such “relative feature” can be used well in clusters, and the core of clustering is similarity data between data, “relative feature” just as a characteristic of calculating their similarity.

    Next, our cluster application is said. The three important points of cluster are: similarity calculation, clustering algorithm, feature processing . For similarity calculations, we need to calculate themselves due to the “relative feature” mentioned above. This will lead to the metrics that we mentioned in the abstract. Measure learning is based on different tasks from the master to learn a measure distance function for a particular task. This is to model the similarity calculation task, the similarity between the two alerts is “distance”, the greater the “distance”, the larger the difference between the two, the smaller the similarity, is a group of alarms. . We have established a regression model, and the model inputs the relative feature between the two alerts, and outputs the distance between the two alerts.

    For training data, it is marked as a set of two alarms. We expect the distance result to 0, non-group two alarms, and the distance expectation value is 1. In order to make the distance modeling reflect the continuity of time characteristics, we have also made some improvements: the alarm group that appears is numbered in chronological order, and the number difference multiplied by a smaller weight is added as the final distance as the distance expectation value.

    In the distance calculation model, the features are non-linearly related. So we finally selected the XGBoost algorithm of the regression tree model class. It has excellent performance, very fast training, and it is also very easy to use, and it has been large in major machine learning competitions, and in principle, it has a certain similarity with the problem.

    and in the final clustering algorithm, we should pay attention to a problem. When we got a set of alarms, we are impossible to predict the number of clusters. It will be reasonable, and we should use the number of clusters of change or algorithms that do not need to specify the number of clusters. Thanks to the distance calculation model gives a more accurate distance, we finally use algorithms that do not need to specify the number of clusters – hierarchics based on distance thresholds. When it is run, the nearest two (or group) entries are gathered each time until their distance is more than this threshold. In the way of thinking, it is a process: each time two (or group) the most similar warning, until the similarity between them is relatively low, we think they are not a group, then cluster is stopped.

    This clustering algorithm has two important parameters, which are connected methods and thresholds:

    linkage (connection mode) : with a collection Distance, calculate the distance of the collection. There are three ways when using the distance matrix, Average, Single. Complete is a distance from the point farthest in two episodes, and Average is average the distance, and Single is a distance from the nearest point. In this issue, it is likely that the distance between the two alarms is far away (separately seeing a set of alarms), but at the same time is very close to another alarm (compared with the third alarm belonging to a set of alarms). At this time, it should be drawn, the SINGLE connection method is more appropriate in the collection, and the actual effect is better.

    Threshold : When all points and the distance of all points and collections in the clustering process exceeds this threshold, the cluster ends. The smaller the value, the more the number of clusters, the greater the value, the less the number of clusters. Since our distance calculation model expects alarm to be 0, the distance is 0, otherwise the distance is 1. Therefore, this threshold should be taken in 0-1, theoretically 0.5, can be adjusted according to the situation.

    This picture shows our metrics-based clustering scheme:

    Ultimately we have achieved ARI = 0.9556 on the test set, with better results and Use.

    Clearing the Certification isn’t considered to be that much easy, you have to go through rigorous training and lots of CCNP 300-710 Dumps would be needed to go through unless you have some expertise training courses like such offered at the ITCertDumps.

Leave Comment

Your email address will not be published. Required fields are marked *