Reducing False Positive Alerts With Contextual Anomaly DetectionRob Malnati
Too many false positives!
Traditionally, monitoring alerts are produced comparing metrics against thresholds to identify behavior outside the norm. This approach of metrics-based alert definitions often generates too many false positives that lead to wasted human time and effort or worse yet, loss of confidence and ignoring alerts as general practice!
Efforts to improve alert quality typically lead to devising more granular alerts. This approach leads to improved alerting for specific conditions, but introduces significant complexity in alert definitions and their associated maintenance as dimensionality increases. Machine Learning approaches often crumble under the same “curse of dimensionality” that humans feel: when looking at hundreds of alerts no person or machine can find the true anomalies. Dynamic threshold definitions that accommodate historically observed trends such as time-of-day or seasonal variations are helpful, but still limit us to looking for the problems we know to expect.
What we all want are high-confidence alerts that identify truly anomalous events as they occur in real-time, from a system that learns and adapts to our data immediately.
A New Approach: Use Categorical Data
Categorical data is composed of the strings of information included in our logs and events: file names, IP addresses, HTTP status codes, geographical information, etc. Including categorical data in our monitoring analysis provides a greatly expanded context from which to evaluate application and network performance logs. As much as 80% of the information in our logs and events is categorical data. Why not include it in our monitoring? Doing so let’s us reduce the false positives that often overwhelm the people monitoring these systems, and also let’s us explain WHY and alert was generated.
Not Everything New Is Anomalous
The additional context gained by incorporating categorical dimensions of data provides a significant benefit in rapidly identifying unique data, identified as having high “surprise” value in our system, as well as recognizing anomalous data as separate from unique values. thatDot Anomaly Detector learns a fingerprint for the data it observes, so that it can tell when “new” is actually just “normal”.
High cardinality is a normally expected condition of many data types. User agents, IP addresses, and file names, are all examples of data that can have many values. Shown below are two examples that illustrate the value of context for differentiating unique vs anomalous data.
The above example shows the identification of a highly unique observation in a CDN log monitoring data set. To scatter plot of the data uses color to indicate the “surprise” or uniqueness of each observation, while the left hand scale of the scatter plot indicates thatDot’s anomaly score for each observation. The tree to the right is from thatDot Exploration UI and shows the context of the observation. It has both high Surprise and Anomaly scores, being the first observation of the FUJIFILM ISP out of 800,294 observations.
In this second example we see an observation in the scatter plot that is yellow indicating high “surprise” or uniqueness, but this observation receives a low anomaly score from thatDot. thatDot’s Exploration UI tree shows that observing a unique Server IP value under the Spectrum ISP is not anomalous, despite this IP being seen for the first time, as the context of previous data has taught the system that new client IP values are a usual occurrence for the Spectrum ISP.
Alerts With Fewer False Positives
Utilizing the additional context provided by including categorical data in our anomaly detection can significantly improve the quality of our alerting. When we have high confidence in our ability to identify the real signal-from-the-noise users save the time they historically spent chasing false positives, and they get back time to build more automation into our support processes.
thatDot Anomaly Detector
thatDot Anomaly Detector is the first general-use application designed for finding anomalies in real-time in data sets that include categorical data. Available as an application for deployment in any cloud or data center thatDot Anomaly Detector exposes an API that scores submitted observations for their “novelty” enabling real-time anomaly detention with fewer false positives than traditional threshold based metric analysis. Read more and access the anomaly detector free trial tier on AWS here.