Anomaly Detection Categorical Data Novelty

Network Log Analysis Using Categorical Anomaly Detection

thatDot

The distributed nature of modern virtualized software architectures has created added complexity in the networking stack, making it difficult to attribute behavior to any single service. Instrumenting services will give you insight into activity within the service, but doesn’t provide the entire picture. What’s missing is insight into the communication behaviors that happen between two logical hosts.

In an attempt to better expose this area I found a dataset containing over 200m network connection summary records from the open source Zeek network monitoring service. Each Zeek log contains a number of fields including the originating host, the responding hosts with summary fields for connection state and connection history. A record converted to CSV looks like this (emphasis mine):

1331902125.080000, CIp1er3EKU2WUebCDe, 192.168.202.94, 52307, 192.168.23.100,445, tcp, -, 10.550000, 4803, 3174, SF, -, 0, ShADdaFf, 32, 6475, 27, 4590, (empty)

The metrics available in those records aid in informing standard monitors such as bandwidth (bytes received, bytes sent). Analysis of only the available metrics, however, is ignoring significant information encoded into the categorical elements of the log. This includes the hosts’ IP addresses and the summary abbreviations for connection state (SF) and connection history (ShADdaFf). For connection state, the entire field maps to a description. For connection history, each character maps to a different activity within the TCP lifecycle. Capital letters indicate originating server requests and lowercase letters indicate responding server responses.

Using thatDot Novelty Detector’s data transformation API, I was able to build a simple function to manipulate the raw logs into something more useful. The function is responsible for:

Mapping abbreviations to their corresponding definitions for easier understanding.
Separating the activity for sending and receiving hosts.
Create the ordered data observation for submission to the API.

This function was then stored as a transformation that could be applied to all incoming data.

Data Transformation Map

With the transformation in place, I was able to ingest the records and build a tree to visualize the connection history, ultimately giving us insight into a general fingerprint of conversation behavior. Once the system has recognized the fingerprint, it will begin to highlight connection paths that have deviated from normal behavior.

Visualization Of Communication Patterns

The principle reason for using thatDot’s Novelty Detector for this analysis however, is to surface the “novel” data from amongst the volumes of “normal” data. This sampled plot chart does a nice job of identifying the highly novel network conversations. The items highest on the X axis are the most Novel observations which may or may not also be Unique in the data. It is always interesting to see when Unique data, shown via the coloring, is NOT Novel. Differentiating such “false-positive” events is a significant benefit of including categorical data in our analysis.

Example Observation Detail Visualization

From this scatter plot chart we click through to one of the high novelty scored observation which leads us to the tree below, showing us that completing a handshake connection is abnormal for these two hosts. It is much more typical for these connections to time out.

Observation Detail Visualization

This same mechanism is useful for a range of use cases:

Real-time DDoS detection, such as TCP half-open (SYN flood) attacks.
Public-Private hosts communications. Use to determine which hosts are trying to connect and why (protocol, port, etc)
New protocol use between known hosts
New hosts successfully communicating with known hosts

In summary, this turns out to be a useful tool to aid in enriching existing telemetry data to aid in discovery, remediation and automation.

thatDot Novelty Detector

thatDot Novelty Detector is the first general-use application designed for finding anomalies in real-time in data sets that include categorical data. Available as an application for deployment in any cloud or data center thatDot Novelty Detector exposes an API that scores submitted observations for their “novelty” enabling real-time anomaly detention with fewer false positives than traditional threshold based metric analysis.

‍

09.24.2024

Event Driven Architecture Streaming Graph Event Stream Processing

Stream Processing World Meets Streaming Graph at Current 2024

The thatDot team had a great time last week at Confluent’s big conference, Current 2024. We talked to a lot of folks about the power of Streaming Graph,…

Event Driven Architecture Streaming Graph Event Stream Processing

Read more: Stream Processing World Meets Streaming Graph at Current 2024
07.23.2024

Streaming Graph

Streaming Graph Get Started

It’s been said that graphs are everywhere. Graph-based data models provide a flexible and intuitive way to represent complex relationships and interconnectedness in data. They are particularly well-suited…

Streaming Graph

Read more: Streaming Graph Get Started
07.23.2024

Company News Event Driven Architecture Graph Databases Streaming Data

Streaming Graph for Real-Time Risk Analysis at Data Connect in Columbus 2024

After more than 25 years in the data management and analysis industry, I had a brand new experience. I attended a technical conference. No, that wasn’t the new…

Company News Event Driven Architecture Graph Databases Streaming Data

Read more: Streaming Graph for Real-Time Risk Analysis at Data Connect in Columbus 2024

Network Log Analysis Using Categorical Anomaly Detection

Data Transformation Map

Visualization Of Communication Patterns

Example Observation Detail Visualization

Observation Detail Visualization

thatDot Novelty Detector

Related posts

Stream Processing World Meets Streaming Graph at Current 2024

Streaming Graph Get Started

Streaming Graph for Real-Time Risk Analysis at Data Connect in Columbus 2024