Network Log Analysis Using Categorical Anomaly DetectionJosh Cody
The distributed nature of modern virtualized software architectures has created added complexity in the networking stack, making it difficult to attribute behavior to any single service. Instrumenting services will give you insight into activity within the service, but doesn’t provide the entire picture. What’s missing is insight into the communication behaviors that happen between two logical hosts.
In an attempt to better expose this area I found a dataset containing over 200m network connection summary records from the open source Zeek network monitoring service. Each Zeek log contains a number of fields including the originating host, the responding hosts with summary fields for connection state and connection history. A record converted to CSV looks like this (emphasis mine):
1331902125.080000, CIp1er3EKU2WUebCDe, 192.168.202.94, 52307, 192.168.23.100,445, tcp, -, 10.550000, 4803, 3174, SF, -, 0, ShADdaFf, 32, 6475, 27, 4590, (empty)
The metrics available in those records aid in informing standard monitors such as bandwidth (bytes received, bytes sent). Analysis of only the available metrics, however, is ignoring significant information encoded into the categorical elements of the log. This includes the hosts’ IP addresses and the summary abbreviations for connection state (SF) and connection history (ShADdaFf). For connection state, the entire field maps to a description. For connection history, each character maps to a different activity within the TCP lifecycle. Capital letters indicate originating server requests and lowercase letters indicate responding server responses.
Using thatDot Anomaly Detector’s data transformation API, I was able to build a simple function to manipulate the raw logs into something more useful. The function is responsible for:
- Mapping abbreviations to their corresponding definitions for easier understanding.
- Separating the activity for sending and receiving hosts.
- Create the ordered data observation for submission to the API.
This function was then stored as a transformation that could be applied to all incoming data.
With the transformation in place, I was able to ingest the records and build a tree to visualize the connection history, ultimately giving us insight into a general fingerprint of conversation behavior. Once the system has recognized the fingerprint, it will begin to highlight connection paths that have deviated from normal behavior.
The principle reason for using thatDot’s Anomaly Detector for this analysis however, is to surface the “novel” data from amongst the volumes of “normal” data. This sampled plot chart does a nice job of identifying the highly novel network conversations. The items highest on the X axis are the most Novel observations which may or may not also be Unique in the data. It is always interesting to see when Unique data, shown via the coloring, is NOT Novel. Differentiating such “false-positive” events is a significant benefit of including categorical data in our analysis.
From this scatter plot chart we click through to one of the high novelty scored observation which leads us to the tree below, showing us that completing a handshake connection is abnormal for these two hosts. It is much more typical for these connections to time out.
This same mechanism is useful for a range of use cases:
- Real-time DDoS detection, such as TCP half-open (SYN flood) attacks.
- Public-Private hosts communications. Use to determine which hosts are trying to connect and why (protocol, port, etc)
- New protocol use between known hosts
- New hosts successfully communicating with known hosts
In summary, this turns out to be a useful tool to aid in enriching existing telemetry data to aid in discovery, remediation and automation.
thatDot Anomaly Detector
thatDot Anomaly Detector is the first general-use application designed for finding anomalies in real-time in data sets that include categorical data. Available as an application for deployment in any cloud or data center thatDot Anomaly Detector exposes an API that scores submitted observations for their “novelty” enabling real-time anomaly detention with fewer false positives than traditional threshold based metric analysis.