Data Exfiltration Detection in AWS CloudTrail Logs Using Categorical Data

thatDot avatar Rob Malnati

Use Categorical Data to Detect and Prevent Data Exfiltration from AWS CloudTrail logs

In our previous blog, Identifying stolen credential use in AWS CloudTrail logs with high confidence using categorical anomaly detection, we discussed the power of  graph machine learning to analyze the categorical data in AWS CloudTrail logs to identify novel or anomalous behaviors. We then compared thatDot Novelty Detector findings to the results of traditional statistical anomaly detection tools. This article builds on our previous analysis to further investigate the use of categorical anomaly detection to identify multi-stage exploit campaigns in AWS CloudTrail logs.  

About AWS CloudTrail and Data Exfiltration

AWS CloudTrail monitors calls to AWS services and delivers detailed logs, providing a complete audit of management calls, with optional inclusion of data calls. To detect attacks effectively, you will need both, but the resulting high volume of log events creates a glut of data, making it very difficult to detect a behavioral attack like data exfiltration.

Data exfiltration can be an exploit on its own, or as is increasingly seen, it can be one step in a larger ransomware campaign. Attackers recognize the obfuscating effect of high volume data movement, and use the complexity of the logs to hide their activity by moving slowly and carefully. This challenge requires an anomaly detection system capable of understanding the shared characteristics of ‘novel’ behaviors while ignoring the simpler sequential and time stamped information often relied upon by other methods.

Data Exfiltration from AWS — an Example

Monitoring data transfer activity for data theft is a true “needle in a haystack” problem. A storage layer like S3 is in a continuous state of change with data being ingested, exported, updated, duplicated, moved, and expired. With storage being a fundamental service, monitoring user or system behavior at any scale can be challenging. Rulesets designed to limit undesired activity lack the granularity required to avoid unintended interruptions in business functions, and more granular rules are incredibly difficult to manage. Administrators need a better way to highlight system and user behaviors that stand out from the norm.

In the case described below a hacker has initiated a series of steps to steal S3 data.

Preventing Exfiltration using Graph-based Techniques

If your security monitoring system is configured to monitor CloudTrail logs and uses thatDot Novelty Detector, you can detect data theft quickly. Novelty Detector generates an observation for each CloudTrail event. Each observation receives a novelty score indicating how relevant the observation is and how much it warrants your attention. High novelty doesn’t immediately trigger a security event. However, the system will identify what makes the observation novel, providing the unique insight necessary to speed manual or automated categorization.

As observations stream into thatDot Novelty Detector, it generates real time plots showing it learning both system and user behavior and learning what is ‘normal’ for the environment. The plots below provide a graphical representation of the observed behaviors with scores for normal behaviors gradually dropping over time to form a down-and-to-the-right curve.

thatDot Anomaly Detector Baseline

Observations that stand out from the norm receive high novelty scores and appear higher on the chart. Looking at a plot of the most recent data, we see Novelty Detector has detected a series of observations with particularly high scores.

thatDot Anomaly Detector Data Exfiltration Detection

The high scoring observations are for a particular user, raul, who initiated a series of API calls to the AWS S3 service:

  1. GetCallerIdentity – To validate S3 access for the credentials
  2. CreateSnapShot – To create a data snapshot of the target data using Account ID and Volume ID
  3. ModifySnapshotAttributes – To modify the snapshot permissions to allow export of data
  4. At this point the data snapshot is then “pulled” from an external account
  5. DeleteSnapshot – To delete the snapshot and cover their tracks

These activities present as novel because the user, raul, does not typically execute these API calls. Novelty Detector not only identifies the activity as novel, but provides the critical insight that raul’s execution of these actions is the most novel element.

Novelty Detector retains the CloudTrail event ID and event time, enabling analysts to navigate to the actual CloudTrail events to investigate the details. Most importantly, it forwards the observation and score to a security monitoring system for action.

Identifying Data Exfiltration in AWS Cloudtrail logs

One of the most compelling aspects of Novelty Detector’s identification of raul’s data exfiltration is the multi-stage view of all of the activities associated with this exploit. In our first blog, we showed how thatDot Novelty Detector was able to identify indicators of stolen credential use. When combined with our data exfiltration results, we are able to see the entire sequence of events: Stage 1, initial credential validity testing and broader probing of associated permissions, and Stage 2, the series of calls that indicate data exfiltration. Identifying multi-stage campaigns in this way provides the audit trail needed to trace the entire intrusion and thus understand the related consequences.

When we see this pattern and the multiple novel events generated by raul, it’s obvious that raul’s credentials are being used in multiple, highly unusual, operations. In response, for this one user, we can immediately force a logout and password reset while also immediately identifying impacted data. This remediation limits the disruption only to the impacted user and provides our analysts a comprehensive view of the breach impact.


thatDot’s categorical analysis delivers the capability to generate high value, high confidence alerts, so analysts can quickly find true anomalies, judge them for maliciousness, easily trace the comprehensive impact of an exploit,  and act accordingly. By looking beyond just numbers, thatDot filters out the noise, and finds anomalies in the richer set of categorical data including values like usernames, hostnames, file paths, URLs, process names and more. These benefits mean that a security analyst will focus on high-value, high likelihood, alerts, leading to a 10x increase in productivity. Good security analysts are hard to find, and their limited time and the burnout common in the industry make it crucial to use their time wisely.

If you’d like to see this in action, or learn more from the team that is pioneering categorical data analysis at thatDot.