The Future of Modern Threat Hunting is Streaming Graph

thatDot avatar Rob Malnati

Towards a new model of threat hunting

The continuous expansion of threat vectors and attack techniques requires a modern threat hunting architecture capable of large scale operations, real-time deep/complex event processing to identify Indicators of Behavior (IoB), and programmable automation to best leverage scarce SOC expertise. Central to the evolution from after-the-fact Indicators of Compromise (IoCs) to IoBs is the need to embrace an event driven architecture.

Many industry initiatives aim to codify the intersection points between data sources, analysis systems, and remediation solutions. These efforts are centered around two characteristics that align with thatDot software in significant ways.

A focus on behavior analysis – The evolution from the use of Indicators of Compromise (IoC) to Indicators of Behavior (IoB) has been driven by the desire to evolve from seeking static definitions of a completed attack (file# or an IP), to an understanding of how an attack happens. This change in perspective creates the opportunity to find attacks earlier, and with more flexibility.

Use of graph data modeling – Representing behavior and relationships is a natural fit for graph data modeling techniques. Graph data structures are terrific at expressing the  relationships between entities which simplifies analysis and infrastructure, so much so that STIX Indicators and the Kestrel protocol assumes the use of graph systems for their operation.

An image showing an example of a STIX 2 graph that indicates he relationships between vulnerabilities, threat actors, and indicators relative to a campaign.

Image source: available here.

New Standards Reduce Friction

The cybersecurity industry is active on many fronts defining standards to smooth the frictions that exist between data sources, analysis engines, SIEMs, and automated response systems. A number of these standard include:

STIX™ Indicators – Indicators convey specific observable patterns combined with contextual information intended to represent artifacts and/or behaviors of interest within a cyber security context. [Read more here.]

Kestrel – Kestrel threat hunting language provides an abstraction for threat hunters to focus on the high-value and composable threat hypothesis development instead of specific realization of hypothesis testing with heterogeneous data sources, threat intelligence, and public or proprietary analytics. [Read more here.]

CACAO – defines the schema and taxonomy for collaborative automated course of action operations (CACAO) security playbooks and how these playbooks can be created, documented, and shared in a structured and standardized way across organizational boundaries and technological solutions. [Read more here.]

These standards fit well with thatDot’s approach to a modern threat hunting stack, one powered by thatDot’s Quine streaming graph to detect and instantly alert on known patterns and that uses thatDot Novelty Detector to identify new emerging threat behaviors in real time.

Highly Scalable IoB Pattern Recognition

The evolution from a reactive IoC threat hunting model to a real-time IoB-based approach requires a new set of technical capabilities along with the tools to deliver them. Fortunately, the advent of IoB threat hunting, new standards, and ground-breaking streaming graph technology are all emerging to meet the need.

As shown below, thatDot’s open source Quine streaming graph perfectly aligns with the requirement to ingest multiple data streams and natively process graph data model encoded IoBs to then generate events that invoke predefined remediation actions. The work flow looks as follow:

  1. Event sources are ingested from any common event stream queue, including Apache Kafka, AWS Kinesis, AWS SQS, or Apache Pulsar/DataStax Astra Streaming.
  2. STIX-defined IoBs are loaded into Quine using Kestrel graph objects via API, or entered manually, as Quine standing queries.
  3. Quine standing queries continuously analyze newly arriving events for matches against IoB pattern definitions. Partial matches are identified and stored for any desired period of time to accommodate threat behaviors that occur incrementally over longer time frames.
  4. Upon a full IoB pattern match, Quine generates a new event that is associated with a pre-defined CACAO Playbook action, for use by SOAR or analysts.
A flow diagram showing envents ingested into Quine, which is using STIX IoB definitions to detect known attack vectors.

The Problems Quine Solves

Quine solves some hard problems in this role. Let’s take a look at a few of the major points:

Multiple Event Sources

Modern threat detection requires data – lots of data – usually from multiple sources. This brings with it a number of interesting data engineering challenges, especially when we want to materialize that data into a single view and execute analysis in a timely and cost-effective manner.

Combining threat Intelligence, EDR, XDR, and Cloud logs are increasingly common requirements for building a baseline of behavior models against which real-time data is assessed for known and new threats. thatDot’s Quine streaming graph is a new and powerful software tool for resolving many of the data engineering challenges associated with handling volumes of data from multiple sources.

Scale For Costs – Scale graph event processing from 1,000s to 1,000,000s of events per second on commodity cloud VMs, more efficiently than nested joins.

Out-of-Order Data Arrival – Quine standing queries evaluate each newly arriving event as it arrives and stores partial results until completion data arrives.

Entity Resolution – Graph data models are known for leveraging the additional context gained by understanding the relationships between event datum.  

Finding Threat Behaviors

IoBs are patterns of behavior expressed as actions taken by users or systems. Identifying the end to end pattern of an IoB across events generated by disparate systems is a perfect alignment with the Quine graph data model.

Quine evaluates every single newly arriving event for partial or full match against defined IoB patterns. This incremental approach to evaluating data is paired with a highly efficient mechanism for persisting partial matches. The result is a threat detection solution that tracks millions or billions of suspect actions until there is a complete pattern match, at which point an event is generated to serve as an alert or to trigger an automated workflow.

Incremental Evaluation Of Events For IoB Patterns Across Event Sources

Diagram showing multiple streams flowing into Quine via ingest queries. Quine populates the graph and waits for late arriving data, which then triggers a standing query. Results are emitted by Quine when a standing query match is made.

Image source: Quine Streaming Graph White Paper (PDF)

Automated Responses

CACAO provides a graph-based data model. As such, CACAO implementations should implement protections against graph queries that can potentially consume a significant amount of resources and prevent the implementation from functioning in a normal way.

Identifying Novel New Behaviors

Of course, the most difficult part of threat hunting is identifying new threat vectors as near to the time when they first appear as possible. This is especially difficult since attackers are intentionally working to obscure their illicit behavior in large volumes of events. Systemic approaches that use traditional anomaly detection approaches have largely failed to detect sophisticated attacks without also identifying a significant number of false positives, forcing reliance upon manual human evaluations based on intuition and increasingly scarce security expertise.

thatDot Novelty Detector brings a fresh approach to the problem of detecting illicit behavior. Novelty Detector is a new graph AI technique built on the Quine streaming graph. As such, Novelty Detector natively uses categorical data in events, such as IP addresses, file names, file paths, API call types etc. to fully understand the context of user and system actions. This rich context is used to evaluate behaviors via Information Theory analysis to identify novel new behaviors in real-time, with incredibly low incidence of false positives.

Once a new novel behavior is evaluated, it can then be encoded as a new IoB and fed into an operating Quine streaming graph system for immediate use on newly arriving data, or applied to previous data if desired.    

A flow diagram showing envents ingested into Quine, which is using STIX IoB definitions to detect known attack vectors, then passed into Novelty Detector to find new, unknown vectors, which are fed back into STIX IoBs

Separately, Quine streaming graph and Novelty Detector software offer unique capabilities for organizations and service providers: real-time processing of categorical data to find known IoB patterns (Quine) and emerging new threat patterns (Novelty Detector).

When combined as a single platform that uses industry standards for IoB definitions and intersystem communications, the result is a comprehensive modern threat hunting and remediation stack.

Quine Enterprise Delivers Scalable Threat Hunting

Quine is available in both open source and enterprise editions. However, Novelty Detector is available either in the AWS marketplace or under license as part of thatDot Streaming Graph.

Quine Enterprise offers large organizations and managed security service providers (MSSPs) both the clustered, resilient version of Quine and Novelty Detector. It is meant for production applications where resilience, query performance, and throughput matter. Resilient clustering includes support for hot spares and distribution across multiple availability zones.

We recently shared reproducible tests demonstrating both scale (Quine easily processed one million 4-node graph events/second) and resilience in the face of node failure. You can read about the tests here.

Try It Yourself

If you want to try it on your own, here are some resources to help:

  1. Download Quine – JAR file | Docker Image | Github
  2. Check out the Ingest Data into Quine blog series covering everything from ingest from Kafka to ingesting .CSV data
  3. Password Spraying Attack Detection – this recipe provides an example of detecting brute force attack patterns in authentication logs

Header image adapted from photo by Lianhao Qu on Unsplash.