THATDOT / PRODUCT / NOVELTY DETECTOR FEATURES

Novelty Detector Features

thatDot Real-time Novelty Detector for Categorical Data is a general purpose application, deployable in public or private data centers, to find anomalies and insights in streaming or batch data, at scale, in real-time.

Ingest & Transform

Categorical Data
Streaming Or API Ingest
Transform Data On Ingest
Back Pressured
High-Volume
High-Cardinality
Parallel Operation For Multiple Streams

Analytics

Behavioral Fingerprinting
No Data Labeling
"New” Is Not Always “Novel”
Every Result Includes An Explanation
Expert Customization
Model Integrity Management

Real-time Scoring

Streaming Results
Bulk Batch Results
Immediate Answers For:
Novelty Score,
Information Content
Uniqueness
Probability
Most Novel Component…

Easy Management

Runs In The Cloud
Runs On-Premise
Interactive Docs
Data Visualization
Live Exploration Detail
Operational Metrics

Ingest and Transform

Categorical Data

Typical anomaly detection is a very mathematical process, requiring exclusively numeric data. Categorical values are typically ignored or “one-hot” encoded, however, such encoding significantly impacts processing performance due to the compounding impact of dimensionality. Natively using categorical data enables the development of a more complete data context and produces superior insight.

Streaming Or API Ingest

Ingest data as individual or bulk observations, as files, or steams from platforms such as Kafka or Kinesis. Bulk score responses are returned in a single payload, the responses in the same order as the submitted data.

Transform Data On Ingest

Define data transformations as JavaScript to select specific data from your ingested logs, decompose strings into separate data elements, build new strings from separate data elements or re-order data elements to answer different questions.

Back Pressured

Data ingestion rates are automatically managed to ensure system operation during volume spikes. Use of Kafka or Kinesis stream queuing is recommended to buffer event streams.

High Volume

Plan for 10,000-15,000 observations per second on an average VM with 8 cores and 16 GB of RAM.

See the
Configuration Guide for more complete details.

High Cardinality

Built on our patent-pending graph interpreter, the system handles massive cardinality, accommodating highly-dimensional data without impacting real-time response performance.

Parallel Operations For Multiple Streams

Ingested observations can be modeled and scored for multiple context simultaneously, answer multiple questions about data efficiently.

Analytics

Behavioral Fingerprinting

thatDot incorporates all data observations to learn a map of behavior from the data, providing a deep understanding of how systems, files, and/or users interact. This behavioral fingerprint of the data enables a nuanced determination of what is anomalous in the data while also providing “why” an observation is seen as novel or not.

No Data Labeling

No pre-training using labeled data is used as the system dynamically learns from the data as it is observed, and new data values are automatically incorporated on the fly.

New Is Not Always Novel

Incorporating both metrics and categorical data, thatDot Anomaly Detector uses the structure created from all observations for a deep understanding of data context. Having a  rich context allows the system to determine whether the observation is anomalous vs unique. Sometimes unique data values are normal behavior and knowing this allows Anomaly Detector to differentiate between anomalies and unique data values, eliminating false-positives.

Every Result Includes An Explanation

Each observation processed returns a payload of scores that provide insights about the data. thatDot’s proprietary Novelty Score indicates the anomalous nature of any single observation, and Most Novel Component indicates what aspect of the observation contributed most to the Novelty Score. Most Novel Component is a powerful tool for investigations, showing analyst where to look in the data for cause and effect.

Expert Customization

Anomaly Detector is not a “black box” solution. Define your custom use of Anomaly Detector to leverage your expert understanding of your data to  answer questions. What data you use and how it is ordered dictates any bias you wish to introduce and the insights produced.

Model Integrity Management

Anomaly Detector offers three API-driven modes of model management to limit or remove the influence of highly novel observations. Users may: 1. remove individual or groups of observations, 2. define a rolling window of observations, or 3. delete entire contexts from the system.

Real-time Scoring

Streaming Results

An API response is generated for each observation submitted to the API. The response includes the submitted observation data and a series of scores generated by Anomaly Detector. For more complete details see the usage guide, under the heading “Step 3: Interpret the Results”

Bulk Batch Results

When a bulk batch file is submitted to the API a single response file is returned. The response includes a record for each submitted observation, which includes the submitted observation and the API response for each observation. Response records are in the same order as the originally submitted observation data.

Response Value – Novelty Score

The score is the total calculation of how novel the particular observation is. The value is always between 0 and 1, where zero is entirely normal and not-anomalous, and one is highly novel and clearly anomalous. The score is the result of a complex analysis of the observation and other contextual data. In contrast to the next field, this score is weighted primarily by the novelty of individual components of the observation. Depending on the dataset and corresponding observation structure (see Step 2), real-world datasets will often see this score weighted with exponentially fewer results at higher scores. Practically, this often means that 0.99 is a reasonable threshold for finding only the most anomalous results; and 0.999 is likely to return half as many results. But to reiterate, the actual values and results will depend on the data and observation structure.

Response Value – Most Novel Component

Which component in the list from the observation field was the most novel. This value is the index into that list, and is zero-indexed.

Response Value – Info Content

The “Information Content”, “Shannon Information”, or “self-information” contained in this entire observation, given all prior observations. This value is measured in bits, and is an answer to the question: On average, how many “yes/no” questions would I need to ask to identify this observation, given this and all previous observations made to the system.

Response Value – Uniqueness

A value between 0 and 1 which indicates how unique this entire observation is, given all previously observed data. A value of 1 means that this observation has never been seen before (in its entirety). Values approaching 0 indicate that this observation is incredibly common.

Response Value - Probability

This field represents the probability of seeing this entire observation (exactly) given all previous data when the observation was made.

Easy Management

Runs In The Cloud

Novelty Detector is available as a .jar or container. If you subscribe through the AWS Marketplace you can instantiate in minutes using the Cloudformation template. See the Quick Start page for details.

Runs on Premise

Deploy on-premise or in your cloud VPC, the Anomaly Detector application is available as a .jar or container. On-premise deployment ensures the privacy of your data and has none of the security exposures associated with SaaS service use.

Interactive Docs

Every instance includes interactive documentation at port :8080/docs. Interactive docs let you view sample code for the use of that method and provides the ability to submit sample calls to see actual response payloads.

Data Visualization

View distribution plots of API responses for visual insight into score distribution and rapid identification of your most anomalous observations. Plots combine Sequence, Novelty Scores, Uniqueness Scores, and score distribution and display different ranges of observations, including long term history, recent observations and high-scoring events.

Live Exploration Detail

Discover “why” an observation is scored as novel exposing key contextual understanding for root cause analysis. Clicking on any data element allows you to expand the tree to see the range of values observed in the data set. Observation details show the relational context of each data element in the observation and a count of the number of times that value has been observed in the context of the data element preceding it.

Operational Metrics

An operational metrics dashboard monitors critical aspects of the system to provide insight into system performance. Log detail is configurable for external system monitoring.