thatDot Anomaly Detector Features
thatDot Real-time Anomaly Detector for Categorical Data is a general purpose application, deployable in public or private data centers, to find anomalies and insights in streaming or batch data, at scale, in real-time.
Ingest & Transform
Typical anomaly detection is a very mathematical process, requiring exclusively numeric data. Categorical values are typically ignored or “one-hot” encoded, however, such encoding significantly impacts processing performance due to the compounding impact of dimensionality. Natively using categorical data enables the development of a more complete data context and produces superior insight.
Streaming Or API Ingest
Ingest data as individual or bulk observations, as files, or steams from platforms such as Kafka or Kinesis. Bulk score responses are returned in a single payload, the responses in the same order as the submitted data.
Transform Data On Ingest
Data ingestion rates are automatically managed to ensure system operation during volume spikes. Use of Kafka or Kinesis stream queuing is recommended to buffer event streams.
Built on our patent-pending graph interpreter, the system handles massive cardinality, accommodating highly-dimensional data without impacting real-time response performance.
Parallel Operations For Multiple Streams
Ingested observations can be modeled and scored for multiple context simultaneously, answer multiple questions about data efficiently.
thatDot incorporates all data observations to learn a map of behavior from the data, providing a deep understanding of how systems, files, and/or users interact. This behavioral fingerprint of the data enables a nuanced determination of what is anomalous in the data while also providing “why” an observation is seen as novel or not.
No Data Labeling
No pre-training using labeled data is used as the system dynamically learns from the data as it is observed, and new data values are automatically incorporated on the fly.
New Is Not Always Novel
Incorporating both metrics and categorical data, thatDot Anomaly Detector uses the structure created from all observations for a deep understanding of data context. Having a rich context allows the system to determine whether the observation is anomalous vs unique. Sometimes unique data values are normal behavior and knowing this allows Anomaly Detector to differentiate between anomalies and unique data values, eliminating false-positives.
Every Result Includes An Explanation
Each observation processed returns a payload of scores that provide insights about the data. thatDot’s proprietary Novelty Score indicates the anomalous nature of any single observation, and Most Novel Component indicates what aspect of the observation contributed most to the Novelty Score. Most Novel Component is a powerful tool for investigations, showing analyst where to look in the data for cause and effect.
Anomaly Detector is not a “black box” solution. Define your custom use of Anomaly Detector to leverage your expert understanding of your data to answer questions. What data you use and how it is ordered dictates any bias you wish to introduce and the insights produced.
An API response is generated for each observation submitted to the API. The response includes the submitted observation data and a series of scores generated by Anomaly Detector. For more complete details see the usage guide, under the heading “Step 3: Interpret the Results”
Bulk Batch Results
When a bulk batch file is submitted to the API a single response file is returned. The response includes a record for each submitted observation, which includes the submitted observation and the API response for each observation. Response records are in the same order as the originally submitted observation data.
Response Value – Novelty Score
The score is the total calculation of how novel the particular observation is. The value is always between
1, where zero is entirely normal and not-anomalous, and one is highly novel and clearly anomalous. The score is the result of a complex analysis of the observation and other contextual data. In contrast to the next field, this score is weighted primarily by the novelty of individual components of the observation. Depending on the dataset and corresponding observation structure (see Step 2), real-world datasets will often see this score weighted with exponentially fewer results at higher scores. Practically, this often means that
0.99 is a reasonable threshold for finding only the most anomalous results; and
0.999 is likely to return half as many results. But to reiterate, the actual values and results will depend on the data and observation structure.
Response Value – Most Novel Component
Which component in the list from the
observation field was the most novel. This value is the index into that list, and is zero-indexed.
Response Value – Info Content
The “Information Content”, “Shannon Information”, or “self-information” contained in this entire observation, given all prior observations. This value is measured in bits, and is an answer to the question: On average, how many “yes/no” questions would I need to ask to identify this observation, given this and all previous observations made to the system.
Response Value – Uniqueness
A value between
1 which indicates how unique this entire observation is, given all previously observed data. A value of
1 means that this observation has never been seen before (in its entirety). Values approaching
0 indicate that this observation is incredibly common.
Response Value – Probability
This field represents the probability of seeing this entire observation (exactly) given all previous data when the observation was made.
Runs In The Cloud
Anomaly Detector is available as a .jar or container. If you subscribe through the AWS Marketplace you can instantiate in minutes using the Cloudformation template. See the Quick Start page for details.
Deploy on-premise or in your cloud VPC, the Anomaly Detector application is available as a .jar or container. On-premise deployment ensures the privacy of your data and has none of the security exposures associated with SaaS service use.
Every instance includes interactive documentation at port :8080/docs. Interactive docs let you view sample code for the use of that method and provides the ability to submit sample calls to see actual response payloads.
View distribution plots of API responses for visual insight into score distribution and rapid identification of your most anomalous observations. Plots combine Sequence, Novelty Scores, Uniqueness Scores, and score distribution and display different ranges of observations, including long term history, recent observations and high-scoring events.
Live Exploration Detail
Discover “why” an observation is scored as novel exposing key contextual understanding for root cause analysis. Clicking on any data element allows you to expand the tree to see the range of values observed in the data set. Observation details show the relational context of each data element in the observation and a count of the number of times that value has been observed in the context of the data element preceding it.
An operational metrics dashboard monitors critical aspects of the system to provide insight into system performance. Log detail is configurable for external system monitoring.