Categorical data-driven apps made easy with Meroxa + thatDot

thatDot avatar Rob Malnati

Using Categorical Data for Real-time Fraud Detection

Most fraud detection is based on numeric data. Why? Because it’s easier. Categorical data is hard to analyze and virtually impossible to analyze in real-time. Behavioral and profile data can provide the necessary info to detect an anomaly. And we’re not talking about just scoring the categorical data in order to make the models easier. With Meroxa Turbine and thatDot Novelty Detector accessing and analyzing categorical data just got a lot easier.

Turbine is Meroxa’s a real-time data application framework that makes it easy to turn your data pipelines into data applications. The vision for the Meroxa Data Platform and Turbine is to empower Software Engineers to build and deploy data apps; data processing applications that manipulate, enrich and analyze data that solve problems and derive value for the business.

An appealing aspect of the Turbine framework is that it enables the use of highly specialized tools such as thatDot’s Novelty Detector product. Novelty Detector is a real-time anomaly detection tool that uses categorical data to help you find anomalies in your data that you may not have otherwise been able to find while greatly reducing false positives.

Together, these two tools can help you build a data infrastructure powerful enough to handle large volumes of data and that can quickly identify anomalies. This can be a valuable addition to any software stack, as it can help you and your customers avoid costly mistakes and quickly identify and fix problems.

In this blog we’ll outline a simple Turbine Data App that leverages Novelty Detector to highlight novel, noteworthy or otherwise interesting user activities in real-time.

A diagram showing how data flows through Turbine, is passed to thatDot for processing, and back into the Meroxa data pipeline.
The Meroxa data pipeline streaming data through thatDot’s Novelty Detector.


Sign up for a Meroxa account and install the latest Meroxa CLI.

  1. Setup your Novelty Environment and obtain credentials.
  2. Clone the example to your local machine: `git clone`

Since this example uses Go, you will need to have Go installed.

How the Turbine + Novelty Dector app works

The `novelty` Turbine app takes use of activity data (e.g. user A carried out action B at time T) from a PostgreSQL database and streams it in real-time to the Novelty Detector server. The Novelty Detector server scores each “observation” for novelty, adding some additional anomaly metadata, which is then injected back into the PostgreSQL database.

Here’s an example Novelty Detector response payload:

  "observation": [
  "score": 0.36231689108923804,
  "totalObsScore": 0.36231689108923804,
  "sequence": 3,
  "probability": 0.6666666666666666,
  "uniqueness": 0.9943363088569088,
  "infoContent": 0.5849625007211563,
  "mostNovelComponent": {
    "index": 2,
    "value": "observation",
    "novelty": 0.5849625007211563

A full explanation of each field of the payload can be found on the Novelty Detector Usage Guide here but it is worth noting a few of the more interesting payload elements:

  • `observation` – simply the observation originally passed into Novelty Detector, included for reference.
  • `score` – The score is the total calculation of how novel the particular observation is. The value is always between 0 and 1, where zero is entirely normal and not-anomalous, and one is highly novel and clearly anomalous.
  • mostNovelComponent – an object, consisting of `index`, `value`, and `novelty` that indicates just how novel is the most novel component of the observation, indicated by `index` + `value`.

A key aspect of Novelty Detector, and one of the reasons it pairs so well with Turbine, is its simplicity of operation: once you have connected Turbine to Novelty Detector, it starts scoring observations without requiring any other configuration or setup.

Code for the Turbine `novelty` app

The core of the Data App looks much like any typical Turbine app, but there are a couple of sections worth digging into.

func formatObservation(r turbine.Record) []string {
	country := r.Payload.Get("country").(string)
	city := r.Payload.Get("city").(string)
	email := r.Payload.Get("email").(string)
	userID := r.Payload.Get("user_id").(float64)
	tsFloat := r.Payload.Get("timestamp").(float64)
	tod, err := timeOfDay(fmt.Sprint(int(tsFloat)))
	log.Printf("tod: %+v", tod)
	if err != nil {
		log.Printf("error in formatObservation: %s", err.Error())
		return nil

	obs := []string{tod, country, city, email, fmt.Sprint(userID)}

	log.Printf("obs: %+v", obs)

	return obs

Here we’re formatting the observation as an array of categorical data, starting with the value with the lowest cardinality (or the most significant).

A particularly interesting optimization is the bucketing of time data in the form of the `timeOfDay` function.

func timeOfDay(t string) (string, error) {
	intTime, err := strconv.ParseInt(t, 10, 64)
	if err != nil {
		return "", err

	ts := time.Unix(intTime, 0)

	splitAfternoon := 12
	splitEvening := 17
	splitNight := 21

	if ts.Hour() < splitAfternoon {
		return "morning", nil

	if ts.Hour() >= splitAfternoon && ts.Hour() < splitEvening {
		return "afternoon", nil

	if ts.Hour() >= splitEvening && ts.Hour() < splitNight {
		return "evening", nil

	return "night", nil

The function takes a unix timestamp value and maps it to morning, afternoon, evening or night.

Try Turbine + Novelty Detector Yourself or Learn More

You can find the full example for this data app on GitHub. We can’t wait to see what you build 🚀