Quine adds Real-time ETL for Kafka-based Event Streams
Kafka is the tool of choice for data engineers when building streaming data pipelines. Adding Quine into a Kafka-centric data pipeline is the perfect way to introduce streaming analytics to the mix. Adding business logic directly into an event pipeline allows you to process high-value insights in real time. Quine also allows you to add processing of categorical data, which makes up a vast majority of the data your business generates, yet is often overlooked or discarded.
Simple Streaming Pipeline for ETL
Consider this straightforward, minimum viable streaming pipeline.
In this simple pipeline, Vector will produce events (`dummy_log` lines) once a second and stream them into a Kafka topic (`demo-logs`) where an ingest stream from Quine will transform the log events into a streaming graph.
Setting up Vector
Start by installing Vector in your environment. My examples use macOS and may need slight modifications to work correctly in your environment. I installed Vector with `brew install vector`, which includes a sample ` Vector.toml` config in `/opt/homebrew/etc/vector`. I extended the sample Vector config to build our pipeline.
Run Vector to get a feel for the events that Vector emits.
Local Kafka Instance to use with Quine
Kafka is the next step in the pipeline. I set up a single node Kafka cluster in Docker. There are more than enough examples on the internet of how to set up a Kafka cluster in Docker, and please set up the cluster in a way that fits your environment. My cluster uses a docker-compose file that launches version 7.1.1 of Zookeeper and Kafka containers.
Start the Kafka cluster and create a topic called demo-logs.