<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
    <title>thatDot</title>
    <subtitle>thatDot builds Quine Streaming Graph and Novelty Detector: real-time graph analysis of high-volume event streams.</subtitle>
    <link rel="self" type="application/atom+xml" href="https://www.thatdot.com/atom.xml"/>
    <link rel="alternate" type="text/html" href="https://www.thatdot.com"/>
    <generator uri="https://www.getzola.org/">Zola</generator>
    <updated>2026-05-20T00:00:00+00:00</updated>
    <id>https://www.thatdot.com/atom.xml</id>
    <entry xml:lang="en">
        <title>Quine 2.0 Released!</title>
        <published>2026-05-20T00:00:00+00:00</published>
        <updated>2026-05-20T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              Unknown
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://www.thatdot.com/blog/quine-2-0-released/"/>
        <id>https://www.thatdot.com/blog/quine-2-0-released/</id>
        
        <content type="html" xml:base="https://www.thatdot.com/blog/quine-2-0-released/">&lt;p&gt;&lt;img src=&quot;&#x2F;img&#x2F;2026&#x2F;05&#x2F;Screenshot-2026-05-20-at-16.22.18.png&quot; alt=&quot;&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;We are thrilled to announce the release of Quine 2.0! With major updates to key enterprise integrations, A.I. capabilities, and UI&#x2F;UX enhancements, Quine continues to push beyond what&#x27;s possible anywhere else for understanding high-volume data in real-time.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-quine-2-0-thesis&quot;&gt;&lt;strong&gt;The Quine 2.0 Thesis&lt;&#x2F;strong&gt;&lt;&#x2F;h2&gt;
&lt;p&gt;Your A.I. tools need reliable reasoning to be useful in the enterprise. Hallucinations rob the confidence from an otherwise transformative A.I. initiative. Quine is the reliable reasoning engine which anchors those A.I. agents with an always-up-to-date, perfectly reliable, and completely traceable context graph for enterprise agent workflows. Our largest customers are using &quot;Reverse-RAG&quot; with Quine to turn real-time event streams into 100% confident conclusions at scale for critical business processes. Quine is used throughout the Fortune 500 at the world&#x27;s leading banks, cybersecurity companies, and many U.S. government agencies and allied partners to ensure employees and agents always have the right context, at the right time, to make the right decision, when it matters most.&lt;&#x2F;p&gt;
&lt;p&gt;One of the world&#x27;s leading global banks is using Quine to process hundreds of thousands of infrastructure events per second, routing employees to the right resources in real time, before the underlying data shifts and the recommendation becomes stale. The goal is moving from &quot;siloed data and teams&quot; to &quot;real-time accessible intelligence that fuels automation and makes proactive recommendations.&quot;&lt;&#x2F;p&gt;
&lt;p&gt;Whether the challenge is dynamically provisioning IT infrastructure at scale, adversaries on the global stage, threat actors in the cyber domain, fraudsters, or adversarial A.I., Quine helps you see, understand, and react faster than the situation can evolve. It plugs into high-volume data streams and &quot;connects the dots&quot; to build a live streaming graph from millions of events every second. Data agents efficiently monitor that graph to find critical patterns immediately and run reliable algorithms to feed the perfect context for reliable action—all without an army of consultants or forward deployed engineers.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;what-s-new-in-quine-enterprise-2-0&quot;&gt;&lt;strong&gt;What&#x27;s New in Quine Enterprise 2.0&lt;&#x2F;strong&gt;&lt;&#x2F;h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Identity, Access, and Audit&lt;&#x2F;strong&gt; - Governance and access controls dominate enterprise system deployment concerns. Quine Enterprise 2.0 integrates with many kinds of existing enterprise RBAC systems and provides a tailored experience with permission-based UI rendering for each user&#x27;s level of access. OIDC authorization, session management, SP 800-53 audit logging and many of the important-but-boring internal details are all done and in place with the latest version of Quine Enterprise.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;API V2: the Integration Foundation&lt;&#x2F;strong&gt; - Version 2 of the REST API supports RBAC and standardizes on the Google API style guide. You can now transform ingested data with javascript before passing it to a Cypher ingest function (no extra micro-service to transform streaming data in front of Quine!). Did some bad data make it into your stream, handle it intelligently without disruption in the new deadletter queue.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Operating at Scale with Ease&lt;&#x2F;strong&gt; - With a new UI and dashboard, it&#x27;s easy to see the status of your Quine cluster at a glance—if you have the proper permissions. Need to add a new ingest stream or standing query? Just use the simplified multiple-choice template on the &quot;Streams&quot; page to set it up in seconds. Cluster stats and status are now neatly visualized from the dashboard.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Optimized for A.I. Coding Tools&lt;&#x2F;strong&gt; - If you&#x27;re using A.I. tools to work with Quine, just point your agent at &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;docs.thatdot.com&#x2F;&quot;&gt;docs.thatdot.com&lt;&#x2F;a&gt; or the OpenAPI spec built into the docs. Our world-class human-written and human-readable documentation has also been tuned for LLM consumption. Give it a try and you can one-shot your next Quine recipe in a single prompt!&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;You can &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;docs.thatdot.com&#x2F;quine-enterprise&#x2F;reference&#x2F;release-notes&#x2F;#release-200&quot;&gt;see the full list of changes here&lt;&#x2F;a&gt;. If you&#x27;re upgrading a running Quine instance from a 1.x version, &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;docs.thatdot.com&#x2F;quine-enterprise&#x2F;reference&#x2F;upgrade&#x2F;quine-2.0.0&#x2F;&quot;&gt;see the migration guide here&lt;&#x2F;a&gt;.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;deep-history&quot;&gt;Deep History&lt;&#x2F;h2&gt;
&lt;p&gt;In 2022 we released Quine 1.0 as open source software at &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;quine.io&#x2F;&quot;&gt;https:&#x2F;&#x2F;quine.io&lt;&#x2F;a&gt; A revolutionary advance for graphs+streaming, it was the result of many years of high-level R&amp;amp;D funded DARPA, Crowdstrike, and other large organizations confronting challenges on the cutting edge of what&#x27;s possible. Since that release, Quine&#x27;s user base expanded dramatically to include many of the world&#x27;s largest banks and governments, who use Quine for critical applications inside their high-volume enterprise data pipelines. To be used at that scale also requires building the surrounding infrastructure to manage such large deployments and massive data volumes. The &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;quine.io&#x2F;&quot;&gt;Open Source version of Quine&lt;&#x2F;a&gt; has many new features and quality of life improvements, but many of the key capabilities in the 2.0 release are in support of our enterprise users.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;around-the-corner&quot;&gt;&lt;strong&gt;Around the Corner&lt;&#x2F;strong&gt;&lt;&#x2F;h2&gt;
&lt;p&gt;The Quine 2.0 release sets a foundation for many more exciting capabilities right around the corner. From deeper A.I. integration, a revolutionary new indexing capability, automated query writing, and more integration sources, the 2.0 is not just a major milestone, it&#x27;s the starting gun for the next leg of this exciting race.&lt;&#x2F;p&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>The Secret Ingredient in the Alphabet Soup of Cybersecurity</title>
        <published>2025-03-04T00:00:00+00:00</published>
        <updated>2025-03-04T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              Unknown
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://www.thatdot.com/blog/the-secret-ingredient-in-the-alphabet-soup-of-cybersecurity/"/>
        <id>https://www.thatdot.com/blog/the-secret-ingredient-in-the-alphabet-soup-of-cybersecurity/</id>
        
        <content type="html" xml:base="https://www.thatdot.com/blog/the-secret-ingredient-in-the-alphabet-soup-of-cybersecurity/">&lt;p&gt;&lt;img src=&quot;&#x2F;img&#x2F;2025&#x2F;03&#x2F;ABC-SecretIngredient-.png&quot; alt=&quot;Alphabet soup of cypher security&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;This is the first in a series of blogs exploring how the &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;http:&#x2F;&#x2F;quine.io&quot;&gt;Quine&lt;&#x2F;a&gt; Streaming Graph analytics engine is the secret ingredient in the Alphabet Soup of cybersecurity, enabling faster, more accurate detection of complex threats without compromising on the type or volume of data analyzed, the fidelity of alerts or response time.&lt;&#x2F;p&gt;
&lt;h4 id=&quot;the-dilemma-of-data-in-cybersecurity&quot;&gt;&lt;strong&gt;The Dilemma of Data in Cybersecurity&lt;&#x2F;strong&gt;&lt;&#x2F;h4&gt;
&lt;p&gt;As we all know, the letter combinations in cybersecurity continue to grow, sometimes falling out of view, floating just under the surface, and others rising to the top.  These letter combinations include network protection (NDR, NTA, ID&#x2F;PS), endpoint (XDR, FIM, EPP, EDR, HIPS), or cloud (CWPP, CSPM, and CNAPP). Despite their diversity, these solutions all face a shared challenge: the amount of data they need to analyze and how to go about it.&lt;&#x2F;p&gt;
&lt;p&gt;Including the correct information in the analysis process is a delicate balance - like that right balance of herbs and spices in our favorite meal.  It is no simple task to determine which data to analyze and how to do it efficiently without the risk of false positives&#x2F;negatives. The current approach is to look at it in subsets and cohorts, but never holistically.  In some cases, this decision is warranted; the data is irrelevant - that ingredient simply does not go into our meal.  However, this process frequently results in context being left on the proverbial cutting board, and the data so watered down it is useless.&lt;&#x2F;p&gt;
&lt;h4 id=&quot;a-new-paradigm-data-analysis-without-compromise&quot;&gt;&lt;strong&gt;A New Paradigm: Data Analysis Without Compromise&lt;&#x2F;strong&gt;&lt;&#x2F;h4&gt;
&lt;p&gt;Imagine the following what ifs: what if we didn’t have to exclude relevant data?  what if you did not need to leave relevant data on the cutting board? what if we could analyze &lt;strong&gt;all&lt;&#x2F;strong&gt; data for any time - past, present, or future?&lt;&#x2F;p&gt;
&lt;p&gt;&lt;em&gt;&lt;strong&gt;With thatDot’s Quine Streaming Graph, you can continuously analyze real-time and historical data at scale to identify complex patterns and enable your solution to trigger an action within milliseconds.&lt;&#x2F;strong&gt;&lt;&#x2F;em&gt;&lt;&#x2F;p&gt;
&lt;p&gt;This enables product owners to reenvision current features and approaches—moving from periodic batch processing to real-time analytics. For cybersecurity vendors, this changes the game. Instead of relying on batch processing or overlooking key data for speed, you &lt;em&gt;&lt;strong&gt;can&lt;&#x2F;strong&gt;&lt;&#x2F;em&gt; achieve &lt;strong&gt;instant notifications&lt;&#x2F;strong&gt; to trigger mitigation and containment routines.&lt;&#x2F;p&gt;
&lt;h4 id=&quot;what-s-next&quot;&gt;&lt;strong&gt;What’s Next?&lt;&#x2F;strong&gt;&lt;&#x2F;h4&gt;
&lt;p&gt;There are various ways in which we intend to explore adding thatDot to various cybersecurity solutions to see what we can cook up.  Each of these is either not done adequately or only viable with lots of development time, custom code, and homegrown analysis pipelines, such as:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;Identifying attack paths&lt;&#x2F;li&gt;
&lt;li&gt;Triggering immediate response&lt;&#x2F;li&gt;
&lt;li&gt;Continuous enrichment of event data&lt;&#x2F;li&gt;
&lt;li&gt;Identify even the most latent of patterns&lt;&#x2F;li&gt;
&lt;li&gt;Real-time as well as “point-in-time” visibility&lt;&#x2F;li&gt;
&lt;li&gt;Context-Aware Threat Intelligence&lt;&#x2F;li&gt;
&lt;li&gt;Real-time MITRE TTP Awareness&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;These are problems that can be solved in other ways; however, how long does it take to develop each successive detection pipeline?  Other than time to market, what else are you giving up? It may be avoiding asking too complex a question or inspecting a tiny sliver of time? Or just settling for pseudo and “near” real-time analysis?  Let’s explore Quine, drop some of those qualifiers, and make something great!&lt;&#x2F;p&gt;
&lt;h4 id=&quot;learn-more&quot;&gt;&lt;strong&gt;Learn More&lt;&#x2F;strong&gt;&lt;&#x2F;h4&gt;
&lt;p&gt;Check out these resources:&lt;&#x2F;p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;a href=&quot;https:&#x2F;&#x2F;www.thatdot.com&#x2F;blog&#x2F;are-you-ready-for-low-and-slow-authentication-attacks&#x2F;&quot;&gt;Are You Ready for Low and Slow Auth Attacks&lt;&#x2F;a&gt; Blog Post&lt;&#x2F;li&gt;
&lt;li&gt;Quine for &lt;a href=&quot;https:&#x2F;&#x2F;www.thatdot.com&#x2F;use-cases&#x2F;&quot;&gt;cybersecurity&lt;&#x2F;a&gt; and &lt;a href=&quot;https:&#x2F;&#x2F;www.thatdot.com&#x2F;use-cases&#x2F;financial-fraud-detection&#x2F;&quot;&gt;fraud&lt;&#x2F;a&gt; use cases&lt;&#x2F;li&gt;
&lt;li&gt;Download &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;www.quine.io&#x2F;download&quot;&gt;Quine&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;&#x2F;ol&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>Stream Processing World Meets Streaming Graph at Current 2024</title>
        <published>2024-09-24T00:00:00+00:00</published>
        <updated>2024-09-24T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              Unknown
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://www.thatdot.com/events/stream-processing-world-meets-streaming-graph-at-current-2024/"/>
        <id>https://www.thatdot.com/events/stream-processing-world-meets-streaming-graph-at-current-2024/</id>
        
        <content type="html" xml:base="https://www.thatdot.com/events/stream-processing-world-meets-streaming-graph-at-current-2024/">&lt;p&gt;The thatDot team had a great time last week at Confluent’s big conference, Current 2024. Our apologies to anyone that may have been hit. Attendees and exhibitors alike loved the thatDot Frisbees. We spoke with multiple attendees, learned about the challenges the stream processing community have when trying to do their jobs with KSQLDB or Flink in their data pipelines. We helped them understand how thatDot would fit into their architecture and how our approach to stream processing - &lt;a href=&quot;https:&#x2F;&#x2F;www.thatdot.com&#x2F;products&#x2F;quine&#x2F;&quot;&gt;thatDot Streaming Graph&lt;&#x2F;a&gt; - can scale to meet their demands while solving deeper problems that challenge the key-value&#x2F;relational models of other stream processors.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;&#x2F;img&#x2F;2024&#x2F;09&#x2F;IMG_2514.jpg&quot; alt=&quot;Ryan Wright explaining to multiple Current attendees how thatDot Streaming Graph event stream processor integrates with pipeline software like Kafka. At a small booth at Current, with Integration Options label on a diagram on the booth screen.&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;h2 id=&quot;will-flink-fail-at-practical-stream-processing-like-streams-and-ksqldb-did&quot;&gt;Will Flink fail at practical stream processing like Streams and KSQLDB did?&lt;&#x2F;h2&gt;
&lt;p&gt;Some attendees mentioned that over the years Confluent has introduced two other technologies intended to provide stream processing in Kafka pipelines. The first technology introduced for event stream processing was Kafka Streams, but people were not satisfied with its capabilities. Then KSQLDB was the way to go. But that didn’t work quite as advertised, either.&lt;&#x2F;p&gt;
&lt;p&gt;Now Flink is the new technology touted to solve the problems of previous stream processing engines. Yet, Flink practitioners know that the complexity inherent in Flink operations necessitates a high level of expertise to run it. Out-of-memory errors due to making time windows too wide, or trying to join too many things across data streams were common problems people reported. We heard a repeated question from attendees about how long it will likely be until Confluent starts looking for the next stream processing technology.&lt;&#x2F;p&gt;
&lt;p&gt;As an industry, we keep doing the same thing over and over again and expecting a different result–the definition of insanity. By continuing to use the same relational key&#x2F;value type of mindset to process data with every stream processor, we continue to run into the same problems.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;&#x2F;img&#x2F;2024&#x2F;09&#x2F;IMG_2128.jpg&quot; alt=&quot;hatDot booth at Current conference showing blue frisbees, a paper stack that says, &amp;quot;thatDot is Categorically different,&amp;quot; and a screen that says, &amp;quot;Multiple data streams with lots of joins on huge categorical datasets is super hard in most stream processors. For us, it&amp;#39;s Tuesday.&amp;quot;&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;h2 id=&quot;advantages-of-stream-processors-and-graph-data-models&quot;&gt;Advantages of stream processors and graph data models&lt;&#x2F;h2&gt;
&lt;p&gt;For years, people have understood the power of graph data to connect the dots and see the big contextual picture. But graph databases have the same problem as any other database, the data is no longer real-time streaming. It’s at rest. That inherently makes data analysis too slow for some of the most important and urgent actions a stream processor is needed for, like catching cybersecurity intrusions or stopping a fraudulent transaction.&lt;&#x2F;p&gt;
&lt;p&gt;Ryan wright always says, &quot;Answers now are always better than answers later.&quot; Graph analysis that can work with modern data volumes at stream processor speed is a paradigm shift. You get the answers to deep questions fast. Instead of finding out months or even days later that you were breached, or your company was ripped off, you can stop problems before they cost you.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;advantages-of-thatdot-streaming-graph&quot;&gt;Advantages of thatDot Streaming Graph&lt;&#x2F;h2&gt;
&lt;p&gt;Some of the advantages the attendees of Confluent told us they found most compelling about thatDot Streaming Graph included:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Simplicity at scale&lt;&#x2F;strong&gt; - Flink has to manage current state and do complex logic for fault tolerance such as checkpoints&#x2F;save points. Streaming graph doesn’t require any of that. Dynamic graph technologies don’t require state management, and high availability in our stream processor is more automatic. thatDot is far simpler to use, even at &lt;a href=&quot;https:&#x2F;&#x2F;www.thatdot.com&#x2F;blog&#x2F;scaling-quine-streaming-graph-to-process-1-million-events-sec&#x2F;&quot;&gt;high scale&lt;&#x2F;a&gt;.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Unlimited joins&lt;&#x2F;strong&gt; - For a relational key&#x2F;value data model like Flink uses, multiple joins are difficult and memory intensive. For thatDot’s graph data model, unlimited joins in stream processing are the normal way we do things.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Categorical analysis&lt;&#x2F;strong&gt; - Most analytic tools can only analyze numbers. This means if you want to analyze people, places, events, locations, etc., you have to convert that data into wide, sparse numeric data, rest it in a database, analyze it, then turn it back into &lt;a href=&quot;https:&#x2F;&#x2F;www.thatdot.com&#x2F;blog&#x2F;whats-the-difference-between-categorical-and-numerical-data&#x2F;&quot;&gt;categorical data&lt;&#x2F;a&gt; to get a final, understandable if a bit muddy answer. Having to rest the data before analysis slows response time hugely, and even then, your answer is likely to be unclear and inaccurate. thatDot analyzes categorical data directly, right in the stream processor.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Time unbound analysis&lt;&#x2F;strong&gt; - An event stream processor taking unbounded data streams and chopping them into little time-bounded chunks in order to analyze them has always been a workaround in our opinion. thatDot analyzes the whole data stream as it flows by, with no time window limitations. Even data stored in a file or in a database can be joined with current data. Important points from six months ago can be joined with data from six milliseconds ago to complete a picture and answer an analytical question.&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;h2 id=&quot;ryan-s-talk-on-streaming-entity-resolution-for-kafka&quot;&gt;Ryan&#x27;s talk on &quot;Streaming Entity Resolution for Kafka&quot;&lt;&#x2F;h2&gt;
&lt;p&gt;&lt;img src=&quot;&#x2F;img&#x2F;2024&#x2F;09&#x2F;IMG_2505.jpg&quot; alt=&quot;Ryan Wright speaking at Current 2024 stream processing conference&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;Ryan Wright, our founder and CTO, did a very cool presentation at Current on entity resolution in stream processing which caught a lot of attention, especially from data engineers and anyone working toward real-time master data management. That’s live on the Current site now. Be sure to check it out: &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;current.confluent.io&#x2F;2024-sessions&#x2F;streaming-entity-resolution-for-kafka-with-quine&quot;&gt;https:&#x2F;&#x2F;current.confluent.io&#x2F;2024-sessions&#x2F;streaming-entity-resolution-for-kafka-with-quine&lt;&#x2F;a&gt;&lt;&#x2F;p&gt;
&lt;p&gt;To learn more, check out the &lt;a href=&quot;https:&#x2F;&#x2F;www.thatdot.com&#x2F;products&#x2F;quine&#x2F;&quot;&gt;thatDot Streaming Graph product page.&lt;&#x2F;a&gt;&lt;&#x2F;p&gt;
&lt;p&gt;Or, try the &lt;a href=&quot;https:&#x2F;&#x2F;www.thatdot.com&#x2F;getting-started&#x2F;&quot;&gt;Streaming Graph or Novelty free trial&lt;&#x2F;a&gt; for yourself.&lt;&#x2F;p&gt;
&lt;p&gt;Get the &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;info.thatdot.com&#x2F;enhance-data-security-with-thatdot&quot;&gt;handouts we gave to Current attendees&lt;&#x2F;a&gt;.&lt;&#x2F;p&gt;
&lt;p&gt;And be sure to catch the thatDot team at Current 2025!&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;&#x2F;img&#x2F;2024&#x2F;09&#x2F;IMG_2524.jpg&quot; alt=&quot;Foreground is five people with Ryan Wright, CTO of thatDot at center. Current background with Austin, Texas landmarks.&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>Streaming Graph Get Started</title>
        <published>2024-07-23T00:00:00+00:00</published>
        <updated>2024-07-23T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              Unknown
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://www.thatdot.com/blog/streaming-graph-get-started/"/>
        <id>https://www.thatdot.com/blog/streaming-graph-get-started/</id>
        
        <content type="html" xml:base="https://www.thatdot.com/blog/streaming-graph-get-started/">&lt;p&gt;It&#x27;s been said that graphs are everywhere. Graph-based data models provide a flexible and intuitive way to represent complex relationships and interconnectedness in data. They are particularly well-suited for scenarios where relationships and patterns are important, but until recently, they have been confined to a handful of use cases – databases, chip design, information theory, AI – that all have one thing in common: the data in question is stored first and then processed, usually as a batch job.&lt;&#x2F;p&gt;
&lt;p&gt;In other words, the data in these use cases is at rest. However, what about data in motion, data in event-driven use cases that is constantly changing and being transmitted? As event-driven applications and operational intelligence scenarios, such as real-time monitoring, situational intelligence, and fraud detection, continue to expand rapidly, graph data models and the primary query language used for them, Cypher, are proving much more versatile than SQL and the best tools for the task.&lt;&#x2F;p&gt;
&lt;p&gt;Consider the challenge of extracting insights from a complex event stream. The stream may have high volume and velocity, require the correlation of events by context from multiple sources, contain meaningful event patterns, and have a short timeframe to identify, detect, and take action. Addressing these challenges requires efficient data processing, scalable infrastructure, and effective event modeling techniques in graph solutions and Cypher.&lt;&#x2F;p&gt;
&lt;p&gt;Graph databases are useful for batch processing a portion of a complex event stream to provide macro-level insights and metrics to understand events but not take action. The same concepts (patterns and algorithms) used in graph databases when the event stream is at rest can be applied to an event stream while in motion using a streaming graph like Quine, often directly reusing the Cypher written in the database. Here’s how Cypher addresses the challenges found in complex event streams:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Pattern Matching:&lt;&#x2F;strong&gt; Cypher excels at pattern matching, allowing you to detect patterns (sub-graphs) in the event stream. This is particularly useful for identifying sequences of events or detecting specific patterns, allowing you to efficiently filter and process relevant events based on their relationships and properties.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Event Correlation:&lt;&#x2F;strong&gt; You can define relationships between events and other entities, such as users, devices, or locations. This enables you to correlate events based on common attributes or shared relationships, often with high cardinality and a mix of categorical and numerical data, to identify patterns, anomalies, or complex dependencies.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Time-based Queries:&lt;&#x2F;strong&gt; Cypher provides temporal capabilities, allowing you to query and analyze events based on their timestamps or time intervals. You can filter events based on specific time ranges, compare temporal values, and perform time-based aggregations. This enables you to process time-dependent patterns, detect trends, and perform time window-based computations on the event stream.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Real-time Insights:&lt;&#x2F;strong&gt; You can continuously execute Cypher queries on an incoming event stream, allowing for dynamic analysis and near real-time decision-making. This enables you to monitor, detect patterns, and trigger actions based on the evolving stream of events.&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;h2 id=&quot;event-pattern-detection&quot;&gt;Event Pattern &lt;strong&gt;Detection&lt;&#x2F;strong&gt;&lt;&#x2F;h2&gt;
&lt;p&gt;Specifying a pattern (sub-graph) to &lt;strong&gt;&lt;code&gt;MATCH&lt;&#x2F;code&gt;&lt;&#x2F;strong&gt; can identify specific sequences of events or combinations of events of interest. For example, when observing the efficiency of cache nodes in a CDN network, Cypher can easily identify when a series (10) of cache misses occur and send an alert to the NOC to trigger an investigation.&lt;&#x2F;p&gt;
&lt;p&gt;The Cypher required to detect a &lt;strong&gt;&lt;code&gt;MISS&lt;&#x2F;code&gt;&lt;&#x2F;strong&gt; event only needs to identify the node types and relationships as a pattern.&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;MATCH (server1:server)&amp;lt;-[:TARGETED]-(event1 {cache_class:&amp;quot;MISS&amp;quot;})-[:REQUESTED]-&amp;gt;(asset)&amp;lt;-[:REQUESTED]-(event2 {cache_class:&amp;quot;MISS&amp;quot;})-[:TARGETED]-&amp;gt;(server2:server)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;RETURN DISTINCT id(event1) AS event1&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Then, additional Cypher processes the event to take action, recording it as a metric or sending an alert if the metric constraint is exceeded. This technique is demonstrated in the CDN Observability recipe. An unexpected &lt;strong&gt;challenge&lt;&#x2F;strong&gt;&lt;&#x2F;p&gt;
&lt;p&gt;Cypher can respond to changes in the event stream in real time, allowing organizations to reduce the risk associated with a condition&#x27;s duration before it is analyzed and addressed. For example, the Financial Risk Calculation recipe models market changes in real-time so that organizations can provide adequate coverage for risk exposure while ensuring their regulatory compliance minimally affects their asset allocation. As basic patterns are matched, results are passed to business logic written in Cypher to generate an adjusted trading value, correlate (roll-up) trading events across the network, and trigger an alert when the trading system is out of compliance. When a pattern match query detects an investment pattern, it triggers an output query to process the StandingQueryResult.&lt;&#x2F;p&gt;
&lt;p&gt;For example, the result returned from an investment pattern in Cypher:&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;MATCH MATCH (investment:investment)&amp;lt;-[:HOLDS]-(desk:desk)&amp;lt;-[:HAS]-(institution:institution)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;RETURN DISTINCT id(investment) AS id&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;Triggers business logic in Cypher to generate a new property with a value based on the nodes investment.class property.SET investment.adjustedValue = CASE&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt; WHEN investment.class = &amp;#39;1&amp;#39; THEN investment.value WHEN investment.class = &amp;#39;2a&amp;#39; THEN investment.value * .85 WHEN investment.class = &amp;#39;2b&amp;#39; AND investment.type = 9 THEN investment.value * .75 WHEN investment.class = &amp;#39;2b&amp;#39; AND investment.type = 10 THEN investment.value * .5&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;END&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;The investment events are then correlated through a roll-up function for each investment type.&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;UNWIND [[&amp;quot;1&amp;quot;,&amp;quot;adjustedValue1&amp;quot;], [&amp;quot;2a&amp;quot;,&amp;quot;adjustedValue2a&amp;quot;], [&amp;quot;2b&amp;quot;,&amp;quot;adjustedValue2b&amp;quot;]] AS stuff&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;WITH institution,investment,desk,stuff&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;WHERE investment.class = stuff[0]&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;CALL float.add(institution,stuff[1],investment.adjustedValue) YIELD result AS institutionAdjustedValueRollupByClass&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;CALL float.add(institution,&amp;quot;totalAdjustedValue&amp;quot;,investment.adjustedValue) YIELD result AS institutionAdjustedValueRollup&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;CALL float.add(desk,stuff[1],investment.adjustedValue) YIELD result AS deskAdjustedValueRollupByClass&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;CALL float.add(desk,&amp;quot;totalAdjustedValue&amp;quot;,investment.adjustedValue) YIELD result AS deskAdjustedValueRollup&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;SET institution.percentAdjustedValue2 = ((institution.adjustedValue2a + institution.adjustedValue2b)&#x2F;institution.totalAdjustedValue) * 100,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;institution.percentAdjustedValue2b = (institution.adjustedValue2b&#x2F;institution.totalAdjustedValue) * 100&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;&lt;h2 id=&quot;temporal-analysis&quot;&gt;&lt;strong&gt;Temporal&lt;&#x2F;strong&gt; Analysis&lt;&#x2F;h2&gt;
&lt;p&gt;With Cypher, you can express temporal conditions, such as events occurring within a specific time window, events happening before or after certain events, or events falling into a particular time range. This enables temporal analysis of event streams, including trend analysis, time-based aggregations, and windowed computations. For example, the temporal locality recipe looks for emails sent or received by &lt;strong&gt;&lt;code&gt;cto@company.com&lt;&#x2F;code&gt;&lt;&#x2F;strong&gt; within a four to six-minute sliding window. The pattern query matches each individual &lt;strong&gt;&lt;code&gt;(sender)-[:SENT_MSG]-&amp;gt;(message)-[:RECEIVED_MSG]-&amp;gt;(receiver)&lt;&#x2F;code&gt;&lt;&#x2F;strong&gt; pattern containing the CTO’s email address.&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;MATCH (n)-[:SENT_MSG]-&amp;gt;(m)-[:RECEIVED_MSG]-&amp;gt;(r)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;WHERE n.email=&amp;quot;cto@company.com&amp;quot; OR r.email=&amp;quot;cto@company.com&amp;quot;&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;RETURN id(n) as ctoId, id(m) as ctoMsgId, m.time as mTime, id(r) as recId&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;And then calculates the duration between the emails to generate a sub-graph containing messages that went to or from the CTO within the time window.&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;MATCH (n)-[:SENT_MSG]-&amp;gt;(m)-[:RECEIVED_MSG]-&amp;gt;(r), (thisMsg)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;WHERE id(n) = $that.data.ctoId&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;AND id(r) = $that.data.recId&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;AND id(thisMsg) = $that.data.ctoMsgId&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;AND id(m) &amp;lt;&amp;gt; id(thisMsg)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;AND duration(&amp;quot;PT6M&amp;quot;) &amp;gt; duration.between(m.time,thisMsg.time) &amp;gt; duration(&amp;quot;P&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;CREATE (m)-[:IN_WINDOW]-&amp;gt;(thisMsg)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;CREATE (m)&amp;lt;-[:IN_WINDOW]-(thisMsg) WITH n, m, r, &amp;quot;http:&#x2F;&#x2F;localhost:8080&#x2F;#MATCH&amp;quot; + text.urlencode(&amp;#39; (n)-[:SENT_MSG]-&amp;gt;(m)-[:RECEIVED_MSG]-&amp;gt;(r) WHERE strId(n)=&amp;quot;&amp;#39; + strId(n) + &amp;#39;&amp;quot;AND strId(r)=&amp;quot;&amp;#39; + strId(r) + &amp;#39;&amp;quot; AND strId(m)=&amp;quot;&amp;#39; + strId(m) + &amp;#39;&amp;quot; RETURN n, r, m&amp;#39;) a&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;RETURN URL&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;&lt;h2 id=&quot;conclusion&quot;&gt;&lt;strong&gt;Conclusion&lt;&#x2F;strong&gt;&lt;&#x2F;h2&gt;
&lt;p&gt;Cypher is a powerful and expressive query language well-suited for processing complex event streams. Quine streaming graph enables Cypher developers to leverage graph techniques early when processing a complex event stream to aggregate and shape events, detect patterns for alerting and early feedback, and perform event normalization before entering the data warehouse. Learn more and Try Quine&lt;&#x2F;p&gt;
&lt;p&gt;If you want to try Quine using your own data, here are some resources to help:&lt;&#x2F;p&gt;
&lt;ol&gt;
&lt;li&gt;Learn more about Quine by visiting the Quine open source project.&lt;&#x2F;li&gt;
&lt;li&gt;Download Quine - JAR file | Docker Image | Github&lt;&#x2F;li&gt;
&lt;li&gt;Check out the Financial Risk Calculation recipe to see how Cypher is used for real-time rollups.&lt;&#x2F;li&gt;
&lt;li&gt;Check out demos and other videos at our YouTube channel.&lt;&#x2F;li&gt;
&lt;&#x2F;ol&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>Streaming Graph for Real-Time Risk Analysis at Data Connect in Columbus 2024</title>
        <published>2024-07-23T00:00:00+00:00</published>
        <updated>2024-07-23T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              Unknown
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://www.thatdot.com/blog/streaming-graph-real-time-risk-analysis-data-connect/"/>
        <id>https://www.thatdot.com/blog/streaming-graph-real-time-risk-analysis-data-connect/</id>
        
        <content type="html" xml:base="https://www.thatdot.com/blog/streaming-graph-real-time-risk-analysis-data-connect/">&lt;p&gt;After more than 25 years in the data management and analysis industry, I had a brand new experience. I attended a technical conference. No, that wasn’t the new thing. At many conferences, I’ve been surrounded by data scientists, business analysts, data engineers, mathematicians, developers, startup founders, CTO’s, architects, and PHD students, made network connections, listened to giants in the field, like the Chief of Information Management of the United Nations at this one. But, uniquely, at this one conference, &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;www.dataconnectconf.com&#x2F;dataconnect&#x2F;conference&quot;&gt;Data Connect&lt;&#x2F;a&gt;, organized by Women in Analytics, 9 out of 10 of those leaders in the field were women, and all the speakers were women or a gender minority.&lt;&#x2F;p&gt;
&lt;p&gt;It was a soul-filling feeling. Sometimes, it can feel isolating to be a woman in a technical field, but for 2 days, I was surrounded by smart, capable women encouraging each other and talking shop. I got a copy of &lt;em&gt;Low Code AI&lt;&#x2F;em&gt; signed by Dr. Gwendolyn Stripling who was one of the coolest people we met there, and &lt;em&gt;Unmasking AI&lt;&#x2F;em&gt; by Joy Buolamwini, both of whom gave brilliant presentations.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;risky-real-time-risk-analysis-presentation&quot;&gt;&lt;strong&gt;Risky real-time risk analysis presentation&lt;&#x2F;strong&gt;&lt;&#x2F;h2&gt;
&lt;p&gt;For my presentation, I talked about a way to do powerful risk analysis in real-time. Not too surprising, the method used &lt;a href=&quot;https:&#x2F;&#x2F;www.thatdot.com&#x2F;products&#x2F;quine&#x2F;&quot;&gt;thatDot Streaming Graph&lt;&#x2F;a&gt;. What was surprising is that I went out of my comfort zone for this deeply technical audience; I did a live demo of the risk analysis recipe. Live demos are always a bit nerve-wracking at conferences, and having never done one before with thatDot tech, well … talk about risky.&lt;&#x2F;p&gt;
&lt;p&gt;The presentation defined risk analysis and pointed out the failure of Washington Mutual, the largest bank failure in US history, and Silicon Valley Bank last year, the second largest bank failure in US history. Both, due largely to poor risk management. Those are just two of the &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;www.fdic.gov&#x2F;resources&#x2F;resolutions&#x2F;bank-failures&#x2F;in-brief&#x2F;&quot;&gt;over 550 banks that have failed&lt;&#x2F;a&gt; since the turn of the century. Between a relaxation of government oversight and less than ideal risk calculation, we’re lucky our economy is still functioning.&lt;&#x2F;p&gt;
&lt;p&gt;Since the government regulations aren’t my area, I focused on the problems with current risk analysis methods, mainly that they’re batch and often take 24 or more hours to complete. Even longer if you and your bank HQ aren’t in the same time zone. Since many trades or investments have a regulated time during which a bank can decide to accept the risk and approve the trade or not, usually 24 hours, slow batch processing can expose them far too much.&lt;&#x2F;p&gt;
&lt;p&gt;Most financial institutions are shifting to graph analysis for the entity type of categorical analysis needed, but graph databases don’t scale well to the levels large banks require. Event stream processors scale just fine and are real-time by nature, but they have difficulty with the kind of deep graph analysis. So, you need something with powerful graph analytics at event stream processor speeds to get to real-time risk analysis. The risk analysis recipe uses simulated data, but does a good job of showing the speed of analysis and how it could be done.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;&#x2F;img&#x2F;2024&#x2F;07&#x2F;image-1-1.png&quot; alt=&quot;&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;The presentation was well-received with one person coming up and telling me they thought it was the best presentation of the conference. Wow. Now, that’s a heck of a compliment considering the caliber of presenters.&lt;&#x2F;p&gt;
&lt;p&gt;I’m looking forward to going to Data Connect again next year, and if you want to learn more about data analysis and data management, don’t miss it.&lt;&#x2F;p&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>The Power of Real-Time Entity Resolution with Ryan Wright</title>
        <published>2024-07-08T00:00:00+00:00</published>
        <updated>2024-07-08T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              Unknown
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://www.thatdot.com/events/the-power-of-real-time-entity-resolution/"/>
        <id>https://www.thatdot.com/events/the-power-of-real-time-entity-resolution/</id>
        
        <content type="html" xml:base="https://www.thatdot.com/events/the-power-of-real-time-entity-resolution/">&lt;p&gt;Ever wondered why duplicate records keep slipping through your data streams? This September, thatDot&#x27;s Founder and CEO Ryan Wright will be addressing this critical issue at Current.&lt;&#x2F;p&gt;
&lt;p&gt;Data inconsistencies in Kafka streams, such as misspelled company names, users registering with different email addresses, or multiple bank accounts linked to the same person, can present significant challenges. These issues not only hinder the adoption of streaming data technologies but also impact organizations across the spectrum, from major banks to small startups.&lt;&#x2F;p&gt;
&lt;p&gt;Recent advancements in open-source streaming graph tools, like thatDot&#x27;s Streaming Graph powered by Quine Open Source, have made it easier to clean and resolve data in real-time. These tools offer powerful entity resolution at scale, even as data flows in motion and potentially out of order.&lt;&#x2F;p&gt;
&lt;p&gt;Event Details&lt;&#x2F;p&gt;
&lt;p&gt;Title: Streaming Entity Resolution for Kafka with Quine&lt;&#x2F;p&gt;
&lt;p&gt;Dates: September 17, 2024&lt;&#x2F;p&gt;
&lt;p&gt;Time: Tue Sep 17, 1:30 PM - 1:40 PM CDT   (10 Min)&lt;&#x2F;p&gt;
&lt;p&gt;&lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;current.confluent.io&#x2F;registration&quot;&gt;Register for the Event&lt;&#x2F;a&gt;&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;&#x2F;img&#x2F;2024&#x2F;07&#x2F;Current-2024-_-I-Am-Speaking-_-Light.jpg&quot; alt=&quot;Current 2024 _ I Am Speaking _ Light&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;Join us on Tuesday, September 17, from 1:30 PM to 1:40 PM CDT in Breakroom 5 for an illuminating lightning talk with Ryan Wright, as we delve into two cutting-edge approaches to real-time entity resolution using the Quine Open Source &lt;a href=&quot;https:&#x2F;&#x2F;www.thatdot.com&#x2F;products&#x2F;quine&#x2F;&quot;&gt;streaming graph&lt;&#x2F;a&gt;:&lt;&#x2F;p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Viewing Your Stream as a Graph&lt;&#x2F;strong&gt;: Leveraging event-triggered &quot;standing queries&quot; for real-time entity resolution.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;AI-Powered Entity Resolution&lt;&#x2F;strong&gt;: Using historical stream data to enable AI-driven resolution with graph neural networks.&lt;&#x2F;li&gt;
&lt;&#x2F;ol&gt;
&lt;p&gt;In both scenarios, you&#x27;ll witness how a Kafka stream filled with messy data can be transformed into a clean, entity-resolved output.&lt;&#x2F;p&gt;
&lt;p&gt;Don&#x27;t miss this opportunity to learn how to enhance your data streams and drive better insights. Stay tuned for more updates on this event.&lt;&#x2F;p&gt;
&lt;p&gt;For more details on the event and speakers, visit &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;current.confluent.io&#x2F;speakers&quot;&gt;Current 2024 Speakers&lt;&#x2F;a&gt;.&lt;&#x2F;p&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>Cypher all the things!</title>
        <published>2024-07-03T00:00:00+00:00</published>
        <updated>2024-07-03T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              Unknown
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://www.thatdot.com/blog/cypher-all-the-things/"/>
        <id>https://www.thatdot.com/blog/cypher-all-the-things/</id>
        
        <content type="html" xml:base="https://www.thatdot.com/blog/cypher-all-the-things/">&lt;p&gt;It&#x27;s been said that graphs are &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;neo4j.com&#x2F;blog&#x2F;graphs-are-everywhere-possibilities&#x2F;&quot;&gt;everywhere&lt;&#x2F;a&gt;. Graph-based data models provide a flexible and intuitive way to represent complex relationships and interconnectedness in data. They are particularly well-suited for scenarios where relationships and patterns are important, but until recently, they have been confined to a handful of use cases – databases, chip design, information theory, AI – that all have one thing in common: the data in question is stored first and then processed, usually as a batch job. In other words, the data in these use cases is &lt;em&gt;at rest.&lt;&#x2F;em&gt;&lt;&#x2F;p&gt;
&lt;p&gt;However, what about data in &lt;em&gt;motion,&lt;&#x2F;em&gt; data in event-driven use cases that is constantly changing and being transmitted? As event-driven applications and operational intelligence scenarios, such as real-time monitoring, situational intelligence, and fraud detection, continue to expand rapidly, graph data models and the primary query language used for them, Cypher, are proving much more versatile than SQL and the best tools for the task.&lt;&#x2F;p&gt;
&lt;p&gt;Consider the challenge of extracting insights from a complex event stream. The stream may have high volume and velocity, require the correlation of events by context from multiple sources, contain meaningful event patterns, and have a short timeframe to identify, detect, and take action. Addressing these challenges requires efficient data processing, scalable infrastructure, and effective event modeling techniques in graph solutions and Cypher.&lt;&#x2F;p&gt;
&lt;p&gt;Graph databases are useful for batch processing a portion of a complex event stream to provide macro-level insights and metrics to understand events but not take action. The same concepts (patterns and algorithms) used in graph databases when the event stream is at rest can be applied to an event stream while in motion using a streaming graph like Quine, often directly reusing the Cypher written in the database.&lt;&#x2F;p&gt;
&lt;p&gt;Here’s how Cypher addresses the challenges found in complex event streams:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Pattern Matching&lt;&#x2F;strong&gt;: Cypher excels at pattern matching, allowing you to detect patterns (sub-graphs) in the event stream. This is particularly useful for identifying sequences of events or detecting specific patterns, allowing you to efficiently filter and process relevant events based on their relationships and properties.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Event Correlation&lt;&#x2F;strong&gt;: You can define relationships between events and other entities, such as users, devices, or locations. This enables you to correlate events based on common attributes or shared relationships, often with high cardinality and a mix of categorical and numerical data, to identify patterns, anomalies, or complex dependencies.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Time-based Queries&lt;&#x2F;strong&gt;: Cypher provides temporal capabilities, allowing you to query and analyze events based on their timestamps or time intervals. You can filter events based on specific time ranges, compare temporal values, and perform time-based aggregations. This enables you to process time-dependent patterns, detect trends, and perform time window-based computations on the event stream.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Real-time Insights&lt;&#x2F;strong&gt;: You can continuously execute Cypher queries on an incoming event stream, allowing for dynamic analysis and near real-time decision-making. This enables you to monitor, detect patterns, and trigger actions based on the evolving stream of events.&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;h2 id=&quot;event-pattern-detection&quot;&gt;Event Pattern Detection&lt;&#x2F;h2&gt;
&lt;p&gt;Specifying a pattern (sub-graph) to MATCH can identify specific sequences of events or combinations of events of interest. For example, when observing the efficiency of cache nodes in a CDN network, Cypher can easily identify when a series (10) of cache misses occur and send an alert to the NOC to trigger an investigation.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;uploads-ssl.webflow.com&#x2F;61f0aecf55af2565526f6a95&#x2F;648c759b667959beb0f5c6e4_image5.png&quot; alt=&quot;&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;The Cypher required to detect a MISS event only needs to identify the node types and relationships as a pattern.&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;MATCH (server1:server)&amp;lt;-[:TARGETED]-(event1 {cache_class:&amp;quot;MISS&amp;quot;})-[:REQUESTED]-&amp;gt;(asset)&amp;lt;-[:REQUESTED]-(event2 {cache_class:&amp;quot;MISS&amp;quot;})-[:TARGETED]-&amp;gt;(server2:server)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;RETURN DISTINCT id(event1) AS event1&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Then, additional Cypher processes the event to take action, recording it as a metric or sending an alert if the metric constraint is exceeded. This technique is demonstrated in the &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;quine.io&#x2F;recipes&#x2F;cdn&#x2F;&quot;&gt;CDN Observability&lt;&#x2F;a&gt; recipe.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;graph-based-event-correlation&quot;&gt;Graph-Based Event Correlation&lt;&#x2F;h2&gt;
&lt;p&gt;Cypher can respond to changes in the event stream in real time, allowing organizations to reduce the risk associated with a condition&#x27;s duration before it is analyzed and addressed.&lt;&#x2F;p&gt;
&lt;p&gt;For example, the &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;quine.io&#x2F;recipes&#x2F;finance&#x2F;&quot;&gt;Financial Risk Calculation&lt;&#x2F;a&gt; recipe models market changes in real-time so that organizations can provide adequate coverage for risk exposure while ensuring their regulatory compliance minimally affects their asset allocation.&lt;&#x2F;p&gt;
&lt;p&gt;As basic patterns are matched, results are passed to business logic written in Cypher to generate an adjusted trading value, correlate (roll-up) trading events across the network, and trigger an alert when the trading system is out of compliance.&lt;&#x2F;p&gt;
&lt;p&gt;When a &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;quine.io&#x2F;components&#x2F;standing-queries&#x2F;#pattern-match-query&quot;&gt;pattern match query&lt;&#x2F;a&gt; detects an investment pattern, it triggers an &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;quine.io&#x2F;components&#x2F;standing-queries&#x2F;#pattern-match-query&quot;&gt;output query&lt;&#x2F;a&gt; to process the &lt;code&gt;StandingQueryResult&lt;&#x2F;code&gt;.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;uploads-ssl.webflow.com&#x2F;61f0aecf55af2565526f6a95&#x2F;648c7a0e7380e975b55c0c4b_image2.png&quot; alt=&quot;&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;For example, the result returned from an investment pattern in Cypher:&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;MATCH (investment:investment)&amp;lt;-[:HOLDS]-(desk:desk)&amp;lt;-[:HAS]-(institution:institution)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;RETURN DISTINCT id(investment) AS id&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;Triggers business logic in Cypher to generate a new property with a value based on the nodes investment.class property.&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;SET investment.adjustedValue = CASE&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    WHEN investment.class = &amp;quot;1&amp;quot; THEN investment.value&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    WHEN investment.class = &amp;quot;2a&amp;quot; THEN investment.value * .85&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    WHEN investment.class = &amp;quot;2b&amp;quot; AND investment.type = 9 THEN investment.value * .75&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    WHEN investment.class = &amp;quot;2b&amp;quot; AND investment.type = 10 THEN investment.value * .5&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;END&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;The investment events are then correlated through a roll-up function for each investment type.&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;UNWIND [[&amp;quot;1&amp;quot;,&amp;quot;adjustedValue1&amp;quot;], [&amp;quot;2a&amp;quot;,&amp;quot;adjustedValue2a&amp;quot;], [&amp;quot;2b&amp;quot;,&amp;quot;adjustedValue2b&amp;quot;]] AS stuff&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;WITH institution,investment,desk,stuff&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;WHERE investment.class = stuff[0]&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;CALL float.add(institution,stuff[1],investment.adjustedValue) YIELD result AS institutionAdjustedValueRollupByClass&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;CALL float.add(institution,&amp;quot;totalAdjustedValue&amp;quot;,investment.adjustedValue) YIELD result AS institutionAdjustedValueRollup&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;CALL float.add(desk,stuff[1],investment.adjustedValue) YIELD result AS deskAdjustedValueRollupByClass&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;CALL float.add(desk,&amp;quot;totalAdjustedValue&amp;quot;,investment.adjustedValue) YIELD result AS deskAdjustedValueRollup&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;SET institution.percentAdjustedValue2 = ((institution.adjustedValue2a + institution.adjustedValue2b)&#x2F;institution.totalAdjustedValue) * 100,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;institution.percentAdjustedValue2b = (institution.adjustedValue2b&#x2F;institution.totalAdjustedValue) * 100&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;&lt;h2 id=&quot;temporal-analysis&quot;&gt;Temporal Analysis&lt;&#x2F;h2&gt;
&lt;p&gt;With Cypher, you can express temporal conditions, such as events occurring within a specific time window, events happening before or after certain events, or events falling into a particular time range. This enables temporal analysis of event streams, including trend analysis, time-based aggregations, and windowed computations.&lt;&#x2F;p&gt;
&lt;p&gt;For example, the &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;quine.io&#x2F;recipes&#x2F;duration&#x2F;&quot;&gt;temporal locality&lt;&#x2F;a&gt; recipe looks for emails sent or received by cto@company.com within a four to six-minute sliding window.&lt;&#x2F;p&gt;
&lt;p&gt;The pattern query matches each individual &lt;code&gt;(sender)-[:SENT_MSG]-&amp;gt;(message)-[:RECEIVED_MSG]-&amp;gt;(receiver)&lt;&#x2F;code&gt; pattern containing the CTO’s email address.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;uploads-ssl.webflow.com&#x2F;61f0aecf55af2565526f6a95&#x2F;648c7af53fa58727d47b5583_image1.png&quot; alt=&quot;&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;MATCH (n)-[:SENT_MSG]-&amp;gt;(m)-[:RECEIVED_MSG]-&amp;gt;(r)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;WHERE n.email=&amp;quot;cto@company.com&amp;quot; OR r.email=&amp;quot;cto@company.com&amp;quot;&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;RETURN id(n) as ctoId, id(m) as ctoMsgId, m.time as mTime, id(r) as recId&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;And then calculates the duration between the emails to generate a sub-graph containing messages that went to or from the CTO within the time window.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;uploads-ssl.webflow.com&#x2F;61f0aecf55af2565526f6a95&#x2F;648c7b36769ab4d61458240a_image3.png&quot; alt=&quot;&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;MATCH (n)-[:SENT_MSG]-&amp;gt;(m)-[:RECEIVED_MSG]-&amp;gt;(r), (thisMsg)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;WHERE id(n) = $that.data.ctoId&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  AND id(r) = $that.data.recId&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  AND id(thisMsg) = $that.data.ctoMsgId&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  AND id(m) &amp;lt;&amp;gt; id(thisMsg)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  AND duration(&amp;quot;PT6M&amp;quot;) &amp;gt; duration.between(m.time,thisMsg.time) &amp;gt; duration(&amp;quot;P&amp;quot;)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;CREATE (m)-[:IN_WINDOW]-&amp;gt;(thisMsg)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;CREATE (m)&amp;lt;-[:IN_WINDOW]-(thisMsg)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;WITH n, m, r, &amp;quot;http:&#x2F;&#x2F;localhost:8080&#x2F;#MATCH&amp;quot; + text.urlencode(&amp;amp;amp;#039; (n)-[:SENT_MSG]-&amp;gt;(m)-[:RECEIVED_MSG]-&amp;gt;(r) WHERE strId(n)=&amp;quot;&amp;amp;amp;#039; + strId(n) + &amp;amp;amp;#039;&amp;quot;AND strId(r)=&amp;quot;&amp;amp;amp;#039; + strId(r) + &amp;amp;amp;#039;&amp;quot; AND  strId(m)=&amp;quot;&amp;amp;amp;#039; + strId(m) + &amp;amp;amp;#039;&amp;quot; RETURN n, r, m&amp;amp;amp;#039;) a&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;RETURN URL&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;&lt;h2 id=&quot;conclusion&quot;&gt;Conclusion&lt;&#x2F;h2&gt;
&lt;p&gt;Cypher is a powerful and expressive query language well-suited for processing complex event streams. Quine streaming graph enables Cypher developers to leverage graph techniques early when processing a complex event stream to aggregate and shape events, detect patterns for alerting and early feedback, and perform event normalization before entering the data warehouse.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;learn-more-and-try-quine&quot;&gt;Learn more and Try Quine&lt;&#x2F;h2&gt;
&lt;p&gt;If you want to try Quine using your own data, here are some resources to help:&lt;&#x2F;p&gt;
&lt;ol&gt;
&lt;li&gt;Learn more about Quine  by visiting the &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;quine.io&#x2F;&quot;&gt;Quine open source project&lt;&#x2F;a&gt;.&lt;&#x2F;li&gt;
&lt;li&gt;Download Quine - &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;quine.io&#x2F;download&quot;&gt;JAR file&lt;&#x2F;a&gt; | &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;hub.docker.com&#x2F;r&#x2F;thatdot&#x2F;quine&quot;&gt;Docker Image&lt;&#x2F;a&gt; | &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;thatdot&#x2F;quine&quot;&gt;Github&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;li&gt;Check out the &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;quine.io&#x2F;recipes&#x2F;finance&#x2F;&quot;&gt;Financial Risk Calculation recipe&lt;&#x2F;a&gt; to see how Cypher is used for real-time rollups.&lt;&#x2F;li&gt;
&lt;li&gt;Check out demos and other videos at our &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;www.youtube.com&#x2F;@thatdot&quot;&gt;YouTube channel&lt;&#x2F;a&gt;.&lt;&#x2F;li&gt;
&lt;&#x2F;ol&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>thatDot CEO Explains Streaming Graph to Cybersecurity Thought Leader</title>
        <published>2024-07-02T00:00:00+00:00</published>
        <updated>2024-07-02T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              Unknown
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://www.thatdot.com/blog/thatdot-streaming-graph-to-cybersecurity-thought-leader/"/>
        <id>https://www.thatdot.com/blog/thatdot-streaming-graph-to-cybersecurity-thought-leader/</id>
        
        <content type="html" xml:base="https://www.thatdot.com/blog/thatdot-streaming-graph-to-cybersecurity-thought-leader/">&lt;p&gt;&lt;strong&gt;Briefing Room on demand webinar on thatDot Youtube channel: &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=1SuvjfE0OcU&quot;&gt;The Unreasonable Effectiveness of Streaming Graph&lt;&#x2F;a&gt;&lt;&#x2F;strong&gt;&lt;&#x2F;p&gt;
&lt;p&gt;thatDot founder and CEO Ryan Wright discussed the power of thatDot Streaming Graph and Novelty to detect the most well-hidden threats with the Bloor Group&#x27;s Eric Kavenagh and Mark Lynd, who was ranked #1 global thought leader in cybersecurity by Thinkers360. With high-profile data breaches hitting the headlines every other day now, the way we&#x27;re doing this is clearly a losing battle. Low and slow attacks like advanced persistent threats hiding in mountains of data are stealing whatever they want and many cyber professionals are just throwing up their hands and admitting defeat. DARPA funded thatDot technology development specifically to turn the tables on those threats.&lt;&#x2F;p&gt;
&lt;p&gt;This webinar provides what you need to know to change the game to one where the attacker must be perfect to have a chance. Just one step out of line will get them caught. To quote Mark Lynd, &quot;This is the holy grail.&quot; and &quot;It takes us from reactive to proactive cybersecurity.&quot;&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;&#x2F;img&#x2F;2024&#x2F;07&#x2F;Briefing-Room-Cybersecurity-Mark-Lynd-Ryan-Wright2.png&quot; alt=&quot;&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;Traditional graph data models offer depth but lack the immediacy required to outpace cybercriminals or the scale and processing speed needed to keep up with massive flows of information cyber professionals need to evaluate. With insightful questions from Mark to guide him, Ryan really goes deep on the power of this technology in the cybersecurity space. He provides some potent demonstrations of points like:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;The power of graph for relationship analytics.&lt;&#x2F;li&gt;
&lt;li&gt;Scaling and speed on direct graph analysis of categorical data providing real time threat detection.&lt;&#x2F;li&gt;
&lt;li&gt;Moving left so that cybersecurity analysis is done on data pipelines in real time.&lt;&#x2F;li&gt;
&lt;li&gt;Reducing false positives with context awareness for anomaly detection.&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;This powerful tech is useful for many things, from &lt;a href=&quot;https:&#x2F;&#x2F;www.thatdot.com&#x2F;use-cases&#x2F;stateful-digital-twin&#x2F;&quot;&gt;digital twins&lt;&#x2F;a&gt; to &lt;a href=&quot;https:&#x2F;&#x2F;www.thatdot.com&#x2F;use-cases&#x2F;financial-fraud-detection&#x2F;&quot;&gt;fraud detection&lt;&#x2F;a&gt;, but is particularly powerful in the threat detection and anomaly detection space for cybersecurity. Watch this exceptional video on the &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;www.youtube.com&#x2F;@thatdot&quot;&gt;thatDot Youtube channel&lt;&#x2F;a&gt;.&lt;&#x2F;p&gt;
&lt;p&gt;Learn for yourself how to bring graph-driven reasoning into the real-time nature of event-driven processing in the cybersecurity stream.&lt;&#x2F;p&gt;
&lt;p&gt;Watch &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=1SuvjfE0OcU&quot;&gt;The Unreasonable Effectiveness of Streaming Graph&lt;&#x2F;a&gt;.&lt;&#x2F;p&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>Streaming Graph Processing on Categorical Data Enables Real-time Risk Calculation</title>
        <published>2024-07-01T00:00:00+00:00</published>
        <updated>2024-07-01T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              Unknown
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://www.thatdot.com/events/streaming-graph-processing-on-categorical-data-enables-real-time-risk-calculation/"/>
        <id>https://www.thatdot.com/events/streaming-graph-processing-on-categorical-data-enables-real-time-risk-calculation/</id>
        
        <content type="html" xml:base="https://www.thatdot.com/events/streaming-graph-processing-on-categorical-data-enables-real-time-risk-calculation/">&lt;p&gt;The failure of Silicon Valley Bank in 2023 exemplifies the severe consequences of not accurately assessing risk in a timely manner. Although nearly every financial institution prioritizes risk minimization, their methods for calculating risk often rely on detailed analysis of categorical data and relationships. Most existing algorithms, however, only handle static, numeric data. This requires transforming the data, typically through methods like one-hot encoding, into numerical formats that are bulky, sparse, and slow to process. After analysis, the data often needs to be converted back to its original categories, adding to the inefficiency. Current state-of-the-art solutions take hours to deliver insights.&lt;&#x2F;p&gt;
&lt;p&gt;If we could perform this analysis earlier in the process, using the original categorical data as it streams in without modification, we could reduce the mean time to insight to seconds, potentially saving financial institutions significant amounts of money. This approach could also enable new capabilities, such as using graph NLP on streaming data to identify novel behaviors and detect anomalies like cyber-attacks before they impact systems. The combination of fast, in-line data processing engines like Flink or KsqlDB with graph algorithms and categorical analysis is exceptionally powerful. Join us to learn about a new open-source streaming intelligence system that revolutionizes risk analysis and other fast categorical data processing.&lt;&#x2F;p&gt;
&lt;p&gt;Event Details:&lt;&#x2F;p&gt;
&lt;p&gt;Title: Streaming Graph Processing on Categorical Data Enables Real-time Risk Calculation&lt;&#x2F;p&gt;
&lt;p&gt;Date: July 12, 2024&lt;&#x2F;p&gt;
&lt;p&gt;Time: 10:45am - 11:20am ET&lt;&#x2F;p&gt;
&lt;p&gt;&lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;www.dataconnectconf.com&#x2F;dataconnect&#x2F;register&quot;&gt;Register for the Event&lt;&#x2F;a&gt;&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;&#x2F;img&#x2F;2024&#x2F;07&#x2F;2024-DCC-Speaker-Cards-wTalk-Titles-53-1.png&quot; alt=&quot;thatDot&amp;#39;s Paige Roberts speaking at Data Connect on July 12, 2024&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;h2 id=&quot;why-you-should-attend&quot;&gt;Why You Should Attend&lt;&#x2F;h2&gt;
&lt;p&gt;Attending Paige&#x27;s speech at Data Connect 2024 is a must for anyone serious about staying at the forefront of data science and risk management. Paige will unveil groundbreaking techniques for real-time risk analysis, demonstrating how to cut mean time to insight from hours to seconds. This shift can save financial institutions substantial costs and enhance their ability to detect anomalies, including cyber-attacks, before they cause harm. Paige will explore the synergy of in-line data processing engines like Flink or KsqlDB with advanced graph algorithms and categorical analysis. By attending, you&#x27;ll gain invaluable insights into innovative data processing methods that can revolutionize your organization&#x27;s approach to risk and data management. Don&#x27;t miss this opportunity to learn from a leading expert and enhance your strategic capabilities in the evolving data landscape.&lt;&#x2F;p&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>Akka to Pekko Migration for thatDot and Quine</title>
        <published>2024-06-20T00:00:00+00:00</published>
        <updated>2024-06-20T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              Unknown
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://www.thatdot.com/news/akka-to-pekko-migration-for-thatdot-and-quine/"/>
        <id>https://www.thatdot.com/news/akka-to-pekko-migration-for-thatdot-and-quine/</id>
        
        <content type="html" xml:base="https://www.thatdot.com/news/akka-to-pekko-migration-for-thatdot-and-quine/">&lt;p&gt;You don’t know what you’ve got till it’s gone. Musicians have sung this lament about relationships and the beauty of nature. It turns out to be true about open source software licenses as well.&lt;&#x2F;p&gt;
&lt;p&gt;On September 7, 2022, Lightbend announced that they were changing the license for Akka from the open source Apache License 2.0 to the commercial Business Source License 1.1. This had major implications for Akka users. Operators of closed source services built using Akka were faced with a primarily financial dilemma about the cost of licensing compared to the cost of re-implementation.&lt;&#x2F;p&gt;
&lt;p&gt;Authors of open source software depending on Akka had to re-evaluate their ability to remain open source themselves. At &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;thatdot.com&#x2F;&quot;&gt;thatDot&lt;&#x2F;a&gt;, we found ourselves facing both of these challenges.&lt;&#x2F;p&gt;
&lt;p&gt;thatDot publishes a streaming graph, &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;quine.io&#x2F;&quot;&gt;Quine&lt;&#x2F;a&gt;, under an open source license. We also host SAAS services like Novelty for AWS that are closed source products built on top of Quine. To continue using new versions of Akka, we would have to re-evaluate Quine’s licensing model, and incur the cost of purchasing licenses from Lightbend for our SAAS services.&lt;&#x2F;p&gt;
&lt;p&gt;Our immediate solution, like that adopted by many others, was to simply continue using the last version of Akka available under an open source license, version 2.6. This was a time-limited workaround though, since version 2.6 would eventually stop receiving security fixes.&lt;&#x2F;p&gt;
&lt;p&gt;It also prevented us from using libraries that themselves required later versions of Akka for their own security fixes, or additional functionality. We needed an open source alternative.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;what-we-did&quot;&gt;&lt;strong&gt;What we did&lt;&#x2F;strong&gt;&lt;&#x2F;h2&gt;
&lt;p&gt;Pekko is an open source fork of Akka hosted by the Apache Software Foundation. It provided us with a path forward that kept a core component connected to an active community without requiring extensive re-writing of our own code. It also gained support from important connector libraries built on top of Akka that released Pekko backed versions.&lt;&#x2F;p&gt;
&lt;p&gt;Our migration required two main activities. The first was the modification of our own code that used Akka directly. The second was the replacement of all dependencies with the Pekko versions. The latter proved to be the more difficult one.&lt;&#x2F;p&gt;
&lt;p&gt;Modifying our direct dependency on Akka was refreshingly straightforward. We had to replace all imports of akka packages with imports of org.apache.pekko packages, and the akka section of our config files with a pekko section. The bulk of this was accomplished with search and replace using regular expressions.&lt;&#x2F;p&gt;
&lt;p&gt;The remaining pieces were found using simple (case-insensitive) searches for “akka”, and manually reviewing and editing the code or comments. For example, comments describing use of an Akka feature were modified, while those referring to discussions in Akka community forums to justify a decision were left unchanged.&lt;&#x2F;p&gt;
&lt;p&gt;While this was slightly tedious, it wasn’t hard to work through.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;an-unexpected-challenge&quot;&gt;&lt;strong&gt;An unexpected challenge&lt;&#x2F;strong&gt;&lt;&#x2F;h2&gt;
&lt;p&gt;The real challenge was replacing libraries to remove all indirect dependencies on Akka. Replacing dependencies also required us to unwind the delicate set of indirect dependencies we had pinned to work around vulnerabilities.&lt;&#x2F;p&gt;
&lt;p&gt;Migrating dependencies from Akka to Pekko can be done in 3 ways:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;Swapping in a drop-in replacement&lt;&#x2F;li&gt;
&lt;li&gt;Forking the library and replacing its usage of Akka with Pekko&lt;&#x2F;li&gt;
&lt;li&gt;Re-implementing the feature, possibly on a similar library with a Pekko version&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;In most cases, the community had Pekko equivalents that just worked after changing our build definition and import statements. In others, a Pekko version was not available, so we needed to use an alternative. These required us to make non-trivial changes to our code to re-implement the functionality.&lt;&#x2F;p&gt;
&lt;p&gt;The community adoption of Pekko made our migration feasible. We only had to drop two libraries that didn’t have Pekko versions, and only lost one feature, Pulsar support. The Pulsar library we were using, pulsar4s has since added a Pekko version.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;benefits&quot;&gt;&lt;strong&gt;Benefits&lt;&#x2F;strong&gt;&lt;&#x2F;h2&gt;
&lt;p&gt;Migrating to Pekko:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;Allowed us to continue offering Quine with the same license.&lt;&#x2F;li&gt;
&lt;li&gt;Reduced the maintenance burden of overriding and testing indirect dependencies to avoid security problems.&lt;&#x2F;li&gt;
&lt;li&gt;Avoided extra cost to running our SAAS products.&lt;&#x2F;li&gt;
&lt;li&gt;Opened up our ability to continue leveraging new libraries and releases from the community.&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;‍&lt;&#x2F;p&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>Microservice Hell: The State of the Art in Streaming Services</title>
        <published>2024-06-19T00:00:00+00:00</published>
        <updated>2024-06-19T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              Unknown
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://www.thatdot.com/blog/microservice-hell-the-state-of-the-art-in-streaming-services/"/>
        <id>https://www.thatdot.com/blog/microservice-hell-the-state-of-the-art-in-streaming-services/</id>
        
        <content type="html" xml:base="https://www.thatdot.com/blog/microservice-hell-the-state-of-the-art-in-streaming-services/">&lt;h2 id=&quot;the-state-of-the-art&quot;&gt;The State of the Art&lt;&#x2F;h2&gt;
&lt;p&gt;Data lives in many different places. Some of this could live in Apache Kafka for example, while other bits of important related data could be sourced from server-sent events on a server somewhere. Maybe even some of your data lives in a text file that you need to stream in from.&lt;&#x2F;p&gt;
&lt;p&gt;Let’s quickly emulate the state of the art and see what it’s like to retrieve some data from Kafka.&lt;&#x2F;p&gt;
&lt;p&gt;Here is a docker-compose.yaml file that we can use to stand up Kafka:&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;version: &amp;#39;2&amp;#39;&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;services:&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  zookeeper:&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    image: confluentinc&#x2F;cp-zookeeper:latest&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    environment:&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;      ZOOKEEPER_CLIENT_PORT: 2181&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;      ZOOKEEPER_TICK_TIME: 2000&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    ports:&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;      - &amp;quot;2181:2181&amp;quot;&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  kafka:&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    image: confluentinc&#x2F;cp-kafka:latest&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    ports:&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;      - &amp;quot;9092:9092&amp;quot;&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    environment:&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;      KAFKA_BROKER_ID: 1&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;      KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;      KAFKA_ADVERTISED_LISTENERS: PLAINTEXT:&#x2F;&#x2F;localhost:9092&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;      KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    depends_on:&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;      - zookeeper&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Running &lt;code&gt;docker-compose up -d&lt;&#x2F;code&gt; to stand up Kafka, we can now test and interact with it using a few bash commands. Let’s publish some data to our Kafka.&lt;&#x2F;p&gt;
&lt;p&gt;First, let’s create a topic:&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;docker exec -it kafka-kafka-1 kafka-topics \&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;--create \&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;--bootstrap-server localhost:9092 \&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;--replication-factor 1 \&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;--partitions 1 \&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;--topic my-family&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;This command creates the &lt;code&gt;my-family&lt;&#x2F;code&gt; topic. Let’s fill it in with some members of my family as an example.&lt;&#x2F;p&gt;
&lt;p&gt;We can use the following bash command to begin publishing to the topic:&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;docker exec -it kafka-kafka-1 kafka-console-producer \&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;--topic my-family \&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;--bootstrap-server localhost:9092&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;And we can use this command to subscribe to the topic, and watch as data streams into Kafka:&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;docker exec -it kafka-kafka-1 kafka-console-consumer \&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;--topic my-family \&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;--from-beginning \&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;--bootstrap-server localhost:9092&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Here’s what it looks like after inputting 7 different members of my family:&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;uploads-ssl.webflow.com&#x2F;61f0aecf55af2565526f6a95&#x2F;660f173dbea3b027ec5f11d2_dPB4UVrK9Upf0-fpXLjrECU_rLvyV_xHrbfpPIalRS2O6_AkVMJCNnMN-didhGwmd9be2r_jxznqf5Vq0DRvkzTKo72Lfio6MYXcHADahfFMib8FNMSH9mRX5jwuV3JtPZfemDVmXMP4xnAae0Cw24Q.png&quot; alt=&quot;&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;‍&lt;&#x2F;p&gt;
&lt;p&gt;In the above example, I submitted 7 strings of data, and I can see each being emitted from Kafka in my subscriber terminal.&lt;&#x2F;p&gt;
&lt;p&gt;In order to work with this data though, we should work at a higher level of abstraction than the command line interface. We can’t transform the data very easily here. To build a data pipeline, we’ll need to harness a programming language so we can work this logic in.&lt;&#x2F;p&gt;
&lt;p&gt;Let’s create a Scala service that prints the strings coming in from our &lt;code&gt;my-family&lt;&#x2F;code&gt; topic, similar to what the &lt;code&gt;kafka-console-consumer&lt;&#x2F;code&gt; was doing.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;uploads-ssl.webflow.com&#x2F;61f0aecf55af2565526f6a95&#x2F;660f17644a23394a0761d767_JYh62ew5W7d5TK2bnC-BgI-m4D0BgOd5XqV1asG2uiR5TQVU_gcfiTxMyr5_Qr-dgl3iNYZvPGC6E8z9PFRj7xnTUTOyi1H04Blqs49ZAZpeb7pkLyqFlfuhvgRYpJxf9XsgYnyHcgLg2sJSbaOG8cs.png&quot; alt=&quot;&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;import scala.concurrent.ExecutionContext&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;import org.apache.pekko.actor.ActorSystem&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;import org.apache.pekko.kafka.scaladsl.Consumer&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;import org.apache.pekko.kafka.{ConsumerSettings, Subscriptions}&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;import org.apache.pekko.stream.scaladsl.Sink&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;import org.apache.kafka.clients.consumer.ConsumerConfig&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;import org.apache.kafka.common.serialization.StringDeserializer&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;import org.apache.kafka.clients.consumer.ConsumerRecord&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;import org.apache.pekko.Done&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;import org.apache.pekko.stream.scaladsl.Source&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;import scala.concurrent.Future&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;object KafkaTestMain extends App {&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  implicit val actorSystem: ActorSystem = ActorSystem()&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  implicit val ec: ExecutionContext = actorSystem.dispatcher&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  val bootstrapServers = &amp;quot;localhost:9092&amp;quot;&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  val topic = &amp;quot;my-family&amp;quot;&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  val consumerSettings: ConsumerSettings[String, String] =&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    ConsumerSettings(actorSystem, new StringDeserializer, new StringDeserializer)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;      .withBootstrapServers(bootstrapServers)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;      .withGroupId(&amp;quot;group1&amp;quot;)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;      .withProperty(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, &amp;quot;earliest&amp;quot;)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  val source: Source[ConsumerRecord[String, String], Consumer.Control] =&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    Consumer.plainSource(consumerSettings, Subscriptions.topics(topic))&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  val done: Future[Done] = source&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    .map(record =&amp;gt; record.value())&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    .runWith(Sink.foreach(println))&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  done.onComplete { case _ =&amp;gt;&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    actorSystem.terminate()&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  }&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;}&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;This small Scala application subscribes to our Kafka &lt;code&gt;my-family&lt;&#x2F;code&gt; topic and just prints the strings emitted.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;uploads-ssl.webflow.com&#x2F;61f0aecf55af2565526f6a95&#x2F;660f178f5b4cb5e7a5c36925_jk2KrCX2ppGipMmkVwu85VzMUrQ1OV0wc72-K43IlA0YKNaJSNlBlH1IOXdBySj6oKsBSeLCuKIdb84rSEyDLEA6-jg_QejyxXIYL5EwWfgp2xivYNmlkY6Zu5TW3JZFmNngvdPUV7OESnsZeR4Nv2o.png&quot; alt=&quot;&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;This small example shows that we can create bespoke software that subscribes to data streams, and works with the emitted data. But like I mentioned at the beginning of this post, our data can live in many different locations. If my use-case demanded it, I would need to write up more logic to stream in data from &lt;em&gt;&lt;strong&gt;many&lt;&#x2F;strong&gt;&lt;&#x2F;em&gt; different sources. The above example is the initial step. But many more steps are required to make even data from this single source ready for production.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;scalability-resilience-maintainability&quot;&gt;Scalability, Resilience, Maintainability&lt;&#x2F;h2&gt;
&lt;p&gt;Because we are streaming data in, and not batch processing our data, our service must be up 24&#x2F;7. This requirement means the following:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;The microservice must be able to scale with the amount of data being ingested, tackling challenges like service discovery, network latency, and load balancing.&lt;&#x2F;li&gt;
&lt;li&gt;It must be resilient, capable of handling network outages, able to self-heal if a fault occurs, and able to restore from timely backups.&lt;&#x2F;li&gt;
&lt;li&gt;It must be able to reign in complexity, and allow developers to add new features based on shareholder needs, with a careful eye on dependency management and rich documentation.&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;Applying all of this to our example of streaming data in from a single Kafka topic would require performing best-practices when implementing event-driven architecture patterns. Think techniques like &lt;em&gt;event-sourcing&lt;&#x2F;em&gt;, the &lt;em&gt;saga&lt;&#x2F;em&gt; pattern, and &lt;em&gt;CQRS&lt;&#x2F;em&gt; (Command Query Response Segregation).&lt;&#x2F;p&gt;
&lt;p&gt;Applying proven patterns to our service will result in a scalable, resilient, and maintainable piece of software, but doing this repeatedly for every additional source of data is repetitive, and prone to errors.&lt;&#x2F;p&gt;
&lt;p&gt;What we need is a new state of the art, allowing us to ingest data from event sources in a scalable, resilient, and easily maintainable way.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;enter-thatdot-streaming-graph-the-newstate-of-the-art&quot;&gt;Enter thatDot Streaming Graph, the NewState of the Art&lt;&#x2F;h2&gt;
&lt;p&gt;Let’s do the exact same thing as above, but using thatDot Streaming Graph, the world’s first streaming graph data processor. We’ll point Streaming Graph at our Kafka, ingest data from our &lt;code&gt;my-family&lt;&#x2F;code&gt; topic, and view our results. This time though, we won’t need to write any bespoke programs to handle how to ingest data.&lt;&#x2F;p&gt;
&lt;p&gt;First, grab your own copy of Quine (the open source version ofthatDot Streaming Graph) by clicking &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;quine.io&#x2F;download&#x2F;&quot;&gt;here&lt;&#x2F;a&gt; and downloading the JAR file. As of this blog post, I’m downloading v1.6.2 of Quine.&lt;&#x2F;p&gt;
&lt;p&gt;To start it run the following command:&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;java -Dquine.store.type=in-memory -jar Downloads&#x2F;quine-1.6.2.jar&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;This will start Quine with an in-memory persistor, meaning it will not save any data on disk. This is great for testing, since it doesn’t leave anything on disk that has to be cleaned up later. Navigate to &lt;strong&gt;&lt;code&gt;http:&#x2F;&#x2F;127.0.0.1:8080&#x2F;&lt;&#x2F;code&gt;&lt;&#x2F;strong&gt; and you’ll see the Quine Exploration UI.&lt;&#x2F;p&gt;
&lt;p&gt;‍&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;uploads-ssl.webflow.com&#x2F;61f0aecf55af2565526f6a95&#x2F;660f17c549640cbc605031d2_HNmkCD0BAoTvYjziEkJ0VpivBlMVOyJ5ZtaZJBL385blgqQ7jWjFHAxju8r-3RIQvvIP3DkV7csI9fUb2ctmFg5MrrpInsSHz8FnnZUqZw7cmEWEnDBdT4YZt3-14VCyZJlfrzRieas7o6UnJqwRsPk.png&quot; alt=&quot;&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;‍&lt;&#x2F;p&gt;
&lt;p&gt;The docs are interactive, so you can enter the code directly on the page that explains how it works.Quine can ingest data from multiple different sources, simultaneously. Just like in the previous example, let’s create an Ingest Stream, populating our graph with data from Kafka&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;uploads-ssl.webflow.com&#x2F;61f0aecf55af2565526f6a95&#x2F;660f17dc818a9fb9f01db445_2RZY85A8Ffhvjj_yRxuPVtOurApHMjpQAQy1-qCe8zWWw4Kw9zZVBF6UWlj4Jx_EaRebtIiicMVqlcipyee_LY7yFRmUKyEHpRncr8cr6JBnmWxQ7XYMV5EWX0A1B-4tzZuMTvgqQnJ0ai5Gnc--tPU.png&quot; alt=&quot;&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;‍&lt;&#x2F;p&gt;
&lt;p&gt;We’ll use the following JSON to create an ingest stream from the &lt;code&gt;my-family&lt;&#x2F;code&gt; topic.&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;{&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  &amp;quot;type&amp;quot;: &amp;quot;KafkaIngest&amp;quot;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  &amp;quot;format&amp;quot;: {&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    &amp;quot;type&amp;quot;: &amp;quot;CypherRaw&amp;quot;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    &amp;quot;query&amp;quot;: &amp;quot;WITH text.utf8Decode($that) as name MATCH (n) WHERE id(n) = idFrom(name) SET n:Person, n.name=name&amp;quot;&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  },&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  &amp;quot;topics&amp;quot;: [&amp;quot;my-family&amp;quot;],&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  &amp;quot;autoOffsetReset&amp;quot;: &amp;quot;earliest&amp;quot;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  &amp;quot;bootstrapServers&amp;quot;: &amp;quot;localhost:9092&amp;quot;&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;}&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;uploads-ssl.webflow.com&#x2F;61f0aecf55af2565526f6a95&#x2F;660f17f013206f60f1598788_dtTNsI9WNzRWoJCol6RCLk4DUo3KJbeeDlcv8cR7vKArc3tBKQ4sqEdaMnBZwRC0Ohb1J8IvlLIvsT6EMgrsNuhdXEWZoUYEBjOlug0V2YPcj8Ckj4mmOsW8W0vKDaTj_7w2_smtxollPUZV-I6f_tE.png&quot; alt=&quot;&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;‍&lt;&#x2F;p&gt;
&lt;p&gt;What does this JSON code do? First, we instruct Quine to create an Ingest Stream of type “KafkaIngest” Subscribe to the &lt;code&gt;my-family&lt;&#x2F;code&gt; topic, and use the &lt;code&gt;autoOffsetRest&lt;&#x2F;code&gt; option of &lt;code&gt;earliest&lt;&#x2F;code&gt; to begin reading data from the beginning of the topic.  Use a Cypher query to receive the raw bytes of data coming in from kafka, casting those bytes as UTF8 strings.Then create person nodes on the graph for each name in the incoming data stream.&lt;&#x2F;p&gt;
&lt;p&gt;Pasting in that JSON, giving the Ingest Stream a name (in this case &lt;code&gt;family-ingest-stream&lt;&#x2F;code&gt;), and then clicking &lt;strong&gt;Send API Request&lt;&#x2F;strong&gt;, should result in a 200 return code, meaning we successfully created an ingest stream.&lt;&#x2F;p&gt;
&lt;p&gt;If we list the Ingest Streams, we can see our named ingest, along with the &lt;code&gt;ingestedCount&lt;&#x2F;code&gt; of 7 records.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;uploads-ssl.webflow.com&#x2F;61f0aecf55af2565526f6a95&#x2F;660f1805014b2b49eb05bb3b_p-ObjD0UC1mCf_60sqE_NbqgwZOOCCH43u246Cpl-brdsOhLgqJGnA8ydicmu_pcAe1tmkPw0nD-1cpV6NR4Sa1XZ--3aSOEcXq3er973Oz3P6akOpTxQuD4-YZRsqqd13Obwm_xbZ9D8164EsQ9n5Q.png&quot; alt=&quot;&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;Returning to the Exploration UI, enter the basic &lt;code&gt;MATCH (n) RETURN n&lt;&#x2F;code&gt; cypher query to see the results of our ingest.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;uploads-ssl.webflow.com&#x2F;61f0aecf55af2565526f6a95&#x2F;660f1824e964743d7d71afe3_5pvT5Pz5-aA1QyLyq-wGfmtZhMbB6kdbCoEYx83r0XnaT1TXz_jPQbQQ1o1UmqZn-WhRiOQTTk_2dcHoDWJ0mtuXjan4NdPDZqEI34-2_jSun-nDdT3kY4AGFYhaZWHViO_YC0EL5V6FN_Dv9L8cWfo.png&quot; alt=&quot;&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;‍&lt;&#x2F;p&gt;
&lt;p&gt;We didn’t have to write up a custom microservice to load in data from this source. We just pointed Quine at our source, gave it a few parameters, and told it to start the ingest, transforming our data using Cypher.&lt;&#x2F;p&gt;
&lt;p&gt;If we wanted to consume data from a file, we would  similarly instruct Quine to ingest data from a file, and pass in a Cypher query to transform the data.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;enhancing-the-robustness-of-the-streaming-graph-in-production-environments&quot;&gt;Enhancing the Robustness of the Streaming Graph in Production Environments&lt;&#x2F;h2&gt;
&lt;p&gt;Much like our custom program designed for microservices, for production, it&#x27;s imperative to focus on scalability, resilience, and maintainability.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;optimizing-for-scalability&quot;&gt;Optimizing for Scalability&lt;&#x2F;h3&gt;
&lt;p&gt;thatDot Streaming Graph, the commercial version of Quine, is engineered for horizontal expansion.&lt;&#x2F;p&gt;
&lt;p&gt;Another pivotal feature is its implementation of backpressure, which dynamically regulates data processing speeds in alignment with the consuming service&#x27;s capacity. This ensures Streaming Graph  avoids overwhelming downstream services, no matter how high the system scales. No need for us to do anything to make that happen.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;ensuring-resilience&quot;&gt;Ensuring Resilience&lt;&#x2F;h3&gt;
&lt;p&gt;Streaming Graph&#x27;s resilience is significantly bolstered by the backpressure mechanisms, safeguarding against data loss during unexpected downtimes by adjusting the data flow based on the consumer&#x27;s current state. If the consumer can’t consume for a while, Streaming Graph pauses until it is ready.&lt;&#x2F;p&gt;
&lt;p&gt;Streaming Graph is also designed to be self-healing. In the event of a cluster member failure, the system automatically delegates a hot-spare to jump in and assume the role of the downed member, ensuring uninterrupted data processing. This resilience is further enhanced by leveraging durable data storage solutions such as Cassandra or ClickHouse.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;maximizing-maintainability&quot;&gt;Maximizing Maintainability&lt;&#x2F;h3&gt;
&lt;p&gt;thatDot Streaming Graph has the ability to ingest data from multiple sources, simultaneously, removing the need to create multiple bespoke services. You can ingest data from:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;Apache Kafka&lt;&#x2F;li&gt;
&lt;li&gt;AWS Kinesis&lt;&#x2F;li&gt;
&lt;li&gt;S3&lt;&#x2F;li&gt;
&lt;li&gt;Server Sent Events (SSE)&lt;&#x2F;li&gt;
&lt;li&gt;Websockets&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;quine.io&#x2F;components&#x2F;ingest-sources&#x2F;&quot;&gt;And more&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;‍&lt;&#x2F;p&gt;
&lt;p&gt;This also provides extra functionality, since you can resolve duplicates, and find relationships between multiple data streams in real time.&lt;&#x2F;p&gt;
&lt;p&gt;Streaming Graph and Quine use the Cypher query language to transform all data sources, allowing a consistent experience when ingesting data sourced from different locations.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;elevating-your-data-architecture-discover-the-power-of-thatdot-streaming-graph&quot;&gt;Elevating Your Data Architecture: Discover the Power of thatDot Streaming Graph&lt;&#x2F;h2&gt;
&lt;p&gt;In software development, when complexity starts getting too high, refactoring to use a higher-level abstraction can be a winning strategy. The current state of the art with microservices and data pipelines is incredibly complex, demanding a higher level of abstraction. thatDot Streaming Graph &lt;em&gt;&lt;strong&gt;is&lt;&#x2F;strong&gt;&lt;&#x2F;em&gt; that higher level of abstraction. It can handle the problems of scalability and resilience automatically, so developers can focus on the logic that matters, transforming high-volume data into high-value insight.&lt;&#x2F;p&gt;
&lt;p&gt;Click &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;quine.io&#x2F;download&#x2F;&quot;&gt;here&lt;&#x2F;a&gt; to download your own open source copy of Quine, and click &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;that.re&#x2F;chat&quot;&gt;here&lt;&#x2F;a&gt; to jump into our Discord to join other like-minded developers looking to solve the challenges of high-volume data pipelines at scale.&lt;&#x2F;p&gt;
&lt;p&gt;‍&lt;&#x2F;p&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>Novelty Demo</title>
        <published>2024-06-19T00:00:00+00:00</published>
        <updated>2024-06-19T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              Unknown
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://www.thatdot.com/how-to/novelty-detector-demo/"/>
        <id>https://www.thatdot.com/how-to/novelty-detector-demo/</id>
        
        <content type="html" xml:base="https://www.thatdot.com/how-to/novelty-detector-demo/">&lt;h2 id=&quot;novelty-tutorial&quot;&gt;Novelty Tutorial&lt;&#x2F;h2&gt;
&lt;p&gt;&lt;div class=&quot;video-embed&quot;&gt;
  &lt;iframe src=&quot;https:&#x2F;&#x2F;www.youtube-nocookie.com&#x2F;embed&#x2F;JuvAjtTmLa8&quot; title=&quot;YouTube video&quot;
    frameborder=&quot;0&quot; loading=&quot;lazy&quot;
    allow=&quot;accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture&quot;
    allowfullscreen&gt;&lt;&#x2F;iframe&gt;
&lt;&#x2F;div&gt;

This &lt;strong&gt;12 min&lt;&#x2F;strong&gt; video demonstration walks through a Jupyter notebook powered scenario illustrating how to use thatDot Novelty to analyze CDN logs for anomalous activity.&lt;&#x2F;p&gt;
&lt;p&gt;Click &lt;strong&gt;&lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;that.re&#x2F;insider-threat&quot;&gt;here&lt;&#x2F;a&gt;&lt;&#x2F;strong&gt; to download the CDN dataset for this example.&lt;&#x2F;p&gt;
&lt;p&gt;‍&lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;drive.google.com&#x2F;file&#x2F;d&#x2F;1dLeoBAGPgxxTu-K40C6j5UtVpoaPl1QH&#x2F;view?usp=sharing&quot;&gt;Download the Jupyter notebook&lt;&#x2F;a&gt; and try the demo yourself with an &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;docs.thatdot.com&#x2F;novelty&#x2F;using-novelty&#x2F;aws-quick-start&#x2F;aws-quickstart.html&quot;&gt;AWS instance&lt;&#x2F;a&gt; of thatDot Novelty.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;demo-summary&quot;&gt;Demo Summary&lt;&#x2F;h2&gt;
&lt;h3 id=&quot;novelty-score-endpoints&quot;&gt;Novelty Score Endpoints&lt;&#x2F;h3&gt;
&lt;p&gt;The demo interacts with thatDot Novelty through its interactive REST API. You can &lt;em&gt;stream&lt;&#x2F;em&gt; &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;docs.thatdot.com&#x2F;novelty&#x2F;getting-started&#x2F;novelty-main-concepts.html#terminology&quot;&gt;observations&lt;&#x2F;a&gt; into thatDot Novelty using one of two API endpoints:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;Single observation: &lt;code&gt;POST &#x2F;api&#x2F;v1&#x2F;novelty&#x2F;{context}&#x2F;observe&lt;&#x2F;code&gt;&lt;&#x2F;li&gt;
&lt;li&gt;Bulk observations: &lt;code&gt;POST &#x2F;api&#x2F;v1&#x2F;novelty&#x2F;{context}&#x2F;observe&#x2F;bulk&lt;&#x2F;code&gt;&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;After streaming in a batch of observations, you can rescore observations given the context of the entirety of the dataset using Novelty&#x27;s &lt;em&gt;read-only&lt;&#x2F;em&gt; scoring endpoints:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;Single observation: &lt;code&gt;POST &#x2F;api&#x2F;v1&#x2F;novelty&#x2F;{context}&#x2F;read&lt;&#x2F;code&gt;&lt;&#x2F;li&gt;
&lt;li&gt;Bulk observation: &lt;code&gt;POST &#x2F;api&#x2F;v1&#x2F;novelty&#x2F;{context}&#x2F;read&#x2F;bulk&lt;&#x2F;code&gt;&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;h3 id=&quot;novelty-score-results&quot;&gt;Novelty Score Results&lt;&#x2F;h3&gt;
&lt;p&gt;thatDot Novelty&#x27;s Score Results response returns the observation score, along with additional useful information. Here is some of that data:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;observation&lt;&#x2F;strong&gt;: The observation that was streamed in to generate the result. A list of string observation components&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;score&lt;&#x2F;strong&gt;: score between 0 and 1 representing the most novel component of this observation. 1 is highly novel, 0 is not novel at all: the &lt;code&gt;mostNovelComponent&lt;&#x2F;code&gt; field contains more details for which component led to this result&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;mostNovelComponent&lt;&#x2F;strong&gt;: which component of the observation was the most novel&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;sequence&lt;&#x2F;strong&gt;: sequence number assigned to uniquely identify this observation as made within this context.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;uniqueness&lt;&#x2F;strong&gt;: scaled measure of uniqueness for the observation as a whole; ranges between 0 (no uniqueness) and 1 (totally unique)&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;h3 id=&quot;important-points&quot;&gt;Important Points&lt;&#x2F;h3&gt;
&lt;ul&gt;
&lt;li&gt;Unique does not mean novel. Sometimes, completely unique and unseen observations can be normal, as described in the Demo when showing the normalcy of having completely unique IP addresses in a certain scenario&lt;&#x2F;li&gt;
&lt;li&gt;thatDot Novelty does not require training, but does take a bit of time depending on the use case to adapt to the data&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;h2 id=&quot;recent-posts&quot;&gt;&lt;strong&gt;Recent&lt;&#x2F;strong&gt; posts&lt;&#x2F;h2&gt;
&lt;p&gt;Want to read more news and other posts? Visit the resource center for all things thatDot.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;a href=&quot;https:&#x2F;&#x2F;www.thatdot.com&#x2F;blog&#x2F;&quot;&gt;View all Resources&lt;&#x2F;a&gt;&lt;&#x2F;p&gt;
&lt;h2 id=&quot;help-center&quot;&gt;Help &lt;strong&gt;Center&lt;&#x2F;strong&gt;&lt;&#x2F;h2&gt;
&lt;p&gt;&lt;strong&gt;Streaming Graph Help&lt;&#x2F;strong&gt;&lt;&#x2F;p&gt;
&lt;p&gt;&lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;docs.thatdot.com&#x2F;index.html&quot;&gt;View Docs&lt;&#x2F;a&gt;&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Novelty &amp;amp; Additional Help&lt;&#x2F;strong&gt;&lt;&#x2F;p&gt;
&lt;p&gt;&lt;a href=&quot;https:&#x2F;&#x2F;www.thatdot.com&#x2F;how-to&#x2F;&quot;&gt;View Docs&lt;&#x2F;a&gt;&lt;&#x2F;p&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>Release Announcement for thatDot Streaming Graph 1.6.1 with ClickHouse Persistor</title>
        <published>2024-06-19T00:00:00+00:00</published>
        <updated>2024-06-19T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              Unknown
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://www.thatdot.com/news/release-announcement-for-thatdot-streaming-graph-with-clickhouse-persister/"/>
        <id>https://www.thatdot.com/news/release-announcement-for-thatdot-streaming-graph-with-clickhouse-persister/</id>
        
        <content type="html" xml:base="https://www.thatdot.com/news/release-announcement-for-thatdot-streaming-graph-with-clickhouse-persister/">&lt;p&gt;A new version of thatDot Streaming Graph has just been released with the brand new ability to persist data directly in ClickHouse! The new ability to have multiple namespaces is another huge advantage in multi-tenant situations.&lt;&#x2F;p&gt;
&lt;p&gt;With new v1.6.1 enhancements, you can:&lt;&#x2F;p&gt;
&lt;p&gt;▪ Persist data in Clickhouse.&lt;br &#x2F;&gt;
▪ No longer see hot spares in ingest monitor payloads.&lt;br &#x2F;&gt;
▪ Manage different teams or customers with namespace management, with a single thatDot Streaming Graph instance that has independent graph interpreters.&lt;br &#x2F;&gt;
▪ Integrate with Kafka more robustly with added support for arbitrary Kafka properties on WriteToKafka Standing Query output via kafkaProperties field.&lt;br &#x2F;&gt;
▪ Use multiple value standing queries more easily with the ability to trigger an event on arbitrarily-keyed property changes by using a RETURN properties(n) rather than having to list all properties manually.&lt;br &#x2F;&gt;
▪ Get simpler, more standard names for the NumberIteratorIngest fields startAtOffset and maximumPerSecond. If you were previously using the names startAt or throttlePerSecond, you will need to update your recipes and API calls.&lt;br &#x2F;&gt;
▪ Improve favicon support on all platforms.&lt;br &#x2F;&gt;
▪ Execute simple text queries in the single-line query bar on the Explore UI with SHIFT-ENTER.&lt;&#x2F;p&gt;
&lt;p&gt;Fixes:&lt;br &#x2F;&gt;
Some error messages normally encountered during Cypher query compilation were lost when the previous version was migrated to Scala 2.13. Reintroduce those error messages.&lt;&#x2F;p&gt;
&lt;p&gt;Be sure to check out thatDot Streaming Graph for more information and if you already use it, update your local copy of thatDot Streaming Graph to v1.6.1 using your preferred method&lt;&#x2F;p&gt;
&lt;h2 id=&quot;post-title&quot;&gt;Post Title&lt;&#x2F;h2&gt;
&lt;p&gt;&lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;quine.io&#x2F;download&#x2F;&quot;&gt;Download - Streaming Graph for Data Pipelines.&lt;&#x2F;a&gt;&lt;&#x2F;p&gt;
&lt;p&gt;‍&lt;&#x2F;p&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>Can Streaming Graphs Clean Up the Data Pipeline Mess?</title>
        <published>2024-06-18T00:00:00+00:00</published>
        <updated>2024-06-18T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              Unknown
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://www.thatdot.com/news/can-streaming-graphs-clean-up-the-data-pipeline-mess/"/>
        <id>https://www.thatdot.com/news/can-streaming-graphs-clean-up-the-data-pipeline-mess/</id>
        
        <content type="html" xml:base="https://www.thatdot.com/news/can-streaming-graphs-clean-up-the-data-pipeline-mess/">&lt;p&gt;In this article on Datanami, Alex Woodie discusses the problems with current event stream processing data pipelines, and the advantages a graph paradigm could bring to the table, with thatDot technology spotlighted. He talks about how thatDot&#x27;s Ryan Wright found himself having to rebuild the data pipeline infrastructure of multiple times, and how brittle and difficult to maintain it could be.&lt;&#x2F;p&gt;
&lt;p&gt;“The more data pipelines you build, the more they start looking like the same thing,” Wright says. “And you have to start wondering: How do we solve the higher-level question so we don’t have to keep rebuilding the same pipelines over and over again?”&lt;&#x2F;p&gt;
&lt;p&gt;Learn more by reading the article &quot;&lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;www.datanami.com&#x2F;2022&#x2F;03&#x2F;28&#x2F;can-streaming-graphs-clean-up-the-data-pipeline-mess&#x2F;&quot;&gt;Can Streaming Graphs Clean Up the Data Pipeline Mess&lt;&#x2F;a&gt;?&quot; on Datanami.&lt;&#x2F;p&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>Optimize Digital Twins to Real Time</title>
        <published>2024-06-18T00:00:00+00:00</published>
        <updated>2024-06-18T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              Unknown
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://www.thatdot.com/news/optimize-digital-twins-to-real-time/"/>
        <id>https://www.thatdot.com/news/optimize-digital-twins-to-real-time/</id>
        
        <content type="html" xml:base="https://www.thatdot.com/news/optimize-digital-twins-to-real-time/">&lt;p&gt;On RTInsights, thatDot&#x27;s Rob Malnati writes about digital twins. This article provides a solid foundation as to what digital twins are, what they&#x27;re used for, and how streaming graph technology can make them more effective.&lt;&#x2F;p&gt;
&lt;p&gt;&quot;As our world becomes increasingly connected, digital twins abstract and model almost everything to improve business operations, reduce risk, and enhance decision-making for better outcomes.&quot;&lt;&#x2F;p&gt;
&lt;p&gt;&quot;For digital twins to be truly useful, they must be able to drive actions – for example, issue alerts or power down equipment – the instant an issue emerges, perhaps even beforehand.&quot;&lt;&#x2F;p&gt;
&lt;p&gt;Read more on the original article &quot;&lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;www.rtinsights.com&#x2F;optimizing-digital-twins-to-real-time&#x2F;&quot;&gt;Optimizing Digital Twins to Real Time&lt;&#x2F;a&gt;&quot; on RTInsights.&lt;&#x2F;p&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>Stop Querying Your Data</title>
        <published>2024-06-18T00:00:00+00:00</published>
        <updated>2024-06-18T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              Unknown
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://www.thatdot.com/news/stop-querying-your-data/"/>
        <id>https://www.thatdot.com/news/stop-querying-your-data/</id>
        
        <content type="html" xml:base="https://www.thatdot.com/news/stop-querying-your-data/">&lt;p&gt;At the 2023 Knowledge Graph Conference in New York, Ryan Wright, CEO and Founder of thatDot, gave a presentation entitled: Streaming Graphs: Because We Cannot Afford to Query Anymore. Quine streaming graph can process millions of complex, multi-hop graph events per second. But what design decisions and tradeoffs went into making this possible? And why does it matter to data engineers and their day-to-day? Learn how Quine integrates with your event streaming pipeline (including Apache Kafka, Kinesis, RedPanda, and Apache Pulsar) and uses standing queries to generate results the instant a pattern in the data stream emerges.&lt;&#x2F;p&gt;
&lt;p&gt;How does this mean you don&#x27;t have to keep querying your data?&lt;&#x2F;p&gt;
&lt;p&gt;Check out the &quot;&lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=bivjz9LeUgg&quot;&gt;Streaming Graphs: Because We Cannot Afford to Query Anymore&lt;&#x2F;a&gt;&quot; presentation recording to learn more.&lt;&#x2F;p&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>Streaming graph analytics: ThatDot’s open-source framework Quine is gaining interest</title>
        <published>2024-06-18T00:00:00+00:00</published>
        <updated>2024-06-18T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              Unknown
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://www.thatdot.com/news/streaming-graph-analytics-thatdots-open-source-framework-quine-is-gaining-interest/"/>
        <id>https://www.thatdot.com/news/streaming-graph-analytics-thatdots-open-source-framework-quine-is-gaining-interest/</id>
        
        <content type="html" xml:base="https://www.thatdot.com/news/streaming-graph-analytics-thatdots-open-source-framework-quine-is-gaining-interest/">&lt;p&gt;Streaming Graph Analytics, and what it does. In this article on Venturebeat, George Anadiotis discusses the power of Quine, the increasing interest in the concept of streaming graph, and the influx of thatDot funding from &lt;a href=&quot;https:&#x2F;&#x2F;www.thatdot.com&#x2F;use-cases&#x2F;&quot;&gt;cybersecurity&lt;&#x2F;a&gt; leader Crowdstrike.&lt;&#x2F;p&gt;
&lt;p&gt;&quot;What do you get when you combine two of the most up-and-coming paradigms in data processing — streaming and graphs? Likely a potential game-changer, at least that’s what is being hinted at by the likes of &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;venturebeat.com&#x2F;2017&#x2F;04&#x2F;06&#x2F;ai-weekly-google-tpu-darpa-bixby-and-more&#x2F;&quot;&gt;DARPA&lt;&#x2F;a&gt; and now CrowdStrike’s &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;www.crowdstrike.com&#x2F;falcon-fund&#x2F;&quot;&gt;Falcon Fund,&lt;&#x2F;a&gt; which are betting on &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;www.thatdot.com&#x2F;&quot;&gt;ThatDot&lt;&#x2F;a&gt; and its open-source framework &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;http:&#x2F;&#x2F;quine.io&#x2F;&quot;&gt;Quine&lt;&#x2F;a&gt;.&quot;&lt;&#x2F;p&gt;
&lt;p&gt;To learn more, read &quot;&lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;venturebeat.com&#x2F;data-infrastructure&#x2F;streaming-graph-analytics-thatdots-open-source-framework-quine-is-gaining-interest&#x2F;&quot;&gt;Streaming graph analytics: ThatDot’s open-source framework Quine is gaining interest&lt;&#x2F;a&gt;&quot; on Venturebeat.&lt;&#x2F;p&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>Understanding Batch VS Streaming Data</title>
        <published>2024-06-18T00:00:00+00:00</published>
        <updated>2024-06-18T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              Unknown
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://www.thatdot.com/news/understanding-batch-vs-streaming-data/"/>
        <id>https://www.thatdot.com/news/understanding-batch-vs-streaming-data/</id>
        
        <content type="html" xml:base="https://www.thatdot.com/news/understanding-batch-vs-streaming-data/">&lt;p&gt;In this article on InsideBigData, thatDot&#x27;s Rob Malnati discusses the evolution of data architectures from batch toward a more real-time representation of the world. Often, this new way of dealing with data is essential as data processing demands change.&lt;&#x2F;p&gt;
&lt;p&gt;&quot;Batch processing is, and will remain, enormously useful for many everyday tasks. However, for all its utility,  batch processing is at odds with how the world works. Whether you are talking about financial transactions, social media feeds, or clicks on news sites, data is being generated continuously. It streams past. And once it is gone, your ability to act on it at the moment is also gone.&quot;&lt;&#x2F;p&gt;
&lt;p&gt;Read more at &quot;&lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;insidebigdata.com&#x2F;2022&#x2F;10&#x2F;08&#x2F;understanding-batch-vs-streaming-data-processing-as-enterprises-go-real-time&#x2F;&quot;&gt;Understanding Batch vs. Streaming Data Processing As Enterprises Go Real Time&lt;&#x2F;a&gt;&quot; on InsideBigData.&lt;&#x2F;p&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>Understanding the Scale Limitations of Graph Databases</title>
        <published>2024-06-18T00:00:00+00:00</published>
        <updated>2024-06-18T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              Unknown
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://www.thatdot.com/news/understanding-the-limitations-of-graph-databases/"/>
        <id>https://www.thatdot.com/news/understanding-the-limitations-of-graph-databases/</id>
        
        <content type="html" xml:base="https://www.thatdot.com/news/understanding-the-limitations-of-graph-databases/">&lt;p&gt;In this article on eWeek, thatDot&#x27;s Rob Malnati discusses why it&#x27;d difficult or even impossible to analyze really large datasets using graph databases. The difficulty is compounded by the modern need to respond to everything in real time.&lt;&#x2F;p&gt;
&lt;p&gt;&quot;Much has changed since the emergence of the most recent generation of graph databases from a decade ago. Enterprises are dealing with previously unimaginable volumes of data to potentially query. That data enters and streams through the enterprise in a variety of channels, and enterprises want action on that information in real time.&quot;&lt;&#x2F;p&gt;
&lt;p&gt;To learn more, read &quot;&lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;www.eweek.com&#x2F;cloud&#x2F;scale-limitations-of-graph-databases&#x2F;&quot;&gt;Understanding the Scale Limitations of Graph Databases&lt;&#x2F;a&gt;&quot; on eWeek.&lt;&#x2F;p&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>What is Categorical Data?</title>
        <published>2024-06-18T00:00:00+00:00</published>
        <updated>2024-06-18T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              Unknown
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://www.thatdot.com/news/what-is-categorical-data-type/"/>
        <id>https://www.thatdot.com/news/what-is-categorical-data-type/</id>
        
        <content type="html" xml:base="https://www.thatdot.com/news/what-is-categorical-data-type/">&lt;p&gt;On datanami, thatDot founder and CEO Ryan Wright helps define the nature of &lt;a href=&quot;https:&#x2F;&#x2F;www.thatdot.com&#x2F;blog&#x2F;what-is-categorical-data&#x2F;&quot;&gt;categorical data&lt;&#x2F;a&gt;. This essential data type makes up about three quarters of all data an enterprise needs to analyze, yet is often not analyzed.&lt;&#x2F;p&gt;
&lt;p&gt;&quot;Why would enterprises ignore an entire class of data? Especially when it is essential to high-priority use cases like personalization, customer 360, fraud detection and prevention, network performance monitoring, and supply chain management?&quot;&lt;&#x2F;p&gt;
&lt;p&gt;To learn more read &quot;&lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;www.datanami.com&#x2F;2022&#x2F;07&#x2F;25&#x2F;what-is-categorical-data&#x2F;&quot;&gt;What is Categorical Data?&lt;&#x2F;a&gt;&quot; on datanami.com.&lt;&#x2F;p&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>Quine Aims to Simplify Event Processing on Data in Motion</title>
        <published>2024-06-17T00:00:00+00:00</published>
        <updated>2024-06-17T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              Unknown
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://www.thatdot.com/news/quine-aims-to-simplify-event-processing-on-data-in-motion/"/>
        <id>https://www.thatdot.com/news/quine-aims-to-simplify-event-processing-on-data-in-motion/</id>
        
        <content type="html" xml:base="https://www.thatdot.com/news/quine-aims-to-simplify-event-processing-on-data-in-motion/">&lt;p&gt;On InfoQ, Sergio De Simone talks about the advantages of the streaming graph style of data processing, and of Quine open source software in particular.&lt;&#x2F;p&gt;
&lt;p&gt;&quot;What sets Quine apart from other &lt;a href=&quot;https:&#x2F;&#x2F;www.thatdot.com&#x2F;blog&#x2F;quines-real-time-temporal-event-sequencing-produces-new-insights&#x2F;&quot;&gt;stream processing solutions&lt;&#x2F;a&gt;, says thatDot, is a set of &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;assets.website-files.com&#x2F;61f0aecf55af2560f76f6a75&#x2F;620fd58ba117ef2365c2ab07_Quine_StreamingGraph_WP1.1.pdf&quot;&gt;three design choices&lt;&#x2F;a&gt; that lie at its foundations: a graph-structured data model, an asynchronous actor-based graph computational model, and standing queries.&quot;&lt;&#x2F;p&gt;
&lt;p&gt;Check out &quot;&lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;www.infoq.com&#x2F;news&#x2F;2022&#x2F;03&#x2F;quine-event-stream-processing&#x2F;&quot;&gt;Quine Aims to Simplify Event Processing on Data in Motion&lt;&#x2F;a&gt;&quot; on InfoQ to learn more.&lt;&#x2F;p&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>ThatDot accelerates streaming data analytics with open source Quine</title>
        <published>2024-06-17T00:00:00+00:00</published>
        <updated>2024-06-17T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              Unknown
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://www.thatdot.com/news/thatdot-accelerates-streaming-data-analytics-with-open-source-quine/"/>
        <id>https://www.thatdot.com/news/thatdot-accelerates-streaming-data-analytics-with-open-source-quine/</id>
        
        <content type="html" xml:base="https://www.thatdot.com/news/thatdot-accelerates-streaming-data-analytics-with-open-source-quine/">&lt;p&gt;On VentureBeat, Shubham Sharma writes about thatDot&#x27;s announcement of open source software Quine for streaming graph complex event processing. He discusses, among other things, the power of Quine to reduce the burden on developers of event stream processing data pipelines.&lt;&#x2F;p&gt;
&lt;p&gt;&quot;It can eliminate batch processing, multi-level joins, and other time-consuming and outdated processes that drag down and stall analysis on streaming data. This way, data pipeline engineering teams can easily interpret high-volume event data streams, innovate and ship products faster and use the emerging Graph AI tools driving the next wave in machine learning.&quot;&lt;&#x2F;p&gt;
&lt;p&gt;Read more about how &quot;&lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;venturebeat.com&#x2F;enterprise&#x2F;thatdot-open-sources-quine-to-accelerate-streaming-data-analysis&#x2F;&quot;&gt;ThatDot accelerates streaming data analytics with open source Quine&lt;&#x2F;a&gt;&quot; on VentureBeat.&lt;&#x2F;p&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>thatDot launches Quine, a streaming graph engine</title>
        <published>2024-06-17T00:00:00+00:00</published>
        <updated>2024-06-17T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              Unknown
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://www.thatdot.com/news/thatdot-launches-quine-a-streaming-graph-engine/"/>
        <id>https://www.thatdot.com/news/thatdot-launches-quine-a-streaming-graph-engine/</id>
        
        <content type="html" xml:base="https://www.thatdot.com/news/thatdot-launches-quine-a-streaming-graph-engine/">&lt;p&gt;On TechCrunch.com, Frederic Lardinois talks about the launch of Quine open source streaming graph complex event processing engine.&lt;&#x2F;p&gt;
&lt;p&gt;“We’ve developed the streaming graph to really target the kind of the problem in the industry right now — the rock and hard place that we all sit between,” Quine’s creator and thatDot CEO and co-founder &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;www.linkedin.com&#x2F;in&#x2F;wrightryan&#x2F;&quot;&gt;Ryan Wright&lt;&#x2F;a&gt; told me. “On one side, there’s huge volumes of data. For the last 10 years, big data has just become de rigueur, it’s a normal ordinary thing now and only getting bigger. But the other side of that is how do you interpret all that data?”&lt;&#x2F;p&gt;
&lt;p&gt;Read more about &quot;&lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;techcrunch.com&#x2F;2022&#x2F;02&#x2F;23&#x2F;thatdot-launches-quine-a-streaming-graph-engine&#x2F;&quot;&gt;thatDot launches Quine, a streaming graph engine&lt;&#x2F;a&gt;&quot; on TechCrunch.com.&lt;&#x2F;p&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>thatDot Launches Streaming Graph Platform</title>
        <published>2024-06-17T00:00:00+00:00</published>
        <updated>2024-06-17T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              Unknown
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://www.thatdot.com/news/thatdot-launches-streaming-graph-platform/"/>
        <id>https://www.thatdot.com/news/thatdot-launches-streaming-graph-platform/</id>
        
        <content type="html" xml:base="https://www.thatdot.com/news/thatdot-launches-streaming-graph-platform/">&lt;p&gt;“Enterprise data engineering teams are confined to the limitations and tradeoffs of the previous generation of event processing frameworks like Flink. They spend enormous time and effort building complicated event-driven architectures that only work on small time-windows of in-memory data and miss out on the bigger picture,” said Ryan Wright, the creator of Quine and founder&#x2F;CEO of thatDot. “Quine can transform months of tedious data engineering into an afternoon’s work enabling data pipeline engineers to easily interpret high-volume event data streams, innovate and ship products  faster, and to use the emerging Graph AI tools driving the next wave in machine learning.”&lt;&#x2F;p&gt;
&lt;p&gt;On the editorial for DBTA Magazine, Stephanie Simone discusses the debut of Quine open source graph complex event stream processing software. She talks about how it combines graph data with a streaming data processing technologies in a developer-friendly way.&lt;&#x2F;p&gt;
&lt;p&gt;Read more about &quot;&lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;www.dbta.com&#x2F;Editorial&#x2F;News-Flashes&#x2F;thatDot-Launches-Streaming-Graph-Platform-151570.aspx&quot;&gt;thatDot Launches Streaming Graph Platform&lt;&#x2F;a&gt;&quot; on the online DBTA magazine.&lt;&#x2F;p&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>Authentication Fraud</title>
        <published>2024-06-17T00:00:00+00:00</published>
        <updated>2024-06-17T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              Unknown
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://www.thatdot.com/use-cases/authentication-fraud/"/>
        <id>https://www.thatdot.com/use-cases/authentication-fraud/</id>
        
        <content type="html" xml:base="https://www.thatdot.com/use-cases/authentication-fraud/">&lt;h2 id=&quot;the-problem&quot;&gt;The Problem&lt;&#x2F;h2&gt;
&lt;p&gt;Metered attacks that generate low volume log-in attempts, from diverse IPs and across extended time frames, are designed to avoid the &quot;3 strikes in 24 hours&quot; business rules in authentication applications and the more complex analysis of log analytics &#x2F; SIEM platforms. Batch solutions by definition cannot react until after a compromise has occurred while all real-time solutions impose time windows -- any data falling outside these rolling windows, no matter how important, is simply not processed. Either way, that means important patterns are missed and attempts succeed before you can stop them.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-solution&quot;&gt;The Solution&lt;&#x2F;h2&gt;
&lt;p&gt;Quine&#x27;s changes the status quo by continuously assessing newly arriving events for their match to all known attack patterns, including the identification and tracking of partial behavior matches across any time frame, and billions or trillions of users&#x2F;devices&#x2F;applications, until a behavior pattern is fully observed. Once an attack pattern is fully detected, events are generated immediately to trigger an investigation alert or an automated remediation workflow.&lt;&#x2F;p&gt;
&lt;p&gt;Quine&#x27;s continuous analysis of event streams means there are not time windows to manage, and thus no windows for attackers to engineer their attacks around. And Quine provides this extended time frame of analysis &lt;a href=&quot;https:&#x2F;&#x2F;www.thatdot.com&#x2F;blog&#x2F;use-quine-graph-etl-to-reduce-siem-storage-costs&#x2F;&quot;&gt;without incurring the cost of SIEM solutions&lt;&#x2F;a&gt;, sifting through data from multiple sources to find and store only the patterns that matter – in this case, the ones that indicate a low and slow attack is underway.&lt;&#x2F;p&gt;
&lt;p&gt;‍&lt;&#x2F;p&gt;
&lt;h2 id=&quot;key-value-take-away&quot;&gt;Key Value Take Away&lt;&#x2F;h2&gt;
&lt;ul&gt;
&lt;li&gt;Continuously track behavior patterns across billions&#x2F;trillions of devices, users, and applications&lt;&#x2F;li&gt;
&lt;li&gt;Provide analyst a complete record of historical actions by user, device, or application&lt;&#x2F;li&gt;
&lt;li&gt;Operate on one domain&#x2F;customer, or across domains&#x2F;customers&lt;&#x2F;li&gt;
&lt;li&gt;Costs effective vs. log analysis &#x2F; SIEM data store quotas&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>Financial Fraud Detection</title>
        <published>2024-06-17T00:00:00+00:00</published>
        <updated>2024-06-17T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              Unknown
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://www.thatdot.com/use-cases/financial-fraud-detection/"/>
        <id>https://www.thatdot.com/use-cases/financial-fraud-detection/</id>
        
        <content type="html" xml:base="https://www.thatdot.com/use-cases/financial-fraud-detection/">&lt;h2 id=&quot;the-problem&quot;&gt;The Problem&lt;&#x2F;h2&gt;
&lt;p&gt;Financial fraud detection requires monitoring billions of transactions, devices and users in real-time for suspect behaviors without false positives that alienate customers when service is denied in the middle of a foreign vacation or late night business event.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-solution&quot;&gt;The Solution&lt;&#x2F;h2&gt;
&lt;p&gt;What is needed is a system that do four things:&lt;&#x2F;p&gt;
&lt;ol&gt;
&lt;li&gt;detect complex patterns of behavior&lt;&#x2F;li&gt;
&lt;li&gt;combine multiple sources and scale up to millions of events&#x2F;sec&lt;&#x2F;li&gt;
&lt;li&gt;take the appropriate, user-specified action when patterns are detected&lt;&#x2F;li&gt;
&lt;li&gt;do all of this in real time&lt;&#x2F;li&gt;
&lt;&#x2F;ol&gt;
&lt;p&gt;Quine can monitor device and user behavior over extended time periods to detect expected exploit behaviors and new, novel, threat actions. By including categorical data such as store names, item types or sizes, geo locations, device versions, and day of the week, Quine understands the full context of behavior, eliminating false-positives. Additionally, Quine alerts provide a comprehensive view of past and current behavior for a device or user as supporting data for investigations.&lt;&#x2F;p&gt;
&lt;p&gt;‍&lt;&#x2F;p&gt;
&lt;h2 id=&quot;key-value-take-away&quot;&gt;Key Value Take Away&lt;&#x2F;h2&gt;
&lt;ul&gt;
&lt;li&gt;Behavior modeling for billions&#x2F;trillions of users, devices and transactions&lt;&#x2F;li&gt;
&lt;li&gt;High-confidence risk scoring by leveraging the rich behavior context provided by categorical data analysis&lt;&#x2F;li&gt;
&lt;li&gt;Human-understandable alert information to support analysts investigations&lt;&#x2F;li&gt;
&lt;li&gt;Cost effective at scale with on premise licensing&lt;&#x2F;li&gt;
&lt;li&gt;Integrates with existing &lt;a href=&quot;https:&#x2F;&#x2F;www.thatdot.com&#x2F;blog&#x2F;quine-streaming-graph-is-a-natural-fit-for-kafka-pipelines&#x2F;&quot;&gt;Apache Kafka,&lt;&#x2F;a&gt; AWS Kinesis, data lake, and &lt;a href=&quot;https:&#x2F;&#x2F;www.thatdot.com&#x2F;blog&#x2F;real-time-graph-analytics-for-kafka-streams-with-quine&#x2F;&quot;&gt;API event&lt;&#x2F;a&gt; sources.&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>Graph AI</title>
        <published>2024-06-17T00:00:00+00:00</published>
        <updated>2024-06-17T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              Unknown
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://www.thatdot.com/use-cases/graph-ai/"/>
        <id>https://www.thatdot.com/use-cases/graph-ai/</id>
        
        <content type="html" xml:base="https://www.thatdot.com/use-cases/graph-ai/">&lt;h2 id=&quot;the-problem&quot;&gt;The Problem&lt;&#x2F;h2&gt;
&lt;p&gt;Pick One.&lt;&#x2F;p&gt;
&lt;p&gt;Recent AI research is generating a growing number of graph AI techniques that take advantage of graph data relationships, and the rich context it provides, however production graph data pipelines lack the performance needed to deploy these new tools at scale.&lt;&#x2F;p&gt;
&lt;p&gt;Graph AI development promises significant advances for AI application to a range of use cases thanks to the rich data context available from a graph data model. Moving graph AI techniques from the lab to production scale, however, is a significant challenge due to the limited scaling performance of graph databases.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-solution&quot;&gt;The Solution&lt;&#x2F;h2&gt;
&lt;p&gt;Quine streaming graph provides a single platform for the; 1. development of graph AI techniques, and, 2. production deployment of your algorithms on high-volume data streams. Quine even supports data ingestion and transformation of multiple data and event sources as part of the solution, allowing data scientist to define these data operations in the lab and then migrate them &quot;as is&quot; to production scale platforms run by operations.&lt;&#x2F;p&gt;
&lt;p&gt;Graph AI development in Quine is supports multiple ways:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;Construct your AI logic as Cypher queries and apply them via REST API.&lt;&#x2F;li&gt;
&lt;li&gt;Apply externally built algorithms as User Defined Functions. &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;thatdot&#x2F;quine&#x2F;blob&#x2F;main&#x2F;quine-core&#x2F;src&#x2F;main&#x2F;scala&#x2F;com&#x2F;thatdot&#x2F;quine&#x2F;graph&#x2F;cypher&#x2F;Proc.scala#L61&quot;&gt;Example&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;li&gt;Create custom low-level messaging primitives and node behavior on Quine. &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;thatdot&#x2F;quine&#x2F;blob&#x2F;main&#x2F;quine-cypher&#x2F;src&#x2F;main&#x2F;scala&#x2F;com&#x2F;thatdot&#x2F;quine&#x2F;compiler&#x2F;cypher&#x2F;Procedures.scala#L380-L423&quot;&gt;Example&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;li&gt;Use Quine standing queries as event-based triggers to update values on other nodes. A set of related nodes updating each other can perform the computation for, and maintain intermediate results for algorithms on a graph.&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;‍&lt;&#x2F;p&gt;
&lt;h2 id=&quot;key-value-take-away&quot;&gt;Key Value Take Away&lt;&#x2F;h2&gt;
&lt;ul&gt;
&lt;li&gt;A single platform to define ETL operations in the lab and production&lt;&#x2F;li&gt;
&lt;li&gt;A single platform to define, test and deploy graph AI techniques&lt;&#x2F;li&gt;
&lt;li&gt;Build native graph AI techniques as primitives or using Quine powerful standing query capabilities&lt;&#x2F;li&gt;
&lt;li&gt;Import externally built user defined functions for use via Quine&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>Log Analysis</title>
        <published>2024-06-17T00:00:00+00:00</published>
        <updated>2024-06-17T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              Unknown
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://www.thatdot.com/use-cases/log-analysis/"/>
        <id>https://www.thatdot.com/use-cases/log-analysis/</id>
        
        <content type="html" xml:base="https://www.thatdot.com/use-cases/log-analysis/">&lt;h2 id=&quot;the-problem&quot;&gt;The Problem&lt;&#x2F;h2&gt;
&lt;p&gt;Monitoring systems comprised of multiple services is typically done by monitoring each service individually using it&#x27;s logs, or on an end to end basis that lacks visibility into the individual performance characteristics of each service. Root cause analysis is usually based on operations personnel instinct and past experience, making automated remediation next to impossible for many use cases.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-solution&quot;&gt;The Solution&lt;&#x2F;h2&gt;
&lt;p&gt;With thatDot&#x27;s streaming graph logs and events from servers, operating systems, databases, applications, and clients are ingested in real-time and assembled into a graph data model. The graph data model natively connects events with unlimited categorical classifications and calculated metrics to identify &quot;alerts that matter&quot; and instantly associate them to servers, VMs, containers, code versions, subnets, etc. This real-time comprehensive view of the inter-relationships between services allows rapid assessment of root causes for operations investigations or automated remediation workflows.&lt;&#x2F;p&gt;
&lt;p&gt;‍&lt;&#x2F;p&gt;
&lt;h2 id=&quot;key-value-take-away&quot;&gt;Key Value Take Away&lt;&#x2F;h2&gt;
&lt;ul&gt;
&lt;li&gt;Identify issues that matter, in real-time and at scale&lt;&#x2F;li&gt;
&lt;li&gt;Graph data modeling eliminates the complexity of deeply nested joins&lt;&#x2F;li&gt;
&lt;li&gt;NOC technicians can easily pivot data to understand issue impacts and root causes&lt;&#x2F;li&gt;
&lt;li&gt;Automatic handling of out-of-order data arrival&lt;&#x2F;li&gt;
&lt;li&gt;Entity resolution between log and event sources&lt;&#x2F;li&gt;
&lt;li&gt;Integrates with existing &lt;a href=&quot;https:&#x2F;&#x2F;www.thatdot.com&#x2F;blog&#x2F;quine-streaming-graph-is-a-natural-fit-for-kafka-pipelines&#x2F;&quot;&gt;Apache Kafka&lt;&#x2F;a&gt;, AWS Kinesis, data lake, and &lt;a href=&quot;https:&#x2F;&#x2F;www.thatdot.com&#x2F;blog&#x2F;real-time-graph-analytics-for-kafka-streams-with-quine&#x2F;&quot;&gt;API event sources&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>Real-time Blockchain Fraud Detection</title>
        <published>2024-06-17T00:00:00+00:00</published>
        <updated>2024-06-17T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              Unknown
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://www.thatdot.com/use-cases/real-time-blockchain-fraud-detection/"/>
        <id>https://www.thatdot.com/use-cases/real-time-blockchain-fraud-detection/</id>
        
        <content type="html" xml:base="https://www.thatdot.com/use-cases/real-time-blockchain-fraud-detection/">&lt;h2 id=&quot;the-problem&quot;&gt;The Problem&lt;&#x2F;h2&gt;
&lt;p&gt;Real-time linking of transactions, accounts, wallets, and blocks within and across blockchains is not possible with current solutions. Instead, the user must either rely on batch processing, which means results are out of date, or perform recursive lookups across table joins, which means unacceptable latency.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-solution&quot;&gt;The Solution&lt;&#x2F;h2&gt;
&lt;p&gt;Graph data structures are ideal for modeling the relationships described in blockchain events. Flows of cryptocurrency between accounts and wallets are ideal inputs for graph data modeling. Accounts, addresses, time references, devices, assets, transaction details, etc. are all examples of categorical data connected by relationships and are therefore ideal to be represented as the nodes, edges, and properties provided in a graph data model. Most importantly, relationships between entities are first class citizens in the data model so the costs and complexity associated with table joins is entirely eliminated.&lt;&#x2F;p&gt;
&lt;p&gt;Quine easily ingests event feeds from multiple sources and creates a single unified view of activity on the blockchain(s). When fraudulent activity is reported, or when evidence of fraud emerges, Quine&#x27;s standing queries instantly recognizes the patterns and triggers an alert. Now the client is not only detecting the fraudulent behavior in real time but blocking transactions before they can complete. Of course, many fraud alerts are delivered well after a transaction occurs and the Quine streaming graph can instantly provide a complete list of past transactions for a wallet or block to aid investigations or to block future transactions with related parties.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;uploads-ssl.webflow.com&#x2F;61f0aecf55af2565526f6a95&#x2F;629534657541e766bcd6b242_kMFQWtuGZrQWbY5t4dt2tRowf_Zb5D7so9nTg1w-1rf8KcQJODpLX9Uq88DoCcG451Ih3HhiIva8JAH8MDouAy3_Y-6hfcRwUdr7z9GAa69Dnto7QKkgFhOn51o893ChBPm-dEF_efGpj4hJwQ.png&quot; alt=&quot;&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;Graph is the ideal data model for blockchain relationship tracing.&lt;&#x2F;p&gt;
&lt;p&gt;In the above screenshot from Quine&#x27;s Exploration UI, you can see how Quine makes it easy to trace all the accounts with which an account engaged in fraudulent activities interacts.&lt;&#x2F;p&gt;
&lt;p&gt;Watch thatDot&#x27;s founder, Ryan Wright, demonstrate using Quine to tag fraudulent accounts and track transactions:
&lt;div class=&quot;video-embed&quot;&gt;
  &lt;iframe src=&quot;https:&#x2F;&#x2F;www.youtube-nocookie.com&#x2F;embed&#x2F;Z8pXVof9BfE&quot; title=&quot;YouTube video&quot;
    frameborder=&quot;0&quot; loading=&quot;lazy&quot;
    allow=&quot;accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture&quot;
    allowfullscreen&gt;&lt;&#x2F;iframe&gt;
&lt;&#x2F;div&gt;

‍&lt;&#x2F;p&gt;
&lt;h2 id=&quot;key-value-take-away&quot;&gt;Key Value Take Away&lt;&#x2F;h2&gt;
&lt;ul&gt;
&lt;li&gt;Sub 5ms Access to Complete trace history&lt;&#x2F;li&gt;
&lt;li&gt;Adapt to new blockchains rapidly with streaming ETL built in&lt;&#x2F;li&gt;
&lt;li&gt;Real-time materialization of wallet, block and transaction state&lt;&#x2F;li&gt;
&lt;li&gt;On premise software to deploy in your data center or cloud of choice&lt;&#x2F;li&gt;
&lt;li&gt;Integrates with existing &lt;a href=&quot;https:&#x2F;&#x2F;www.thatdot.com&#x2F;blog&#x2F;quine-streaming-graph-is-a-natural-fit-for-kafka-pipelines&#x2F;&quot;&gt;Apache Kafka,&lt;&#x2F;a&gt; AWS Kinesis, data lake, and &lt;a href=&quot;https:&#x2F;&#x2F;www.thatdot.com&#x2F;blog&#x2F;real-time-graph-analytics-for-kafka-streams-with-quine&#x2F;&quot;&gt;API event&lt;&#x2F;a&gt; sources&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>Stateful Digital Twin</title>
        <published>2024-06-17T00:00:00+00:00</published>
        <updated>2024-06-17T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              Unknown
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://www.thatdot.com/use-cases/stateful-digital-twin/"/>
        <id>https://www.thatdot.com/use-cases/stateful-digital-twin/</id>
        
        <content type="html" xml:base="https://www.thatdot.com/use-cases/stateful-digital-twin/">&lt;h2 id=&quot;the-problem&quot;&gt;The Problem&lt;&#x2F;h2&gt;
&lt;p&gt;While digital twins and the emerging subcategory of asset graphs promise operators greater visibility into the relationships between IT assets and equipment under management, current approaches are more like snapshots of a point in the past. Events take place in real time, meaning the digital twin is almost always out of date, limiting its utility. Lack of visibility translates into delayed reactions to threats or failure modes. Digital twins are out of step with enterprises increasingly moving to real time.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-solution&quot;&gt;The Solution&lt;&#x2F;h2&gt;
&lt;p&gt;Quine streaming graph generates digital twins and asset graphs that are stateful and event-driven. Stateful digital twins reflect activities in the environments they model in real time. Using Quine streaming graph, you can easily construct an digital twin that can ingest high volumes of event data in order to detect complex and subtle threats or changes in conditions the instant data arrives, triggering alerts and remedial action in real time.&lt;&#x2F;p&gt;
&lt;p&gt;‍&lt;&#x2F;p&gt;
&lt;h2 id=&quot;key-value-take-away&quot;&gt;Key Value Take Away&lt;&#x2F;h2&gt;
&lt;ul&gt;
&lt;li&gt;Use graph ETL to ingest cloud or system events from multiple systems.&lt;&#x2F;li&gt;
&lt;li&gt;Automatically construct a digital twin that includes system elements and the relationships between them.&lt;&#x2F;li&gt;
&lt;li&gt;Continuously update the digital twin to reflect ongoing changes in real-time.&lt;&#x2F;li&gt;
&lt;li&gt;Trigger events and alerts when specific conditions are identified in the twin.&lt;&#x2F;li&gt;
&lt;li&gt;Query for any past state of the digital twin.&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>Streaming Graph ETL</title>
        <published>2024-06-17T00:00:00+00:00</published>
        <updated>2024-06-17T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              Unknown
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://www.thatdot.com/use-cases/streaming-graph-etl/"/>
        <id>https://www.thatdot.com/use-cases/streaming-graph-etl/</id>
        
        <content type="html" xml:base="https://www.thatdot.com/use-cases/streaming-graph-etl/">&lt;h2 id=&quot;the-problem&quot;&gt;The Problem&lt;&#x2F;h2&gt;
&lt;p&gt;Most ETL tools use the batch processing paradigm to find high-value patterns in large volumes of data. Whether the specific business application is fraud detection, cyber security, network observability, e-commerce or ad targeting, batch processing translates into delay. Even if you are processing data in small batches, you are missing opportunities to react to events as they happen and shape outcomes in ways beneficial to your business.&lt;&#x2F;p&gt;
&lt;p&gt;A great example is insider trading. The cost of detecting someone who is about to execute an insider trade is much less than the cost of trying to unwind that trade later when batch processing picks it up. Even if the batch process runs every five minutes, that just means you&#x27;ll find them sooner, not stop them. Ultimately stream vs. batch will result in the costly reversal of transactions, not stopping them in real-time.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-solution&quot;&gt;The Solution&lt;&#x2F;h2&gt;
&lt;p&gt;Streaming ETL using Quine means not just knowing but acting on events as they occur. Use Quine&#x27;s ingest queries to materialize event data as a graph, with a graph’s ability to express and query for complex relationships between seemingly unrelated data. Then use Quine’s standing queries to monitor for key patterns (e.g. indicating a fraudulent transaction or cyber attack is underway) and take action when those patterns emerge.&lt;&#x2F;p&gt;
&lt;p&gt;Quine’s graph ETL also makes it straightforward to process categorical data — everything from email addresses and model numbers to IP addresses and process IDs — that other systems ignore or try to encode.&lt;&#x2F;p&gt;
&lt;p&gt;Use Quine Enterprise to scale your graph ETL to millions of events per second.&lt;&#x2F;p&gt;
&lt;p&gt;‍&lt;&#x2F;p&gt;
&lt;h2 id=&quot;key-value-take-away&quot;&gt;Key Value Take Away&lt;&#x2F;h2&gt;
&lt;ul&gt;
&lt;li&gt;Use standing queries to detect patterns as they occur and take action&lt;&#x2F;li&gt;
&lt;li&gt;Join data from multiple sources as scale&lt;&#x2F;li&gt;
&lt;li&gt;Resolve entities across sources&lt;&#x2F;li&gt;
&lt;li&gt;Mitigate out-of-order data arrival&lt;&#x2F;li&gt;
&lt;li&gt;De-duplicate data&lt;&#x2F;li&gt;
&lt;li&gt;Generate new events from data as it streams, in real-time&lt;&#x2F;li&gt;
&lt;li&gt;Integrates with existing &lt;a href=&quot;https:&#x2F;&#x2F;www.thatdot.com&#x2F;blog&#x2F;quine-streaming-graph-is-a-natural-fit-for-kafka-pipelines&#x2F;&quot;&gt;Apache Kafka,&lt;&#x2F;a&gt; AWS Kinesis, data lake, and &lt;a href=&quot;https:&#x2F;&#x2F;www.thatdot.com&#x2F;blog&#x2F;real-time-graph-analytics-for-kafka-streams-with-quine&#x2F;&quot;&gt;API event&lt;&#x2F;a&gt; sources.&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>Video Observability for Root Cause Analysis</title>
        <published>2024-06-17T00:00:00+00:00</published>
        <updated>2024-06-17T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              Unknown
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://www.thatdot.com/use-cases/video-observability-for-root-cause-analysis/"/>
        <id>https://www.thatdot.com/use-cases/video-observability-for-root-cause-analysis/</id>
        
        <content type="html" xml:base="https://www.thatdot.com/use-cases/video-observability-for-root-cause-analysis/">&lt;h2 id=&quot;the-problem&quot;&gt;The Problem&lt;&#x2F;h2&gt;
&lt;p&gt;Real-time video observability that can solve Quality of Experience (QoE) issues while live broadcast events are still playing require the simultaneous monitoring of millions of data points. Video sessions flow across multiple systems including origins, CDNs, manifest services, and players provided by multiple vendors. Relational database approaches to perform this complex log analysis at productions scale run into costs constraints that prohibit comprehensive real-time operations for all but the highest value broadcast events.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-solution&quot;&gt;The Solution&lt;&#x2F;h2&gt;
&lt;p&gt;Quine streaming graph ingests logs and events from clients, CDNs, origins, etc. in real-time and materializes the data into a graph. The graph data model natively connects chunk QoE metrics with unlimited categorical classifications and calculated metrics to identify &quot;alerts that matter to your audience&quot; and instantly associate them to &lt;em&gt;ASN&lt;&#x2F;em&gt;, &lt;em&gt;Geo&lt;&#x2F;em&gt;, &lt;em&gt;client type&lt;&#x2F;em&gt;, &lt;em&gt;asset names&lt;&#x2F;em&gt;, &lt;em&gt;encoding formats&lt;&#x2F;em&gt;, &lt;em&gt;CDN cache serve&lt;&#x2F;em&gt;r, &lt;em&gt;origin server&lt;&#x2F;em&gt;, etc. This real-time comprehensive view of the inter-relationships between services allows rapid assessment of root causes while live video streams as still playing.&lt;&#x2F;p&gt;
&lt;p&gt;‍&lt;&#x2F;p&gt;
&lt;h2 id=&quot;key-value-take-away&quot;&gt;Key Value Take Away&lt;&#x2F;h2&gt;
&lt;ul&gt;
&lt;li&gt;Identify the QoE impacting issues that matter, in real-time and at scale&lt;&#x2F;li&gt;
&lt;li&gt;Graph data modeling eliminates the complexity of deeply nested joins&lt;&#x2F;li&gt;
&lt;li&gt;NOC technicians can easily pivot data to understand issue impacts and root causes&lt;&#x2F;li&gt;
&lt;li&gt;Automatic handling of out-of-order data arrival&lt;&#x2F;li&gt;
&lt;li&gt;Entity resolution between log and event sources&lt;&#x2F;li&gt;
&lt;li&gt;Integrates with existing &lt;a href=&quot;https:&#x2F;&#x2F;www.thatdot.com&#x2F;blog&#x2F;quine-streaming-graph-is-a-natural-fit-for-kafka-pipelines&#x2F;&quot;&gt;Apache Kafka,&lt;&#x2F;a&gt; AWS Kinesis, data lake, and &lt;a href=&quot;https:&#x2F;&#x2F;www.thatdot.com&#x2F;blog&#x2F;real-time-graph-analytics-for-kafka-streams-with-quine&#x2F;&quot;&gt;API event&lt;&#x2F;a&gt; sources.&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>Novelty Technology</title>
        <published>2024-06-14T00:00:00+00:00</published>
        <updated>2024-06-14T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              Unknown
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://www.thatdot.com/novelty-documentation/novelty-technology/"/>
        <id>https://www.thatdot.com/novelty-documentation/novelty-technology/</id>
        
        <content type="html" xml:base="https://www.thatdot.com/novelty-documentation/novelty-technology/">&lt;h2 id=&quot;introduction-a-new-approach-to-anomaly-detection&quot;&gt;Introduction: a New Approach to Anomaly Detection&lt;&#x2F;h2&gt;
&lt;p&gt;Anomaly detection is a technique for finding important data. Decades of research has been spent on creating tools for anomaly detection with &lt;em&gt;numeric&lt;&#x2F;em&gt; data. But most data produced in the real world is not numbers—it is user names, identifiers, log statements, email addresses, URLs, access credentials, service names, file paths, timestamps, IP addresses, API paths, and a seemingly endless list of valuable data that is not a number. Non-numeric data is called “categorical data” and it has been mostly ignored by data analysis tools. So how could you find important categorical data with existing anomaly detection tools if they only work with numeric data?&lt;&#x2F;p&gt;
&lt;h2 id=&quot;not-one-hot&quot;&gt;Not One-Hot&lt;&#x2F;h2&gt;
&lt;p&gt;The state of the art for using anomaly detection—or most other artificial intelligence techniques—on categorical data is to begin by converting the categorical data into numbers. There are standard ways of converting categorical data into numbers, the most common by far is “one-hot encoding.” &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;towardsdatascience.com&#x2F;beyond-one-hot-17-ways-of-transforming-categorical-features-into-numeric-features-57f54f199ea4&quot;&gt;Here is a list of 16 more&lt;&#x2F;a&gt;. These techniques are all cumbersome and lossy in one form or another. Each of those 17 transformations require a data scientist to bake in some interpretation—which future data may not align with. And critically, the standard approach—one-hot encoding—requires that you know the cardinality of the categorical data ahead of time, and that it remains very low. Each new value of the categorical data requires another dimension to be added to the matrix computations.&lt;&#x2F;p&gt;
&lt;p&gt;Adding dimensions leads to a highly-complex feature space that is ruinous for anomaly detection! In addition to larger matrices requiring more computation time, achieving useful results typically becomes impossible because of what has become known as “the curse of dimensionality”: as the number of dimensions used for anomaly detection increases, every data point starts looking like an anomaly in some way and the results are useless.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-standard-for-a-new-technique&quot;&gt;The Standard For a New Technique&lt;&#x2F;h2&gt;
&lt;p&gt;When we set out to create a new way to detect important data directly using categorical data, we set our sights rather high. The standards we wanted to meet for this new technique were:&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Streaming &#x2F; Online &#x2F; Real-Time.&lt;&#x2F;strong&gt; Answers now are &lt;em&gt;much&lt;&#x2F;em&gt; better than answers in 15 minutes, or an hour, or tomorrow, or next week. To be useful in the greatest number of applications, we should be able to provide results immediately. Since the streaming use case is a superset of the batch use case, any tool that can provide streaming results can also provide batch results; but not the other way around.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Unsupervised.&lt;&#x2F;strong&gt; Humans should not have to manually label the data to indicate what is and isn’t anomalous. Supervised AI techniques require manually creating training sets. That process is extremely laborious and time consuming—and often leads to bias encoded into the system. Instead, this system should train itself automatically from the data.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;High cardinality.&lt;&#x2F;strong&gt; One of the greatest challenges with categorical data is that it often has extremely high cardinality. A system designed to handle this should not cause a data or analysis explosion when some fields have an extremely high number of values. Instead, the system should support constant-time processing and storage, incorporating new values and dimensions in the data on-the-fly, while still delivering real-time results.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Distinguish unique from anomalous.&lt;&#x2F;strong&gt; Sometimes “new” is actually “normal.” When a data set regularly includes new unseen values, an anomaly detection system should take that into account. Instead of producing false alarms because data is unique, the system should learn from the context and the rest of the data seen so far to understand when unique data is actually typical.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Learn behavioral patterns.&lt;&#x2F;strong&gt; Human behavior is complex, and system behavior can often compound to be even more complex. The ideal anomaly detection system would be able to learn idiosyncratic behaviors, applicable only in specific situations, and incorporate that learning into the final evaluation of data.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Rank results in a total order.&lt;&#x2F;strong&gt; It is helpful to be told “yes” or “no” for whether data is anomalous. But in many real-world environments, we would also like to know &lt;em&gt;how anomalous&lt;&#x2F;em&gt; it is and how that compares to another anomaly we might be looking at. A strength of existing numerical methods is that they often give a total ranking of results. A new anomaly detection system should preserve this total ordering of anomaly scores but deliver those results immediately—with scores that are still totally ordered as the stream continues. In practice, if I’m already looking at a rather anomalous event, I want to be told if new data is even more anomalous.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Give explanations.&lt;&#x2F;strong&gt; It’s great to reach a conclusion and learn that a piece of data is very novel and important. But it is much &lt;em&gt;better&lt;&#x2F;em&gt; to also understand &lt;em&gt;WHY&lt;&#x2F;em&gt; that datum is important. A system for categorical anomaly detection should embrace the goals of “&lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;www.darpa.mil&#x2F;program&#x2F;explainable-artificial-intelligence&quot;&gt;Explainable AI&lt;&#x2F;a&gt;” so that the people involved can understand its conclusions.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Theoretically sound.&lt;&#x2F;strong&gt; The Achilles heel of most AI systems today is that &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;www.wired.com&#x2F;story&#x2F;artificial-intelligence-confronts-reproducibility-crisis&#x2F;&quot;&gt;they are empirically tested but often lack a solid theoretical justification for the results they produce&lt;&#x2F;a&gt;. The individual steps in modern methods are often well-established, but evaluating the fully-composed technique is limited to empirical measurement instead of theoretical grounding. In contrast, we believe that new techniques are more powerful if they are theoretically sound all the way to their core, and the results reflect that. A system that is theoretically sound produces more reliable results—especially when faced with unfamiliar data. Since anomaly detection is entirely about finding and explaining unfamiliar data, the soundness of the approach is of critical importance.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;theoretical-grounding&quot;&gt;Theoretical Grounding&lt;&#x2F;h2&gt;
&lt;h3 id=&quot;terminology-novel-vs-anomalous&quot;&gt;Terminology: Novel vs. Anomalous&lt;&#x2F;h3&gt;
&lt;p&gt;We describe the system overall as an “anomaly detection system” because that is its most common use, and the name most well-known in the industry. But what we actually compute is a continuous score describing &lt;em&gt;how novel&lt;&#x2F;em&gt; each piece of data is. More than just swapping words, this distinction between “novel” and “anomalous” is an important one. Novelty is an objective feature we can ascribe to the data, and evaluate on a continuous spectrum. Applying this system to a data set, a user can use the novelty scores to decide whether or not data is anomalous. This reflects a separate application-specific process of translating the continuous-valued novelty score into a binary-valued anomaly decision.&lt;&#x2F;p&gt;
&lt;p&gt;Considering an example application in cyber-security, “anomalous” is often the term used to describe what an analyst would probably prefer to call “suspicious.” These terms get used interchangeably in practice, but with subtle subjective distinctions. When approaching the theoretical justification for a new technique, we think it is important to deliberately use the term “novelty” as an objective feature of the data which happens before interpreting the results as “anomalous” or “suspicious.” Thus, “novelty” is the best term for the theoretical determination arrived at by this system.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;probabilistic-graphical-models&quot;&gt;Probabilistic Graphical Models&lt;&#x2F;h3&gt;
&lt;p&gt;A collection of data can be described with a graphical model. This is a representation of the data built by structuring nodes and edges in relationships that represent the core features of the data. When that graph is structured with historical facts and probabilistic information about the data being examined, it can provide a wealth of statistical information about the dataset as a whole, and each individual data observation.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;information-theory&quot;&gt;Information Theory&lt;&#x2F;h3&gt;
&lt;p&gt;That statistical information provides a probabilistic view of the data which allows us to measure the Information Content (also known as “Shannon Information” or “Self-Information”) represented in the data. Information Content is the basis for the novelty scores returned from &lt;em&gt;thatDot Novelty Detector&lt;&#x2F;em&gt;. This approach runs parallel to traditional AI techniques where Information Content is often the primary measure used in the cost function (or “loss function”) of many machine learning methods. In fact, to help machine learning researchers and engineers use this system for other purposes beyond just anomaly detection, we even return the Information Content in the result payload for each observation. Inside &lt;em&gt;Novelty Detector&lt;&#x2F;em&gt;, the Information Content is computed and then combined with other information from a complex graph built by the system.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;a-dynamic-graph&quot;&gt;A Dynamic Graph&lt;&#x2F;h3&gt;
&lt;p&gt;&lt;em&gt;thatDot Novelty Detector&lt;&#x2F;em&gt; builds a dynamic graphical model of the dataset in real-time. This is generally a challenging problem—often calling for a graph database or other complex tools. But those tools aren’t built for streaming data and end up crushed under the load of a high volume of streaming input data and the voluminous computation that result. To overcome this challenge, we built &lt;em&gt;Novelty Detector&lt;&#x2F;em&gt; on top of &lt;em&gt;Quine&lt;&#x2F;em&gt;. &lt;em&gt;Quine&lt;&#x2F;em&gt; is a streaming graph interpreter, capable of high-volume data processing and storage. Using &lt;em&gt;Quine&lt;&#x2F;em&gt;, we construct and maintain the graphical model that represents the incoming data. &lt;em&gt;Quine&lt;&#x2F;em&gt; records the historical facts and computes all the necessary probabilities and information measures needed to produce a score for each incoming data item. That graph interpreter can compare scores across the entire data context and explain what about the data was so novel. All of this is accomplished in real-time so that streaming results are scored immediately as they flow through the system.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;conclusion&quot;&gt;Conclusion&lt;&#x2F;h2&gt;
&lt;p&gt;The final result is a high-throughput, low-latency, high-cardinality, categorical data analysis tool capable of scoring the novelty of all incoming data in real-time. Streaming data dynamically updates a probabilistic graphical model to compute information content assessed holistically with the data context to provide a novelty score useful for finding anomalies and explaining the result. All together, &lt;em&gt;thatDot Novelty Detector&lt;&#x2F;em&gt; represents a breakthrough in the field of anomaly detection, with wide-ranging applications across industries. You can &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;aws.amazon.com&#x2F;marketplace&#x2F;pp&#x2F;prodview-jo6mbktt5ptzm#pdp-pricing&quot;&gt;try &lt;em&gt;thatDot Novelty Detector&lt;&#x2F;em&gt; for free on AWS right away&lt;&#x2F;a&gt;.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Need more help? Join our community on&lt;&#x2F;strong&gt;&lt;&#x2F;p&gt;
&lt;p&gt;&lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;that.re&#x2F;chat&quot;&gt;&lt;img src=&quot;&#x2F;img&#x2F;2024&#x2F;05&#x2F;discord.svg&quot; alt=&quot;&quot; &#x2F;&gt;&lt;&#x2F;a&gt;&lt;&#x2F;p&gt;
&lt;h2 id=&quot;recent-posts&quot;&gt;&lt;strong&gt;Recent&lt;&#x2F;strong&gt; posts&lt;&#x2F;h2&gt;
&lt;p&gt;Want to read more news and other posts? Visit the resource center for all things thatDot.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;a href=&quot;https:&#x2F;&#x2F;www.thatdot.com&#x2F;blog&#x2F;&quot;&gt;View all Resources&lt;&#x2F;a&gt;&lt;&#x2F;p&gt;
&lt;h2 id=&quot;help-center&quot;&gt;Help &lt;strong&gt;Center&lt;&#x2F;strong&gt;&lt;&#x2F;h2&gt;
&lt;p&gt;&lt;strong&gt;Streaming Graph Help&lt;&#x2F;strong&gt;&lt;&#x2F;p&gt;
&lt;p&gt;&lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;docs.thatdot.com&#x2F;index.html&quot;&gt;View Docs&lt;&#x2F;a&gt;&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Novelty &amp;amp; Additional Help&lt;&#x2F;strong&gt;&lt;&#x2F;p&gt;
&lt;p&gt;&lt;a href=&quot;https:&#x2F;&#x2F;www.thatdot.com&#x2F;how-to&#x2F;&quot;&gt;View Docs&lt;&#x2F;a&gt;&lt;&#x2F;p&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>Advanced Persistent Threat (APT) Detection</title>
        <published>2024-06-14T00:00:00+00:00</published>
        <updated>2024-06-14T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              Unknown
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://www.thatdot.com/use-cases/advanced-persistent-threat-apt-detection/"/>
        <id>https://www.thatdot.com/use-cases/advanced-persistent-threat-apt-detection/</id>
        
        <content type="html" xml:base="https://www.thatdot.com/use-cases/advanced-persistent-threat-apt-detection/">&lt;h2 id=&quot;the-problem&quot;&gt;The Problem&lt;&#x2F;h2&gt;
&lt;p&gt;Discovering advanced persistent threats (APT) is, by design, akin to finding a needle in a haystack.&lt;&#x2F;p&gt;
&lt;p&gt;The threat actors behind APTs combine multiple tactics, techniques, and procedures (TTP) over extended periods of time to compromise and maintain access to their targets.&lt;&#x2F;p&gt;
&lt;p&gt;The IBM Cost of Data Breach Report 2021 reported an average attacker dwell time of &lt;strong&gt;212&lt;&#x2F;strong&gt; days.&lt;&#x2F;p&gt;
&lt;p&gt;APTs evade legacy security solutions which rely on time-batched loads of data that filter for Indicators of Compromise (IoC) by executing incremental actions spread across numerous systems at rates that exceed batch analysis size and time boundaries. APT detection requires a new approach.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-solution&quot;&gt;The Solution&lt;&#x2F;h2&gt;
&lt;p&gt;Matured within DARPA&#x27;s Transparent Computing program specifically for the detection of APTs, &lt;a href=&quot;https:&#x2F;&#x2F;www.thatdot.com&#x2F;products&#x2F;quine&#x2F;&quot;&gt;Quine&lt;&#x2F;a&gt; and &lt;a href=&quot;https:&#x2F;&#x2F;www.thatdot.com&#x2F;products&#x2F;novelty&#x2F;&quot;&gt;Novelty Detector&lt;&#x2F;a&gt; work together to efficiently uncover the aspects of advanced persistent threat detection.&lt;&#x2F;p&gt;
&lt;p&gt;Quine’s graph data model uses &lt;a href=&quot;https:&#x2F;&#x2F;www.thatdot.com&#x2F;blog&#x2F;what-is-categorical-data&#x2F;&quot;&gt;categorical data&lt;&#x2F;a&gt; other systems ignore and excels at correlating individual events occurring in their billions&#x2F;trillions across devices, software and services over any time period to find the behavior patterns (Indicators of Behavior or IoBs) that represent malicious activity.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;&#x2F;img&#x2F;2024&#x2F;06&#x2F;image-1-convert.io_.webp&quot; alt=&quot;DARPA Logo for APT&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;When patterns are detected, Novelty Detector can then apply its categorical anomaly detection techniques to identify when a string of related actions represents a novel&#x2F;anomalous behavior, greatly reducing false positives.&lt;&#x2F;p&gt;
&lt;p&gt;Quine Enterprise provides commercial support and licensing for clustered Quine and Novelty Detector. You can add real-time behavior-based APT detection to your stack at scale and with confidence.&lt;&#x2F;p&gt;
&lt;p&gt;thatDot&#x27;s core technology underpinning Quine and Novelty Detector was developed in partnership with DARPA. Read more about thatDot&#x27;s origin and some examples of using Novelty Detector to detect &lt;a href=&quot;https:&#x2F;&#x2F;www.thatdot.com&#x2F;blog&#x2F;identifying-data-exfiltration-in-aws-cloudtrail-logs-using-categorical-anomaly-detection&#x2F;&quot;&gt;data exfiltration&lt;&#x2F;a&gt; and &lt;a href=&quot;https:&#x2F;&#x2F;www.thatdot.com&#x2F;blog&#x2F;identifying-stolen-credential-use-in-aws-cloudtrail-logs-with-high-confidence-using-categorical-anomaly-detection&#x2F;&quot;&gt;credential theft&lt;&#x2F;a&gt;.&lt;&#x2F;p&gt;
&lt;p&gt;‍&lt;&#x2F;p&gt;
&lt;h2 id=&quot;key-value-take-away&quot;&gt;Key Value Take Away&lt;&#x2F;h2&gt;
&lt;ul&gt;
&lt;li&gt;Quine + Novelty Detector detect both known and emerging behavioral patterns in a single workflow.&lt;&#x2F;li&gt;
&lt;li&gt;Joins multiple data sets to enable real-time identification of attack behaviors across domains&lt;&#x2F;li&gt;
&lt;li&gt;Identify behaviors over extended time periods using incremental streaming analysis (not batch)&lt;&#x2F;li&gt;
&lt;li&gt;Native support for &lt;a href=&quot;https:&#x2F;&#x2F;www.thatdot.com&#x2F;blog&#x2F;what-is-categorical-data&#x2F;&quot;&gt;categorical data&lt;&#x2F;a&gt; simplifies operations and provides human-readable alerts for analysts&lt;&#x2F;li&gt;
&lt;li&gt;STIX Compliant, real-time detection of Indicators of Behavior (IoBs) and generation of STIX message events&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>Real-Time IoB Threat Hunting</title>
        <published>2024-06-14T00:00:00+00:00</published>
        <updated>2024-06-14T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              Unknown
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://www.thatdot.com/use-cases/iob-real-time-threat-hunting/"/>
        <id>https://www.thatdot.com/use-cases/iob-real-time-threat-hunting/</id>
        
        <content type="html" xml:base="https://www.thatdot.com/use-cases/iob-real-time-threat-hunting/">&lt;h2 id=&quot;the-problem&quot;&gt;The Problem&lt;&#x2F;h2&gt;
&lt;p&gt;Modern threat detection requires data – lots of data – typically from multiple sources. This brings with it a number of interesting data engineering challenges, especially when we want to materialize that data into a single view and execute analysis in a timely and cost-effective manner. Finding indicators of behavior (IoBs) in real time amplifies already significant challenges: processing enough of the right kind of data from multiple sources in a timely fashion is beyond the capability of most systems.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-solution&quot;&gt;The Solution&lt;&#x2F;h2&gt;
&lt;p&gt;Quine + Novelty Detector together cover all aspects of real-time, automated, behavior-based threat hunting: Quine is used to detect known patterns (STIX) and emit scripted playbook responses (CACAO), while Novelty Detector uses patented categorical anomaly detection techniques to identify emerging threat patterns that are eventually fed back into Quine as new IoB patterns.&lt;&#x2F;p&gt;
&lt;p&gt;Quine Enterprise provides commercial support and licensing for both clustered Quine and Novelty Detector, meaning you can easily add real-time, behavior-based threat hunting to your stack easily.&lt;&#x2F;p&gt;
&lt;p&gt;‍&lt;&#x2F;p&gt;
&lt;h2 id=&quot;key-value-take-away&quot;&gt;Key Value Take Away&lt;&#x2F;h2&gt;
&lt;ul&gt;
&lt;li&gt;Quine + Novelty Detector detect both known and emerging behavioral patterns in a single workflow.&lt;&#x2F;li&gt;
&lt;li&gt;STIX Compliant, real-time detection of Indicators of Behavior (IoBs) and generation of STIX message events&lt;&#x2F;li&gt;
&lt;li&gt;Joins multiple data sets to enable real-time identification of IoBs across domains&lt;&#x2F;li&gt;
&lt;li&gt;Automate STIX indicator additions via API, as well as CACAO Playbook event triggers for remediation&lt;&#x2F;li&gt;
&lt;li&gt;Native support for categorical data simplifies operations and provides human-readable alerts for analysts&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>Real-time AWS CloudTrail Threat Detection</title>
        <published>2024-06-14T00:00:00+00:00</published>
        <updated>2024-06-14T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              Unknown
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://www.thatdot.com/use-cases/real-time-aws-cloudtrail-threat-detection/"/>
        <id>https://www.thatdot.com/use-cases/real-time-aws-cloudtrail-threat-detection/</id>
        
        <content type="html" xml:base="https://www.thatdot.com/use-cases/real-time-aws-cloudtrail-threat-detection/">&lt;h2 id=&quot;the-problem&quot;&gt;The Problem&lt;&#x2F;h2&gt;
&lt;p&gt;AWS CloudTrail logs are full of untapped information that can help reduce risk and improve event response times, especially when analyzed in context and in real time. A thatDot cyber security customer seeking to expand their offerings to include threat detection monitoring of AWS CloudTrail logs faced three challenges. They needed to:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;Reliably identify hard-to-detect insider and external threats using Indicators of Behavior (IoB) analysis&lt;&#x2F;li&gt;
&lt;li&gt;Generate highly informative alerts that low-tech customers could understand and act on&lt;&#x2F;li&gt;
&lt;li&gt;Shorten development cycles on new products&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;Typical use cases for their new product would include identifying both existing employees misusing credentials to access restricted resources and outsiders using valid but compromised credentials. This combines two of the toughest cyber-security challenges in the industry.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-solution&quot;&gt;The Solution&lt;&#x2F;h2&gt;
&lt;h3 id=&quot;finding-new-emerging-threat-behaviors-in-real-time-as-attacks-are-happening&quot;&gt;Finding New Emerging Threat Behaviors, In Real-time (as attacks are happening)&lt;&#x2F;h3&gt;
&lt;p&gt;The team at thatDot solved the client&#x27;s threat detection problem with the first modern threat-hunting stack to combine real-time identification of unknown or emerging threats. Using both Novelty Detector and an event processing system that can instantly identify known patterns and act on them (Quine Enterprise).\&lt;&#x2F;p&gt;
&lt;p&gt;Novelty Detector is a new graph AI technique built on the Quine streaming graph that uses categorical data from events (e.g. IP addresses, file names, file paths, API call types) in order to  understand the context within which user and system actions take place. This rich context is used to evaluate behaviors in order to identify novel  behaviors in real time, with a notably low incidence of false positives.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;uploads-ssl.webflow.com&#x2F;61f0aecf55af2565526f6a95&#x2F;641cd3f62f96d38aba5766c0_ND%20Screen.png&quot; alt=&quot;&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;Novelty Detector results displayed as a graph, making them easy to understand and act on. (From &lt;a href=&quot;https:&#x2F;&#x2F;www.thatdot.com&#x2F;blog&#x2F;stop-insider-threats-with-automated-behavioral-anomaly-detection&#x2F;&quot;&gt;VAST use case.&lt;&#x2F;a&gt;)&lt;&#x2F;p&gt;
&lt;p&gt;‍&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;uploads-ssl.webflow.com&#x2F;61f0aecf55af2565526f6a95&#x2F;6401235ff583820243c5bd79_625859eeb21445254c569248_EliminateFalsePositives-1024x408.png&quot; alt=&quot;&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;Novelty Detector separates truly novel events from those that are unique but not a threat.&lt;&#x2F;p&gt;
&lt;p&gt;‍&lt;&#x2F;p&gt;
&lt;p&gt;When it comes to instantly identifying and acting on known threats, including ones previously detected by Novelty Detector and classified, the client used Quine streaming graph. They used standing queries to monitor for patterns of behavior in the graph indicative of malicious behavior. And because Quine is not limited by time windows, they were able to build a threat detection system that monitored for a broader range of threat behaviors than traditional complex event processing systems and XDRs allow.&lt;&#x2F;p&gt;
&lt;p&gt;Quine is ideal for SaaS businesses. Quine Enterprise can ingest millions of events&#x2F;second from multiple streams, combine them into a single graph view, detect patterns for known threat indicators, and act instantly to emit contextually rich alerts.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;human-readable-results&quot;&gt;Human-Readable Results&lt;&#x2F;h3&gt;
&lt;p&gt;Both Quine and Novelty Detector are based on the same knowledge graph technologies that makes use of categorical data. This means the data structures they create and output -- node objects, their properties, and the relationships between those objects -- are expressed in a familiar human-readable format (&lt;em&gt;subject, predicate, object&lt;&#x2F;em&gt;). This means results are easy to understand and immediately contextualized.&lt;&#x2F;p&gt;
&lt;p&gt;Knowing who did what when, whether or not they had the privileges to do so, how long they had those privileges, and similar contextual information -- all quite easy to generate with Quine and Novelty Detector -- means SOC&#x2F;NOC analysts don&#x27;t need to spend exorbitant amounts of time researching alerts.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;fast-time-to-market&quot;&gt;Fast Time To Market&lt;&#x2F;h3&gt;
&lt;p&gt;Quine Enterprise with Novelty Detector made development fast and straightforward. With both unknown and known threats covered, the client was able to quickly launch a threat detection product to round out their growing portfolio of cyber security products.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;key-value-take-away&quot;&gt;Key Value Take Away&lt;&#x2F;h2&gt;
&lt;ul&gt;
&lt;li&gt;Fewer false positives using shallow learning method that processes categorical data.&lt;&#x2F;li&gt;
&lt;li&gt;Profiles behavior (IoBs) instead of finding indicators of compromise (IoCs).&lt;&#x2F;li&gt;
&lt;li&gt;Contextually rich alerts in a human-friendly form make it easier for analysts to research and resolve.&lt;&#x2F;li&gt;
&lt;li&gt;Real-time processing of data means none of the delays of batch processing.&lt;&#x2F;li&gt;
&lt;li&gt;Scales to millions of events per second, making it suitable for fast-growing SaaS providers.&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>Webinar: Approach Zero False Positive Cyber Alerts</title>
        <published>2024-06-10T00:00:00+00:00</published>
        <updated>2024-06-10T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              Unknown
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://www.thatdot.com/events/webinar-approach-zero-false-positive-cyber-alerts/"/>
        <id>https://www.thatdot.com/events/webinar-approach-zero-false-positive-cyber-alerts/</id>
        
        <content type="html" xml:base="https://www.thatdot.com/events/webinar-approach-zero-false-positive-cyber-alerts/">&lt;p&gt;We would like to invite you to an exclusive webinar featuring thatDot’s CEO, Ryan Wright, and The Bloor Group CEO, Eric Kavanagh, along with Top 5 Global Cybersecurity Thought Leader, Mark Lynd, as they discuss &quot;The Unreasonable Effectiveness of Streaming Graph.&quot; This insightful discussion is a must-attend for anyone serious about cybersecurity, threat detection, and deep, real-time analytics.&lt;&#x2F;p&gt;
&lt;p&gt;Event Details:&lt;&#x2F;p&gt;
&lt;p&gt;Title: The Unreasonable Effectiveness of Streaming Graph&lt;&#x2F;p&gt;
&lt;p&gt;Date: June 11, 2024&lt;&#x2F;p&gt;
&lt;p&gt;Time: 12 pm ET&lt;&#x2F;p&gt;
&lt;p&gt;&lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;us02web.zoom.us&#x2F;webinar&#x2F;register&#x2F;5917162475186&#x2F;WN_2PdI66VJSRereOVZ6bcUTQ#&#x2F;registration&quot;&gt;Register for Webinar&lt;&#x2F;a&gt;&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;&#x2F;img&#x2F;2024&#x2F;06&#x2F;june-11-12-noon-et-1.png&quot; alt=&quot;&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;h2 id=&quot;why-you-should-attend&quot;&gt;Why You Should Attend&lt;&#x2F;h2&gt;
&lt;p&gt;Cybersecurity experts face immense challenges with frequent data breaches dominating the headlines. Traditional anomaly detectors often fall short in identifying and neutralizing threats in real-time. They require constant human tweaking of threat signatures and sensitivity levels to avoid exhausting professionals with mountains of false positive alerts. What if you could build contextual awareness into the application?&lt;&#x2F;p&gt;
&lt;p&gt;Join us to discover how thatDot Novelty, powered by the open-source technology Quine, is revolutionizing real-time threat detection and response. This cutting-edge technology combines event stream processing speed with a built-in AI that learns the contextual fingerprint of your data environment, and pinpoints problems automatically. Developed in a DARPA project, thatDot Novelty and thatDot Streaming Graph provide unparalleled capabilities in:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;Advanced Persistent Threat Detection&lt;&#x2F;li&gt;
&lt;li&gt;Insider Threat Detection&lt;&#x2F;li&gt;
&lt;li&gt;Attack Graph Analysis&lt;&#x2F;li&gt;
&lt;li&gt;Digital Twins&lt;&#x2F;li&gt;
&lt;li&gt;And many more critical use cases&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;Don’t miss this opportunity to learn from industry experts and gain a competitive edge in cybersecurity. Secure your spot today and be part of the future of threat detection and data analytics.&lt;&#x2F;p&gt;
&lt;p&gt;We look forward to your participation.&lt;&#x2F;p&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>4 Advantages to Streaming Analytics in Graph Form</title>
        <published>2024-06-06T00:00:00+00:00</published>
        <updated>2024-06-06T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              Unknown
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://www.thatdot.com/blog/4-advantages-to-streaming-analytics-in-graph-form/"/>
        <id>https://www.thatdot.com/blog/4-advantages-to-streaming-analytics-in-graph-form/</id>
        
        <content type="html" xml:base="https://www.thatdot.com/blog/4-advantages-to-streaming-analytics-in-graph-form/">&lt;p&gt;Existing software has forced people to choose between asking deep questions or getting their answers fast. Graph databases can do some pretty amazing things, but they’re not known for their analytical speed. If you want powerful fraud detection or cybersecurity threat detection, graph data analysis is a good choice. But if you need it fast, to turn that into fraud prevention and cybersecurity threat protection, graph databases are not a good choice. Streaming analytics makes sense to get the immediate level of speed that you need. But most event stream processing frameworks analyze data as if it were a standard row and column coming in one message&#x2F;row at a time. That means they’re not ideal for things like fraud detection and cybersecurity that require complex relationship, pattern, and anomaly type analysis.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;lh7-us.googleusercontent.com&#x2F;docsz&#x2F;AD_4nXdUkazHHJHjj5vrjSPrmdzWiY0VnmNqhW5hUDHymrmDQdKjweBcE_QFXq2J228PT-8NiEq8uinR0dSkcYQjEswiu-A8EoNe3ahGu34GqJwnDe4zJeEkbqu6JcEsePaQNlzTf9g2wt32VBT0K2le-zxqxxAf?key=43XuY4Zt-GQbPU1bWWdqIg&quot; alt=&quot;Example in cybersecurity of being forced to choose between deep slow graph database analytics and fast shallow streaming analytics when what you need is fast deep analytics.&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;&lt;em&gt;Example in cybersecurity of being forced to choose between deep slow graph database analytics and fast shallow streaming analytics when what you need is fast deep analytics.&lt;&#x2F;em&gt;&lt;&#x2F;p&gt;
&lt;p&gt;You could do streaming analytics or you could do graph analytics, but not both. What you need to solve a lot of tough problems is event stream processing that sees the flowing data as a graph. That’s what thatDot does. It’s unique in the industry as far as I know. Being a pioneer can be a major problem because folks don’t know where you fit or what to compare you to. In particular, since there aren’t other options out there, why is this something you might need? Previous technology is like the guy who lost something, and looks where there’s adequate light, even though he knows it wasn’t lost there. It doesn’t really work, but making do with what you have is the only option.&lt;&#x2F;p&gt;
&lt;p&gt;Event stream processing in graph form shines the light where you need it, in fast deep streaming analytics. Here are four advantages of doing graph analysis in an event stream that come immediately to mind:&lt;&#x2F;p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Shift analysis left for smart filtering.&lt;&#x2F;strong&gt; Because the data can be transformed and analyzed while it’s still flowing, you can intelligently filter out data you don’t want, even resolve duplicates from multiple data streams before dropping the masses of data into expensive databases. In the case of IoT, it can shift left all the way to the edge, and only push sensor notifications that matter. Instead of 1000 useless identical readings and three different readings, only the data that matter are sent on for analysis or action.&lt;&#x2F;p&gt;
&lt;&#x2F;li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Ask deep relationship questions on the fly.&lt;&#x2F;strong&gt; The questions most event processors can ask do not dig deep into relationships and patterns simply because that’s not how they look at data. Finding the entities, properties, and analyzing the relationships to find important patterns on the fly is what streaming graph excels at.&lt;&#x2F;p&gt;
&lt;&#x2F;li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Analyze categorical data without turning it into numeric data first.&lt;&#x2F;strong&gt; A lot of important data is categorical, such as IP addresses, names, and location information. State of the art without graph streaming analytics is to first convert that data into wide, sparse numerical information, then do analysis on that bloated numeric data, then turn it back into categorical data to return an answer. (All our tools only work on numbers, so let’s only look there.) thatDot Streaming Graph lets you analyze categorical data as categorical data, immediately. No delay while you land and fiddle with the data so your tools can analyze it.&lt;&#x2F;p&gt;
&lt;&#x2F;li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Reduce mean time to value (MTTV)&lt;&#x2F;strong&gt;. Get actionable answers immediately, sub-milliseconds. Landing the data in a graph database, cleaning and preparing the data, then finally doing some analyses and visualizing it takes time.&lt;&#x2F;p&gt;
&lt;&#x2F;li&gt;
&lt;&#x2F;ol&gt;
&lt;p&gt;&lt;strong&gt;Doing graph analysis right in your event stream reduces the time to get answers from hours to seconds, or from seconds to milliseconds.&lt;&#x2F;strong&gt; This can mean the difference between fraud detection and fraud prevention, between finding out you were breached and catching a cybercriminal in the act. To learn more, check out &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;www.thatdot.com&#x2F;&quot;&gt;&lt;strong&gt;https:&#x2F;&#x2F;www.thatdot.com&#x2F;&lt;&#x2F;strong&gt;&lt;&#x2F;a&gt;. You’d be surprised at what is possible now.&lt;&#x2F;p&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>The Future of Modern Threat Hunting is Streaming Graph</title>
        <published>2023-11-30T00:00:00+00:00</published>
        <updated>2023-11-30T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              Unknown
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://www.thatdot.com/blog/the-future-of-modern-threat-hunting-is-streaming-graph/"/>
        <id>https://www.thatdot.com/blog/the-future-of-modern-threat-hunting-is-streaming-graph/</id>
        
        <content type="html" xml:base="https://www.thatdot.com/blog/the-future-of-modern-threat-hunting-is-streaming-graph/">&lt;h2 id=&quot;towards-a-new-model-of-threat-hunting&quot;&gt;Towards a new model of threat hunting&lt;&#x2F;h2&gt;
&lt;p&gt;The continuous expansion of threat vectors and attack techniques requires a modern threat hunting architecture capable of large scale operations, real-time deep&#x2F;complex event processing to identify Indicators of Behavior (IoB), and programmable automation to best leverage scarce SOC expertise. Central to the evolution from after-the-fact Indicators of Compromise (IoCs) to IoBs is the need to embrace an event driven architecture.&lt;&#x2F;p&gt;
&lt;p&gt;Many industry initiatives aim to codify the intersection points between data sources, analysis systems, and remediation solutions. These efforts are centered around two characteristics that align with thatDot software in significant ways.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;A focus on behavior analysis&lt;&#x2F;strong&gt; - The evolution from the use of Indicators of Compromise (IoC) to Indicators of Behavior (IoB) has been driven by the desire to evolve from seeking static definitions of a completed attack (file# or an IP), to an understanding of how an attack happens. This change in perspective creates the opportunity to find attacks earlier, and with more flexibility.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Use of graph data modeling&lt;&#x2F;strong&gt; - Representing behavior and relationships is a natural fit for graph data modeling techniques. Graph data structures are terrific at expressing the  relationships between entities which simplifies analysis and infrastructure, so much so that STIX Indicators and the Kestrel protocol assumes the use of graph systems for their operation.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;uploads-ssl.webflow.com&#x2F;61f0aecf55af2565526f6a95&#x2F;63869f2305932a1712aa6e42_-qh2uW4zYe4gjIPg81vEaC-DbigrEYztP2TY2gg01FNLf3if9fFGxhTyhucRk1rAxW1p6DDEPXiW5t0P6ye5SRdqBc9-t3Kp750dtqrF4S6X0xWWz3O4u6vM-TxQCeg1DqP2ubuR3R6H3rItf-hcl3OqF7Dkniw3pmgZOXQ1-9siCAbhxPv9bl7vv4eGtw.png&quot; alt=&quot;An image showing an example of a STIX 2 graph that indicates he relationships between vulnerabilities, threat actors, and indicators relative to a campaign.&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;Image source: &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;oasis-open.github.io&#x2F;cti-documentation&#x2F;stix&#x2F;intro#a-look-at-the-structure&quot;&gt;available here&lt;&#x2F;a&gt;.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;new-standards-reduce-friction&quot;&gt;New Standards Reduce Friction&lt;&#x2F;h2&gt;
&lt;p&gt;The cybersecurity industry is active on many fronts defining standards to smooth the frictions that exist between data sources, analysis engines, SIEMs, and automated response systems. A number of these standard include:&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;STIX™ Indicators -&lt;&#x2F;strong&gt; Indicators convey specific observable patterns combined with contextual information intended to represent artifacts and&#x2F;or behaviors of interest within a cyber security context. [&lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;oasis-open.github.io&#x2F;cti-documentation&#x2F;stix&#x2F;intro&quot;&gt;Read more here.&lt;&#x2F;a&gt;]&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Kestrel -&lt;&#x2F;strong&gt; Kestrel threat hunting language provides an abstraction for threat hunters to focus on the high-value and composable threat hypothesis development instead of specific realization of hypothesis testing with heterogeneous data sources, threat intelligence, and public or proprietary analytics. [&lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;kestrel.readthedocs.io&#x2F;en&#x2F;stable&#x2F;&quot;&gt;Read more here.&lt;&#x2F;a&gt;]&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;CACAO&lt;&#x2F;strong&gt; - defines the schema and taxonomy for collaborative automated course of action operations (CACAO) security playbooks and how these playbooks can be created, documented, and shared in a structured and standardized way across organizational boundaries and technological solutions. [&lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;www.oasis-open.org&#x2F;committees&#x2F;tc_home.php?wg_abbrev=cacao&quot;&gt;Read more here.&lt;&#x2F;a&gt;]&lt;&#x2F;p&gt;
&lt;p&gt;These standards fit well with thatDot’s approach to a modern threat hunting stack, one powered by thatDot’s Quine streaming graph to detect and instantly alert on known patterns and that uses thatDot Novelty Detector to identify new emerging threat behaviors in real time.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;highly-scalable-iob-pattern-recognition&quot;&gt;Highly Scalable IoB Pattern Recognition&lt;&#x2F;h2&gt;
&lt;p&gt;The evolution from a reactive IoC threat hunting model to a real-time IoB-based approach requires a new set of technical capabilities along with the tools to deliver them. Fortunately, the advent of IoB threat hunting, new standards, and ground-breaking streaming graph technology are all emerging to meet the need.&lt;&#x2F;p&gt;
&lt;p&gt;As shown below, thatDot’s open source Quine streaming graph perfectly aligns with the requirement to ingest multiple data streams and natively process graph data model encoded IoBs to then generate events that invoke predefined remediation actions. The work flow looks as follow:&lt;&#x2F;p&gt;
&lt;ol&gt;
&lt;li&gt;Event sources are ingested from any common event stream queue, including Apache Kafka, AWS Kinesis, AWS SQS, or Apache Pulsar&#x2F;DataStax Astra Streaming.&lt;&#x2F;li&gt;
&lt;li&gt;STIX-defined IoBs are loaded into Quine using Kestrel graph objects via API, or entered manually, as Quine &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;docs.quine.io&#x2F;components&#x2F;writing-standing-queries.html&quot;&gt;standing queries&lt;&#x2F;a&gt;.&lt;&#x2F;li&gt;
&lt;li&gt;Quine standing queries continuously analyze newly arriving events for matches against IoB pattern definitions. Partial matches are identified and stored for any desired period of time to accommodate threat behaviors that occur incrementally over longer time frames.&lt;&#x2F;li&gt;
&lt;li&gt;Upon a full IoB pattern match, Quine generates a new event that is associated with a pre-defined CACAO Playbook action, for use by SOAR or analysts.&lt;&#x2F;li&gt;
&lt;&#x2F;ol&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;uploads-ssl.webflow.com&#x2F;61f0aecf55af2565526f6a95&#x2F;6386a168d6797e41b152d4e0_Quine_streaming_graph_for_IoB_detection.png&quot; alt=&quot;A flow diagram showing envents ingested into Quine, which is using STIX IoB definitions to detect known attack vectors.&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-problems-quine-solves&quot;&gt;The Problems Quine Solves&lt;&#x2F;h2&gt;
&lt;p&gt;Quine solves some hard problems in this role. Let’s take a look at a few of the major points:&lt;&#x2F;p&gt;
&lt;h3 id=&quot;multiple-event-sources&quot;&gt;Multiple Event Sources&lt;&#x2F;h3&gt;
&lt;p&gt;Modern threat detection requires data – lots of data – usually from multiple sources. This brings with it a number of interesting data engineering challenges, especially when we want to materialize that data into a single view and execute analysis in a timely and cost-effective manner.&lt;&#x2F;p&gt;
&lt;p&gt;Combining threat Intelligence, EDR, XDR, and Cloud logs are increasingly common requirements for building a baseline of behavior models against which real-time data is assessed for known and new threats. thatDot’s Quine streaming graph is a new and powerful software tool for resolving many of the data engineering challenges associated with handling volumes of data from multiple sources.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Scale For Costs&lt;&#x2F;strong&gt; - Scale graph event processing from 1,000s to 1,000,000s of events per second on commodity cloud VMs, more efficiently than nested joins.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Out-of-Order Data Arrival&lt;&#x2F;strong&gt; - Quine standing queries evaluate each newly arriving event as it arrives and stores partial results until completion data arrives.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Entity Resolution&lt;&#x2F;strong&gt; - Graph data models are known for leveraging the additional context gained by understanding the relationships between event datum.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;finding-threat-behaviors&quot;&gt;Finding Threat Behaviors&lt;&#x2F;h3&gt;
&lt;p&gt;IoBs are patterns of behavior expressed as actions taken by users or systems. Identifying the end to end pattern of an IoB across events generated by disparate systems is a perfect alignment with the Quine graph data model.&lt;&#x2F;p&gt;
&lt;p&gt;Quine evaluates every single newly arriving event for partial or full match against defined IoB patterns. This incremental approach to evaluating data is paired with a highly efficient mechanism for persisting partial matches. The result is a threat detection solution that tracks millions or billions of suspect actions until there is a complete pattern match, at which point an event is generated to serve as an alert or to trigger an automated workflow.&lt;&#x2F;p&gt;
&lt;p&gt;Incremental Evaluation Of Events For IoB Patterns Across Event Sources&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;uploads-ssl.webflow.com&#x2F;61f0aecf55af2565526f6a95&#x2F;63869f23ab1fa05dfe26e68f_XR1UlObRghyzxfJCyMAOM0_9ImbxHWiQmEFoUD2MRePu5iat8jWsZORmZ8LXQhUTlivHmxTW1ppLQDv0Kuyvg7_dX6UUBqAldfotLA1eJNgAajBSeFMPY9bRfXlnzUQk9I68JiF_Q4u_fZfPElQSPwvstoD31SjPf6zqbbXoUfegA8LSu9LGMtNVGRSwAw.png&quot; alt=&quot;Diagram showing multiple streams flowing into Quine via ingest queries. Quine populates the graph and waits for late arriving data, which then triggers a standing query. Results are emitted by Quine when a standing query match is made.&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;Image source: &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;assets.website-files.com&#x2F;61f0aecf55af2560f76f6a75&#x2F;620fd58ba117ef2365c2ab07_Quine_StreamingGraph_WP1.1.pdf&quot;&gt;Quine Streaming Graph White Paper&lt;&#x2F;a&gt; (PDF)&lt;&#x2F;p&gt;
&lt;h3 id=&quot;automated-responses&quot;&gt;Automated Responses&lt;&#x2F;h3&gt;
&lt;p&gt;CACAO provides a graph-based data model. As such, CACAO implementations should implement protections against graph queries that can potentially consume a significant amount of resources and prevent the implementation from functioning in a normal way.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;identifying-novel-new-behaviors&quot;&gt;Identifying Novel New Behaviors&lt;&#x2F;h2&gt;
&lt;p&gt;Of course, the most difficult part of threat hunting is identifying new threat vectors as near to the time when they first appear as possible. This is especially difficult since attackers are intentionally working to obscure their illicit behavior in large volumes of events. Systemic approaches that use traditional anomaly detection approaches have largely failed to detect sophisticated attacks without also identifying a significant number of false positives, forcing reliance upon manual human evaluations based on intuition and increasingly scarce security expertise.&lt;&#x2F;p&gt;
&lt;p&gt;thatDot Novelty Detector brings a fresh approach to the problem of detecting illicit behavior. Novelty Detector is a new graph AI technique built on the Quine streaming graph. As such, Novelty Detector natively uses categorical data in events, such as IP addresses, file names, file paths, API call types etc. to fully understand the context of user and system actions. This rich context is used to evaluate behaviors via Information Theory analysis to identify novel new behaviors in real-time, with incredibly low incidence of false positives.&lt;&#x2F;p&gt;
&lt;p&gt;Once a new novel behavior is evaluated, it can then be encoded as a new IoB and fed into an operating Quine streaming graph system for immediate use on newly arriving data, or applied to previous data if desired.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;uploads-ssl.webflow.com&#x2F;61f0aecf55af2565526f6a95&#x2F;63879c28bce3ce07d0af98bc_2022-11%20thatDot%20Modern%20Threat%20Hunting.png&quot; alt=&quot;A flow diagram showing envents ingested into Quine, which is using STIX IoB definitions to detect known attack vectors, then passed into Novelty Detector to find new, unknown vectors, which are fed back into STIX IoBs&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;Separately, Quine streaming graph and Novelty Detector software offer unique capabilities for organizations and service providers: real-time processing of categorical data to find known IoB patterns (Quine) and emerging new threat patterns (Novelty Detector).&lt;&#x2F;p&gt;
&lt;p&gt;When combined as a single platform that uses industry standards for IoB definitions and intersystem communications, the result is a comprehensive modern threat hunting and remediation stack.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;thatdot-streaming-graph-delivers-scalable-threat-hunting&quot;&gt;thatDot Streaming Graph Delivers Scalable Threat Hunting&lt;&#x2F;h2&gt;
&lt;p&gt;Quine is available in both open source and enterprise (thatDot Streaming Graph) editions. However, Novelty Detector is available either in the AWS marketplace or under license as part of &lt;a href=&quot;https:&#x2F;&#x2F;www.thatdot.com&#x2F;products&#x2F;quine&#x2F;&quot;&gt;thatDot Streaming Graph&lt;&#x2F;a&gt;.&lt;&#x2F;p&gt;
&lt;p&gt;Streaming Graph offers large organizations and managed security service providers (MSSPs) both the clustered, resilient version of Quine and Novelty Detector. It is meant for production applications where resilience, query performance, and throughput matter. Resilient clustering includes support for hot spares and distribution across multiple availability zones.&lt;&#x2F;p&gt;
&lt;p&gt;We recently shared reproducible tests demonstrating both scale (thatDot Streaming Graph easily processed one million 4-node graph events&#x2F;second) and resilience in the face of node failure. You can &lt;a href=&quot;https:&#x2F;&#x2F;www.thatdot.com&#x2F;blog&#x2F;scaling-quine-streaming-graph-to-process-1-million-events-sec&#x2F;&quot;&gt;read about the tests here&lt;&#x2F;a&gt;.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;try-it-yourself&quot;&gt;Try It Yourself&lt;&#x2F;h2&gt;
&lt;p&gt;If you want to try it on your own, here are some resources to help:&lt;&#x2F;p&gt;
&lt;ol&gt;
&lt;li&gt;Download Quine - &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;quine.io&#x2F;download&quot;&gt;JAR file&lt;&#x2F;a&gt; | &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;hub.docker.com&#x2F;r&#x2F;thatdot&#x2F;quine&quot;&gt;Docker Image&lt;&#x2F;a&gt; | &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;thatdot&#x2F;quine&quot;&gt;Github&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;li&gt;Check out the Ingest Data into Quine blog series covering everything from &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;docs.quine.io&#x2F;components&#x2F;ingest-sources&#x2F;kafka.html&quot;&gt;ingest from Kafka&lt;&#x2F;a&gt; to ingesting .CSV data&lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;quine.io&#x2F;recipes&#x2F;cdn-cache-efficiency-by-segment&quot;&gt;‍&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;li&gt;Password Spraying Attack Detection - this recipe provides an example of detecting brute force attack patterns in authentication logs&lt;&#x2F;li&gt;
&lt;&#x2F;ol&gt;
&lt;p&gt;Header image adapted from photo by &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;unsplash.com&#x2F;@lianhao?utm_source=unsplash&amp;amp;utm_medium=referral&amp;amp;utm_content=creditCopyText&quot;&gt;Lianhao Qu&lt;&#x2F;a&gt; on &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;unsplash.com&#x2F;s&#x2F;photos&#x2F;surveillance?utm_source=unsplash&amp;amp;utm_medium=referral&amp;amp;utm_content=creditCopyText&quot;&gt;Unsplash.&lt;&#x2F;a&gt;&lt;&#x2F;p&gt;
&lt;p&gt;‍&lt;&#x2F;p&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>Monitoring Quine Streaming Graph using Grafana + InfluxDB</title>
        <published>2023-06-06T00:00:00+00:00</published>
        <updated>2023-06-06T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              Unknown
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://www.thatdot.com/blog/monitoring-quine-streaming-graph-using-grafana-influxdb/"/>
        <id>https://www.thatdot.com/blog/monitoring-quine-streaming-graph-using-grafana-influxdb/</id>
        
        <content type="html" xml:base="https://www.thatdot.com/blog/monitoring-quine-streaming-graph-using-grafana-influxdb/">&lt;h2 id=&quot;monitoring-data-in-motion&quot;&gt;Monitoring Data in Motion&lt;&#x2F;h2&gt;
&lt;p&gt;There has been a significant increase in the popularity of event streaming and stream processing applications&#x2F;technologies within the data engineering community. With the accelerating growth of big data, IoT, and cloud computing, more organizations are facing the challenge of extracting actionable insights earlier in the event pipeline. For historical reasons, operational tools for monitoring, alerting, and diagnosing system issues are oriented toward data at rest. That doesn&#x27;t mean they can&#x27;t be just as useful for monitoring data in motion. It just means adjusting your monitoring regime to a streaming mindset.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;uploads-ssl.webflow.com&#x2F;61f0aecf55af2565526f6a95&#x2F;647a4b1958502a1a736ef0c6_Blueprint-2_-Multimodal-Data-Processing-1.png&quot; alt=&quot;&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;From &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;a16z.com&#x2F;2020&#x2F;10&#x2F;15&#x2F;emerging-architectures-for-modern-data-infrastructure&#x2F;#section--15&quot;&gt;Emerging Architectures for Modern Data Infrastructure - Andreessen Horowitz&lt;&#x2F;a&gt;&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;NOTE&lt;&#x2F;strong&gt;&lt;&#x2F;p&gt;
&lt;p&gt;Darker boxes indicate new or meaningfully changed categories since v1 of the architecture in 2020; lighter colored boxes indicate categories that have largely remained unchanged. Gray boxes are considered less relevant to this blueprint.&lt;&#x2F;p&gt;
&lt;p&gt;A good example of a next-gen streaming infrastructure element is Quine. Quine is an event streaming technology designed to process graph-shaped event streams and produce high-value events in real time.&lt;&#x2F;p&gt;
&lt;p&gt;In this blog post, we&#x27;ll guide you through setting up Grafana backed by InfluxDB to monitor a Quine instance. We&#x27;ll show you how to configure Quine to send data to InfluxDB, create a dashboard in Grafana to visualize this data, and use Grafana&#x27;s powerful features to detect issues and anomalies in real time. By the end of this post, you&#x27;ll have a solid understanding of how to monitor event stream pipelines using Grafana and InfluxDB, and you&#x27;ll be equipped with the tools and knowledge needed to keep Quine running smoothly.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;setting-up-grafana-and-influxdb&quot;&gt;Setting up Grafana and InfluxDB&lt;&#x2F;h2&gt;
&lt;p&gt;&lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;grafana.com&quot;&gt;Grafana&lt;&#x2F;a&gt; is a tool that helps you visualize and understand operational metrics data. It lets you create visual dashboards to monitor and analyze data from sources across your data infrastructure. DevOps teams use Grafana metrics dashboards to make informed decisions.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;uploads-ssl.webflow.com&#x2F;61f0aecf55af2565526f6a95&#x2F;647a4e919822bf9fe29b11ad_Quine%2BGrafana.png&quot; alt=&quot;&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;The observability subsystem for Quine is build for Grafana integration.&lt;&#x2F;p&gt;
&lt;p&gt;Above is an example of my typical development and testing environment when working on a &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;quine.io&#x2F;recipes&quot;&gt;recipe&lt;&#x2F;a&gt;. The event sources and output sinks change depending on the scenario, but most of the time, I run Quine on my local host, configured to push metrics to InfluxDB and visualize the observations in Grafana. Using Docker containers makes it easy to configure and clean up my environment quickly.&lt;&#x2F;p&gt;
&lt;p&gt;We need to do a little pre-work before launching the Docker containers. This is how I set up my environment using &lt;code&gt;docker-compose&lt;&#x2F;code&gt;. You may do things differently based on how Docker is installed on your host.&lt;&#x2F;p&gt;
&lt;p&gt;I like to keep &lt;code&gt;docker-compose.yaml&lt;&#x2F;code&gt; files arranged inside their directories in a &lt;code&gt;docker&lt;&#x2F;code&gt; directory that lives in &lt;code&gt;$HOME&lt;&#x2F;code&gt;. This helps me keep things organized and makes sharing configs between my MacOS laptop and Ubuntu servers easy.&lt;&#x2F;p&gt;
&lt;p&gt;I created a &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;quine-recipe-public.s3.us-west-2.amazonaws.com&#x2F;quine-grafana-docker.zip&quot;&gt;zip file&lt;&#x2F;a&gt; of my config to download and use with the blog post.&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;cd $HOME&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;wget https:&#x2F;&#x2F;quine-recipe-public.s3.us-west-2.amazonaws.com&#x2F;quine-grafana-docker.zip&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;unzip quine-grafana-docker.zip&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;Archive:  quine-docker.zip&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  inflating: docker&#x2F;cassandra&#x2F;docker-compose.yaml&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  inflating: docker&#x2F;grafana&#x2F;docker-compose.yaml&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;   creating: docker&#x2F;grafana&#x2F;grafana-provisioning&#x2F;&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;   creating: docker&#x2F;grafana&#x2F;grafana-provisioning&#x2F;datasources&#x2F;&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  inflating: docker&#x2F;grafana&#x2F;grafana-provisioning&#x2F;datasources&#x2F;datasource.yml&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;   creating: docker&#x2F;grafana&#x2F;grafana-provisioning&#x2F;dashboards&#x2F;&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  inflating: docker&#x2F;grafana&#x2F;grafana-provisioning&#x2F;dashboards&#x2F;quine.json&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  inflating: docker&#x2F;grafana&#x2F;grafana-provisioning&#x2F;dashboards&#x2F;dashboard.yaml&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;&lt;strong&gt;NOTE&lt;&#x2F;strong&gt;&lt;&#x2F;p&gt;
&lt;p&gt;I included a &lt;code&gt;docker-compose&lt;&#x2F;code&gt; file for Cassandra in the zip archive. I won&#x27;t cover the Cassandra config in this article. The file is included as a reference if you choose to separate your persistent storage from the application to keep from competing for server resources. See the &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;quine.io&#x2F;components&#x2F;persistors&#x2F;cassandra-setup&#x2F;&quot;&gt;Cassandra Persistor&lt;&#x2F;a&gt; docs for a sample configuration file.&lt;&#x2F;p&gt;
&lt;p&gt;You now have this directory structure in your &lt;code&gt;$HOME&lt;&#x2F;code&gt; dir.&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;docker&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;├── cassandra&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;│   └── docker-compose.yaml&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;└── grafana&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    ├── docker-compose.yaml&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    └── grafana-provisioning&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        ├── dashboards&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        │   ├── dashboard.yaml&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        │   └── quine.json&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        └── datasources&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;            └── datasource.yml&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;With Docker configured and the &lt;code&gt;quine-docker.zip&lt;&#x2F;code&gt; files loaded on your virtualization host, it&#x27;s time to start the containers so that they are ready to receive data from Quine.&lt;&#x2F;p&gt;
&lt;p&gt;Change into the &lt;code&gt;grafana&lt;&#x2F;code&gt; directory and start the InfluxDB&#x2F;Grafana stack:&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;docker compose up -d&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;You should see something similar to this appear in your terminal window:&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;[+] Running 18&#x2F;18&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt; ⠿ grafana Pulled                                            8.7s&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;   ⠿ f56be85fc22e Pull complete                              2.8s&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;   ⠿ 9efeca377709 Pull complete                              3.0s&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;   ⠿ b4608283f0dd Pull complete                              3.5s&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;   ⠿ 94ba646ecfcd Pull complete                              3.9s&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;   ⠿ 6730f2b3d4cf Pull complete                              4.1s&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;   ⠿ 871e090050be Pull complete                              4.4s&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;   ⠿ 03d60ad4c029 Pull complete                              5.7s&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;   ⠿ baaa3e79bf5c Pull complete                              7.6s&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;   ⠿ 01c0c058d3df Pull complete                              7.7s&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt; ⠿ influxdb Pulled                                           9.6s&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;   ⠿ 918547b94326 Pull complete                              7.4s&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;   ⠿ 5d79063a01c5 Pull complete                              7.7s&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;   ⠿ a8e9798c2a3f Pull complete                              7.8s&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;   ⠿ e8074b4fc936 Pull complete                              8.5s&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;   ⠿ a913b4722330 Pull complete                              8.5s&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;   ⠿ 9c8265b2cf7a Pull complete                              8.6s&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;   ⠿ 9037f1aeb9df Pull complete                              8.6s&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;[+] Running 4&#x2F;4&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt; ⠿ Volume &amp;quot;grafana_grafana-storage&amp;quot;   Created                0.0s&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt; ⠿ Volume &amp;quot;grafana_influxdb-storage&amp;quot;  Created                0.0s&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt; ⠿ Container grafana-influxdb-1       Started                0.5s&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt; ⠿ Container grafana-grafana-1        Started                0.7s&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;Verify that the containers are running:&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;docker ps --format &amp;quot;table {{.Names}}\t{{.Status}}\t{{.Ports}}&amp;quot;&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;NAMES                 STATUS           PORTS&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;grafana-grafana-1     Up 4 seconds     0.0.0.0:3000-&amp;gt;3000&#x2F;tcp&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;grafana-influxdb-1    Up 4 seconds     0.0.0.0:8086-&amp;gt;8086&#x2F;tcp&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Congratulations! 🎉  InfluxDB, Grafana, and Cassandra are running in separate containers and listening on their default ports.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;configuring-quine-to-send-metrics-data&quot;&gt;Configuring Quine to Send Metrics Data&lt;&#x2F;h2&gt;
&lt;p&gt;Enable metrics reporting in Quine via configuration parameters that can be passed as Java system properties with &lt;code&gt;-D&lt;&#x2F;code&gt; or contained in a &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;quine.io&#x2F;reference&#x2F;config&#x2F;configuration&#x2F;&quot;&gt;Quine configuration&lt;&#x2F;a&gt; file. Quine can report metrics to &lt;code&gt;jmx&lt;&#x2F;code&gt;, &lt;code&gt;csv&lt;&#x2F;code&gt;, &lt;code&gt;influxdb,&lt;&#x2F;code&gt; and &lt;code&gt;slf4j&lt;&#x2F;code&gt;  for analysis. The &lt;code&gt;jmx&lt;&#x2F;code&gt; metrics reporter is enabled by default.&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;java \&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    -Xmx12G -Xms12G \&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    -Dquine.metrics-reporters.1.type=influxdb \&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    -Dquine.metrics-reporters.1.database=db0 \&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    -Dquine.metrics-reporters.1.period=30s \&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    -Dquine.metrics-reporters.1.host={container_host} \&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    -jar quine-1.5.4.jar \&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    -r wikipedia --force-config&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;A couple of things to note when passing configuration as system properties.&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;The &lt;code&gt;-D&lt;&#x2F;code&gt; parameters must come before &lt;code&gt;-jar&lt;&#x2F;code&gt;&lt;&#x2F;li&gt;
&lt;li&gt;When launching Quine with a recipe (&lt;code&gt;-r&lt;&#x2F;code&gt;) you also have to pass &lt;code&gt;--force-config&lt;&#x2F;code&gt;&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;Alternatively, you can pass the following configuration stored in &lt;code&gt;quine-metrics.conf&lt;&#x2F;code&gt; to Quine to accomplish the same thing.&lt;&#x2F;p&gt;
&lt;p&gt;Create a &lt;code&gt;quine-metrics.conf&lt;&#x2F;code&gt; file containing the HOCON configuration from the &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;quine.io&#x2F;reference&#x2F;config&#x2F;configuration&#x2F;&quot;&gt;documentation&lt;&#x2F;a&gt;.&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;quine {&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  # where metrics collected by the application should be reported&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  metrics-reporters = [&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    {&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;      # Report metrics to an influxdb (version 1) database&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;      type = influxdb&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;      # required by influxdb - the interval at which new records will&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;      # be written to the database&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;      period = 30&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;      # Connection information for the influxdb database&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;      database = db0&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;      scheme = http&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;      host = {container_host}&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;      port = 8086&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;      # Authentication information for the influxdb database. Both&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;      # fields may be omitted&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;      # user = admin&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;      # password = admin&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    }&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  ]&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;}&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Then launch Quine, passing the configuration file on the command line.&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;java -Dconfig.file=metrics.conf -jar quine-1.5.4.jar -r wikipedia --force-config&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;&lt;h2 id=&quot;quine-metrics&quot;&gt;Quine Metrics&lt;&#x2F;h2&gt;
&lt;p&gt;Quine reports three classes of metrics; counters, timers, and gauges.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;TIP!&lt;&#x2F;strong&gt;&lt;br &#x2F;&gt;
When queried, the &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;quine.io&#x2F;reference&#x2F;rest-api&#x2F;#&#x2F;paths&#x2F;api-v1-admin-metrics&#x2F;get&quot;&gt;metrics summary&lt;&#x2F;a&gt; API endpoint reports the same metrics as a metrics reporter.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;counters&quot;&gt;Counters&lt;&#x2F;h3&gt;
&lt;p&gt;Quine uses counters to accumulate the number of times that events occur. Counters can return either a value or a histogram.&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;node.edge-counts.*&lt;&#x2F;code&gt;: Histogram-style summaries of edges per node&lt;&#x2F;li&gt;
&lt;li&gt;&lt;code&gt;node.property-counts.*&lt;&#x2F;code&gt;: Histogram-style summaries of properties per node&lt;&#x2F;li&gt;
&lt;li&gt;&lt;code&gt;shard.*.sleep-counters&lt;&#x2F;code&gt;: Count the lifecycle state of nodes managed by a shard&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;h3 id=&quot;timers&quot;&gt;Timers&lt;&#x2F;h3&gt;
&lt;p&gt;Quine reports the elapsed time in milliseconds it takes to perform persistor operations.&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;persistor.get-journal&lt;&#x2F;code&gt;: Time taken to read and deserialize a single node&#x27;s relevant journal&lt;&#x2F;li&gt;
&lt;li&gt;&lt;code&gt;persistor.persist-event&lt;&#x2F;code&gt;: Time taken to serialize and persist one message&#x27;s worth of on-node events&lt;&#x2F;li&gt;
&lt;li&gt;&lt;code&gt;persistor.get-latest-snapshot&lt;&#x2F;code&gt;: Time taken to read (but not deserialize) a single node snapshot&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;h3 id=&quot;gauges&quot;&gt;Gauges&lt;&#x2F;h3&gt;
&lt;p&gt;Quine gauges report metrics as a value.&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;memory.heap.*&lt;&#x2F;code&gt;: JVM heap usage&lt;&#x2F;li&gt;
&lt;li&gt;&lt;code&gt;memory.total&lt;&#x2F;code&gt;: JVM combined memory usage&lt;&#x2F;li&gt;
&lt;li&gt;&lt;code&gt;shared.valve.ingest&lt;&#x2F;code&gt;: Number of current requests to slow ingest for another part of Quine to catch up&lt;&#x2F;li&gt;
&lt;li&gt;&lt;code&gt;dgn-reg.count&lt;&#x2F;code&gt;: Number of in-memory registered DomainGraphNodes&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;h2 id=&quot;create-a-dashboard-in-grafana&quot;&gt;Create a Dashboard in Grafana&lt;&#x2F;h2&gt;
&lt;p&gt;A dashboard in Grafana contains a series of panels that provide an at-a-glance view of how Quine is performing.&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;Log into Grafana. The username and password for the container is admin:admin.&lt;&#x2F;li&gt;
&lt;li&gt;Decide if you are going to keep the default password or skip changing it&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;If you launched Grafana using the &lt;code&gt;docker-compose&lt;&#x2F;code&gt; files from the &lt;code&gt;quine-docker.zip&lt;&#x2F;code&gt; file that I provided, you will see a dashboard called  &quot;Quine - Monitor a Recipe&quot; in the lower left hand corner of the Dashboards card. Click on that dashboard to open it. Initially, the dashboard will be empty. It will fill in as you run a recipe.&lt;&#x2F;p&gt;
&lt;p&gt;Let&#x27;s start Quine with the Wikipedia recipe and the &lt;strong&gt;&lt;code&gt;metrics.conf&lt;&#x2F;code&gt;&lt;&#x2F;strong&gt; file from above to get familiar with each visualization.&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;java -Dconfig.file=metrics.conf -jar quine-1.5.4.jar -r wikipedia --force-config&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Metrics will populate the dashboard after about 30 seconds once Quine is running. You may need to reload your browser to have Grafana pull all of the metrics from InfluxDB. Also, be sure to set the time range in the upper right corner of the dashboard to &quot;Last 15 minutes&quot; to ensure that you have a current time range selected to visualize.&lt;&#x2F;p&gt;
&lt;p&gt;Your dashboard will begin to populate like this:&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;uploads-ssl.webflow.com&#x2F;61f0aecf55af2565526f6a95&#x2F;647a54b43c232b1061ef656c_wikipedia-dashboard.png&quot; alt=&quot;&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;A Grafana dashboard view for Quine running the Wikipedia ingest recipe.&lt;&#x2F;p&gt;
&lt;p&gt;Hover over each graph in the dashboard to expose a &quot;three-dot&quot; menu in the upper right hand corner of the panel. Click on the menu and select &quot;edit&quot; to review how each visualization is configured. Some visualizations use the query builder, and some are written directly as an InfluxDB query.&lt;&#x2F;p&gt;
&lt;p&gt;Please modify the dashboard to match your environment and satisfy your needs.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;what-i-ve-learned-monitoring-quine&quot;&gt;What I&#x27;ve Learned Monitoring Quine&lt;&#x2F;h2&gt;
&lt;p&gt;Monitoring a streaming graph is similar to any other database, with a few additional key metrics to watch.&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;Quine is backpressured, which means that the performance of the persistence subsystem affects the flow of events in the graph.&lt;&#x2F;li&gt;
&lt;li&gt;Java garbage collection impacts backpressure. It is normal for Quine ingest rates to fluctuate as Java manages the heap. Keep an eye on when your heap consumption approaches the max memory configured for Java. I&#x27;ve found the best performance when launching Quine with a 12G (&lt;code&gt;-Xmx12G -Xms12G&lt;&#x2F;code&gt;) memory allocation pool.&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;h2 id=&quot;conclusion&quot;&gt;Conclusion&lt;&#x2F;h2&gt;
&lt;p&gt;The metrics dashboard built into the Exploration UI is good for understanding how Quine is currently operating. However, monitoring the performance of a recipe or solution over time requires a DevOps tool like Grafana. This blog will get you up and running with a sample dashboard that replicates all of the gauges in the Exploration UI that you can modify to suit your needs.&lt;&#x2F;p&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>Calculate Risk and Optimize Asset Allocation in Real Time</title>
        <published>2023-05-31T00:00:00+00:00</published>
        <updated>2023-05-31T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              Unknown
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://www.thatdot.com/blog/calculate-risk-and-optimize-asset-allocation-in-real-time/"/>
        <id>https://www.thatdot.com/blog/calculate-risk-and-optimize-asset-allocation-in-real-time/</id>
        
        <content type="html" xml:base="https://www.thatdot.com/blog/calculate-risk-and-optimize-asset-allocation-in-real-time/">&lt;h2 id=&quot;the-hidden-cost-of-batch-processing-for-financial-institutions&quot;&gt;The Hidden Cost of Batch Processing for Financial Institutions&lt;&#x2F;h2&gt;
&lt;p&gt;The recent failures at financial institutions like First Republic Bank, Signature Bank, and even Silicon Valley Bank have brought issues of regulatory compliance and capital management to the forefront for both industry members and the wider public alike.&lt;&#x2F;p&gt;
&lt;p&gt;One thing these events have exposed is that the financial industry largely relies on an approach to managing mandated operational risk capital requirements, batch processing, that is ill-suited to the direction both the market and compliance are heading. Operationally, batch processing is time-consuming, costly, and often must take place in constrained time windows between market close and open.&lt;&#x2F;p&gt;
&lt;p&gt;The knock-on financial effect of the operation limitations of batch processing are more impactful: institutions are slow to react to changing market conditions, which can lead to over- or under-allocation of certain classes of funds.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;real-time-risk-calculation-and-asset-allocation&quot;&gt;Real-time Risk Calculation and Asset Allocation&lt;&#x2F;h2&gt;
&lt;p&gt;Using Quine streaming graph, financial institutions can respond to market changes in real time, providing adequate coverage for risk exposure while ensuring compliance minimally affects asset allocation.&lt;&#x2F;p&gt;
&lt;p&gt;At a high level, Quine accomplishes this by doing what it does best: combining multiple feeds in real-time to build hierarchical models of elements like markets, trading entities, risk classes, and asset values, that adjust in real time to changing market conditions.&lt;&#x2F;p&gt;
&lt;p&gt;At a specific level, we have created a Quine recipe that demonstrates, in the context of regulatory monitoring requirements like the &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;www.bis.org&#x2F;basel_framework&#x2F;standard&#x2F;LCR.htm&quot;&gt;Basel III&lt;&#x2F;a&gt; &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;www.bis.org&#x2F;basel_framework&#x2F;standard&#x2F;LCR.htm&quot;&gt;Liquidity&lt;&#x2F;a&gt; &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;www.bis.org&#x2F;basel_framework&#x2F;standard&#x2F;LCR.htm&quot;&gt;Coverage Ratio (LCR)&lt;&#x2F;a&gt;, &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;www.bis.org&#x2F;bcbs&#x2F;publ&#x2F;d295.htm&quot;&gt;Net Stable Funding Ratio (NSFR)&lt;&#x2F;a&gt; and liquidity risk monitoring tools as described in &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;www.bis.org&#x2F;bcbs&#x2F;basel3.htm&quot;&gt;https:&#x2F;&#x2F;www.bis.org&#x2F;bcbs&#x2F;basel3.htm&lt;&#x2F;a&gt;, the following:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;Calculating risk while taking into account complex interdependencies and rules.&lt;&#x2F;li&gt;
&lt;li&gt;Constantly recomputing liquidity-indexed risk to determine capital requirements relative to market conditions.&lt;&#x2F;li&gt;
&lt;li&gt;Normalizing multiple sources to calculate relative value of assets and roll up the results to determine near-real time liquidity in event liquidation is necessary.&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;uploads-ssl.webflow.com&#x2F;61f0aecf55af2565526f6a95&#x2F;64765b19bdb520e88f2967a6_3bb29783.png&quot; alt=&quot;A graph that shows the hierarchical nature of the graph for this particular problem domain.&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;A sample view from the graph this recipe generates.&lt;&#x2F;p&gt;
&lt;p&gt;The recipe can be found &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;quine.io&#x2F;recipes&#x2F;finance&#x2F;&quot;&gt;here&lt;&#x2F;a&gt;.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;quine-developer-site-2-0&quot;&gt;Quine Developer Site 2.0&lt;&#x2F;h2&gt;
&lt;p&gt;As part of our continued focus on improving the Quine developer experience, we’ve made significant changes to the Quine.io site.&lt;&#x2F;p&gt;
&lt;p&gt;The most notable change is a total restructuring of the recipe pages to interleave code and contextual or documentary information. Recipe documentation now includes a full walkthrough of a recipe and an explanation of how the recipe works so that recipes can also act as training material.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;uploads-ssl.webflow.com&#x2F;61f0aecf55af2565526f6a95&#x2F;64765f2127f0f94621dde110_Recipe%20Screen%20Shot%20-%20Example.png&quot; alt=&quot;Recipe page example showing sidebar navigation, sections that include Scenario, How it Works, and a breakdown of standing and ingest queries, and a link to the full recipe.&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;The new, more structured recipe page.&lt;&#x2F;p&gt;
&lt;p&gt;Other changes include:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;Improved developer journey by separating tutorials (&lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;quine.io&#x2F;getting-started&#x2F;&quot;&gt;getting started&lt;&#x2F;a&gt;), &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;quine.io&#x2F;docs&#x2F;&quot;&gt;technical docs&lt;&#x2F;a&gt;, and &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;quine.io&#x2F;recipes&#x2F;&quot;&gt;recipe docs&lt;&#x2F;a&gt; into their own sections of the site&lt;&#x2F;li&gt;
&lt;li&gt;Full release notes and release history included in the &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;quine.io&#x2F;download&#x2F;&quot;&gt;downloads&lt;&#x2F;a&gt; page&lt;&#x2F;li&gt;
&lt;li&gt;Direct links to Quine blog posts, events, and self-service demos are now on the &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;quine.io&#x2F;info&#x2F;&quot;&gt;info&lt;&#x2F;a&gt; page&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;You can still download and easily get started with Quine (hint, hint) and we’d love to &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;that.re&#x2F;quine-slack&quot;&gt;hear your feedback&lt;&#x2F;a&gt; and add features you think might help you build great things with Quine.&lt;&#x2F;p&gt;
&lt;p&gt;‍&lt;&#x2F;p&gt;
&lt;p&gt;‍&lt;&#x2F;p&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>Create a Quine Icon Library with Python</title>
        <published>2023-04-25T00:00:00+00:00</published>
        <updated>2023-04-25T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              Unknown
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://www.thatdot.com/blog/create-a-quine-icon-library-with-python/"/>
        <id>https://www.thatdot.com/blog/create-a-quine-icon-library-with-python/</id>
        
        <content type="html" xml:base="https://www.thatdot.com/blog/create-a-quine-icon-library-with-python/">&lt;p&gt;Have you ever wanted to add flair to a graph visualization but are unsure which icons Quine supports? In this blog, we explore a Python script that fetches valid icon names from the web, configures the Exploration UI, then creates a graph of icon nodes for reference. The script uses several popular Python libraries, including Requests, BeautifulSoup, and Halo, along with the &lt;code&gt;&#x2F;query-ui&lt;&#x2F;code&gt; and &lt;code&gt;&#x2F;query&#x2F;cypher&lt;&#x2F;code&gt; API endpoints.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;environment&quot;&gt;Environment&lt;&#x2F;h2&gt;
&lt;p&gt;Before we start, we need to ensure that we have the necessary libraries installed. We will be using &lt;code&gt;requests&lt;&#x2F;code&gt;, &lt;code&gt;beautifulsoup4&lt;&#x2F;code&gt;, &lt;code&gt;log_symbols&lt;&#x2F;code&gt;, and &lt;code&gt;halo&lt;&#x2F;code&gt;. You can install them using &lt;code&gt;pip&lt;&#x2F;code&gt;:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;‍&lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;thatdot&#x2F;quine&#x2F;releases&#x2F;latest&quot;&gt;Quine&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;thatdot&#x2F;quine&#x2F;releases&#x2F;latest&quot;&gt;‍&lt;&#x2F;a&gt;Python 3&lt;&#x2F;li&gt;
&lt;li&gt;Requests library (&lt;code&gt;pip install requests&lt;&#x2F;code&gt;)&lt;&#x2F;li&gt;
&lt;li&gt;BeautifulSoup library (&lt;code&gt;pip install beautifulsoup4&lt;&#x2F;code&gt;)&lt;&#x2F;li&gt;
&lt;li&gt;Optional Halo library for operation visuals  (&lt;code&gt;pip install log-symbols halo&lt;&#x2F;code&gt;)&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;Start Quine so that it is ready to run the script.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;code&gt;java -jar quine-1.5.3.jar&lt;&#x2F;code&gt;&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-script&quot;&gt;The Script&lt;&#x2F;h2&gt;
&lt;p&gt;The script begins by importing the required libraries:&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;import requests&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;import json&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;from halo import Halo&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;from log_symbols import LogSymbols&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;from bs4 import BeautifulSoup&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;&lt;h2 id=&quot;build-a-list-of-icon-names&quot;&gt;Build a list of icon names&lt;&#x2F;h2&gt;
&lt;p&gt;We use the &lt;code&gt;requests&lt;&#x2F;code&gt; library to GET the webpage referenced in the Replace Node Appearances API documentation. Quine supports version 2.0.0 of the [&lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;ionic.io&#x2F;ionicons&#x2F;v2&#x2F;cheatsheet.html&quot;&gt;Ionicons&lt;&#x2F;a&gt;] icon set from the Ionic Framework. The link contains a list of 733 icons supported by Quine. A &lt;code&gt;try...except&lt;&#x2F;code&gt; block handles any errors that might occur during the request. If the request is successful, the script saves the HTML content of the page.&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;try:&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    url = &amp;quot;https:&#x2F;&#x2F;ionic.io&#x2F;ionicons&#x2F;v2&#x2F;cheatsheet.html&amp;quot;&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    response = requests.get(url)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    html = response.content&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    print(LogSymbols.SUCCESS.value, &amp;quot;GET Icon Cheatsheet&amp;quot;)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;except requests.exceptions.RequestException as e:&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    raise SystemExit(e)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Next, we use BeautifulSoup to parse the HTML content of the page to extract all of the icon names. The &lt;code&gt;soup.select&lt;&#x2F;code&gt; method finds all &lt;code&gt;&amp;lt;input&amp;gt;&lt;&#x2F;code&gt; elements with a &lt;code&gt;name&lt;&#x2F;code&gt; attribute and returns a list, which are then looped over to extract the &lt;code&gt;value&lt;&#x2F;code&gt; attribute of each tag later. We output &lt;code&gt;len(all_icons)&lt;&#x2F;code&gt; to verify that we identified all of the icons.&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;soup = BeautifulSoup(html, &amp;quot;html.parser&amp;quot;)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;all_icons = soup.select(&amp;quot;input.name&amp;quot;)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;print(LogSymbols.SUCCESS.value, &amp;quot;Extract Icon Names:&amp;quot;, len(all_icons)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;&lt;h2 id=&quot;create-node-appearances&quot;&gt;Create Node Appearances&lt;&#x2F;h2&gt;
&lt;p&gt;Now that we have the icon names, we can use them to create node appearances for the Quine Exploration UI. We&#x27;ll use the &lt;code&gt;json&lt;&#x2F;code&gt; package to format the &lt;code&gt;nodeAppearances&lt;&#x2F;code&gt; data as JSON, and &lt;code&gt;requests&lt;&#x2F;code&gt; to replace the current &lt;code&gt;nodeAppearances&lt;&#x2F;code&gt; with a PUT to the &lt;code&gt;&#x2F;query-ui&#x2F;node-appearances&lt;&#x2F;code&gt; endpoint. We wrap the API call in &lt;code&gt;try...expect&lt;&#x2F;code&gt; as before to handle any errors.&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;predicate&lt;&#x2F;code&gt;: filter which nodes to apply this style&lt;&#x2F;li&gt;
&lt;li&gt;&lt;code&gt;size&lt;&#x2F;code&gt;: the size of the icon in pixels&lt;&#x2F;li&gt;
&lt;li&gt;&lt;code&gt;icon&lt;&#x2F;code&gt;: the name of the icon&lt;&#x2F;li&gt;
&lt;li&gt;&lt;code&gt;label&lt;&#x2F;code&gt;: the label of the node&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;&lt;em&gt;&lt;strong&gt;Note:&lt;&#x2F;strong&gt;&lt;&#x2F;em&gt; &lt;em&gt;Cypher does not allow dash (&lt;code&gt;-&lt;&#x2F;code&gt;) characters in node labels. We get around this by replacing all of the dashes with underscores in the node labels.&lt;&#x2F;em&gt;&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;nodeAppearances = [&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    {&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        &amp;quot;predicate&amp;quot;: {&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;            &amp;quot;propertyKeys&amp;quot;: [],&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;            &amp;quot;knownValues&amp;quot;: {},&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;            &amp;quot;dbLabel&amp;quot;: icon_name[&amp;quot;value&amp;quot;].replace(&amp;quot;-&amp;quot;, &amp;quot;_&amp;quot;)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        },&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        &amp;quot;size&amp;quot;:40.0,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        &amp;quot;icon&amp;quot;: icon_name[&amp;quot;value&amp;quot;],&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        &amp;quot;label&amp;quot;: {&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;            &amp;quot;key&amp;quot;: &amp;quot;name&amp;quot;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;            &amp;quot;type&amp;quot;: &amp;quot;Property&amp;quot;&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        }&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    } for icon_name in all_icons]&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;json_data = json.dumps(nodeAppearances)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;try:&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    headers = {&amp;quot;Content-type&amp;quot;: &amp;quot;application&#x2F;json&amp;quot;}&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    response = requests.put(&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        &amp;quot;http:&#x2F;&#x2F;localhost:8080&#x2F;api&#x2F;v1&#x2F;query-ui&#x2F;node-appearances&amp;quot;, data=json_data, headers=headers)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;except requests.exceptions.RequestException as e:&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    raise SystemExit(e)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;print(LogSymbols.SUCCESS.value, &amp;quot;PUT Node Appearances&amp;quot;)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;&lt;h2 id=&quot;create-icon-nodes&quot;&gt;Create Icon Nodes&lt;&#x2F;h2&gt;
&lt;p&gt;Finally, our script creates icon nodes by sending a series of POST requests to the Quine &lt;code&gt;&#x2F;query&#x2F;cypher&lt;&#x2F;code&gt; endpoint. For each icon name, a Cypher query creates the corresponding icon node and connects it to the appropriate group node. We use &lt;code&gt;Halo&lt;&#x2F;code&gt; to create a spinner while we POST the icon data to Quine.&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;try:&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    quineSpinner.start()&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    for icon_name in all_icons:&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        group = icon_name[&amp;quot;value&amp;quot;].split(&amp;#39;-&amp;#39;,2)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        query_text = (&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;            f&amp;#39;MATCH (a), (b), (c) &amp;#39;&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;            f&amp;#39;WHERE id(a) = idFrom(&amp;quot;{group[0]}&amp;quot;) &amp;#39;&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;            f&amp;#39;  AND id(b) = idFrom(&amp;quot;{group[1]}&amp;quot;) &amp;#39;&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;            f&amp;#39;  AND id(c) = idFrom(&amp;quot;{icon_name[&amp;quot;value&amp;quot;]}&amp;quot;) &amp;#39;&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;            f&amp;#39;SET a:{group[0]}, a.name = &amp;quot;{group[0]}&amp;quot; &amp;#39;&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;            f&amp;#39;SET b:{group[1]}, b.name = &amp;quot;{group[1]}&amp;quot; &amp;#39;&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;            f&amp;#39;SET c:{icon_name[&amp;quot;value&amp;quot;].replace(&amp;quot;-&amp;quot;, &amp;quot;_&amp;quot;)}, c.name = &amp;quot;{icon_name[&amp;quot;value&amp;quot;]}&amp;quot; &amp;#39;&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;            f&amp;#39;CREATE (a)&amp;amp;amp;lt;-[:GROUP]-(b)&amp;amp;amp;lt;-[:GROUP]-(c)&amp;#39;&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;          ) if len(group) == 3 else (&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;            f&amp;#39;MATCH (a), (c) &amp;#39;&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;            f&amp;#39;WHERE id(a) = idFrom(&amp;quot;{group[0]}&amp;quot;) &amp;#39;&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;            f&amp;#39; AND id(c) = idFrom(&amp;quot;{icon_name[&amp;quot;value&amp;quot;]}&amp;quot;) &amp;#39;&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;            f&amp;#39;SET a:{group[0]}, a.name = &amp;quot;{group[0]}&amp;quot; &amp;#39;&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;            f&amp;#39;SET c:{icon_name[&amp;quot;value&amp;quot;].replace(&amp;quot;-&amp;quot;, &amp;quot;_&amp;quot;)}, c.name = &amp;quot;{icon_name[&amp;quot;value&amp;quot;]}&amp;quot; &amp;#39;&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;            f&amp;#39;CREATE (a)&amp;amp;amp;lt;-[:GROUP]-(c)&amp;#39;&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;          )&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        quineSpinner.text = query_text&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        headers = {&amp;#39;Content-type&amp;#39;: &amp;#39;text&#x2F;plain&amp;#39;}&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        # print(query_text)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        response = requests.post(&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;            &amp;#39;http:&#x2F;&#x2F;localhost:8080&#x2F;api&#x2F;v1&#x2F;query&#x2F;cypher&amp;#39;, data=query_text, headers=headers)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    quineSpinner.succeed(&amp;#39;POST Icon Nodes&amp;#39;)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;except requests.exceptions.Timeout as timeout:&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    quineSpinner.stop(&amp;#39;Request Timeout: &amp;#39; + timeout)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;except requests.exceptions.RequestException as e:&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    raise SystemExit(e)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;&lt;h2 id=&quot;running-the-script&quot;&gt;Running the script&lt;&#x2F;h2&gt;
&lt;p&gt;At this point, we are ready to run the script and visualize the icons supported in Quine.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;code&gt;python3 iconLibrary.py&lt;&#x2F;code&gt;&lt;&#x2F;p&gt;
&lt;p&gt;The script updates the console as it moves through the blocks of code that we described above:&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;✔ GET Icon Cheatsheet&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;✔ Extract Icon Names: 733&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;✔ PUT Node Appearances&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;✔ POST Icon Nodes&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Navigate to Quine in your browser and load all of the nodes that we just created into the Exploration UI. There are multiple ways to load all of the nodes in the UI, for this example, we use &lt;code&gt;MATCH (n) RETURN n&lt;&#x2F;code&gt;. The Exploration UI will warn that you are about to render 787 nodes which is correct for all of the icons and grouping nodes generated by the script. Hit the &lt;strong&gt;OK&lt;&#x2F;strong&gt; button to view the graph.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;em&gt;&lt;strong&gt;Note:&lt;&#x2F;strong&gt;&lt;&#x2F;em&gt; &lt;em&gt;If you already had Quine open in a browser before running the script, you will need to refresh your browser window to load the new &lt;code&gt;nodeAppearances&lt;&#x2F;code&gt; submitted by the query in order for the nodes to render correctly.&lt;&#x2F;em&gt;&lt;&#x2F;p&gt;
&lt;p&gt;In our case, the nodes are jumbled when they are first rendered. Click the play button in the top nav to have Quine organize the graph. Our result produced the graph visualization of all supported icons below:&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;uploads-ssl.webflow.com&#x2F;61f0aecf55af2565526f6a95&#x2F;6441bc06f12f50ba7123b8d9_All%20the%20icons%20visualized%20in%20Quine%27s%20Exploration%20UI%2C%20forming%20graph%20based%20on%20categories.png&quot; alt=&quot;Icons visualized in Exploration UI, grouped by category.&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;h2 id=&quot;conclusion&quot;&gt;Conclusion&lt;&#x2F;h2&gt;
&lt;p&gt;There you have it, a graph visualization using all of the icons Quine supports!&lt;&#x2F;p&gt;
&lt;p&gt;This script can generate the &lt;code&gt;nodeAppearances&lt;&#x2F;code&gt; graph and serve as a starting point if you are looking to automate fetching non-streaming data from websites to enrich streaming data stored in Quine.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;uploads-ssl.webflow.com&#x2F;61f0aecf55af2565526f6a95&#x2F;6441b87b3cd59d2a7c662ff4_Quine%20RECENT%20NODES%20APPEARANCE%20API%20screenshot.png&quot; alt=&quot;RECENT NODES APPEARANCE API screen shot from Quine. Uses the Stoplight framework.&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;If you want to learn more about Quine or explore using other API libraries with Quine, check out the interactive REST API documentation available via the document icon in the left nav bar. The interactive documentation is a great place to submit API requests. Code samples in popular languages are quickly mocked up in the docs for use when experimenting with small projects like this yourself.&lt;&#x2F;p&gt;
&lt;p&gt;You can download this script and try it for yourself in this GitHub Repo.&lt;&#x2F;p&gt;
&lt;p&gt;‍&lt;&#x2F;p&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>Dynamic Duo: Quine &amp;amp; Novelty Detector for Insider Threats</title>
        <published>2023-04-18T00:00:00+00:00</published>
        <updated>2023-04-18T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              Unknown
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://www.thatdot.com/blog/dynamic-duo-quine-novelty-detector-for-insider-threats/"/>
        <id>https://www.thatdot.com/blog/dynamic-duo-quine-novelty-detector-for-insider-threats/</id>
        
        <content type="html" xml:base="https://www.thatdot.com/blog/dynamic-duo-quine-novelty-detector-for-insider-threats/">&lt;h2 id=&quot;adding-quine-to-the-insider-threat-detection-proof-of-concept&quot;&gt;Adding Quine to the Insider Threat Detection Proof of Concept&lt;&#x2F;h2&gt;
&lt;p&gt;A lot has changed since we first &lt;a href=&quot;https:&#x2F;&#x2F;www.thatdot.com&#x2F;blog&#x2F;stop-insider-threats-with-automated-behavioral-anomaly-detection&#x2F;&quot;&gt;posted&lt;&#x2F;a&gt; the Stop Insider Threats With Automated Behavioral Anomaly Detection blog post. Most significantly, thatDot released Quine, our streaming graph, as an &lt;a href=&quot;https:&#x2F;&#x2F;www.thatdot.com&#x2F;news&#x2F;announcing-open-source-release-of-quine-streaming-graph&#x2F;&quot;&gt;open&lt;&#x2F;a&gt; source project just as the industry is recognizing the value of real-time ETL and complex event processing in service of business requirements. This is especially true in finance and cybersecurity, where minutes (seconds or even milliseconds) can mean the difference between disaster, survival or success.&lt;&#x2F;p&gt;
&lt;p&gt;Our goal, at the time, was to show how anomaly detection on &lt;a href=&quot;https:&#x2F;&#x2F;www.thatdot.com&#x2F;blog&#x2F;whats-the-difference-between-categorical-and-numerical-data&#x2F;&quot;&gt;categorical data&lt;&#x2F;a&gt; could be used to resolve complex challenges utilizing an industry recognized standard benchmark dataset, which happened to be static. The approach we used then was to pre-process (batch) the &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;www.osti.gov&#x2F;biblio&#x2F;1001546&quot;&gt;VAST Insider Threat challenge dataset&lt;&#x2F;a&gt; with Python then ingest that processed stream of data with thatDot&#x27;s Novelty Detector to identity the bad actor.&lt;&#x2F;p&gt;
&lt;p&gt;But with a new tool in our kit we decided to see what would be involved in updating the workflow by replacing the Python pre-processing, instead using Quine in front of Novelty Detector in our pipeline.&lt;&#x2F;p&gt;
&lt;p&gt;This involved:&lt;&#x2F;p&gt;
&lt;ol&gt;
&lt;li&gt;Defining the &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;docs.quine.io&#x2F;getting-started&#x2F;ingest-streams-tutorial.html&quot;&gt;ingest queries&lt;&#x2F;a&gt; required to consume and shape the VAST datasets; and&lt;&#x2F;li&gt;
&lt;li&gt;Developing a &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;docs.quine.io&#x2F;getting-started&#x2F;standing-queries-tutorial.html&quot;&gt;standing query&lt;&#x2F;a&gt; to output the data to Novelty Detector for anomaly detection.&lt;&#x2F;li&gt;
&lt;&#x2F;ol&gt;
&lt;p&gt;Data from the &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;that.re&#x2F;insider-threat&quot;&gt;dataset&lt;&#x2F;a&gt; is broken into three files:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;Employee to office and source IP address mapping in employeeData.csv&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;ingestStreams:&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  - type: FileIngest&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    path: employeeData.csv&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    parallelism: 61&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    format:&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;      type: CypherCsv&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;      headers: true&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;      query: &amp;amp;gt;-&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        MATCH (employee), (ipAddress), (office)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        WHERE id(employee) = idFrom(&amp;quot;employee&amp;quot;, $that.EmployeeID)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;          AND id(ipAddress) = idFrom(&amp;quot;ipAddress&amp;quot;,$that.IP)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;          AND id(office) = idFrom(&amp;quot;office&amp;quot;,$that.Office)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        SET employee.id = $that.EmployeeID,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;            employee:employee&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;            &lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        SET ipAddress.ip = $that.IP,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;            ipAddress:ipAddress&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;            &lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        SET office.office = $that.Office,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;            office:office&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        CREATE (ipAddress)&amp;amp;lt;-[:USES_IP]-(employee)-[:SHARES_OFFICE]-&amp;amp;gt;(office)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;ul&gt;
&lt;li&gt;Proximity reader data from door badge scanners in proxLog.csv&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  - type: FileIngest&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    path: proxLog.csv&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    format:&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;      type: CypherCsv&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;      headers: true&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;      query: &amp;gt;-&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        MATCH (employee), (badgeStatus)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        WHERE id(employee) = idFrom(&amp;quot;employee&amp;quot;, $that.ID)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;          AND id(badgeStatus) = idFrom(&amp;quot;badgeStatus&amp;quot;,$that.ID,$that.Datetime,$that.Type,$that.ID)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        SET employee.id = $that.ID,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;            employee:employee&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        SET badgeStatus.type = $that.Type,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;            badgeStatus.employee = $that.ID,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;            badgeStatus.datetime = $that.Datetime,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;            badgeStatus:badgeStatus&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        CREATE (employee)-[:BADGED]-&amp;gt;(badgeStatus)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;ul&gt;
&lt;li&gt;Network traffic in IPLog3.5.csv&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt; - type: FileIngest&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    path: IPLog3.5.csv&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    format:&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;      type: CypherCsv&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;      headers: true&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;      query: &amp;gt;-&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        MATCH (ipAddress), (request)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        WHERE id(ipAddress) = idFrom(&amp;quot;ipAddress&amp;quot;,$that.SourceIP)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;          AND id (request) = idFrom(&amp;quot;request&amp;quot;, $that.SourceIP,$that.AccessTime, $that.DestIP, $that.Socket)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        SET request.reqSize = $that.ReqSize,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;            request.respSize = $that.RespSize,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;            request.datetime = $that.AccessTime,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;            request.dst = $that.DestIP,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;            request.dstport = $that.Socket,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;            request:request&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        SET ipAddress.ip = $that.SourceIP,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;            ipAddress:ipAddress&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        CREATE (ipAddress)-[:MADE_REQUEST]-&amp;gt;(request)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;These ingests form a basic structure that looks like:&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;uploads-ssl.webflow.com&#x2F;61f0aecf55af2565526f6a95&#x2F;643d98cfe9eaca518244cf78_image2.png&quot; alt=&quot;A snapshot of the graph created by ingest streams showing Employee 51 connected by the Badged edge to a door reader event node and node IP address by USES_IP edge, which is connected to a Request node by a Made_request edge.&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;The ingest streams combine to create the essential graph structure.&lt;&#x2F;p&gt;
&lt;p&gt;Because we have created an intuitive schema for identifying nodes by way of feeding &lt;code&gt;idFrom()&lt;&#x2F;code&gt; deterministic &lt;em&gt;&lt;strong&gt;and&lt;&#x2F;strong&gt;&lt;&#x2F;em&gt; descriptive data that can be used to query for them very efficiently (and do so with sub-millisecond latency).&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;uploads-ssl.webflow.com&#x2F;61f0aecf55af2565526f6a95&#x2F;643d9746c3cb33f1bfac7156_sP0xVcIhRAPjRE_TdBJnDUD9nAOak6ape6YSh5tZfx01-Yfq5A_puEj2EmnLYImWCTqWhI2QRXwPhz_mTdOi8DdXy6b4dUtBTNebNL4FEbfc_4re4MZsowqCd6wulNywVVoI2aBlB7895JKEtkC9Ano.png&quot; alt=&quot;The same basic graph as above but this time showing a very efficient query for node properties. &quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;A quick query efficiently displays relevant properties from connected nodes.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;moving-from-batch-to-real-time-monitoring&quot;&gt;Moving from Batch to Real Time Monitoring&lt;&#x2F;h2&gt;
&lt;p&gt;While this is certainly an improvement from our previous workflow, it is still highly manual (i.e., having to explicitly query for the data we’re looking for). The promise of a Quine to Novelty Detector workflow is automation with real-time results.&lt;&#x2F;p&gt;
&lt;p&gt;By ingesting the data in chronological order (as presented in the source files), we are able to easily match proximity network events to the last associated proximity badge event &lt;em&gt;in real-time&lt;&#x2F;em&gt;.&lt;&#x2F;p&gt;
&lt;p&gt;This is accomplished via standing query matches like:&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;standingQueries:&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;   - pattern:&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;       query: &amp;gt;-&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;         MATCH (request)&amp;lt;-[:MADE_REQUEST]-(ipAddress)&amp;lt;-[:USES_IP]-(employee)-[:BADGED]-&amp;gt;(badgeStatus)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;         RETURN DISTINCT id(request) AS requestid&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;       type: Cypher&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;     outputs:&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;       print-output:&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;         type: CypherQuery&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;         query: &amp;gt;-&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;          MATCH (request)&amp;lt;-[:MADE_REQUEST]-(ipAddress)&amp;lt;-[:USES_IP]-(employee)-[:BADGED]-&amp;gt;(badgeStatus) &lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;          WHERE id(request) = $that.data.requestid &lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;            AND badgeStatus.datetime&amp;lt;=request.datetime &lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;          WITH max(badgeStatus.datetime) AS date, request, ipAddress &lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;          MATCH (request)&amp;lt;-[:MADE_REQUEST]-(ipAddress)&amp;lt;-[:USES_IP]-(employee)-[:BADGED]-&amp;gt;(badgeStatus) &lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;          WHERE badgeStatus.datetime=date &lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;          RETURN badgeStatus.type AS status,ipAddress.ip AS src,request.dstport AS port,request.dst AS dst&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;The question remains, “How do we share the standing query matches from Quine to Novelty Detector?” This can be done in a number of ways (all via &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;docs.quine.io&#x2F;getting-started&#x2F;standing-queries-tutorial.html&quot;&gt;standing query outputs)&lt;&#x2F;a&gt; including, but not limited to:&lt;&#x2F;p&gt;
&lt;ol&gt;
&lt;li&gt;Writing results to a file that Novelty Detector ingests;&lt;&#x2F;li&gt;
&lt;li&gt;Emitting webhooks from Quine to Novelty Detector; or&lt;&#x2F;li&gt;
&lt;li&gt;Publishing results to a Kafka topic to be ingested by Novelty Detector.&lt;&#x2F;li&gt;
&lt;&#x2F;ol&gt;
&lt;p&gt;Although the first two choices will work, they are severely suboptimal. Consider a simple example of a single employee’s data:&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;uploads-ssl.webflow.com&#x2F;61f0aecf55af2565526f6a95&#x2F;643d9746e3c6681e053e30c5_GFPDHl9JixoSG9Hxw2aNpJusiFk9G4KwNwuWGBp5ELsjV2sMov-6gj2AvcVFuLijmpfJwGuT1OG06S6s218RZrCNDqO8146HRarMHPd8WIDRGKC4B-GMvW3hPiZIl1lx1fGfAFd-XQIsHHinOAEijp8.png&quot; alt=&quot;A graph showing employee&amp;#39;s data that renders as thousands of nodes connected to four main clusters. &quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;Visualizing data from a single employee.&lt;&#x2F;p&gt;
&lt;p&gt;Writing the aggregate 115,434 matches would be done one record at a time (on each standing query match) to the filesystem.&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;andThen:&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    type: WriteToFile&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    path: behaviors.jsonl&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Using webhooks suffer the same issue as writing to file, and introduces induced latency from the HTTP transactions.&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;andThen:&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    type: PostToEndpoint&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    url: http:&#x2F;&#x2F;localhost:8080&#x2F;api&#x2F;v1&#x2F;novelty&#x2F;behaviors&#x2F;observe?transformation=behaviors&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Ultimately, we settled on the third option as it most closely resembles production environments, and is the most performant.&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;andThen:&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    type: WriteToKafka&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    bootstrapServers: localhost:9092&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    topic: vast&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    format: {&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        type: JSON&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    }&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;The big question - did it work?&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;uploads-ssl.webflow.com&#x2F;61f0aecf55af2565526f6a95&#x2F;643d97468a60032d1c22f1f8_m_YcpaiiEFndZkxzvkIrohryxSExNiPOKvg6xSbrmnuYBgfRtq5rDfJv2QKKo2ZU8lI0IpEIVPxSxumE0t1ChR1bmY-oqru9J5xassbpv0vlkpOROp7fOuFJfqkx-YJBIluVMTlEj9RfIa5cqT8etOM.png&quot; alt=&quot;A scatter graph of Novelty Detector results showing the anomalous behavior connected to a compromised faciltiy.&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;Results from the Novelty Detector UI.&lt;&#x2F;p&gt;
&lt;p&gt;Absolutely.&lt;&#x2F;p&gt;
&lt;p&gt;The anomalous activity has been identified.&lt;&#x2F;p&gt;
&lt;p&gt;Was it worthwhile?&lt;&#x2F;p&gt;
&lt;p&gt;Sure, but…&lt;&#x2F;p&gt;
&lt;h2 id=&quot;it-don-t-mean-a-thing-if-it-ain-t-got-that-real-time-swing&quot;&gt;It Don’t Mean a Thing If It Ain’t Got That Real-Time Swing&lt;&#x2F;h2&gt;
&lt;p&gt;Although we were able to accomplish the same results with Quine in a single step this was still a batch processing-based exercise. The true value of a Quine to Novelty Detector pipeline is in the melding of complex event stream processing in Quine with shallow learning (no training data) techniques in Novelty Detector, providing an efficient solution for detecting persistent threats and unwanted behaviors in your network. This pattern, moving from batch processing, requiring heavy lifting and grooming of datasets, to real-time stream processing is one where Quine and Novelty Detector thrive.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;try-it-yourself&quot;&gt;Try it Yourself&lt;&#x2F;h2&gt;
&lt;p&gt;If you&#x27;d like to try the VAST test case yourself, you can run &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;docs.thatdot.com&#x2F;novelty&#x2F;using-novelty&#x2F;aws-quick-start&#x2F;aws-quickstart.html&quot;&gt;Novelty Detector on AWS&lt;&#x2F;a&gt; with a generous free usage tier. Instructions for configuring Novelty Detector are available here.&lt;&#x2F;p&gt;
&lt;p&gt;And the open source version of Quine is available for &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;quine.io&#x2F;download&quot;&gt;download here&lt;&#x2F;a&gt;. If you are interested there is also an enterprise version that offers clustering for horizontal scaling and resilience.&lt;&#x2F;p&gt;
&lt;p&gt;And if you&#x27;d prefer a demo or have additional questions, check out &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;that.re&#x2F;quine-slack&quot;&gt;Quine community slack&lt;&#x2F;a&gt; or &lt;a href=&quot;mailto:info@thatdot.com&quot;&gt;send us an email&lt;&#x2F;a&gt;.&lt;&#x2F;p&gt;
&lt;p&gt;‍&lt;&#x2F;p&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>idFrom(): the simple function that’s key to Quine streaming graph</title>
        <published>2023-04-06T00:00:00+00:00</published>
        <updated>2023-04-06T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              Unknown
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://www.thatdot.com/blog/idfrom-the-simple-function-thats-key-to-quine-streaming-graph/"/>
        <id>https://www.thatdot.com/blog/idfrom-the-simple-function-thats-key-to-quine-streaming-graph/</id>
        
        <content type="html" xml:base="https://www.thatdot.com/blog/idfrom-the-simple-function-thats-key-to-quine-streaming-graph/">&lt;h2 id=&quot;a-simple-concept-at-the-core-of-a-new-way-of-processing-data&quot;&gt;A simple concept at the core of a new way of processing data&lt;&#x2F;h2&gt;
&lt;p&gt;&lt;em&gt;What’s a streaming graph?&lt;&#x2F;em&gt;&lt;&#x2F;p&gt;
&lt;p&gt;When we first released Quine streaming graph last year, we had to answer this question a lot. After all, a “streaming graph” had never existed before.&lt;&#x2F;p&gt;
&lt;p&gt;As interest grew, we got pretty good at answering, usually something like this: Quine is a real-time event processor like Flink or ksqlDB. It consumes data from sources like Kafka and Kinesis, queries for complex patterns in event streams, and pushes results to the next hop in the streaming architecture the instant a match is made. However, unlike those venerable systems, Quine uses graph data structure.&lt;&#x2F;p&gt;
&lt;p&gt;Hence, streaming graph.&lt;&#x2F;p&gt;
&lt;p&gt;That seemed to work and, engineers being a curious lot, led inevitably to a second question: “How’s it different from a graph database?”&lt;&#x2F;p&gt;
&lt;p&gt;That’s a fun question to answer, because it means we get to talk about &lt;code&gt;idFrom()&lt;&#x2F;code&gt;. And explaining &lt;code&gt;idFrom()&lt;&#x2F;code&gt; allows us to begin to unpack all the interesting architectural properties that make Quine uniquely well-suited for real-time complex event processing.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;uploads-ssl.webflow.com&#x2F;61f0aecf55af2565526f6a95&#x2F;6430a2e99b3f2e4f162ba663_5b02dc6003dbb4a98e49eb8918055b49.jpeg&quot; alt=&quot;A photo of the character of David from the film Prometheus contemplating a scientific discovery. &quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;&quot;Big things have small beginnings.&quot; -- David from the film &lt;em&gt;Prometheus (2012)&lt;&#x2F;em&gt;&lt;&#x2F;p&gt;
&lt;h2 id=&quot;event-driven-what-if-we-stopped-querying-databases&quot;&gt;Event-driven: what if we stopped querying databases?&lt;&#x2F;h2&gt;
&lt;p&gt;Unlike a graph database, which relies on an index to query for the existence of data in the graph, Quine uses &lt;code&gt;idFrom()&lt;&#x2F;code&gt;, a custom &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;docs.quine.io&#x2F;reference&#x2F;cypher&#x2F;cypher-language.html&quot;&gt;Cypher function&lt;&#x2F;a&gt;.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;code&gt;idFrom()&lt;&#x2F;code&gt; generates a unique node ID from a set of user-provided arguments – most commonly taken from the data in the event stream itself – which is then used in lieu of an index to locate and operate on a node and its properties. (We will get to the why in a bit but it will help first to look at how you use &lt;code&gt;idFrom()&lt;&#x2F;code&gt;.)&lt;&#x2F;p&gt;
&lt;p&gt;Say you want to analyze an event stream of edits from wikipedia to keep an eye out for edits made by specific authors to specific articles in specific databases.&lt;&#x2F;p&gt;
&lt;p&gt;The &lt;em&gt;json&lt;&#x2F;em&gt; record (a pared back version of the actual Wikipedia event feed used in the Wikipedia API recipe featured in our docs &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;docs.quine.io&#x2F;getting-started&#x2F;ingest-streams-tutorial.html&quot;&gt;example here&lt;&#x2F;a&gt;) might look like this:&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;{&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;   &amp;quot;$schema&amp;quot;: &amp;quot;&#x2F;mediawiki&#x2F;revision&#x2F;create&#x2F;1.1.0&amp;quot;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;   &amp;quot;database&amp;quot;: &amp;quot;wikidatawiki&amp;quot;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;   &amp;quot;page_id&amp;quot;: 83996749,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;   &amp;quot;rev_id&amp;quot;: 1869025669,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;   &amp;quot;rev_timestamp&amp;quot;: &amp;quot;2023-04-05T18:18:23Z&amp;quot;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;   &amp;quot;performer&amp;quot;: {&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;       &amp;quot;user_is_bot&amp;quot;: true,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;       &amp;quot;user_id&amp;quot;: 6135162,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;   },&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;   &amp;quot;rev_parent_id&amp;quot;: 1869025663,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;}&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;To create the nodes in a continuous stream of records, you would use &lt;code&gt;MATCH&lt;&#x2F;code&gt; to declare the node names then call the &lt;code&gt;idFrom()&lt;&#x2F;code&gt; function to generate unique node IDs based on the values in the json itself.&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;MATCH (revNode),(pageNode),(dbNode),(userNode), (parentNode)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;WHERE id(revNode) = idFrom(&amp;quot;revision&amp;quot;, $that.rev_id) &lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  AND id(pageNode) = idFrom(&amp;quot;page&amp;quot;, $that.page_id) &lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  AND id(dbNode) = idFrom(&amp;quot;db&amp;quot;, $that.database)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  AND id(userNode) = idFrom(&amp;quot;id&amp;quot;, $that.performer.user_id)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  AND id(parentNode) = idFrom(&amp;quot;revision&amp;quot;, $that.rev_parent_id)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;For now, we can skip adding properties to nodes but it helps our discussion to complete this simple graph by adding relationships between the nodes:&lt;&#x2F;p&gt;
&lt;p&gt;Now, as each event streams in, Quine will create and connect nodes, forming the desired subgraph that looks like this:&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;CREATE (revNode)-[:IN]-&amp;gt;(dbNode),&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;       (revNode)-[:TO]-&amp;gt;(pageNode),&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;       (userNode)-[:MADE]-&amp;gt;(revNode)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;       (parentNode)-[:NEXT]-&amp;gt;(revNode)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;uploads-ssl.webflow.com&#x2F;61f0aecf55af2565526f6a95&#x2F;64304fc12ca3479c4692c5b1_AyTAjRRWChqJL3VU4y88MZoPs4-5YkkRis_32tyoNID1jwDbd6GT98fcTYCbv60GezHeu1G2QM8tfihuYToKK8SrvnKhwe5X9KB1fj6lVnLvK1OgHqcqoGvc92rQQ-UT5TIPa9TE3Bs0kxYl5pO4o4w.png&quot; alt=&quot;&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;You can see the same subgraph with node ID no longer concealed by the node labels:&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;uploads-ssl.webflow.com&#x2F;61f0aecf55af2565526f6a95&#x2F;64304fc18d4e760772134f4d_Y0ZCPWIxXKmp2ebLma49xdA--Y4AikbOG_cGHI8SQBXFU5v569iyZ98u9L6xKvvMN9SUsMUiNwjd9pHB16jSCSDPUKdmGhdLzCRYRVsfi9A_plt2aXGU2Mu9LhnuKqDv98m9XSnvuBa-Wnem923KwJU.png&quot; alt=&quot;&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;Note the things you didn’t have to do to create this graph:&lt;&#x2F;p&gt;
&lt;ol&gt;
&lt;li&gt;Query to find out if the node exists already before&lt;&#x2F;li&gt;
&lt;li&gt;Consult a schema&lt;&#x2F;li&gt;
&lt;&#x2F;ol&gt;
&lt;p&gt;Quine eliminates the need to check to see if the node exists before completing an operation.&lt;&#x2F;p&gt;
&lt;p&gt;The deterministic nature of node IDs created using &lt;code&gt;idFrom()&lt;&#x2F;code&gt; means a value or combination of values passed to the function will always result in the same ID.&lt;&#x2F;p&gt;
&lt;p&gt;It will either create a new node based on the value or, if that node already exists, update it.&lt;&#x2F;p&gt;
&lt;p&gt;In the latter case, because Quine is an event-sourced system, when Quine updates a node, it doesn’t need to look up if the node already exists. Quine appends the update to the existing node, preserving historical versions that can be retrieved using &lt;code&gt;idFrom()&lt;&#x2F;code&gt; with the at.time&lt;&#x2F;p&gt;
&lt;h2 id=&quot;idfrom-and-crud-operations-why-quine-is-so-dang-fast&quot;&gt;&lt;code&gt;idFrom()&lt;&#x2F;code&gt; and CRUD operations: why Quine is so dang fast&lt;&#x2F;h2&gt;
&lt;p&gt;Inasmuch as Quine uses a hash of a value to generate a node ID that is then used for CRUD operations, it bears a superficial similarity between NoSQL key-value stores As long as you know either the ID or the value, it is dead simple to retrieve data from the graph.&lt;&#x2F;p&gt;
&lt;p&gt;However, because of Quine’s in-memory graph structure, it is far more efficient and performant operating on patterns, ranges (e.g. time-ordered), or otherwise related data than key-value databases.&lt;&#x2F;p&gt;
&lt;p&gt;Using the node ID to anchor the query, you specify the edges to traverse to find connected data.&lt;&#x2F;p&gt;
&lt;p&gt;This might be a query to retrieve a node’s properties using node ID (in this case, for &lt;code&gt;revNode&lt;&#x2F;code&gt;):&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;MATCH (n) WHERE strId(n) = &amp;quot;8b290926-271c-3497-b5d6-e30fcf934a73&amp;quot; RETURN id(n), properties(n)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Which delivers these results:&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;uploads-ssl.webflow.com&#x2F;61f0aecf55af2565526f6a95&#x2F;64304fc24ee6bc19661d5172_2CG-zeJuf2_F3PMA9xB0YDD3Z5sVfwvvOYhizA_YLmuizljaR3Zbhf92GENg5hmMVBtTpW1rpcTfrd_jgXhsNhoByQka-zQzba73_7Yw9HLzMB6TN3RjzoUKEONeWqmwY97vY1Po3peGYHG8zScq6js.png&quot; alt=&quot;Screen shot of a node ID and associated properties in json format.&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;If you don’t know a node’s ID, you can query for it using the node’s properties and the &lt;code&gt;strid()&lt;&#x2F;code&gt; function:&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;MATCH (userNode:user {user_is_bot: true}) RETURN DISTINCT strid(userNode)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;uploads-ssl.webflow.com&#x2F;61f0aecf55af2565526f6a95&#x2F;64304fc2737e745309e4e929_J7LhJgqKH_YGchMijoO7Z7KldlHX_yjZ-Wwo0WetHObCI-d955Uub59TGLYqR4iKS1QGCj_4x5n3vdTOoVLYNMyYTjNHZZOabPFOrlRV_zS-fvvdwPax83UoMQvamiIuP0Zkl-XlMM8td2UIPbk8Fsc.png&quot; alt=&quot;Screen shot of a node strID() results -- the node ID as a string.&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;But what about more complex queries – for example, a query that must retrieve multiple related objects. Key-value stores are famously inefficient in this scenario. But this is precisely where Quine’s architectural choices come in. Using an in-memory graph structure means you can query for any node in a subgraph, follow it’s edges, and produce one or more values.&lt;&#x2F;p&gt;
&lt;p&gt;For example, say you want to find all revisions where a bot made an update to the &lt;code&gt;wikidatawiki&lt;&#x2F;code&gt; database:&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;MATCH (userNode:user {user_is_bot:true})-[:MADE]-&amp;gt;(revNode:revision)-[:TO]-&amp;gt;(pageNode:page)-[:IN]-&amp;gt;(dbNode:db {database : &amp;quot;wikidatawiki&amp;quot;})  &lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;RETURN DISTINCT id(revNode) as id, id(userNode) as id2&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;uploads-ssl.webflow.com&#x2F;61f0aecf55af2565526f6a95&#x2F;64304fc2bb63406b1a919b23_9njta8v0AD_pAkFeOhMqEk3sQkA9q3k4jBfmfeYQ3G8uf3dl0Q9Fq_ImR5-iZuQSoEWRJAVDWgWakbr6qhA9hgnuCm3SOZZDb1HupF7LqLDFvIWEbZ81WPf5l20nXiBCc4GOQ5ZtBlWHJqNegYJBBuo.png&quot; alt=&quot;The results of the query -- two nod IDs returned.&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;Either way, it starts with setting the node ID with &lt;code&gt;idFrom()&lt;&#x2F;code&gt; . And &lt;code&gt;idFrom()&lt;&#x2F;code&gt; makes Quine very, very fast.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;standing-queries-and-querying-data-from-the-future-with-idfrom&quot;&gt;Standing queries and querying data from the future with &lt;code&gt;idFrom()&lt;&#x2F;code&gt;&lt;&#x2F;h2&gt;
&lt;p&gt;Standing queries persist inside the graph, monitoring the stream for specific patterns. Propagate them throughout the graph without you ever having to issue a query again. Standing queries persist, monitoring for matches.&lt;&#x2F;p&gt;
&lt;p&gt;Once matches are found, standing queries trigger actions using those results (e.g. report results, execute code, transform other data in the graph, publish data to another source).&lt;&#x2F;p&gt;
&lt;p&gt;To do this, every standing query must have two parts, the &lt;code&gt;pattern&lt;&#x2F;code&gt; portion (what sub-graph you are matching for in the event stream) and the &lt;strong&gt;&lt;code&gt;outputs&lt;&#x2F;code&gt;&lt;&#x2F;strong&gt; portion (the action you wish to take).&lt;&#x2F;p&gt;
&lt;p&gt;Adapted from the recipe used in &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;docs.quine.io&#x2F;getting-started&#x2F;standing-queries-tutorial.html&quot;&gt;Getting Started&lt;&#x2F;a&gt;, here’s a standing query that monitors for non-bot revisions to the &lt;code&gt;enwiki&lt;&#x2F;code&gt; database and outputs these events to the terminal:&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;standingQueries:&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  - pattern:&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;      query: |-&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        MATCH (userNode:user {user_is_bot: false})-[:MADE]-&amp;gt;(revNode:revision {database: &amp;quot;enwiki&amp;quot;})&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        RETURN DISTINCT id(revNode) as id&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;      type: Cypher&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    outputs:&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;      print-output:&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        type: CypherQuery&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        query: |-&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;          MATCH (n)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;          WHERE id(n) = $that.data.id&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;          RETURN properties(n)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        andThen:&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;          type: PrintToStandardOut&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;uploads-ssl.webflow.com&#x2F;61f0aecf55af2565526f6a95&#x2F;6430af91e2d0579a2bd80902_Standing%20Query%20Output.png&quot; alt=&quot;results (json) prining to the console.&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;Standing query matches printing to console.&lt;&#x2F;p&gt;
&lt;p&gt;Because standing queries persist in the graph, incrementally updating partial results as new data arrives, you are not just querying the past and present state, you are setting up queries  for data yet to arrive.&lt;&#x2F;p&gt;
&lt;p&gt;And while &lt;code&gt;idFrom()&lt;&#x2F;code&gt; is a key part of what makes standing queries possible, to really understand what makes Quine function so efficiently as a stream processor, we’ll need to dive into the actor-based, graph-shaped compute model. But that’s for a different post.&lt;&#x2F;p&gt;
&lt;p&gt;Instead, I’ll leave you with a clever use of &lt;code&gt;idFrom()&lt;&#x2F;code&gt; employed by developers at a SaaS company that uses Quine.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;uploads-ssl.webflow.com&#x2F;61f0aecf55af2565526f6a95&#x2F;6430a43f8a585e1ceca8d1b5_76ff5b19de8c4c94353a55d9005d53e4.png&quot; alt=&quot;Color wheel, Color, Pie chart&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;h2 id=&quot;partitioning-key-spaces-for-a-saas-application-using-idfrom&quot;&gt;Partitioning Key Spaces for a SaaS application using &lt;code&gt;idFrom()&lt;&#x2F;code&gt;&lt;&#x2F;h2&gt;
&lt;p&gt;Since you can generate a node ID by passing an arbitrary combination of values to &lt;code&gt;idFrom()&lt;&#x2F;code&gt;, some Quine users with SaaS or internal multi-tenant applications have employed it to partition graphs by customer namespace or similar property.&lt;&#x2F;p&gt;
&lt;p&gt;Sticking with the Wikipedia example, you could create distinct sub-graphs corresponding to each of the database types by adding &lt;code&gt;$that.database&lt;&#x2F;code&gt; as an additional value determining each node ID:&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;MATCH (revNode),(pageNode),(dbNode),(userNode),(parentNode)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;WHERE id(revNode) = idFrom(&amp;quot;revision&amp;quot;, $that.rev_id, $that.database)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  AND id(pageNode) = idFrom(&amp;quot;page&amp;quot;, $that.page_id, $that.database)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  AND id(dbNode) = idFrom(&amp;quot;db&amp;quot;, $that.database)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  AND id(userNode) = idFrom(&amp;quot;id&amp;quot;, $that.performer.user_id, $that.database)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  AND id(parentNode) = idFrom(&amp;quot;revision&amp;quot;, $that.rev_parent_id, $that.database)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;This creates a series of subgraphs partitioned by database and would allow you to be certain that if you query for data related to a specific database, you won’t inadvertently return data from others.&lt;&#x2F;p&gt;
&lt;p&gt;And while the chance of key collision exists, it is &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Universally_unique_identifier#Collisions&quot;&gt;vanishingly small&lt;&#x2F;a&gt;, making this approach suitable for use in multi-tenant SaaS applications.&lt;&#x2F;p&gt;
&lt;p&gt;At any rate, this accomplished what the company wanted: a partitioned graph for data separation, all standing and ad hoc queries work the same across the entire graph, and the only real cost is the discipline of always using the compound key.&lt;&#x2F;p&gt;
&lt;p&gt;Pretty clever.&lt;&#x2F;p&gt;
&lt;p&gt;If any of this inspires you or piques your interest and you want to try Quine yourself, check out &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;docs.quine.io&#x2F;getting-started&#x2F;&quot;&gt;Getting Started&lt;&#x2F;a&gt; docs.&lt;&#x2F;p&gt;
&lt;p&gt;‍&lt;&#x2F;p&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>Using Indicators of Behavior (IoB) Analysis for IoT data</title>
        <published>2023-02-22T00:00:00+00:00</published>
        <updated>2023-02-22T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              Unknown
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://www.thatdot.com/blog/using-indicators-of-behavior-iob-analysis-for-iot-data/"/>
        <id>https://www.thatdot.com/blog/using-indicators-of-behavior-iob-analysis-for-iot-data/</id>
        
        <content type="html" xml:base="https://www.thatdot.com/blog/using-indicators-of-behavior-iob-analysis-for-iot-data/">&lt;p&gt;Data analysis based on Indicators of Behavior (or IoBs) has emerged as the new standard in real-time cybersecurity threat hunting. As data science practices and tooling have evolved to enable IoB analysis, we are finding that the identification of system and&#x2F;or user behavior patterns, especially in real-time, extends beyond the cybersecurity domain to finance, e-commerce, and in particular, Internet of Things (IoT) use cases.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-shift-to-critical-iot-data&quot;&gt;&lt;strong&gt;The Shift to Critical IoT Data&lt;&#x2F;strong&gt;&lt;&#x2F;h2&gt;
&lt;p&gt;Self-driving cars, medical devices, and security monitoring are just a few examples of how IoT solutions are being applied to high stakes or critical use cases. As we leverage IoT for higher value use cases, it becomes apparent that they benefit enormously from real-time data analysis.&lt;&#x2F;p&gt;
&lt;p&gt;Data from last week, yesterday, last night, or even an hour ago, doesn’t help achieve satisfactory outcomes when you’re using data to assist in navigating a car traveling 100 km&#x2F;hr or monitoring an at-risk patient outside a closely monitored hospital setting. As IoT devices handle more critical operations we need more effective tools for monitoring, securing, and interpreting this data in real time.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;iot-data-challenges&quot;&gt;&lt;strong&gt;IoT Data Challenges&lt;&#x2F;strong&gt;&lt;&#x2F;h2&gt;
&lt;p&gt;A hallmark characteristic of IoT devices is their significant resource limitations: everything from processing power and storage to connectivity has historically been in short supply.&lt;&#x2F;p&gt;
&lt;p&gt;The variability of natural environments, difficulty in upgrading firmware remotely, and intermittent communications compound the difficulty to manage networks of IoT devices in the way we’ve come to expect we can manage other large networks of connected devices..&lt;&#x2F;p&gt;
&lt;p&gt;Fortunately, thanks to LTE&#x2F;5G and Moore’s law, IoT devices large and small increasingly have the resources to generate, process, and transmit more substantive data.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-high-cost-of-slow-data&quot;&gt;&lt;strong&gt;The High Cost of Slow Data&lt;&#x2F;strong&gt;&lt;&#x2F;h2&gt;
&lt;p&gt;But this ramp up in the capacity of large numbers of connected devices just relocates the problem of processing data to the other end of the IoT network: to the data processing infrastructure.&lt;&#x2F;p&gt;
&lt;p&gt;As data floods in, systems designed for intermittent trickles either can’t scale, cost too much, or, in an effort to solve both scale and cost, process data only in batches.&lt;&#x2F;p&gt;
&lt;p&gt;When you add together the cost of proprietary device management software (licensing can cost as much as $1M&#x2F;100,000 sensors) and data processing and storage infrastructure (e.g. &lt;strong&gt;SIEMs like Splunk are notoriously expensive&lt;&#x2F;strong&gt;), it is difficult to make a business case that supports the capital investment necessary to adopt and deploy IoT technology. ‍&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;lh7-us.googleusercontent.com&#x2F;docsz&#x2F;AD_4nXeHPbWd4LIjn0DTxFVR41iJ0zhi4okQMpczkFr_nO6P6Ms7n_zXuOfINCtlJsoBffZO32EqcEmeS_tH17JT7jZta1J2XZRfuHXrV97tv_1Nfw7RXHl_ET_Z9I-7Upgoy8MU78Sxr8RWGtzQGkuGCP4k7ufF?key=EowYsb8TwwGYQOu0_Ucwcw&quot; alt=&quot;a diagram of two smart emter scenerios -- one showing a leaking faucet where data is processed in batches and leaks accrue while the other shows a smart faucet with additional event data like weather and temporal&#x2F;behavioral data that allows one to anticipate leaks and prevent waste and loss.&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;Batch processing limited event data means opportunity costs.&lt;&#x2F;p&gt;
&lt;p&gt;And while batch processing is fine for many tasks, finding and acting on sensor information when it can make a positive business impact doesn’t work if that data is processed only after it is too late to take action. Which begs the question: if use cases like preventative maintenance were the primary selling points of IoT, what’s the point?&lt;&#x2F;p&gt;
&lt;h2 id=&quot;iot-is-finally-ready-for-iob&quot;&gt;&lt;strong&gt;IoT is finally ready for IoB&lt;&#x2F;strong&gt;&lt;&#x2F;h2&gt;
&lt;p&gt;With more robust IoT devices collecting and transmitting richer data sets, there is now opportunity to apply real-time IoB detection and the threat detection and operational analysis it can deliver. With IoBs, security teams can watch for patterns of system and user behavior that indicate a notable event (cyber attack, upsell opportunity, or churn risk) is in process, rather than waiting to find evidence after the fact.&lt;&#x2F;p&gt;
&lt;p&gt;When watching for IoBs , teams can build predictive models that limit negative impacts and capitalize on opportunities. As we look at increasingly complex IoT data we can apply this next generation of thinking to build state of the art analysis that improves the confidence and timeliness of IoT data analysis.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;applying-iob-analysis-to-iot-data&quot;&gt;&lt;strong&gt;Applying IoB Analysis to IoT Data&lt;&#x2F;strong&gt;&lt;&#x2F;h2&gt;
&lt;p&gt;As I described in my recent blog on &lt;strong&gt;the use of IoBs in modern Cyber Security Threat Hunting&lt;&#x2F;strong&gt;, IoBs draw upon the context of system and user behavior to more reliably identify attack threats. IoBs can as easily be applied to the behavior of “an about to fail” disc brake or “trending towards failure” heart valve. Importantly, the richer context behavior analysis provides &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;www.techrepublic.com&#x2F;article&#x2F;cybersecurity-pros-should-switch-from-indicators-of-compromise-to-indicators-of-behavior&#x2F;&quot;&gt;&lt;strong&gt;significantly fewer false positives&lt;&#x2F;strong&gt;&lt;&#x2F;a&gt; than traditional analysis due to the added context provided by analyzing categorical data elements used to describe behaviors.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;comparing-legacy-batch-processing-of-metrics-with-real-time-iob-analysis&quot;&gt;&lt;strong&gt;Comparing Legacy Batch Processing of Metrics with Real-time IoB Analysis&lt;&#x2F;strong&gt;&lt;&#x2F;h3&gt;
&lt;p&gt;&lt;strong&gt;Monitoring Use Cases&lt;&#x2F;strong&gt;&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;1. Delivery Truck Monitoring&lt;&#x2F;strong&gt;&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Traditional Metrics Example (batch):&lt;&#x2F;strong&gt;
&lt;ul&gt;
&lt;li&gt;Hourly or daily metrics on vehicle speed, idle time, and brake application.&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;IoB Analytics Example (real time):&lt;&#x2F;strong&gt;
&lt;ul&gt;
&lt;li&gt;By driver, identify patterns of rapid acceleration, turns, and braking indicating aggressive driving or an accident.&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;&lt;strong&gt;2. Utility Meter Monitoring&lt;&#x2F;strong&gt;&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Traditional Metrics Example (batch):&lt;&#x2F;strong&gt;
&lt;ul&gt;
&lt;li&gt;Daily or even monthly batch analysis of usage to identify potential leaks or resource theft.&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;IoB Analytics Example (real time):&lt;&#x2F;strong&gt;
&lt;ul&gt;
&lt;li&gt;Real-time inputs for predictive modeling of shortages to avoid brownouts or identify leaks.&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;&lt;strong&gt;3. Device Security Monitoring&lt;&#x2F;strong&gt;&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Traditional Metrics Example (batch):&lt;&#x2F;strong&gt;
&lt;ul&gt;
&lt;li&gt;Periodic metrics that report login attempts, source IP, time of day, bytes transferred.&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;IoB Analytics Example (real time):&lt;&#x2F;strong&gt;
&lt;ul&gt;
&lt;li&gt;Identify patterns of login success and failures, based on source IPs at different times of day over an extended time period.&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;&lt;strong&gt;4. Medical Device Monitoring&lt;&#x2F;strong&gt;&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Traditional Metrics Example (batch):&lt;&#x2F;strong&gt;
&lt;ul&gt;
&lt;li&gt;Periodic metrics on patient temperature, glucose, heart rate, respiration.&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;IoB Analytics Example (real time):&lt;&#x2F;strong&gt;
&lt;ul&gt;
&lt;li&gt;Identify patterns of vital signs at different hours of the day to alert on deviations from expected patterns.&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;h2 id=&quot;iob-analysis-requires-new-tools&quot;&gt;&lt;strong&gt;IoB Analysis Requires New Tools&lt;&#x2F;strong&gt;&lt;&#x2F;h2&gt;
&lt;p&gt;The evolution from reactive batch processing of data to a real-time IoB-based approach requires a new set of technical capabilities along with the tools to deliver them. At minimum, a system must be able to:&lt;&#x2F;p&gt;
&lt;ol&gt;
&lt;li&gt;Easily combine streams of real-time and historical data.&lt;&#x2F;li&gt;
&lt;li&gt;Process both &lt;strong&gt;categorical and numerical variety.&lt;&#x2F;strong&gt;&lt;&#x2F;li&gt;
&lt;li&gt;Recognize and act on known patterns within this combination of categorical and numerical data, which typically requires a graph database or system that uses graph’s connected data structures.&lt;&#x2F;li&gt;
&lt;li&gt;Continuously analyze the constant flows of data for emerging behaviors, whether they signal threats or opportunities.&lt;&#x2F;li&gt;
&lt;li&gt;Update the system with newly identified threats and grow the number of IoBs for which to monitor.&lt;&#x2F;li&gt;
&lt;&#x2F;ol&gt;
&lt;p&gt;The advent of ground-breaking streaming graph technology has emerged to meet the need: thatDot’s Quine streaming graph and Novelty Detector. thatDot’s open source Quine streaming graph aligns with the requirement to ingest multiple data streams of both categorical and numerical data.&lt;&#x2F;p&gt;
&lt;p&gt;In fact, because it is a graph data processor, Quine is the only real-time event processor that works natively with categorical data, which makes it much easier to express IoBs as patterns. Because of Quine’s unique architecture, it can monitor streams for IoBs, detect their fingerprint patterns the instant they occur, and take immediate action, often in the form of predefined business rules or remediations. The work flow looks as follow:&lt;&#x2F;p&gt;
&lt;ol&gt;
&lt;li&gt;Event sources are ingested from any common event stream queue, including Apache Kafka, AWS Kinesis, AWS SQS, or Apache Pulsar&#x2F;DataStax Astra Streaming.&lt;&#x2F;li&gt;
&lt;li&gt;IoBs are defined in Quine as &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;docs.quine.io&#x2F;components&#x2F;writing-standing-queries.html&quot;&gt;&lt;strong&gt;standing queries, the watchdog queries that monitor the streams for important patterns.&lt;&#x2F;strong&gt;&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;li&gt;Quine analyzes newly arriving events for matches against IoB pattern definitions. Partial matches are identified and stored for any desired period of time to watch for behaviors that occur incrementally over longer time frames.&lt;&#x2F;li&gt;
&lt;li&gt;When Quine detects a full IoB pattern match, it generates a new event that is associated with a pre-defined set of business rules: sending alerts to a SOC or passing an upsell offer to a user are a few examples.&lt;&#x2F;li&gt;
&lt;li&gt;All data is also passed through &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;docs.thatdot.com&#x2F;novelty&#x2F;&quot;&gt;Novelty Detector&lt;&#x2F;a&gt;, a real-time shallow learning algorithm that identifies novel and notable behaviors that can be turned into IoBs if deemed necessary.&lt;&#x2F;li&gt;
&lt;&#x2F;ol&gt;
&lt;p&gt;The data flow looks like this:&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;lh7-us.googleusercontent.com&#x2F;docsz&#x2F;AD_4nXc6o79SnxjfUPyZUZe-zASocI2jTFX5V5346a58Fg7Hwu8Wgm9PauttndLc9pQv8MNml_pwB5FIAgFUU79VJg8UhfT4b-OuCXwNE_Es7uWfi9JVFXwO-IBohFO21Bvs83Rb9relFG5itvkVUB8PDlkCJMb-?key=EowYsb8TwwGYQOu0_Ucwcw&quot; alt=&quot;Real time identification of Known Indicator of behavior in event streams&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;The modern IoB workflow using Quine Enterprise.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;streaming-graph-delivers-end-to-end-iobs-for-iot&quot;&gt;&lt;strong&gt;Streaming Graph Delivers End-To-End IoBs for IoT&lt;&#x2F;strong&gt;&lt;&#x2F;h2&gt;
&lt;p&gt;Quine is available in both open source and enterprise editions. However, Novelty Detector is available either in the AWS marketplace or under license as part of &lt;strong&gt;thatDot Streaming Graph&lt;&#x2F;strong&gt;. Streaming Graph offers large organizations and connected device manufacturers both the clustered, resilient version of Quine and Novelty Detector. It is meant for production applications where resilience, query performance, and throughput matter. Resilient clustering includes support for hot spares and distribution across multiple availability zones. We recently shared reproducible tests demonstrating both scale (thatDot Streaming Graph easily processed one million 4-node graph events&#x2F;second) and resilience in the face of node failure. You can &lt;a href=&quot;https:&#x2F;&#x2F;www.thatdot.com&#x2F;blog&#x2F;scaling-quine-streaming-graph-to-process-1-million-events-sec&#x2F;&quot;&gt;&lt;strong&gt;read about the tests here&lt;&#x2F;strong&gt;&lt;&#x2F;a&gt;.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;try-it-yourself&quot;&gt;&lt;strong&gt;Try It Yourself&lt;&#x2F;strong&gt;&lt;&#x2F;h2&gt;
&lt;p&gt;If you want to try it on your own, here are some resources to help:&lt;&#x2F;p&gt;
&lt;ol&gt;
&lt;li&gt;Download Quine – &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;quine.io&#x2F;download&quot;&gt;&lt;strong&gt;JAR file&lt;&#x2F;strong&gt;&lt;&#x2F;a&gt; | &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;hub.docker.com&#x2F;r&#x2F;thatdot&#x2F;quine&quot;&gt;&lt;strong&gt;Docker Image&lt;&#x2F;strong&gt;&lt;&#x2F;a&gt; | &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;thatdot&#x2F;quine&quot;&gt;&lt;strong&gt;Github&lt;&#x2F;strong&gt;&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;li&gt;Check out the &lt;strong&gt;Ingest Data into Quine&lt;&#x2F;strong&gt; blog series covering everything from &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;docs.quine.io&#x2F;components&#x2F;ingest-sources&#x2F;kafka.html&quot;&gt;&lt;strong&gt;ingest from Kafka&lt;&#x2F;strong&gt;&lt;&#x2F;a&gt; to ingesting .CSV data&lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;quine.io&#x2F;recipes&#x2F;cdn-cache-efficiency-by-segment&quot;&gt;&lt;strong&gt;‍&lt;&#x2F;strong&gt;&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a href=&quot;https:&#x2F;&#x2F;www.thatdot.com&#x2F;blog&#x2F;network-log-analysis-using-categorical-anomaly-detection&#x2F;&quot;&gt;&lt;strong&gt;Novelty Detector for Log Analysis&lt;&#x2F;strong&gt;&lt;&#x2F;a&gt; – learn more about Novelty Detector in this blog showing how to use it to analyze system logs to detect novel anomalous behavior.&lt;&#x2F;li&gt;
&lt;&#x2F;ol&gt;
&lt;p&gt;Original photo for header by &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;unsplash.com&#x2F;@sanderweeteling?utm_source=unsplash&amp;amp;utm_medium=referral&amp;amp;utm_content=creditCopyText&quot;&gt;&lt;strong&gt;Sander Weeteling&lt;&#x2F;strong&gt;&lt;&#x2F;a&gt; on &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;unsplash.com&#x2F;s&#x2F;photos&#x2F;networks?utm_source=unsplash&amp;amp;utm_medium=referral&amp;amp;utm_content=creditCopyText&quot;&gt;&lt;strong&gt;Unsplash&lt;&#x2F;strong&gt;&lt;&#x2F;a&gt;&lt;&#x2F;p&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>Quine’s Real-time Temporal Event Sequencing Produces New Insights</title>
        <published>2023-02-08T00:00:00+00:00</published>
        <updated>2023-02-08T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              Unknown
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://www.thatdot.com/blog/quines-real-time-temporal-event-sequencing-produces-new-insights/"/>
        <id>https://www.thatdot.com/blog/quines-real-time-temporal-event-sequencing-produces-new-insights/</id>
        
        <content type="html" xml:base="https://www.thatdot.com/blog/quines-real-time-temporal-event-sequencing-produces-new-insights/">&lt;p&gt;One of the fundamental advantages of Quine’s architecture compared to other complex event stream processing technologies, like Flink and ksqlDB, is that it is not constrained by time windows. We demonstrated the value of this capability in the “Are You Ready for Low and Slow Auth Attacks?” blog, where we demonstrated how you can use Quine to identify password spraying attacks that take place over extended periods, defeating legacy detection mechanisms constrained by time windowing.&lt;&#x2F;p&gt;
&lt;p&gt;But what about cases where the sequence of events is critical to detecting and investigating interesting incidents?  For example, when performing root cause analysis (RCA) for performance issues in a NOC or security incidents in a SOC, the temporal ordering of events is often as important as the events themselves.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;event-sequencing-in-real-time-a-streaming-graph-strength&quot;&gt;Event Sequencing in Real Time: A Streaming Graph Strength&lt;&#x2F;h2&gt;
&lt;p&gt;Event sequencing can provide key information for accurate and timely detection and analysis, even in the most complex cases where causality and temporal ordering are difficult to ascertain. The key is architecting a graph structure that can most effectively answer your questions and produce insights.&lt;&#x2F;p&gt;
&lt;p&gt;In the case of a streaming graph solution like Quine, this means modeling the graph so queries can effectively traverse nodes and edges natively, which is always more efficient than path matching based on node properties like timestamps.  This is because the relations between nodes (edges) are persisted in the nodes themselves.&lt;&#x2F;p&gt;
&lt;p&gt;We like using an event sequencing technique to explicitly identify order based on a pattern match detected by one of Quine’s most powerful features, the standing query. (&lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;docs.quine.io&#x2F;getting-started&#x2F;standing-queries-tutorial.html&quot;&gt;Standing queries&lt;&#x2F;a&gt; monitor streams for specified patterns, maintaining partial matches, and executing user-specified actions the instant a full match is made.)&lt;&#x2F;p&gt;
&lt;p&gt;We demonstrate this technique in the APT (Advanced Persistent Threat) Detection recipe (&lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;quine.io&#x2F;recipes&#x2F;apt-detection&quot;&gt;https:&#x2F;&#x2F;quine.io&#x2F;recipes&#x2F;apt-detection&lt;&#x2F;a&gt;) to create sequence edges as Quine ingests EDR (Endpoint Detection and Response) and network traffic logs while monitoring for an Indicator of Behavior (IoB) that matches malicious data exfiltration patterns.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;uploads-ssl.webflow.com&#x2F;61f0aecf55af2565526f6a95&#x2F;63e2c9735c16f470cae4c958_nboM1LFzWS5gFaUnLglquonuBXq3U8DRFPTSYFenp55GJK9I6algi_etgsf0Iox8Zjddh8RabNBPpVJMmo44tjvbVrnqokRMc9ff5VUzqh-PFrxlqszsA-Eo84zzUkcmQFFTprhZDrzHHPVmciDsmzs.png&quot; alt=&quot;A screenshot from the APT Detection recipe that reads as follows: SCENARIO:   Using a standing query, the recipe monitors for covert interprocess   communication using a file to pass data. When that pattern is matched, with a   network SEND event, we have our smoking gun and a URL is logged linking to   the Quine Exploration UI with the full activity and context for investigation.&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;Our approach to this technique has four key components.&lt;&#x2F;p&gt;
&lt;ol&gt;
&lt;li&gt;Model a behavioral pattern as a subgraph&lt;&#x2F;li&gt;
&lt;li&gt;Develop Cypher to match the subgraph in the event stream&lt;&#x2F;li&gt;
&lt;li&gt;Encode the event sequence into the graph&lt;&#x2F;li&gt;
&lt;li&gt;Emit an alert containing a &lt;code&gt;linkURL&lt;&#x2F;code&gt; to the subgraph inside the &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;docs.quine.io&#x2F;getting-started&#x2F;exploration-ui.html&quot;&gt;Quine Exploration UI&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;&#x2F;ol&gt;
&lt;p&gt;Our concern is not the timeframe of events (how quickly they happen). Rather, our focus is locating a specific sequence of events in order – &lt;strong&gt;&lt;code&gt;WRITE-&amp;gt;READ-&amp;gt;SEND-&amp;gt;DELETE&lt;&#x2F;code&gt;&lt;&#x2F;strong&gt; – regardless of the time interval across which the events occurred.&lt;&#x2F;p&gt;
&lt;p&gt;A subgraph like the one below can model the data exfiltration event from the APT Detection recipe.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;uploads-ssl.webflow.com&#x2F;61f0aecf55af2565526f6a95&#x2F;63e2c973fa0cfa5c15a3fcc2_3Sv5pdUDG1Zbiv15Y_YDdfDGBjLw5JpWpOuf6O0ypB44-FdpaXwgf9pmzB8XeN_qmucIOwzdhtrDhS9z8ExZVr-SN3SWtQXKIGs4U-tSQ_ohyhwqKT9WjSGozh4_BNcF96yvGmJ2BGCGX1jAfyuyKi0.png&quot; alt=&quot;&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;An initial data subgraph produced by the ADP Detection recipe.&lt;&#x2F;p&gt;
&lt;p&gt;Based on the model, we develop Cypher to match the subgraph in the &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;docs.quine.io&#x2F;components&#x2F;writing-standing-queries.html#match-query&quot;&gt;pattern match section&lt;&#x2F;a&gt; of a Quine standing query:&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;MATCH (e1)-[:EVENT]-&amp;gt;(f)&amp;lt;-[:EVENT]-(e2), &lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;      (f)&amp;lt;-[:EVENT]-(e3)&amp;lt;-[:EVENT]-(p2)-[:EVENT]-&amp;gt;(e4)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;WHERE e1.type = &amp;quot;WRITE&amp;quot;&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  AND e2.type = &amp;quot;READ&amp;quot;&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  AND e3.type = &amp;quot;DELETE&amp;quot;&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  AND e4.type = &amp;quot;SEND&amp;quot;&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;RETURN DISTINCT id(f) as fileId&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Next augment the subgraph in the &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;docs.quine.io&#x2F;components&#x2F;writing-standing-queries.html#output-action&quot;&gt;standing query output&lt;&#x2F;a&gt; to overlay sequencing with the CREATE clause, adding NEXT edges between the key nodes:&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;MATCH (p1)-[:EVENT]-&amp;gt;(e1)-[:EVENT]-&amp;gt;(f)&amp;lt;-[:EVENT]-(e2)&amp;lt;-[:EVENT]-(p2),&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;(f)&amp;lt;-[:EVENT]-(e3)&amp;lt;-[:EVENT]-(p2)-[:EVENT]-&amp;gt;(e4)-[:EVENT]-&amp;gt;(ip)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  WHERE id(f) = $that.data.fileId&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  AND e1.type = &amp;quot;WRITE&amp;quot;&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  AND e2.type = &amp;quot;READ&amp;quot;&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  AND e3.type = &amp;quot;DELETE&amp;quot;&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  AND e4.type = &amp;quot;SEND&amp;quot;&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  AND e1.time &amp;lt; e2.time&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  AND e2.time &amp;lt; e3.time&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  AND e2.time &amp;lt; e4.time&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;CREATE (e1)-[:NEXT]-&amp;gt;(e2)-[:NEXT]-&amp;gt;(e4)-[:NEXT]-&amp;gt;(e3)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;The transformed subgraph in Quine becomes this.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;uploads-ssl.webflow.com&#x2F;61f0aecf55af2565526f6a95&#x2F;63e2c97319c122eff3cbf0a2_wLo5Q7jFhqhC3nVRWsdWXCA_QnEm-0Rbo7MD6FiRHLuKUj7upFheeV4lu07BYqeGA-urxkRdbQ6mf9SJ8PUfBxkbIZW084_9NHmHQ8j_MPqNSHWabxj6f9IjZTO_GyESO7AmbnfDYG_raENbMit_fjo.png&quot; alt=&quot;&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;There are three important things to note here:&lt;&#x2F;p&gt;
&lt;ol&gt;
&lt;li&gt;The synthetic &lt;code&gt;NEXT&lt;&#x2F;code&gt; edges only exist after the standing query match creates them&lt;&#x2F;li&gt;
&lt;li&gt;The &lt;code&gt;NEXT&lt;&#x2F;code&gt; edge labels enable us to efficiently traverse the &lt;code&gt;WRITE-&amp;gt;READ-&amp;gt;SEND-&amp;gt;DELETE&lt;&#x2F;code&gt; path with a simple Cypher query.&lt;&#x2F;li&gt;
&lt;li&gt;Temporal sequencing is even more difficult when dealing with multiple input sources.  Imagine matching the &lt;code&gt;WRITE-&amp;gt;READ-&amp;gt;SEND-&amp;gt;DELETE&lt;&#x2F;code&gt; pattern where the write, read and delete events come from one source and the send from another. Quine makes it easy to &lt;a href=&quot;https:&#x2F;&#x2F;www.thatdot.com&#x2F;blog&#x2F;ingesting-from-multiple-data-sources-into-quine-streaming-graph&#x2F;&quot;&gt;combine multiple event sources&lt;&#x2F;a&gt;.&lt;&#x2F;li&gt;
&lt;&#x2F;ol&gt;
&lt;p&gt;Once Quine identifies the event, we can explore the graph further with queries like the following:&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;MATCH (n)-[:NEXT*]-&amp;gt;(m)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;WHERE strId(n)=&amp;quot;20b2059e-19c5-3ab6-b465-fe3593c45bc8&amp;quot;&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;RETURN DISTINCT collect(m),n&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;uploads-ssl.webflow.com&#x2F;61f0aecf55af2565526f6a95&#x2F;63e2c9735b7e5c211ecaebac_XEV-_mdX3Jm07X3g_JdUKdvuQC78W7cfjNXX7QWtUyRtI5p2BKm2BMLHhFD8IdopqimOZn9n1gEfeFe_NHhP3PXULsTVqLCSqqOBEYBma6OXFyCR5mIs8yDBdunkc8l2Awvl5yMqaM1oj9quWI9KT4Q.png&quot; alt=&quot;&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;The final output -- the &lt;strong&gt;&lt;code&gt;WRITE --&amp;gt; READ --&amp;gt; SEND --&amp;gt; DELETE&lt;&#x2F;code&gt;&lt;&#x2F;strong&gt; subgraph.&lt;&#x2F;p&gt;
&lt;p&gt;As an alternative, use the &lt;code&gt;&#x2F;api&#x2F;v1&#x2F;query&#x2F;cypher&#x2F;nodes&lt;&#x2F;code&gt; API endpoint to build a dictionary of malicious file names.&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;[&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  {&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    &amp;quot;id&amp;quot;: &amp;quot;f00ae947-3dd5-3c92-a84f-118b401c80f1&amp;quot;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    &amp;quot;hostIndex&amp;quot;: 0,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    &amp;quot;label&amp;quot;: &amp;quot;ID: f00ae947-3dd5-3c92-a84f-118b401c80f1&amp;quot;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    &amp;quot;properties&amp;quot;: {&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;      &amp;quot;data&amp;quot;: &amp;quot;&#x2F;tmp&#x2F;miscellaneous.data&amp;quot;&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    }&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  }&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;]&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;You can even use &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;docs.quine.io&#x2F;reference&#x2F;rest-api.html#&#x2F;schemas&#x2F;com.thatdot.quine.routes.QuickQuery&quot;&gt;quick queries&lt;&#x2F;a&gt; to follow the &lt;code&gt;NEXT&lt;&#x2F;code&gt; edges in the exploration UI to find actions that occurred earlier in the event timeline.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;uploads-ssl.webflow.com&#x2F;61f0aecf55af2565526f6a95&#x2F;63e2c97310b1157df2aa5d12_n--AlF6sgP7fW2FS_Q7nRhAKdyxDFKhClMsjL3VxdH4XJPq2GgVqRURGvqjQ-adswUXhi75mVvk0A6DDRdvc9LJDT-LCRH6_bTjpO8qxNoZRCejNFzD5HmP8Bme0jMfn_xl2hJdsqAEedsYLcncXsUw.png&quot; alt=&quot;A diagram of an expanded graph showing nodes related to the core WRITE--&amp;gt;READ--&amp;gt;SEND--&amp;gt;DELETE results, allowing root cause analysis.&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;Results using quick queries to explore the final subgraph for root cause analysis.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;event-sequencing-benefits-from-planning&quot;&gt;Event Sequencing Benefits From Planning&lt;&#x2F;h2&gt;
&lt;p&gt;Processing event streams using a streaming graph like Quine requires adjusting how you think about your data. For example, when the recipe we used in this post was first developed, it was focused on evaluating a single concern; find a specific subgraph. This required a simple plan for creating node IDs.&lt;&#x2F;p&gt;
&lt;p&gt;In the original use case, the choice was to create nodes using &lt;code&gt;idFrom()&lt;&#x2F;code&gt; in its most basic form &lt;code&gt;id(event) = idFrom($that)&lt;&#x2F;code&gt;, which was completely reasonable at the time. Now, asking a more complex question, &quot;Show me any process that interacts with a file named &lt;code&gt;&#x2F;tmp&#x2F;miscellaneous.data&lt;&#x2F;code&gt; &quot; is more difficult because the node ID namespace plan did not include using individual node parameters. This is something to keep in mind when you plan your streaming event graph!&lt;&#x2F;p&gt;
&lt;p&gt;Temporal data doesn&#x27;t always need to be tied to timestamps. Instead, you can use temporal categories – &lt;em&gt;morning&#x2F;afternoon&#x2F;night, before&#x2F;after&lt;&#x2F;em&gt;, etc.  Many use cases, like our data exfiltration scenario, are built from understanding the sequence of events as a subgraph. What temporal use cases do you have that could benefit from detection using graph analysis, and how long does it take to detect those patterns today?&lt;&#x2F;p&gt;
&lt;h2 id=&quot;try-for-yourself&quot;&gt;&lt;strong&gt;Try for Yourself&lt;&#x2F;strong&gt;&lt;&#x2F;h2&gt;
&lt;p&gt;If you want to try Quine using your own data, here are some resources to help:&lt;&#x2F;p&gt;
&lt;ol&gt;
&lt;li&gt;Download Quine &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;quine.io&#x2F;download&quot;&gt;JAR&lt;&#x2F;a&gt;| &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;hub.docker.com&#x2F;r&#x2F;thatdot&#x2F;quine&quot;&gt;Docker Image&lt;&#x2F;a&gt; | &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;thatdot&#x2F;quine&quot;&gt;Github&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;li&gt;Start learning about Quine now by visiting the &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;quine.io&#x2F;&quot;&gt;Quine open source project&lt;&#x2F;a&gt;.&lt;&#x2F;li&gt;
&lt;li&gt;Check out the Ingest Data into Quine blog series covering everything from &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;docs.quine.io&#x2F;components&#x2F;ingest-sources&#x2F;kafka.html&quot;&gt;ingest from Kafka&lt;&#x2F;a&gt; to ingesting .CSV data&lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;quine.io&#x2F;recipes&#x2F;cdn-cache-efficiency-by-segment&quot;&gt;‍&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;quine.io&#x2F;recipes&#x2F;apt-detection&quot;&gt;APT Detection recipe&lt;&#x2F;a&gt; - this recipe, referenced above, demonstrates the ability of streaming graphs to process event data without time windows.&lt;&#x2F;li&gt;
&lt;&#x2F;ol&gt;
&lt;p&gt;‍&lt;&#x2F;p&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>Graph Neural Networks for Quine</title>
        <published>2023-01-21T00:00:00+00:00</published>
        <updated>2023-01-21T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              Unknown
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://www.thatdot.com/blog/graph-neural-networks-for-quine/"/>
        <id>https://www.thatdot.com/blog/graph-neural-networks-for-quine/</id>
        
        <content type="html" xml:base="https://www.thatdot.com/blog/graph-neural-networks-for-quine/">&lt;h2 id=&quot;introduction&quot;&gt;&lt;strong&gt;Introduction&lt;&#x2F;strong&gt;&lt;&#x2F;h2&gt;
&lt;p&gt;&lt;em&gt;“Which records are the most similar to this one?”&lt;&#x2F;em&gt;&lt;&#x2F;p&gt;
&lt;p&gt;That’s a straightforward question, but it hides some very thorny problems! Similar in what way? How do we measure similarity? How do we incorporate different attributes? How do we weigh different values? How do we incorporate similar relationships to other similar records?&lt;&#x2F;p&gt;
&lt;p&gt;Believe it or not, solving the similarity problem is more manageable than answering those questions. This is a perfect job for machine learning! New research in the last few years has brought natural language processing (NLP) tools to graphs as Graph Neural Networks (GNNs).&lt;&#x2F;p&gt;
&lt;p&gt;Graph A.I. is starting to leave the research lab and enable critical new use cases in the industry. From cybersecurity to fin-tech, to social networks, to ad placement, applications for graph A.I. are sweeping across industries.&lt;&#x2F;p&gt;
&lt;p&gt;While demonstrations on small datasets have proven the effectiveness and value of graph A.I. techniques, operationalizing graph A.I. in large graphs or with streaming data has been a significant obstacle!&lt;&#x2F;p&gt;
&lt;p&gt;The latest release of Quine includes built-in functionality for many of the essential graph A.I. algorithms. This changes with the 1.5 release of Quine. This post shows how to quickly solve tricky questions like “similarity” using Quine’s new random walk graph algorithm feature.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;take-a-walk&quot;&gt;&lt;strong&gt;Take a Walk&lt;&#x2F;strong&gt;&lt;&#x2F;h3&gt;
&lt;p&gt;Random walks are often the central connection between graph-structured data and machine learning applications. A random walk is a string of data produced by starting at a graph node, following one of its edges randomly to reach another node, then following one of its edges randomly to reach another node, and so on. Walks let us translate the possibly-infinite dimensions of graph data into linear strings we can feed to graph neural networks.&lt;&#x2F;p&gt;
&lt;p&gt;Generating one random walk in Quine 1.5 can be done by calling a function in a Cypher query:&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;MATCH (n) CALL random.walk(n, 10) YIELD walk RETURN id(n), walk LIMIT 1&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Or through an API call (if you know the ID of a node on which to start):&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;curl --request GET --url http:&#x2F;&#x2F;localhost:8080&#x2F;api&#x2F;v1&#x2F;algorithm&#x2F;walk&#x2F;b93eabe7-38d2-30fa-8f96-097d75eb1f50&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;&lt;h3 id=&quot;take-all-the-walks&quot;&gt;‍&lt;strong&gt;Take All the Walks&lt;&#x2F;strong&gt;&lt;&#x2F;h3&gt;
&lt;p&gt;The previous examples start from one node and return one random walk. But graph A.I. usually requires building many walks from every node in the graph. To support this, Quine includes an API that will generate all random walks for an entire graph—regardless of how large the graph is.&lt;&#x2F;p&gt;
&lt;p&gt;Quine is a “&lt;em&gt;streaming graph&lt;&#x2F;em&gt;” that operates on continuous data streams. With an API call, you can direct Quine to stream all the random walks from every node in the graph to a file stored locally or in an S3 bucket.&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;curl --request PUT --url http:&#x2F;&#x2F;localhost:8080&#x2F;api&#x2F;v1&#x2F;algorithm&#x2F;walk --header &amp;quot;Content-Type: application&#x2F;json&amp;quot; --data &amp;quot;{ &amp;#39;bucketName&amp;#39;: &amp;#39;your-s3-bucket-name&amp;#39;, &amp;#39;type&amp;#39;: &amp;#39;S3Bucket&amp;#39; }&amp;quot;&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;&lt;h3 id=&quot;remember-when&quot;&gt;‍&lt;strong&gt;Remember When&lt;&#x2F;strong&gt;&lt;&#x2F;h3&gt;
&lt;p&gt;Most use cases for Quine include continuously-running data ingests that constantly modify the graph. To correctly generate a set of random walks, you need the graph to hold still while collecting random walk data. That’s hard to do in a stream that is constantly being updated with every new record that streams in.&lt;&#x2F;p&gt;
&lt;p&gt;Fortunately, it is straightforward to query a fixed graph in Quine using the historical query functionality. Include a historical timestamp with the &lt;code&gt;at-time&lt;&#x2F;code&gt; parameter in your query to generate random walks from the graph as it was at that fixed historical moment. **The rest of the graph can keep changing, but the GNN will see walks from a consistent and fixed view of the graph.**&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;curl --request PUT --url &amp;quot;http:&#x2F;&#x2F;localhost:8080&#x2F;api&#x2F;v1&#x2F;algorithm&#x2F;walk?at-time=$(date +%s000)&amp;quot; --header &amp;#39;Content-Type: application&#x2F;json&amp;#39; --data &amp;#39;{ &amp;quot;bucketName&amp;quot;: &amp;quot;your-s3-bucket-name&amp;quot;, &amp;quot;type&amp;quot;: &amp;quot;S3Bucket&amp;quot; }&amp;#39;&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;&lt;h2 id=&quot;graph-neural-networks&quot;&gt;&lt;strong&gt;Graph Neural Networks&lt;&#x2F;strong&gt;&lt;&#x2F;h2&gt;
&lt;p&gt;Graph neural networks (GNNs) are a way to combine the power of graph-structured data and cutting edge machine learning techniques. Research on GNNs is continuing rapidly, but several foundational techniques have become critical to modern applications of artificial intelligence on graph data.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;node2vec&quot;&gt;&lt;strong&gt;Node2Vec&lt;&#x2F;strong&gt;&lt;&#x2F;h3&gt;
&lt;p&gt;The Node2Vec algorithm is a important technique for applying some revolutionary machine learning techniques to graph data. &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;cs.stanford.edu&#x2F;~jure&#x2F;pubs&#x2F;node2vec-kdd16.pdf&quot;&gt;The initial Node2Vec paper&lt;&#x2F;a&gt; by Grover and Leskovec appeared in 2016. It was an important paper which demonstrated how random walks on a graph behave like sentences in a corpus of natural language text. So by generating random walks, you could apply NLP (natural language processing) techniques and neural networks to graph data.&lt;&#x2F;p&gt;
&lt;p&gt;The NLP technique used by Node2Vec is Word2Vec. In 2013, &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;arxiv.org&#x2F;pdf&#x2F;1301.3781.pdf&quot;&gt;Mikolov et. al published&lt;&#x2F;a&gt; a landmark result showing that words from natural language can be “embedded” into a high-dimensional vector space. (Imagine plotting a point on a two-dimensional X-Y plane, but instead of 2 dimensions X and Y, put the dot in a plot with dozens, thousands, —or in the case of Large Language Models (LLMs)—hundreds of billions of dimensions!) The main take-away from this work is that word meanings can be learned by a computer and used mathematically. For instance, after training on English text, you can take the word “king”, subtract “man”, add “woman” and arrive at the word “queen”. Word2Vec works incredibly well for natural language; Node2Vec then took it a step further and applied this technique to graph data.&lt;&#x2F;p&gt;
&lt;p&gt;Random walks on a graph are the link that allows us to use Word2Vec for Node2Vec. A random walk generates a string of data from the graph. That string of data is analogous to a sentence in natural language. Producing many random walks gives us data structured like a set of documents containing many sentences; this is the “corpus” that trains the NLP model. So if we can produce random walks from a graph, we can train a neural network to learn the meaning of the data in that graph.&lt;&#x2F;p&gt;
&lt;p&gt;The random walk APIs in Quine 1.5+ allow a user to tune the random walks as described in the Node2Vec paper. The &lt;code&gt;return&lt;&#x2F;code&gt; parameter determines how likely a walk is to return one step back where it came from (to “backtrack” to the previous node). The &lt;code&gt;in-out&lt;&#x2F;code&gt; parameter determines whether a walk is more likely to explore the local region (“neighborhood”) around a node or travel far afield to explore corners of the graph far away. These parameters can tune the walks to learn different features of the graph and address different goals.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;graphsage&quot;&gt;&lt;strong&gt;GraphSAGE&lt;&#x2F;strong&gt;&lt;&#x2F;h3&gt;
&lt;p&gt;Whereas Node2Vec used random walks to generate a list of node IDs as the “corpus” to train Word2Vec, a richer approach was developed in the &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;arxiv.org&#x2F;pdf&#x2F;1706.02216.pdf&quot;&gt;GraphSAGE&lt;&#x2F;a&gt; algorithm. The primary development in GraphSAGE is to include the ability to explore the local area of each node in a walk, then aggregate features from each of the nearby nodes back to the starting node. These features then get concatenated together and used in the learning process for representing nodes in a graph. This provides two big improvements vs. Node2Vec: 1.) it provides a lot more information for each node (Node2Vec uses only the node’s ID), and 2.) it learns a function which can be used to embed unseen nodes—making it much more useful in practical situations where we need to compare new nodes not present in the original training set.&lt;&#x2F;p&gt;
&lt;p&gt;Embedding unseen nodes is important for streaming data workflows. It means that we can train a neural network on a consistent view of the graph at one moment in time, and then use that trained network to interpret new data (nodes) which stream in after the training. We can retrain the network any time we like, but we don’t have to skip over streaming data arriving in real-time. Every node can be embedded in real-time!&lt;&#x2F;p&gt;
&lt;p&gt;Quine’s random walk generation includes the ability to define an aggregation query for each node encountered in a random walk. This can be used to explore the local neighborhood and&#x2F;or aggregate multiple properties which get automatically folded into the walk.&lt;&#x2F;p&gt;
&lt;p&gt;For example, the following Cypher query fragment is passed in to the query API parameter to collect different properties from nodes visited during a walk based on the node type:&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;RETURN CASE &lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  WHEN &amp;quot;Movie&amp;quot; IN labels(thisNode) THEN thisNode.languages + [id(thisNode)] + thisNode.countries&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  WHEN &amp;quot;Genre&amp;quot; IN labels(thisNode) THEN [id(thisNode), thisNode.genre]&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  WHEN &amp;quot;Person&amp;quot; IN labels(thisNode) THEN [thisNode.born.year, id(thisNode), split(thisNode.bornIn, &amp;quot; &amp;quot;)[-1]]&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  WHEN &amp;quot;Role&amp;quot; IN labels(thisNode) THEN id(thisNode)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  WHEN &amp;quot;User&amp;quot; IN labels(thisNode) THEN id(thisNode)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  WHEN &amp;quot;Rating&amp;quot; IN labels(thisNode) THEN thisNode.rating&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  ELSE id(thisNode)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;END&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;&lt;h2 id=&quot;gnn-tutorial&quot;&gt;‍&lt;strong&gt;GNN Tutorial&lt;&#x2F;strong&gt;&lt;&#x2F;h2&gt;
&lt;p&gt;Streaming data and graph neural network techniques come together in Quine 1.5 to easily solve some powerful graph A.I. use cases. Let’s use an existing recipe to demonstrate how we can enrich it with graph neural networks.&lt;&#x2F;p&gt;
&lt;p&gt;Let’s use the &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;quine.io&#x2F;recipes&#x2F;imdb-movie-data&quot;&gt;movieData recipe from here&lt;&#x2F;a&gt; and enrich it using a GNN to compute similarity for movies in that dataset.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;uploads-ssl.webflow.com&#x2F;61f0aecf55af2565526f6a95&#x2F;63d0205c6a7d1237cc558b55_Using%20Random%20Walk%20with%20Quinepng.png&quot; alt=&quot;&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;Workflow for using Quine to create the similarity data file.&lt;&#x2F;p&gt;
&lt;p&gt;To get started, download the two data files to your working directory:&lt;br &#x2F;&gt;
&lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;quine-recipe-public.s3.us-west-2.amazonaws.com&#x2F;movieData.csv&quot;&gt;https:&#x2F;&#x2F;quine-recipe-public.s3.us-west-2.amazonaws.com&#x2F;movieData.csv&lt;&#x2F;a&gt;&lt;br &#x2F;&gt;
‍&lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;quine-recipe-public.s3.us-west-2.amazonaws.com&#x2F;ratingData.csv&quot;&gt;https:&#x2F;&#x2F;quine-recipe-public.s3.us-west-2.amazonaws.com&#x2F;ratingData.csv&lt;&#x2F;a&gt;&lt;&#x2F;p&gt;
&lt;p&gt;&lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;quine.io&#x2F;download&quot;&gt;Download the latest Quine executable&lt;&#x2F;a&gt;.&lt;&#x2F;p&gt;
&lt;p&gt;Run the movieData recipe to ingest the movie data and build a graph:&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;java -jar quine-1.5.0.jar -r movieData --recipe-value movie_file=movieData.csv --recipe-value rating_file=ratingData.csv&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Let that run to ingest all the data and build the movieData graph. When complete, you should see the ingest counters stop changing and an output that looks like this:&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;INGEST-4 status is completed and ingested 74090&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;INGEST-1 status is completed and ingested 74090&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;INGEST-2 status is completed and ingested 74090&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;INGEST-3 status is completed and ingested 74090&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;INGEST-5 status is completed and ingested 100005&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;We’ll use a Node2Vec approach for this example. Let’s start by generating random walks:&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;curl --request PUT --url &amp;quot;http:&#x2F;&#x2F;localhost:8080&#x2F;api&#x2F;v1&#x2F;algorithm&#x2F;walk?count=20&amp;amp;amp;amp;seed=foo&amp;amp;amp;amp;at-time=$(date +%s000)&amp;quot; --header &amp;#39;Content-Type: application&#x2F;json&amp;#39; --data &amp;#39;{ &amp;quot;type&amp;quot;: &amp;quot;LocalFile&amp;quot; }&amp;#39;&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;&lt;em&gt;&lt;strong&gt;NOTE:&lt;&#x2F;strong&gt;&lt;&#x2F;em&gt; &lt;em&gt;This bash command computes the current timestamp (in seconds) inline with &lt;code&gt;date +%s&lt;&#x2F;code&gt; and then appends &lt;code&gt;000&lt;&#x2F;code&gt; to make this a millisecond timestamp in the past.&lt;&#x2F;em&gt;&lt;&#x2F;p&gt;
&lt;p&gt;This API call generates a string of node IDs by “walking” from every node in the graph, then saves a file to the local machine in the same working directory named something like: &lt;code&gt;graph-walk-1674342158000-10x20-q0-1.0x1.0-foo.csv&lt;&#x2F;code&gt;&lt;&#x2F;p&gt;
&lt;p&gt;&lt;em&gt;&lt;strong&gt;NOTE:&lt;&#x2F;strong&gt;&lt;&#x2F;em&gt; &lt;em&gt;The file name is automatically derived from the parameters passed into the API call. If the API call includes a timestamp (in milliseconds), the filename will be the same if you issue the same API call again. If the API call includes at-time and a seed parameter, then the file contents will also be the same each time. But if the file already exists, trying to generate it again will return an error.&lt;&#x2F;em&gt;&lt;&#x2F;p&gt;
&lt;p&gt;The first line of the file looks like this:&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;00006819-5fb1-310b-a4b6-e3bfbf600aa4,00006819-5fb1-310b-a4b6-e3bfbf600aa4,5f97b729-7435-3a44-823e-2e7217885d27,c9e28d23-5d1f-353a-b619-d1ec849d2583,c79ad95d-901d-3484-9f07-ae79c8efca39,ac799ac2-a21f-3b83-a827-31aace7c4211,c79ad95d-901d-3484-9f07-ae79c8efca39,0e415310-62ea-3bf6-b674-2ff5a9e84cee,985084e1-a9d5-353b-8080-175d2447929f,7a3c184f-451b-36ff-b005-feca7d8ee73b,985084e1-a9d5-353b-8080-175d2447929f,59cf0b79-23a1-36b8-96f6-ff6a4aa61e43&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;The line is a series of node IDs generated from a random walk starting at node &lt;code&gt;00006819-5fb1-310b-a4b6-e3bfbf600aa4&lt;&#x2F;code&gt;.&lt;em&gt;‍&lt;&#x2F;em&gt;&lt;&#x2F;p&gt;
&lt;p&gt;*NOTE: the first value of each line of the file output of the random walk call will always be the ID of the starting node, even if the walk returns other values. If a node has no edges to walk to, you’ll see a much shorter line: just the same node ID repeated twice.*‍&lt;&#x2F;p&gt;
&lt;p&gt;The graph has 164,777 nodes. We instructed Quine to generate 20 random walks for each node. So the output file has 3,295,540 rows.&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;$ wc -l graph-walk-1674342158000-10x20-q0-1.0x1.0-foo.csv&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt; 3295540 graph-walk-1674342158000-10x20-q0-1.0x1.0-foo.csv&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;&lt;h3 id=&quot;graph-embedding&quot;&gt;‍&lt;strong&gt;Graph Embedding&lt;&#x2F;strong&gt;&lt;&#x2F;h3&gt;
&lt;p&gt;With the random walks generated, we can use the output to train a neural network and load similarity data back into Quine (as a new ingest stream).&lt;&#x2F;p&gt;
&lt;p&gt;The following python code demonstrates how to train the neural net, create the embeddings, and computer similarity for each node in the graph:&lt;&#x2F;p&gt;
&lt;p&gt;&lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;quine-recipe-public.s3.us-west-2.amazonaws.com&#x2F;graph_embedding.py&quot;&gt;Download This Code&lt;&#x2F;a&gt;&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;import sys, json, requests&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;from gensim.models.word2vec import Word2Vec&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;from gensim.models.callbacks import CallbackAny2Vec&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;from multiprocessing import cpu_count&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;from datetime import datetime&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;file = sys.argv[1]&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;print(f&amp;quot;Reading in training data from: {file}&amp;quot;)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;line_count = 0&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;word_count = 0&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;data = []&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;with open(file, &amp;quot;r&amp;quot;) as f:&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;	for line in f:&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;		l = line.strip().split(&amp;quot;,&amp;quot;)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;		data.append(l)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;		line_count += 1&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;		word_count += len(l)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;print(f&amp;quot;Read in: {line_count} sentences with: {word_count} total words&amp;quot;)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;class EpochLogger(CallbackAny2Vec):&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;	&amp;#39;&amp;#39;&amp;#39;Callback to log information about training&amp;#39;&amp;#39;&amp;#39;&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;	def __init__(self):&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;		self.epoch = 1&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;	def on_epoch_begin(self, model):&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;		print(f&amp;quot;Iteration #{self.epoch} training started at: {datetime.now().strftime(&amp;#39;%Y&#x2F;%m&#x2F;%d %H:%M:%S&amp;#39;)}&amp;quot;)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;	def on_epoch_end(self, model):&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;		print(f&amp;quot;Iteration #{self.epoch} completed at: {datetime.now().strftime(&amp;#39;%Y&#x2F;%m&#x2F;%d %H:%M:%S&amp;#39;)}&amp;quot;)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;		self.epoch += 1&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;logger = EpochLogger()&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;print(&amp;quot;Preparing dictionary and beginning model training...&amp;quot;)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;model = Word2Vec(data, vector_size=16, window=5, min_count=0, sg=1, workers=cpu_count(), callbacks=[logger])&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;print(&amp;quot;Training Complete.&amp;quot;)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;ks = sorted(list(set([d[0] for d in data])))&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;print(&amp;quot;Computing and saving similarities...&amp;quot;)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;with open(file + &amp;quot;-similarities.json&amp;quot;, &amp;quot;w&amp;quot;) as fd:&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;	for k in ks:&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;		d = { &amp;quot;target&amp;quot;: k, &amp;quot;similarNodes&amp;quot;: [{&amp;quot;id&amp;quot;: x[0], &amp;quot;similarity&amp;quot;: x[1]} for x in model.wv.most_similar(k)] }&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;		fd.write(json.dumps(d)+&amp;quot;\n&amp;quot;)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;print(f&amp;quot;Similarity save completed at: {datetime.now().strftime(&amp;#39;%Y&#x2F;%m&#x2F;%d %H:%M:%S&amp;#39;)}&amp;quot;)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;print(&amp;quot;Saving the learned model...&amp;quot;)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;model.save(file + &amp;quot;.wvmodel&amp;quot;)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;print(f&amp;quot;Model save completed at: {datetime.now().strftime(&amp;#39;%Y&#x2F;%m&#x2F;%d %H:%M:%S&amp;#39;)}&amp;quot;)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;### Ingest similarities and create edges:&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;print(f&amp;quot;Beginning async ingest of similarity data from: {file}-similarities.json&amp;quot;)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;ingest_payload = {&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  &amp;quot;type&amp;quot;: &amp;quot;FileIngest&amp;quot;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  &amp;quot;path&amp;quot;: file + &amp;quot;-similarities.json&amp;quot;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  &amp;quot;format&amp;quot;: {&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    &amp;quot;type&amp;quot;: &amp;quot;CypherJson&amp;quot;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    &amp;quot;query&amp;quot;: &amp;quot;UNWIND $that.similarNodes AS s MATCH (a), (b) WHERE id(a) = $that.target AND id(b) = s.id CREATE (a)-[:is_similar_to]-&amp;amp;amp;gt;(b)&amp;quot;&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  }&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;}&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;url = &amp;quot;http:&#x2F;&#x2F;localhost:8080&#x2F;api&#x2F;v1&#x2F;ingest&#x2F;similarities&amp;quot;&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;headers = {&amp;quot;Content-Type&amp;quot;: &amp;quot;application&#x2F;json&amp;quot;}&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;response = requests.request(&amp;quot;POST&amp;quot;, url, json=ingest_payload, headers=headers)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;if response.status_code != 200:&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;	print(response.status_code)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Run this code with a command like (replace with your file name):&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;python graph_embedding.py graph-walk-1674342158000-10x20-q0-1.0x1.0-foo.csv&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;The script returns the following output from the training process:&lt;&#x2F;p&gt;
&lt;p&gt;Now we have similarity data saved to the file &lt;code&gt;graph-walk-1674342158000-10x20-q0-1.0x1.0-foo.csv-similarities.json&lt;&#x2F;code&gt;, which has single lines that pretty-print to look like this:&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;{&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  &amp;quot;target&amp;quot;: &amp;quot;00006819-5fb1-310b-a4b6-e3bfbf600aa4&amp;quot;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  &amp;quot;similarNodes&amp;quot;: [&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    {&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;      &amp;quot;id&amp;quot;: &amp;quot;b221ceb5-9a0a-3c93-b0d0-870ab56ff924&amp;quot;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;      &amp;quot;similarity&amp;quot;: 0.9671398997306824&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    },&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    {&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;      &amp;quot;id&amp;quot;: &amp;quot;97290657-1a1e-34c1-8bd1-3db0aefaf13b&amp;quot;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;      &amp;quot;similarity&amp;quot;: 0.957493245601654&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    },&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    {&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;      &amp;quot;id&amp;quot;: &amp;quot;2fa7e55a-840e-37ec-bb30-8736e71d037b&amp;quot;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;      &amp;quot;similarity&amp;quot;: 0.9573205709457397&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    },&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    ...&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  ]&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;}&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;The data in this file tells us that, for each &lt;code&gt;target&lt;&#x2F;code&gt; node, we have a measure of its similarity to the other node somewhere else in the graph with the corresponding &lt;code&gt;id&lt;&#x2F;code&gt;. We’ve saved the top-10-most-similar-node-ids for each node in the graph to a file.&lt;&#x2F;p&gt;
&lt;p&gt;Each pair of &lt;code&gt;target&lt;&#x2F;code&gt; and &lt;code&gt;similarNodes.id&lt;&#x2F;code&gt; generated in the &lt;code&gt;...similarities.json&lt;&#x2F;code&gt; file is a new edge we can create and name &lt;code&gt;is_similar_to&lt;&#x2F;code&gt;.&lt;&#x2F;p&gt;
&lt;p&gt;After saving the learned model, our Python script made an API call creating an ingest stream named &lt;code&gt;similarities&lt;&#x2F;code&gt; to load the data into Quine. Once it’s complete we can visualize the data.&lt;&#x2F;p&gt;
&lt;p&gt;Check the status of the similarities ingest stream with an API call like:&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;curl --request GET --url http:&#x2F;&#x2F;localhost:8080&#x2F;api&#x2F;v1&#x2F;ingest&#x2F;similarities&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;{&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  &amp;quot;name&amp;quot;: &amp;quot;similarities&amp;quot;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  &amp;quot;status&amp;quot;: &amp;quot;Completed&amp;quot;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  ...&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;&lt;h3 id=&quot;seeing-similarities&quot;&gt;‍&lt;strong&gt;Seeing Similarities&lt;&#x2F;strong&gt;&lt;&#x2F;h3&gt;
&lt;p&gt;Our graph now has edges showing the results for the question we began with:&lt;&#x2F;p&gt;
&lt;p&gt;&lt;em&gt;“Which records are the most similar to this one?”&lt;&#x2F;em&gt;&lt;&#x2F;p&gt;
&lt;p&gt;Let’s see which movies are similar to each other:&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;MATCH (n: Movie)-[:is_similar_to]-&amp;gt;(m: Movie)-[:is_similar_to]-&amp;gt;(n) RETURN n, m LIMIT 100&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;And we get insightful results:&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;uploads-ssl.webflow.com&#x2F;61f0aecf55af2565526f6a95&#x2F;63d01ff5cd9052216be17b43_results%20from%20tutorial.png&quot; alt=&quot;&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;Visualizing similarities returned by Quine.&lt;&#x2F;p&gt;
&lt;p&gt;This graph is a lot of fun to explore! Looking at a few of the connections, we see that the GNN learned many useful similarities:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;The Hobbit trilogy appears in a connected triangle near the top.&lt;&#x2F;li&gt;
&lt;li&gt;The George Carlin (and Bill Hicks) pentagram is left of center.&lt;&#x2F;li&gt;
&lt;li&gt;Many sequels are related to their predecessors.&lt;&#x2F;li&gt;
&lt;li&gt;A pair of Bond movies appears just below the center (more if we didn’t limit results)&lt;&#x2F;li&gt;
&lt;li&gt;Movies in the same genre are much more likely to be similar.&lt;&#x2F;li&gt;
&lt;li&gt;…and many more! Your version might be a little different based on the randomness of the GNN training. Take a look and see what you find.&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;Similar movies will have similar genres, similar actors in similar roles, similar users providing similar ratings, similarities of similarities, and so on. The similarities also include the dataset’s other kinds of graph relationships. The entire graph is incorporated into the graph neural network learning process through random walk production.&lt;&#x2F;p&gt;
&lt;p&gt;Want to explore more kinds of similarities? Try this query to explore similar actors:&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;MATCH (n: Person)-[:is_similar_to]-&amp;gt;(m: Person)-[:is_similar_to]-&amp;gt;(n) RETURN n, m LIMIT 100&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;&lt;h2 id=&quot;conclusion&quot;&gt;‍&lt;strong&gt;Conclusion&lt;&#x2F;strong&gt;&lt;&#x2F;h2&gt;
&lt;p&gt;Machine Learning on graphs is an effective new technique applicable to all sorts of datasets—even many that don’t immediately need a graph. Representing JSON or CSV data as a graph allows us to draw connections that become crucial for neural networks to draw conclusions that would otherwise be impossible to detect or require enormous manual human analysis.&lt;&#x2F;p&gt;
&lt;p&gt;This is only the beginning! In a future post, we’ll look at how to apply these graph neural network techniques to your live streaming data to constantly embed new nodes which weren’t part of the training set.&lt;&#x2F;p&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>Categorical Data: An Untapped Source of Real-Time Insights</title>
        <published>2023-01-12T00:00:00+00:00</published>
        <updated>2023-01-12T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              Unknown
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://www.thatdot.com/blog/categorical-data-an-untapped-source-of-real-time-insights/"/>
        <id>https://www.thatdot.com/blog/categorical-data-an-untapped-source-of-real-time-insights/</id>
        
        <content type="html" xml:base="https://www.thatdot.com/blog/categorical-data-an-untapped-source-of-real-time-insights/">&lt;p&gt;As we enter a period of uncertainty, it is important to get more value from existing investments, including event stream processing infrastructure. Categorical data represents a largely ignored and  untapped resource capable of providing significant business impact.&lt;&#x2F;p&gt;
&lt;p&gt;But categorical data has largely been ignored by enterprises. This post explores what categorical data is, why it has been ignored, and most importantly, the value it can unlock with minimal investment.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;categorical-data-contains-important-insights&quot;&gt;Categorical data contains important insights&lt;&#x2F;h2&gt;
&lt;p&gt;There are two principal types of data: &lt;a href=&quot;https:&#x2F;&#x2F;www.thatdot.com&#x2F;blog&#x2F;whats-the-difference-between-categorical-and-numerical-data&#x2F;&quot;&gt;categorical and numerical&lt;&#x2F;a&gt;. Numerical data, as the name implies, refers to numbers or metrics (e.g. temperatures, counts, scores or ratings.)&lt;&#x2F;p&gt;
&lt;p&gt;Categorical data is everything else – colors, product models, addresses (IP and terrestrial), telephone numbers. And what is more, categorical can express the relationship between objects – an individual and their favorite color or the education distribution by postal code are two examples.&lt;&#x2F;p&gt;
&lt;p&gt;Categorical data is vast and expressive, describing attributes of the real world which, for enterprises, can provide the holy grail of insights: understanding and even predicting behavior.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Example&lt;&#x2F;strong&gt;Eliminating False Positive Security Alerts Counting the frequency of an employee accessing a high-value service is easily described with numerical values: UserID, Service#, CountofAccessAttempts. Categorical data, on the other hand, provides a rich context that can be used to identify attackers with much higher confidence, and the categorical data is already in our logs: UserID, UserAgent, DeviceOS, ServerIP, FilePath, TimeofDay. This more complete context can eliminate false positives creating significant ROI and happier analysts!&lt;&#x2F;p&gt;
&lt;h2 id=&quot;so-why-is-categorical-data-ignored&quot;&gt;So why is categorical data ignored?&lt;&#x2F;h2&gt;
&lt;p&gt;The simple answer is that today’s tools are not really designed to work with categorical data, particularly for real-time event processing. That has started to change in recent years with the emergence of knowledge graphs like Neo4J and Janus graph, however current graph databases can’t scale to handle real-time data volumes.&lt;&#x2F;p&gt;
&lt;p&gt;Instead, enterprises must resort to encoding categorical data into numerical values so it can be processed with current event stream processing systems, a computationally expensive operation that obfuscates the very same relationships between objects that makes categorical data  so valuable.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;quine-streaming-graph-for-real-time-behavioral-insights&quot;&gt;Quine streaming graph for real-time behavioral insights&lt;&#x2F;h2&gt;
&lt;p&gt;Quine was &lt;a href=&quot;https:&#x2F;&#x2F;www.thatdot.com&#x2F;products&#x2F;quine&#x2F;&quot;&gt;built specifically to provide enterprises&lt;&#x2F;a&gt; with the ability to process huge volumes of categorical data in real time. Technically speaking, Quine can ingest &lt;a href=&quot;https:&#x2F;&#x2F;www.thatdot.com&#x2F;blog&#x2F;scaling-quine-streaming-graph-to-process-1-million-events-sec&#x2F;&quot;&gt;millions of events per second&lt;&#x2F;a&gt;, render them as a graph that reveals connections between data, and produce actionable insights, all with sub-millisecond latency.&lt;&#x2F;p&gt;
&lt;p&gt;Developers and data scientists query Quine using Cypher, the emerging standard for graph database query languages. Cypher makes it easy to create and detect complex patterns that indicate behaviors. Quine makes it possible to &lt;a href=&quot;https:&#x2F;&#x2F;www.thatdot.com&#x2F;blog&#x2F;what-is-the-difference-between-batch-and-stream-processing&#x2F;&quot;&gt;detect those patterns as they are emerging&lt;&#x2F;a&gt;, rather than later, after an event has taken place.&lt;&#x2F;p&gt;
&lt;p&gt;Practically, this is the difference between anticipating and intervening before a customer abandons a shopping cart or an intruder compromises a key system and learning about it later, when you query your data warehouse. Even then, because tools require categorical data to be encoded, predictive analytics may not succeed in detecting an issue, let alone providing enough information to defend against future occurrences.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;drop-in-solution-for-unlocking-categorical-data&quot;&gt;Drop-in solution for unlocking categorical data&lt;&#x2F;h2&gt;
&lt;p&gt;Quine is not just a high-performance graph database. Quine is also a complex event processor designed to consume from and publish to Apache Kafka, AWS Kinesis and SNS&#x2F;SQS, and Pulsar. Quine designed to work with  your &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;docs.quine.io&#x2F;core-concepts&#x2F;streaming-systems.html&quot;&gt;existing ETL pipeline&lt;&#x2F;a&gt;.&lt;&#x2F;p&gt;
&lt;p&gt;This minimizes the cost to leverage existing infrastructure to unlock the value contained in unused categorical data.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;uploads-ssl.webflow.com&#x2F;61f0aecf55af2565526f6a95&#x2F;63bf4a4d8d463f6a69b68bd8_--m9pgFJ6gHELMlfiL2BGb207Qbinc-KeUL1J3UaBW_l9LOnBsjmnA_N8tDxPLRQCfxDxeyih0dbKt4wfzYPiXx_G_yfssJ4-tkBcTrH-XY_yhhygs4e9lHxGnKlSSFF_26zPodlS-GkQntLnE9Ta5PmozqRV7BgtOSfHbR3UvZ_mZFHU2rT4WI7JwRYcg.png&quot; alt=&quot;A diagram of Quine ingesting data from streaming event processors (Pulsar, Kafka, Amazon Kinesis and SNS), materializing a graph, persisting data to pluggable storage, and outputting matches to the same event processors.&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;Quine ingests data, creates, a graph, and publishes results to event processors.&lt;&#x2F;p&gt;
&lt;p&gt;‍&lt;&#x2F;p&gt;
&lt;p&gt;Quine uses &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;docs.quine.io&#x2F;getting-started&#x2F;ingest-streams-tutorial.html&quot;&gt;ingest queries&lt;&#x2F;a&gt;  to eliminate data silos and combine multiple event sources into a complete graph view.&lt;&#x2F;p&gt;
&lt;p&gt;‍&lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;docs.quine.io&#x2F;getting-started&#x2F;standing-queries-tutorial.html&quot;&gt;Standing queries&lt;&#x2F;a&gt; – queries that persist in the graph keeping a lookout for the patterns that matter most to your business – publish results to the next hop in the event stream processing infrastructure the instant a match is made.&lt;&#x2F;p&gt;
&lt;p&gt;What this means in practical terms is that Quine can feed new, categorical data-derived insights into existing workflows, feeding dashboards, alerting analysts, and improving the quality of data that gets stored in company data lakes.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;novelty-detector-anomaly-detection-using-categorical-data&quot;&gt;Novelty Detector: Anomaly Detection using Categorical Data&lt;&#x2F;h2&gt;
&lt;p&gt;Quine streaming graph is built to drop into your existing infrastructure to look for known patterns across event streams. But what if you don’t know what to look for? How do you deal with new threats or learn new ways to improve customer experience?&lt;&#x2F;p&gt;
&lt;p&gt;Built on Quine, &lt;a href=&quot;https:&#x2F;&#x2F;www.thatdot.com&#x2F;products&#x2F;novelty&#x2F;&quot;&gt;Novelty Detector&lt;&#x2F;a&gt; is unlike previous generations of anomaly detectors. Novelty Detector combines a shallow learning algorithm developed by thatDot with streaming graph’s ability to process categorical data in real time, streaming out only the truly unique and anomalous events.&lt;&#x2F;p&gt;
&lt;p&gt;Because Novelty Detector is a self-training algorithm, it delivers results fast and with no need for data science resources. Data engineers can simply direct data from existing event feeds into Novelty Detector and it will learn what is normal, what is unexpected but not important, and what is truly unique and requires further action.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;uploads-ssl.webflow.com&#x2F;61f0aecf55af2565526f6a95&#x2F;63bf4a4d3fac7c0a8ea4c0ee_Q3xBCDyFPvIXc-XQo1on1znPVa7YnA4tp25SHlqhDKyOXq6okGZEaP1llNVaOhRyVxQvEEBiv98uvAo5l_0NK-UAIhvlDSB2-_WiqzV3IoIl_RMrP4lm4hdj0lbBoHGF52NTHMzyatoVupm9uKYG2hiRpqBq-xggQwQW2ry_A-xH9fgtjhLauqyNgKWj8Q.png&quot; alt=&quot;An example of a scatter plot from Novelty Detector.&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;Novelty Detector identifies true anomalies using categorical data.&lt;&#x2F;p&gt;
&lt;p&gt;And just like Quine itself, when a truly unique and anomalous event is detected, Novelty Detector pushes the results to the next hop in the workflow.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;delivering-business-value-categorical-data&quot;&gt;Delivering Business Value Categorical Data&lt;&#x2F;h2&gt;
&lt;p&gt;Streaming graph is ideal when one or more of the following criteria describe the problem you are trying to solve:&lt;&#x2F;p&gt;
&lt;ol&gt;
&lt;li&gt;You need to get answers and take actions on those answers sooner rather than later. Therefore, the delay built into batch processing of data has too high a cost.&lt;&#x2F;li&gt;
&lt;li&gt;You have high volumes of real-time event data that exceed the capacity of traditional graph databases.&lt;&#x2F;li&gt;
&lt;li&gt;Your answers to your questions are spread across multiple sources that need to be combined to create a complete picture.&lt;&#x2F;li&gt;
&lt;&#x2F;ol&gt;
&lt;p&gt;Example use cases vary across a wide range of verticals and use cases, but the examples below should give you a good sense of the breadth of Quine’ streaming graph’s uses:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;Ethereum blockchain fraud detection &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=Z8pXVof9BfE&quot;&gt;demo video&lt;&#x2F;a&gt; - (Quine)&lt;&#x2F;li&gt;
&lt;li&gt;Streaming network monitoring to improve end-user experience (Quine)&lt;&#x2F;li&gt;
&lt;li&gt;Stopping &lt;a href=&quot;https:&#x2F;&#x2F;www.thatdot.com&#x2F;blog&#x2F;identifying-data-exfiltration-in-aws-cloudtrail-logs-using-categorical-anomaly-detection&#x2F;&quot;&gt;data exfiltration&lt;&#x2F;a&gt; before it happens (Novelty Detector)&lt;&#x2F;li&gt;
&lt;li&gt;Pre-processing log files to &lt;a href=&quot;https:&#x2F;&#x2F;www.thatdot.com&#x2F;blog&#x2F;use-quine-graph-etl-to-reduce-siem-storage-costs&#x2F;&quot;&gt;reduce SIEM and data lake storage costs&lt;&#x2F;a&gt; or &lt;a href=&quot;https:&#x2F;&#x2F;www.thatdot.com&#x2F;blog&#x2F;kafka-data-deduping-made-easy-using-quines-idfrom-function&#x2F;&quot;&gt;de-dupe and data cleanse&lt;&#x2F;a&gt; (Quine)&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;h2 id=&quot;return-on-investment&quot;&gt;Return on investment&lt;&#x2F;h2&gt;
&lt;p&gt;Unlocking a new source of actionable insight needs to make business sense, which means it should have a measurable impact on outcomes. On the cost side, Quine has demonstrated the ability to scale well past existing graph solutions at extremely reasonable prices – processing 425K events&#x2F;second for $13&#x2F;hour on AWS.&lt;&#x2F;p&gt;
&lt;p&gt;Quine is easy to use and is distributed under either an &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;quine.io&#x2F;&quot;&gt;open source&lt;&#x2F;a&gt; or a commercial license that includes support, clustering for horizontal scale and resilience, and access to Novelty Detector.&lt;&#x2F;p&gt;
&lt;p&gt;We are happy to help you with your use case or feel free to  &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;docs.quine.io&#x2F;&quot;&gt;explore Quine open source&lt;&#x2F;a&gt; yourself. Either way, you’ll find Quine the fastest, easiest way to unlock the potential of categorical data and deliver incremental business value without massive capital investment.&lt;&#x2F;p&gt;
&lt;p&gt;‍&lt;&#x2F;p&gt;
&lt;p&gt;Banner photo credit: Photo by &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;unsplash.com&#x2F;@dekubaum?utm_source=unsplash&amp;amp;utm_medium=referral&amp;amp;utm_content=creditCopyText&quot;&gt;Dennis Kummer&lt;&#x2F;a&gt; on &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;unsplash.com&#x2F;photos&#x2F;52gEprMkp7M?utm_source=unsplash&amp;amp;utm_medium=referral&amp;amp;utm_content=creditCopyText&quot;&gt;Unsplash&lt;&#x2F;a&gt;&lt;&#x2F;p&gt;
&lt;p&gt;‍&lt;&#x2F;p&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>Quine Streaming Graph: A Year in Open Source</title>
        <published>2022-12-20T00:00:00+00:00</published>
        <updated>2022-12-20T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              Unknown
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://www.thatdot.com/news/quine-streaming-graph-a-year-in-open-source/"/>
        <id>https://www.thatdot.com/news/quine-streaming-graph-a-year-in-open-source/</id>
        
        <content type="html" xml:base="https://www.thatdot.com/news/quine-streaming-graph-a-year-in-open-source/">&lt;h2 id=&quot;becoming-developer-focused&quot;&gt;&lt;strong&gt;Becoming Developer Focused&lt;&#x2F;strong&gt;&lt;&#x2F;h2&gt;
&lt;p&gt;Since February, we spent much of our time – whether coding, writing docs, or talking to people – iteratively improving the developer experience: who is Quine built for, what jobs do they tell us they need to get done, what isn’t working for them currently, and what resources do they need to be successful.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;uploads-ssl.webflow.com&#x2F;61f0aecf55af2565526f6a95&#x2F;639cf9273c2d585d2d63e40c_Blue%20Pastel%20Creative%20Fun%20Blogging%20Blog%20Banner-5.png&quot; alt=&quot;A graphic of people learning together -- one in person and another online with the text &amp;quot;A year spent learning from the developer community.&amp;quot;&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;As more people started hearing about Quine, mainly through shared blogs, word of mouth and events, we received a lot of feedback that challenged very basic things like how we even talk about Quine.&lt;&#x2F;p&gt;
&lt;p&gt;For example, we heard early that people didn’t really understand what a streaming graph was or where Quine fits in an event stream pipeline. Was it a graph database? What did it even do? All the important foundational stuff.&lt;&#x2F;p&gt;
&lt;p&gt;We realized that we needed to simplify everything: we needed to create or update web pages, documentation and blog posts that were simple, clear, and developer-focused.&lt;&#x2F;p&gt;
&lt;p&gt;Documentation underwent a transformation, with significant focus on the Getting Started and Concepts sections.&lt;&#x2F;p&gt;
&lt;p&gt;Of the 32 blog posts published this year, 23 were either 100% or primarily focused on how engineers and architects can use or better understand the utility of the open source version of Quine.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;highlights-include&quot;&gt;Highlights include:&lt;&#x2F;h3&gt;
&lt;ul&gt;
&lt;li&gt;A six part series on &lt;a href=&quot;https:&#x2F;&#x2F;www.thatdot.com&#x2F;blog&#x2F;real-time-graph-analytics-for-kafka-streams-with-quine&#x2F;&quot;&gt;ingesting data&lt;&#x2F;a&gt; into Quine&lt;&#x2F;li&gt;
&lt;li&gt;A deep dive into &lt;a href=&quot;https:&#x2F;&#x2F;www.thatdot.com&#x2F;blog&#x2F;kafka-data-deduping-made-easy-using-quines-idfrom-function&#x2F;&quot;&gt;idFrom&lt;&#x2F;a&gt; and its implications for event streaming (idFrom is Quine’s &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;docs.quine.io&#x2F;components&#x2F;id-provider.html#idfrom-&quot;&gt;strategy for maintaining state&lt;&#x2F;a&gt; for a potentially infinite stream of data; Quine  deterministically generates known IDs from data)&lt;&#x2F;li&gt;
&lt;li&gt;A post explaining how to use Quine’s &lt;a href=&quot;https:&#x2F;&#x2F;www.thatdot.com&#x2F;blog&#x2F;time-series-streaming-graph-and-other-quine-1-2-0-highlights&#x2F;&quot;&gt;time series-like function&lt;&#x2F;a&gt;: reify-time&lt;&#x2F;li&gt;
&lt;li&gt;Data de-duping event streams with &lt;a href=&quot;https:&#x2F;&#x2F;www.thatdot.com&#x2F;blog&#x2F;real-time-graph-analytics-for-kafka-streams-with-quine&#x2F;&quot;&gt;Kafka and Quine&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;li&gt;Two popular blogs explaining the difference between &lt;a href=&quot;https:&#x2F;&#x2F;www.thatdot.com&#x2F;blog&#x2F;whats-the-difference-between-categorical-and-numerical-data&#x2F;&quot;&gt;categorical and numerical&lt;&#x2F;a&gt; data and why it matters in event processing&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;h2 id=&quot;strengthening-the-community-s-voice&quot;&gt;Strengthening the Community&#x27;s Voice&lt;&#x2F;h2&gt;
&lt;p&gt;We also hired a dedicated developer relations director – Michael Algietti – whose job is to engage with the community to understand how best we can serve it and guide it toward self-sustainment.&lt;&#x2F;p&gt;
&lt;p&gt;But hiring Michael doesn’t mean we’ve delegated the community to one person and forgotten about it, though. Far from it.&lt;&#x2F;p&gt;
&lt;p&gt;Primarily through Quine slack and events, everyone in the company has supported open source users, answering questions and providing feedback for PRs.&lt;&#x2F;p&gt;
&lt;p&gt;As 2022 ends and 2023 begins, that focus is turning into something special: the first glimmers of a self-sustaining community is beginning to manifest. Community members have submitted code, developers are building production systems using Quine OSS, and people are far more likely to tell us what we can do to improve Quine than they were even four months ago.&lt;&#x2F;p&gt;
&lt;p&gt;In 2023 we will continue to learn along with you to improve Quine. We can’t wait to see how you use streaming graph techniques in your modern data pipelines.&lt;&#x2F;p&gt;
&lt;p&gt;Happy Holidays from the team at thatDot!&lt;&#x2F;p&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>Quine 1.4.0: Scale, Stability, Supernode Mitigation</title>
        <published>2022-11-22T00:00:00+00:00</published>
        <updated>2022-11-22T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              Unknown
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://www.thatdot.com/news/quine-1-4-0-scale-stability-supernode-mitigation/"/>
        <id>https://www.thatdot.com/news/quine-1-4-0-scale-stability-supernode-mitigation/</id>
        
        <content type="html" xml:base="https://www.thatdot.com/news/quine-1-4-0-scale-stability-supernode-mitigation/">&lt;h2 id=&quot;a-major-release-fast-on-the-heels-of-a-major-milestone&quot;&gt;A Major Release Fast on the Heels of A Major Milestone&lt;&#x2F;h2&gt;
&lt;p&gt;Today marks the release of Quine 1.4.0 with significant improvements made to resource utilization and developer experience. This release impacts both the open source and enterprise versions of Quine and is not backwards compatible with previous versions.&lt;&#x2F;p&gt;
&lt;p&gt;The development of Quine 1.4.0 and our recent landmark achievement in which Quine Enterprise &lt;a href=&quot;https:&#x2F;&#x2F;www.thatdot.com&#x2F;blog&#x2F;scaling-quine-streaming-graph-to-process-1-million-events-sec&#x2F;&quot;&gt;processed one million graph events per second&lt;&#x2F;a&gt; are deeply intertwined events. We ran those tests using an early 1.4.0 release candidate and we incorporated learning and bug fixes from those tests into the final released version.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;uploads-ssl.webflow.com&#x2F;61f0aecf55af2565526f6a95&#x2F;6362b92a9f5a435235e96ec5_under%20construction.jpg&quot; alt=&quot;&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;Quine 1.4.0 contains much foundational work for the next leap forward in terms of performance and stability.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;highlights-of-changes-impacting-both-quine-community-and-quine-enterprise&quot;&gt;Highlights of Changes impacting both Quine Community and Quine Enterprise&lt;&#x2F;h2&gt;
&lt;p&gt;&lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;thatdot&#x2F;quine&#x2F;releases&quot;&gt;(Full release notes for Quine 1.4.0)&lt;&#x2F;a&gt;&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Domain Graph Nodes (DGNs)&lt;&#x2F;strong&gt; - sometimes the biggest impact feature isn’t very glamorous but one that sets the table for more ambitious and high visibility improvements. Such is the case with DGNs, which not only contributes immediate performance and resource utilization improvements but lays the foundation for coming breakthroughs in supernode mitigation. (Supernodes, the bane of graph data models, are nodes with too many edges, which impacts memory usage and performance.)&lt;&#x2F;p&gt;
&lt;p&gt;With the cluster of PRs associated with DGN, we rewrote the serialization, persistence, and message passing system used by &lt;code&gt;DistinctId&lt;&#x2F;code&gt; standing queries.  Instead of using substantial memory on every node that stores a component of a relevant standing query, these partial query objects are stored in a new top-level persistent entity called &lt;code&gt;DomainGraphNode&lt;&#x2F;code&gt;.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;em&gt;Impact: reduces memory footprint, prepares the way for supernode mitigation, does not support graphs created in Quine 1.3.2 or earlier.&lt;&#x2F;em&gt;&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;uploads-ssl.webflow.com&#x2F;61f0aecf55af2565526f6a95&#x2F;6362d778c159b6de06255b44_Evolution%20of%20a%20Supernode.png&quot; alt=&quot;Four views of supernodes developing in a Quine streaming graph.&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;Evolution of supernodes during ingest visualized in Quine&#x27;s &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;docs.quine.io&#x2F;getting-started&#x2F;exploration-ui.html&quot;&gt;Exploration UI.&lt;&#x2F;a&gt;&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Documentation Improvements –&lt;&#x2F;strong&gt; as part of our never-ending quest to make docs.quine.io as easy and clear to use as possible, we made significant improvements throughout, but most notably for:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;docs.quine.io&#x2F;getting-started&#x2F;&quot;&gt;Getting Started&lt;&#x2F;a&gt; - reorganized and simplified all elements of onboarding Quine and Quine basics&lt;&#x2F;li&gt;
&lt;li&gt;Core Concepts - added &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;docs.quine.io&#x2F;core-concepts&#x2F;streaming-systems.html&quot;&gt;Streaming Systems&lt;&#x2F;a&gt; with details on Apache Kafka and Kinesis&lt;&#x2F;li&gt;
&lt;li&gt;Components - expanded on the properties of components, and most importantly on the &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;docs.quine.io&#x2F;components&#x2F;id-provider.html&quot;&gt;idFrom function&lt;&#x2F;a&gt;.&lt;&#x2F;li&gt;
&lt;li&gt;Recipe Documentation - expanded and reorganized across docs site.&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;&lt;em&gt;Impact: better organization and simpler explanations of some of Quine’s notable core concepts.&lt;&#x2F;em&gt;&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Usability and performance improvements to &lt;code&gt;reify.time&lt;&#x2F;code&gt;&lt;&#x2F;strong&gt; -  With Quine 1.4.0, the &lt;strong&gt;&lt;code&gt;reify.time&lt;&#x2F;code&gt;&lt;&#x2F;strong&gt; function now yields only the finest-granularity reified period node whereas &lt;code&gt;reify.time&lt;&#x2F;code&gt; previously returned an array with all time nodes.&lt;&#x2F;p&gt;
&lt;p&gt;With this adjustment, &lt;code&gt;reify.time&lt;&#x2F;code&gt; makes it easier to write well-behaved queries that would otherwise tend to turn large period time nodes into supernodes. For example, the time node for &quot;2022&quot; would tend to become a supernode. That is avoided by changing &lt;code&gt;reify.time&lt;&#x2F;code&gt; such that the Cypher query calling it does not inadvertently operate on every period in the hierarchy, and instead, only operates on the time node for the smallest period.&lt;&#x2F;p&gt;
&lt;p&gt;Existing Cypher scripts will be syntactically compatible. However, the behavior will change, specifically impacting the &lt;strong&gt;&lt;code&gt;YIELD&lt;&#x2F;code&gt;&lt;&#x2F;strong&gt; clause following &lt;strong&gt;&lt;code&gt;CALL reify.time&lt;&#x2F;code&gt;&lt;&#x2F;strong&gt;. Previous behavior was that the &lt;strong&gt;&lt;code&gt;YIELD&lt;&#x2F;code&gt;&lt;&#x2F;strong&gt; clause would emit every time node in the period hierarchy created by &lt;strong&gt;&lt;code&gt;reify.time&lt;&#x2F;code&gt;&lt;&#x2F;strong&gt;. The new behavior is that the &lt;strong&gt;&lt;code&gt;YIELD&lt;&#x2F;code&gt;&lt;&#x2F;strong&gt; block will emit only for a single time node.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;em&gt;Impact: better query ergonomics, reduces likelihood a supernode is created, does not support &lt;strong&gt;&lt;code&gt;reify.time&lt;&#x2F;code&gt;&lt;&#x2F;strong&gt; queries created for Quine 1.3.2 or earlier.&lt;&#x2F;em&gt;&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Added &lt;code&gt;text.urlencode&lt;&#x2F;code&gt; and &lt;code&gt;text.urldecode&lt;&#x2F;code&gt; Cypher functions&lt;&#x2F;strong&gt; – These handy functions are especially useful for standing query results. They perform URL encoding or decoding of a string used as an HTTP POST body or URL component. This is particularly useful to create the #-string query linking to a specific view in the Quine Exploration UI.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;em&gt;Impact: standing query outputs are now more readily usable in other, downstream applications and for demos.&lt;&#x2F;em&gt;&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Added atomic &quot;count&quot; return value to &lt;code&gt;incrementCounter&lt;&#x2F;code&gt; procedure&lt;&#x2F;strong&gt; - The core functionality of &lt;code&gt;incrementCounter&lt;&#x2F;code&gt; takes advantage of Quine’s unique computational model to keep a perfectly-synchronized counter on a single node. The old VOID version (i.e., returning nothing) ensured the counter was updated correctly but provided no way to access the uniquely incremented counter value.&lt;&#x2F;p&gt;
&lt;p&gt;We added a &quot;count&quot; yielded value to the procedure that can be used via the standard syntax for Cypher procedures: https:&#x2F;&#x2F;s3.amazonaws.com&#x2F;artifacts.opencypher.org&#x2F;openCypher9.pdf, page 122). For example, &lt;strong&gt;&lt;code&gt;CALL incrementCounter(myNode, &quot;counter&quot;) YIELD count AS updatedCount&lt;&#x2F;code&gt;&lt;&#x2F;strong&gt; will increment a property named &quot;counter&quot; on the node referred to as &lt;strong&gt;&lt;code&gt;myNode&lt;&#x2F;code&gt;&lt;&#x2F;strong&gt;, then add a variable to the query context called &lt;strong&gt;&lt;code&gt;updatedCount&lt;&#x2F;code&gt;&lt;&#x2F;strong&gt;, containing the new value of that property.&lt;&#x2F;p&gt;
&lt;p&gt;Of particular note is that multiple query executions running in parallel will get unique values returned from the procedure&#x27;s yielded &quot;count&quot;.&lt;&#x2F;p&gt;
&lt;p&gt;An example of how to use the new version&lt;&#x2F;p&gt;
&lt;p&gt;&lt;code&gt;MATCH (n) WHERE id(n) = idFrom(-1) CALL incrementCounter(n, &#x27;prop&#x27;) YIELD count RETURN id(n), count&lt;&#x2F;code&gt;&lt;&#x2F;p&gt;
&lt;p&gt;This increments the &quot;prop&quot; counter on the node with id &lt;strong&gt;&lt;code&gt;idFrom(-1)&lt;&#x2F;code&gt;&lt;&#x2F;strong&gt; and returns the &lt;strong&gt;&lt;code&gt;count&lt;&#x2F;code&gt;&lt;&#x2F;strong&gt;. Multiple invocations of this query, even in parallel, will all return a unique &lt;code&gt;count&lt;&#x2F;code&gt; value.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;em&gt;Impact: counters are more flexible, and can be read under high parallelism.&lt;&#x2F;em&gt;&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Standing queries can now use &lt;code&gt;idFrom&lt;&#x2F;code&gt;&lt;&#x2F;strong&gt; - Previously, ingest and ad hoc queries could use idFrom, Quine’s hash-based high-performance alternative to indices used for node lookup and retrieval. Standing queries can now use &lt;code&gt;idFrom&lt;&#x2F;code&gt;-based ID constraints, provided that all arguments to the &lt;code&gt;idFrom&lt;&#x2F;code&gt; are literal values.&lt;&#x2F;p&gt;
&lt;p&gt;An example of a standing query using &lt;strong&gt;&lt;code&gt;idFrom&lt;&#x2F;code&gt;&lt;&#x2F;strong&gt;:&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;&lt;code&gt;MATCH (n) WHERE id(n) = idFrom(&#x27;my&#x27;, &#x27;special&#x27;, 1, &#x27;node&#x27;) RETURN DISTINCT id(n) AS specialNodeId&lt;&#x2F;code&gt;&lt;&#x2F;strong&gt; will match on exactly 1 node: the node with the id &lt;strong&gt;&lt;code&gt;idFrom(&#x27;my&#x27;, &#x27;special&#x27;, 1, &#x27;node&#x27;)&lt;&#x2F;code&gt;&lt;&#x2F;strong&gt;.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;em&gt;Impact: brings standing queries, whether using MultipleValues or DistinctId, closer to the full range of functionality available in ad-hoc queries.&lt;&#x2F;em&gt;&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Added support for decoding steps during ingest (base64, zlib, and gzip)&lt;&#x2F;strong&gt; – This one is pretty simple and &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;docs.quine.io&#x2F;components&#x2F;ingest-sources&#x2F;ingest-sources.html#record-decoding&quot;&gt;self-explanatory&lt;&#x2F;a&gt;, but we draw your attention to it as it means you can ingest compressed event data from Kafka and Kinesis.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;em&gt;Impact: better performance, supports gzip, zlib, and base64 input compression&lt;&#x2F;em&gt;&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;uploads-ssl.webflow.com&#x2F;61f0aecf55af2565526f6a95&#x2F;6362b6a9a6e5d76236830b7d_Quine%20Enterprise.png&quot; alt=&quot;&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;h2 id=&quot;enterprise-focused-improvements&quot;&gt;Enterprise-focused Improvements&lt;&#x2F;h2&gt;
&lt;p&gt;Improvements to Quine Enterprise focused on improved cluster management and querying cluster state.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Added Cypher function &lt;code&gt;clusterPosition()&lt;&#x2F;code&gt; to get the executing member&#x27;s position –&lt;&#x2F;strong&gt; You can now get the index of the executing cluster member. When combined with &lt;strong&gt;&lt;code&gt;locIdFrom(clusterPosition(), &quot;prop1&quot;, &quot;prop2&quot;)&lt;&#x2F;code&gt;&lt;&#x2F;strong&gt;, the node will have a unique hash on the current host&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Cluster-position aware &lt;code&gt;locIdFrom&lt;&#x2F;code&gt; function  –&lt;&#x2F;strong&gt; The &lt;code&gt;locIdFrom&lt;&#x2F;code&gt; function now accepts Quine cluster positionintegers as its first argument rather than a partition. To map a partition to a position integer, the new &lt;strong&gt;&lt;code&gt;kafkaHash&lt;&#x2F;code&gt;&lt;&#x2F;strong&gt; function may be used. For example, &lt;strong&gt;&lt;code&gt;locIdFrom(kafkaHash(“india”), “West Bengal”, 12345)&lt;&#x2F;code&gt;&lt;&#x2F;strong&gt;. Note: ‘QuineIds’ allocated by Quine 1.3.2 and earlier may have inconsistent mappings in Quine 1.4.0 and later.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Extended support for bloom filter-optimized persistence to clusters&lt;&#x2F;strong&gt;  – Previously Quine used a bloom filter to help decide if a node already exists. When unsure, Quine would have to query the persister. The bloom filter was disabled for Quine Enterprise because in the case of a cluster, when a hot spare joined the cluster, it would have to rebuild its bloom filter (taking minutes), thus causing the cluster performance to degrade while waiting for the host to join. This change allows a host to join the cluster, build its bloom filter in the background while always hitting the persister early on. Once the bloom filter is loaded, then the optimization can be utilized. It effectively allows a host to join fast, keep the cluster healthy, and the cost is that the new host will be a bit slower until the bloom filter is available.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;improvements-resulting-from-one-million-events-second-testing&quot;&gt;Improvements resulting from One Million Events&#x2F;Second testing&lt;&#x2F;h2&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;uploads-ssl.webflow.com&#x2F;61f0aecf55af2565526f6a95&#x2F;632fcc0a24771a2089714ba0_Pasted%20image%2020220923174204.png&quot; alt=&quot;&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;Scaling to 1M+ events&#x2F;second and demonstrating recovery from various failure scenarios&lt;&#x2F;p&gt;
&lt;p&gt;If you are interested in bug fixes and improvements yielded while processing high volume event streams with Quine, here’s a quick list pulled from release notes:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;Enriched logging in edge cases involving shard resolution&lt;&#x2F;li&gt;
&lt;li&gt;Simplified node wakeup protocol: edge cases involving simultaneous request to sleep and wake should now be more efficient&lt;&#x2F;li&gt;
&lt;li&gt;Cassandra persister batched writes now respect configured timeout and consistency options&lt;&#x2F;li&gt;
&lt;li&gt;Singleton-snapshot and journals may now be enabled at the same time&lt;&#x2F;li&gt;
&lt;li&gt;Improved shutdown behavior in failsafe case&lt;&#x2F;li&gt;
&lt;li&gt;Node edge and property counts will now be correctly reflected in the metrics dashboard&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Enterprise:&lt;&#x2F;strong&gt; Improved cluster stability when cluster members experience temporary disconnections&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;h2 id=&quot;getting-started&quot;&gt;&lt;strong&gt;Getting Started&lt;&#x2F;strong&gt;&lt;&#x2F;h2&gt;
&lt;p&gt;If you want to try Quine using your own data, here are some resources to help:&lt;&#x2F;p&gt;
&lt;ol&gt;
&lt;li&gt;Download Quine 1.4.0 - &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;thatdot&#x2F;quine&#x2F;releases&#x2F;download&#x2F;v1.4.0&#x2F;quine-1.4.0.jar&quot;&gt;JAR file&lt;&#x2F;a&gt; (263MB)| &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;hub.docker.com&#x2F;layers&#x2F;thatdot&#x2F;quine&#x2F;1.4.0&#x2F;images&#x2F;sha256-672646c6184f3fcc529d7fb8939c1c20553308c344f7915a5622b75d968823c3&quot;&gt;Docker Image&lt;&#x2F;a&gt; | &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;thatdot&#x2F;quine&quot;&gt;Github&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;li&gt;Start learning about Quine now by visiting the &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;quine.io&#x2F;&quot;&gt;Quine open source project&lt;&#x2F;a&gt;.&lt;&#x2F;li&gt;
&lt;li&gt;Check out the Ingest Data into Quine blog series covering everything from &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;docs.quine.io&#x2F;components&#x2F;ingest-sources&#x2F;kafka.html&quot;&gt;ingest from Kafka&lt;&#x2F;a&gt; to ingesting .CSV data&lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;quine.io&#x2F;recipes&#x2F;cdn-cache-efficiency-by-segment&quot;&gt;‍&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;quine.io&#x2F;recipes&#x2F;cdn-cache-efficiency-by-segment&quot;&gt;CDN Cache Efficiency Recipe&lt;&#x2F;a&gt; - this recipe provides more ingest pattern examples&lt;&#x2F;li&gt;
&lt;&#x2F;ol&gt;
&lt;p&gt;And if you require 24 x7 support or have a high-volume use case and would like to try the Quine Enterprise, please contact us. You can also read more about &lt;a href=&quot;https:&#x2F;&#x2F;www.thatdot.com&#x2F;products&#x2F;quine&#x2F;&quot;&gt;Streaming Graph here&lt;&#x2F;a&gt;.&lt;&#x2F;p&gt;
&lt;p&gt;‍&lt;&#x2F;p&gt;
&lt;hr &#x2F;&gt;
&lt;p&gt;‍&lt;&#x2F;p&gt;
&lt;p&gt;Header image: Photo by &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;unsplash.com&#x2F;@ninaluong?utm_source=unsplash&amp;amp;utm_medium=referral&amp;amp;utm_content=creditCopyText&quot;&gt;Nina Luong&lt;&#x2F;a&gt; on &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;unsplash.com&#x2F;s&#x2F;photos&#x2F;staircase?utm_source=unsplash&amp;amp;utm_medium=referral&amp;amp;utm_content=creditCopyText&quot;&gt;Unsplash&lt;&#x2F;a&gt;&lt;&#x2F;p&gt;
&lt;p&gt;Additional image: Photo by &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;unsplash.com&#x2F;@rgaleriacom?utm_source=unsplash&amp;amp;utm_medium=referral&amp;amp;utm_content=creditCopyText&quot;&gt;Ricardo Gomez Angel&lt;&#x2F;a&gt; on &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;unsplash.com&#x2F;s&#x2F;photos&#x2F;toy-construction-workers?utm_source=unsplash&amp;amp;utm_medium=referral&amp;amp;utm_content=creditCopyText&quot;&gt;Unsplash&lt;&#x2F;a&gt;&lt;&#x2F;p&gt;
&lt;p&gt;‍&lt;&#x2F;p&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>Why Digital Twins Need to Go Real Time</title>
        <published>2022-11-15T00:00:00+00:00</published>
        <updated>2022-11-15T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              Unknown
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://www.thatdot.com/blog/why-digital-twins-need-to-go-real-time/"/>
        <id>https://www.thatdot.com/blog/why-digital-twins-need-to-go-real-time/</id>
        
        <content type="html" xml:base="https://www.thatdot.com/blog/why-digital-twins-need-to-go-real-time/">&lt;p&gt;&lt;em&gt;(This post is modified from a version that ran in&lt;&#x2F;em&gt; &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;www.rtinsights.com&#x2F;optimizing-digital-twins-to-real-time&#x2F;&quot;&gt;&lt;em&gt;&lt;strong&gt;RT&lt;&#x2F;strong&gt;&lt;&#x2F;em&gt;&lt;&#x2F;a&gt; &lt;em&gt;&lt;strong&gt;&lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;www.rtinsights.com&#x2F;optimizing-digital-twins-to-real-time&#x2F;&quot;&gt;Insights&lt;&#x2F;a&gt;&lt;&#x2F;strong&gt;&lt;&#x2F;em&gt; &lt;em&gt;Oct 13, 2022.)&lt;&#x2F;em&gt; Quine streaming graph was built to analyze event streams in real time and drive event pipelines. Now our users are coupling them with digital twins and asset graphs to create accurate, up-to-the-second views of their infrastructure.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-potential-of-real-time-digital-twins&quot;&gt;&lt;strong&gt;The potential of real-time digital twins&lt;&#x2F;strong&gt;&lt;&#x2F;h2&gt;
&lt;p&gt;When things go wrong, our instinct is to retrace our process to pinpoint the problem. This process can be slow and frustrating. So, imagine if AI could step in and flag the misstep with data in real time. What if you could pinpoint the exact place — a blockage in a manufacturing process, a small interruption in a logistics chain that could bring deliveries to a halt or the failure of a piece of critical cloud infrastructure, for example?&lt;&#x2F;p&gt;
&lt;p&gt;What if you could combine real-time alerting and diagnostics with your digital twin? As our world becomes increasingly connected, digital twins are being used to abstract and model almost everything to improve business operations, reduce risk, and enhance decision-making for better outcomes. They provide greater context to challenges by creating clear relationships and streamlining workflows as a virtual representation of the real world, including physical objects, processes, relationships, and behaviors.&lt;&#x2F;p&gt;
&lt;p&gt;Even though they serve a valuable function as part of the enterprise technology toolkit, digital twins are not a technology per se but a specification for the structure and use of real-time data. But this concept is still relatively new, and as data is moving rapidly, how do we maximize the outcomes and role of digital twins?&lt;&#x2F;p&gt;
&lt;h2 id=&quot;seeing-digital-doubles-an-introduction&quot;&gt;&lt;strong&gt;Seeing digital doubles: An introduction&lt;&#x2F;strong&gt;&lt;&#x2F;h2&gt;
&lt;p&gt;IBM defines digital twins as “a virtual representation of an object or system that spans its lifecycle, is updated from real-time data, and uses simulation, machine learning, and reasoning to help decision-making.” In short, they can connect digital and physical items through data.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;lh7-us.googleusercontent.com&#x2F;docsz&#x2F;AD_4nXfe4dUAWHYTRodsPdMtjJ2Iu3WvjvHB1PD_DWCaLc13XmOa2of_txdm2VOVLmHrkvbM7Q2SaVUAG4uIYgtV933yG0rCxrPW7a3THPCFJZxc9uP-cXxUlvKpPoQWP_RQ60t8D3yj4e9YHAgxPpv7Q-Uh4psN?key=3kurZADcCprK-t-VsgeSzg&quot; alt=&quot;Digital Twin example&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;According to McKinsey, “by 2025, smart workflows and seamless interactions among humans and machines will likely be as standard as the corporate balance sheet, and most employees will use data to optimize nearly every aspect of their work.” Realistically, that’s only a couple of years away, meaning we have a short time to get this right. Currently, digital twins are helping revolutionize engineering, security, eCommerce, supply chain issues, and manufacturing to ensure better outcomes across industries.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;embrace-and-enhance&quot;&gt;&lt;strong&gt;Embrace and enhance&lt;&#x2F;strong&gt;&lt;&#x2F;h3&gt;
&lt;p&gt;Until recently, digital twins were used to simulate real-world processes rather than interact with the world in real time. Either synthetically generated or previously captured data was run (and rerun) in controlled scenarios. As a design and diagnostic aid in product lifecycle management, digital twins have proven enormously helpful (think NASA engineers during the Apollo 13 mission for the early use of a twin).&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;lh7-us.googleusercontent.com&#x2F;docsz&#x2F;AD_4nXfnDExGzZkwr9sX4rE5H-hOdC2Y0dPJKkoFspOAREaF7TGJ24ZaP0jHkh4GVVDbWWruo7qgEM8B2cPkmohg_PcBPGtNT4Avof-oQjhxeyLuMLLnyd74tJvygXSYyn-_2YvyO-DIyDDJFYu4gu8yy_pFPxrJ?key=3kurZADcCprK-t-VsgeSzg&quot; alt=&quot;Digital twins for industrial processes.&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;Digital twins for industrial processes.&lt;&#x2F;p&gt;
&lt;p&gt;But as enterprises increasingly feel the pressure to replace offline batch processing of event data with real-time event processing, digital twins will need to go real time to remain relevant and valuable. This means moving the digital twin out of legacy databases, and in particular, graph databases, and into systems capable of processing potentially vast amounts of data, usually arriving via data pipeline software like Apache &lt;a href=&quot;https:&#x2F;&#x2F;www.thatdot.com&#x2F;blog&#x2F;quine-streaming-graph-is-a-natural-fit-for-kafka-pipelines&#x2F;&quot;&gt;&lt;strong&gt;Kafka&lt;&#x2F;strong&gt;&lt;&#x2F;a&gt; or Spark, the instant it arrives.&lt;&#x2F;p&gt;
&lt;p&gt;Such systems, &lt;a href=&quot;https:&#x2F;&#x2F;www.thatdot.com&#x2F;products&#x2F;quine&#x2F;&quot;&gt;&lt;strong&gt;streaming graphs&lt;&#x2F;strong&gt;&lt;&#x2F;a&gt;, have evolved in recent years and combine the complex event processing capabilities of Flink and ksqlDB with the powerful data structures popularized by Neo4J, a traditional &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Graph_database&quot;&gt;&lt;strong&gt;graph database&lt;&#x2F;strong&gt;&lt;&#x2F;a&gt;. Going real time, in this case, means more than just processing data as it streams in, though. For digital twins to be truly useful, they must be able to drive actions — for example, issue alerts or power down equipment — the instant an issue emerges, perhaps even beforehand.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;build-a-real-time-streaming-asset-graph&quot;&gt;&lt;strong&gt;Build a real-time, streaming asset graph&lt;&#x2F;strong&gt;&lt;&#x2F;h3&gt;
&lt;p&gt;New streaming graph systems have evolved to embed query logic and compute resources in line with data flows. They act like nets stretched across the data stream to capture interesting patterns as they race past and trigger workflows where instant matches occur. If our goal is to streamline and make our data processing more precise, &lt;strong&gt;digital twins need to be graph and real time&lt;&#x2F;strong&gt;. The best way to do this is by embracing and engaging with streaming graph, which combines event stream processing with the ability to query graph data.&lt;&#x2F;p&gt;
&lt;p&gt;The volume of events being handled in the physical world must be translated into tangible and usable data. By making digital twins real time and graph, we can take the training wheels off this area of AI&#x2F;ML and allow it to run at its highest potential for maximum business impact. ThatDot makes the only technology that combines event processing with graph (&lt;strong&gt;graph data pipelines&lt;&#x2F;strong&gt; is a simple way to describe it) and, as such, is tailor made for digital twins.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;build-your-own-digital-twin-with-thatdot&quot;&gt;&lt;strong&gt;Build Your Own Digital Twin with thatDot&lt;&#x2F;strong&gt;&lt;&#x2F;h2&gt;
&lt;p&gt;thatDot software is available in both open source Quine and commercial &lt;a href=&quot;https:&#x2F;&#x2F;www.thatdot.com&#x2F;products&#x2F;quine&#x2F;&quot;&gt;&lt;strong&gt;Streaming Graph&lt;&#x2F;strong&gt;&lt;&#x2F;a&gt;. You can try it yourself. Learn how to ingest your own data and build a streaming graph that can detect all sorts of changes and problems in real time.&lt;&#x2F;p&gt;
&lt;ol&gt;
&lt;li&gt;Try &lt;strong&gt;&lt;a href=&quot;https:&#x2F;&#x2F;www.thatdot.com&#x2F;getting-started&#x2F;&quot;&gt;Streaming Graph&lt;&#x2F;a&gt;&lt;&#x2F;strong&gt; free for yourself.&lt;&#x2F;li&gt;
&lt;li&gt;Learn more about &lt;strong&gt;&lt;a href=&quot;https:&#x2F;&#x2F;www.thatdot.com&#x2F;products&#x2F;quine&#x2F;&quot;&gt;thatDot Streaming Graph&lt;&#x2F;a&gt;&lt;&#x2F;strong&gt;.&lt;&#x2F;li&gt;
&lt;li&gt;Join the &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;that.re&#x2F;chat&quot;&gt;&lt;strong&gt;Quine Discord Community&lt;&#x2F;strong&gt;&lt;&#x2F;a&gt; and get help from thatDot engineers and community members.&lt;&#x2F;li&gt;
&lt;li&gt;Check out the &lt;a href=&quot;https:&#x2F;&#x2F;www.thatdot.com&#x2F;blog&#x2F;ingesting-from-multiple-data-sources-into-quine-streaming-graph&#x2F;&quot;&gt;&lt;strong&gt;Ingest Data into Quine&lt;&#x2F;strong&gt;&lt;&#x2F;a&gt; blog series covering everything from &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;docs.quine.io&#x2F;components&#x2F;ingest-sources&#x2F;kafka.html&quot;&gt;&lt;strong&gt;ingest from Kafka&lt;&#x2F;strong&gt;&lt;&#x2F;a&gt; to ingesting .CSV data&lt;&#x2F;li&gt;
&lt;li&gt;Download open source Quine – &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;quine.io&#x2F;download&quot;&gt;&lt;strong&gt;JAR file&lt;&#x2F;strong&gt;&lt;&#x2F;a&gt; | &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;hub.docker.com&#x2F;r&#x2F;thatdot&#x2F;quine&quot;&gt;&lt;strong&gt;Docker Image&lt;&#x2F;strong&gt;&lt;&#x2F;a&gt; | &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;thatdot&#x2F;quine&quot;&gt;&lt;strong&gt;Github&lt;&#x2F;strong&gt;&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;&#x2F;ol&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>What is the difference between batch and stream processing?</title>
        <published>2022-10-25T00:00:00+00:00</published>
        <updated>2022-10-25T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              Unknown
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://www.thatdot.com/blog/what-is-the-difference-between-batch-and-stream-processing/"/>
        <id>https://www.thatdot.com/blog/what-is-the-difference-between-batch-and-stream-processing/</id>
        
        <content type="html" xml:base="https://www.thatdot.com/blog/what-is-the-difference-between-batch-and-stream-processing/">&lt;h2 id=&quot;is-it-better-to-fix-a-problem-now-or-later&quot;&gt;&lt;strong&gt;Is it better to fix a problem now or later?&lt;&#x2F;strong&gt;&lt;&#x2F;h2&gt;
&lt;p&gt;The typical answer when someone describes the difference between &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;www.techopedia.com&#x2F;definition&#x2F;5417&#x2F;batch-processing&quot;&gt;&lt;strong&gt;batch processing&lt;&#x2F;strong&gt;&lt;&#x2F;a&gt; and &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;www.techopedia.com&#x2F;definition&#x2F;33914&#x2F;stream-processing&quot;&gt;&lt;strong&gt;stream processing&lt;&#x2F;strong&gt;&lt;&#x2F;a&gt; is that batch data is collected, stored for a period of time, and processed and put to use at regular intervals (e.g. payroll, bank statements) while streaming data is processed and put to use as close to the instant it is generated (think of alerts from sensor data).&lt;&#x2F;p&gt;
&lt;p&gt;While accurate, this answer fails to capture why the difference is important and why companies are moving decisively toward stream processing architectures. We experience the world as a constant stream of events. We make decisions by comparing this stream of information to our experiences and memories.&lt;&#x2F;p&gt;
&lt;p&gt;We perceive and react to threats or recognize and seize opportunities. And often reacting in a timely fashion is rewarding – we avoid the snake bite or grab the best seat at the movie theater. Stream processing more closely reflects this very human mode of experience.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;lh7-us.googleusercontent.com&#x2F;docsz&#x2F;AD_4nXcKcTLjCGQ7BuuhMMdFtpeDrcMmxYHqiiaBU-qgbTkby-mU9luaEdWq2OlfgN0d0TjEa8SyMe2W6PTx-d15ep5UUCXIFgkKJiDYOzsGSvqu98mxA5pw5S4vAspcFTXpZUJhmr9rD8SYhFVz4w60FXFNVdpv?key=4GmsSBMLhuN1Avwfi2QGdQ&quot; alt=&quot;Batch processing vs Stream Processing example &quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;Enterprises ingest as many streams of information as they can handle, look for patterns in the data that represent threats or opportunities as it flows past, and when said patterns emerge, they act. The cost of not acting could be a data breach or a lost revenue opportunity. Batch processing still works well when you need to process huge amounts of data and the results can be delivered at regular intervals. But if recent trends hold, more of these jobs will move to streaming because companies can’t accept the hidden cost of batch any longer and remain competitive.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;counting-the-cost-of-not-acting&quot;&gt;&lt;strong&gt;Counting the Cost of Not Acting&lt;&#x2F;strong&gt;&lt;&#x2F;h2&gt;
&lt;p&gt;A great example is insider trading. The cost of detecting someone who is about to execute an insider trade is now much less than the cost of trying to unwind that trade later when batch processing picks it up. Even if the batch process runs every five minutes, that just means you’ll find them sooner, not stop them.&lt;&#x2F;p&gt;
&lt;p&gt;Ultimately stream vs. batch will show up in the balance sheet and the stock price. The one potential argument against streaming is that it might not handle the amount of data as cost effectively as batch handles. However, with the advent of systems like Kafka, Flink, and their cloud analogues, such cases are getting rare.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;quine-stream-graph-for-etl-pipelines&quot;&gt;&lt;strong&gt;Quine Stream Graph for ETL Pipelines&lt;&#x2F;strong&gt;&lt;&#x2F;h2&gt;
&lt;p&gt;We build Quine to not just detect emerging patterns of interest in high volumes of data but to act on the results with sub-millisecond latency. Practically speaking, this means finding evidence of a password spray attack or streaming CDN service interruptions when they are technical issues and before they become business issues.&lt;&#x2F;p&gt;
&lt;p&gt;Quine consumes event data from one or more streams originating in Kafka, Kinesis, or data lakes, uses a graph data structure to materialize the often complex relationships between events that evidence important system or user behavior. Quine uses &lt;strong&gt;standing queries&lt;&#x2F;strong&gt; to trigger actions like sending alerts or updating machine learning models the instant such patterns become apparent.&lt;&#x2F;p&gt;
&lt;p&gt;Far from acting as a passive filter, Quine actually drives the workflow. And Quine scales to meet the needs of modern enterprises, as this test demonstrating Quine’s ability to process and alert on one million events&#x2F;second demonstrates.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;when-to-use-quine&quot;&gt;&lt;strong&gt;When to Use Quine&lt;&#x2F;strong&gt;&lt;&#x2F;h2&gt;
&lt;p&gt;Batch processing is great for jobs where response time doesn’t matter. And batch processing tools have been around for a long time so you have your choice. But for jobs where the cost of not knowing and therefore not acting are unacceptable, Quine is idea. For use cases like &lt;strong&gt;financial fraud detection,video observability&lt;&#x2F;strong&gt;, and &lt;strong&gt;manufacturing process management&lt;&#x2F;strong&gt; using a digital twin, Quine streaming graph is really the only choice.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;getting-started&quot;&gt;&lt;strong&gt;Getting Started&lt;&#x2F;strong&gt;&lt;&#x2F;h2&gt;
&lt;p&gt;If you want to try Quine using your own data, here are some resources to help:&lt;&#x2F;p&gt;
&lt;ol&gt;
&lt;li&gt;Download Quine – &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;quine.io&#x2F;download&quot;&gt;&lt;strong&gt;JAR file&lt;&#x2F;strong&gt;&lt;&#x2F;a&gt; | &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;hub.docker.com&#x2F;r&#x2F;thatdot&#x2F;quine&quot;&gt;&lt;strong&gt;Docker Image&lt;&#x2F;strong&gt;&lt;&#x2F;a&gt; | &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;thatdot&#x2F;quine&quot;&gt;&lt;strong&gt;Github&lt;&#x2F;strong&gt;&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;li&gt;Start learning about Quine now by visiting the &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;quine.io&#x2F;&quot;&gt;&lt;strong&gt;Quine open source project&lt;&#x2F;strong&gt;&lt;&#x2F;a&gt;.&lt;&#x2F;li&gt;
&lt;li&gt;Check out the &lt;strong&gt;Ingest Data into Quine&lt;&#x2F;strong&gt; blog series covering everything from &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;docs.quine.io&#x2F;components&#x2F;ingest-sources&#x2F;kafka.html&quot;&gt;&lt;strong&gt;ingest from Kafka&lt;&#x2F;strong&gt;&lt;&#x2F;a&gt; to ingesting .CSV data&lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;quine.io&#x2F;recipes&#x2F;cdn-cache-efficiency-by-segment&quot;&gt;&lt;strong&gt;‍&lt;&#x2F;strong&gt;&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;quine.io&#x2F;recipes&#x2F;cdn-cache-efficiency-by-segment&quot;&gt;&lt;strong&gt;CDN Cache Efficiency Recipe&lt;&#x2F;strong&gt;&lt;&#x2F;a&gt; – this recipe provides more ingest pattern examples&lt;&#x2F;li&gt;
&lt;&#x2F;ol&gt;
&lt;p&gt;And if you require 24 x7 support or have high-volume use case and would like to try the Quine Enterprise, please contact us. You can also read more about &lt;strong&gt;Quine Enterprise here&lt;&#x2F;strong&gt;. Special thanks for the image used in the image to &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;unsplash.com&#x2F;@da_sikka_x?utm_source=unsplash&amp;amp;utm_medium=referral&amp;amp;utm_content=creditCopyText&quot;&gt;&lt;strong&gt;Amritanshu Sikdar&lt;&#x2F;strong&gt;&lt;&#x2F;a&gt; on &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;unsplash.com&#x2F;s&#x2F;photos&#x2F;water-droples?utm_source=unsplash&amp;amp;utm_medium=referral&amp;amp;utm_content=creditCopyText&quot;&gt;&lt;strong&gt;Unsplash.&lt;&#x2F;strong&gt;&lt;&#x2F;a&gt;&lt;&#x2F;p&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>See Quine in Action: 3 Live Demos Showing Graph ETL Use Cases</title>
        <published>2022-10-20T00:00:00+00:00</published>
        <updated>2022-10-20T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              Unknown
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://www.thatdot.com/news/see-quine-in-action-3-live-demos-showing-graph-etl-use-cases/"/>
        <id>https://www.thatdot.com/news/see-quine-in-action-3-live-demos-showing-graph-etl-use-cases/</id>
        
        <content type="html" xml:base="https://www.thatdot.com/news/see-quine-in-action-3-live-demos-showing-graph-etl-use-cases/">&lt;h2 id=&quot;nothing-beats-a-live-demo&quot;&gt;Nothing Beats a Live Demo&lt;&#x2F;h2&gt;
&lt;p&gt;Over the last three weeks, we&#x27;ve been fortunate enough to deliver presentations at events hosted by DataStax (makers of Apache Cassandra), Confluent (makers of Apache Kafka), and the PDX Video Tech Meetup (sponsored by AWS Elemental). Each video includes a live demo showing Quine in action and includes ways for you to follow along and go further. Enjoy!&lt;&#x2F;p&gt;
&lt;h2 id=&quot;datastax-hands-on-workshop-password-spray-detection&quot;&gt;DataStax Hands-On Workshop: Password Spray Detection&lt;&#x2F;h2&gt;
&lt;p&gt;I joined the team at DataStax to demonstrate Quine graph ETL in action using the Password Spray Detection recipe. In this hands-on workshop, we cover how to use Quine with AstraDB, DataStax&#x27;s Cassandra-as-a-Service DB. If you recall, we used the Cassandra persistor for our performance tests where Quine broke 1 million events&#x2F;second (read the &lt;a href=&quot;https:&#x2F;&#x2F;www.thatdot.com&#x2F;blog&#x2F;scaling-quine-streaming-graph-to-process-1-million-events-sec&#x2F;&quot;&gt;blog describing the reproducible tests here&lt;&#x2F;a&gt;).
&lt;div class=&quot;video-embed&quot;&gt;
  &lt;iframe src=&quot;https:&#x2F;&#x2F;www.youtube-nocookie.com&#x2F;embed&#x2F;IHgNmhPA7mA&quot; title=&quot;YouTube video&quot;
    frameborder=&quot;0&quot; loading=&quot;lazy&quot;
    allow=&quot;accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture&quot;
    allowfullscreen&gt;&lt;&#x2F;iframe&gt;
&lt;&#x2F;div&gt;

You can access the Github repo with the recipe and the excellent and comprehensive &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;datastaxdevs&#x2F;workshop-streaming-graph-quine&#x2F;blob&#x2F;main&#x2F;README.md&quot;&gt;README&lt;&#x2F;a&gt; here: &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;datastaxdevs&#x2F;workshop-streaming-graph-quine&quot;&gt;https:&#x2F;&#x2F;github.com&#x2F;datastaxdevs&#x2F;workshop-streaming-graph-quine&lt;&#x2F;a&gt;&lt;&#x2F;p&gt;
&lt;h2 id=&quot;confluent-current22-demo-advanced-persistent-threat-use-case-with-apache-kafka&quot;&gt;Confluent Current22 Demo: Advanced Persistent Threat Use Case with Apache Kafka&lt;&#x2F;h2&gt;
&lt;p&gt;Ryan Wright (@rrwright) delivered a bite-sized demonstration of how to use Quine to detect APT attacks. At 10 minutes, this demo packs a lot of useful information into a lightning talk format and points to many of the features that make streaming graph ETL an essential tool for cybersecurity solutions.
&lt;div class=&quot;video-embed&quot;&gt;
  &lt;iframe src=&quot;https:&#x2F;&#x2F;www.youtube-nocookie.com&#x2F;embed&#x2F;DKENFhSWzAI&quot; title=&quot;YouTube video&quot;
    frameborder=&quot;0&quot; loading=&quot;lazy&quot;
    allow=&quot;accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture&quot;
    allowfullscreen&gt;&lt;&#x2F;iframe&gt;
&lt;&#x2F;div&gt;

Slides and Kafka resources &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;quine.io&#x2F;current22&quot;&gt;are available here&lt;&#x2F;a&gt;, including more info on how to add Quine graph ETL to Apache Kafka data pipelines.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;pdx-vid-tech-meetup-real-time-video-cdn-root-cause-analysis&quot;&gt;PDX Vid Tech Meetup: Real-time Video CDN Root Cause Analysis&lt;&#x2F;h2&gt;
&lt;p&gt;Rob Malnati (@robmalnati) and Allan Konar (@7evenbridges)presented a live demonstration using Quine to ingest, transform, sessionize, and analyze log data from CloudFront, AWS Elemental, and Mux client APIs. Video QoE&#x2F;S issues were identified in real time and root cause analysis notifications were generated automatically. This example uses AWS Kinesis for the event stream feed.
&lt;div class=&quot;video-embed&quot;&gt;
  &lt;iframe src=&quot;https:&#x2F;&#x2F;www.youtube-nocookie.com&#x2F;embed&#x2F;sbNCkd32ntA&quot; title=&quot;YouTube video&quot;
    frameborder=&quot;0&quot; loading=&quot;lazy&quot;
    allow=&quot;accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture&quot;
    allowfullscreen&gt;&lt;&#x2F;iframe&gt;
&lt;&#x2F;div&gt;

We will be publishing the recipe for this soon but a related recipe --&lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;quine.io&#x2F;recipes&#x2F;cdn-cache-efficiency-by-segment&quot;&gt;CDN Cache Efficiency&lt;&#x2F;a&gt; -- is available now to try. You can also read about Kinesis integration here.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;download-and-try&quot;&gt;Download and Try&lt;&#x2F;h2&gt;
&lt;p&gt;If you want to try it on your own logs, here are some resources to help:&lt;&#x2F;p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;docs.quine.io&#x2F;getting-started&quot;&gt;Getting Started Guide&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;li&gt;Download Quine - &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;quine.io&#x2F;download&quot;&gt;JAR file&lt;&#x2F;a&gt; | &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;hub.docker.com&#x2F;r&#x2F;thatdot&#x2F;quine&quot;&gt;Docker Image&lt;&#x2F;a&gt; | &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;thatdot&#x2F;quine&quot;&gt;Github&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;li&gt;Check out the &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;www.thatdot.com&#x2F;blog?*=Ingest&quot;&gt;Ingest Data into Quine&lt;&#x2F;a&gt; blog series covering everything from &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;docs.quine.io&#x2F;components&#x2F;ingest-sources&#x2F;kafka.html&quot;&gt;ingest from Kafka&lt;&#x2F;a&gt; to ingesting .CSV data&lt;&#x2F;li&gt;
&lt;&#x2F;ol&gt;
&lt;p&gt;And please don&#x27;t hesitate to &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;that.re&#x2F;quine-slack&quot;&gt;sign up for Quine community slack&lt;&#x2F;a&gt;. There are lively discussions and it is a great place to get fast answers to pressing question. You can find me there or on Twitter (@michaelaglietti).&lt;&#x2F;p&gt;
&lt;p&gt;Thanks!&lt;&#x2F;p&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>Great Quine Community Events for October 2022</title>
        <published>2022-10-03T00:00:00+00:00</published>
        <updated>2022-10-03T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              Unknown
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://www.thatdot.com/events/great-quine-community-events-for-october-2022/"/>
        <id>https://www.thatdot.com/events/great-quine-community-events-for-october-2022/</id>
        
        <content type="html" xml:base="https://www.thatdot.com/events/great-quine-community-events-for-october-2022/">&lt;h2 id=&quot;october-2022-events-meet-the-quine-team-learn-about-streaming-graph&quot;&gt;&lt;strong&gt;October 2022 Events: Meet the Quine Team, Learn About Streaming Graph&lt;&#x2F;strong&gt;&lt;&#x2F;h2&gt;
&lt;p&gt;Our thatDot team is focused on creating sessions to educate and inform key audiences about streaming graph and Quine while helping developers get the most out of their data.&lt;&#x2F;p&gt;
&lt;p&gt;If you will be at one of these events (either online or in-person) and would like to schedule a meeting, or if you’re interested in the topics below but won’t be able to attend, reach out at info@thatdot.com. We can connect you with our team of experts who are presenting afterward.&lt;&#x2F;p&gt;
&lt;p&gt;Our team will be at the following events in October (and November):&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;uploads-ssl.webflow.com&#x2F;61f0aecf55af2565526f6a95&#x2F;633b6276e7e657750e8caefe_Current2022%20.png&quot; alt=&quot;Image showing info from Current22 website: Oct 4-5, Austin Texas. The Next Generation of Kafka Summit.&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;‍&lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;2022.currentevent.io&#x2F;website&#x2F;39543&#x2F;speakers&#x2F;&quot;&gt;&lt;strong&gt;Current: The Next Generation of Kafka Summit&lt;&#x2F;strong&gt;&lt;&#x2F;a&gt;&lt;strong&gt;,&lt;&#x2F;strong&gt; Ryan Wright, CEO and Founder&lt;strong&gt;‍&lt;&#x2F;strong&gt;&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Austin, TX, October 4-5&lt;&#x2F;strong&gt;&lt;&#x2F;p&gt;
&lt;p&gt;Join the first-ever data streaming industry event at Current 2022: The Next Generation of Kafka Summit. You’ll be able to immerse yourself in all things real-time data with peers, industry analysts, expert speakers, and more. Current captures the fast-moving data streaming movement, bringing this broad community together for learning, sharing, and networking to help you unlock the value from data.&lt;&#x2F;p&gt;
&lt;p&gt;Session:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;2022.currentevent.io&#x2F;website&#x2F;39543&#x2F;agenda&#x2F;&quot;&gt;Build a Streaming Graph Pipeline on Kafka with Quine&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;‍&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;uploads-ssl.webflow.com&#x2F;61f0aecf55af2565526f6a95&#x2F;633b6212de0526a9fae2e3c3_8cUJgE_hYYDGFNMoWP43isAdvoJghN2s9CVKOAChy0vvMg7a-YL1dkZ1iTeFK-5DuEK9DQ-zLIO25tIRWZOxNAbufpnUXuySaEsQOwJA006wmwGWPTKvSg1rgMmw32j84DQH2YZ6bhrR8h9tcPJsYljzvlvIUbcOBLhsBeK4bH47qB2dyVHc3Kp38zej.png&quot; alt=&quot;A modified movie poster from Analyze This, showing one of the stars (De Niro) contemplating a Quine logo with his analyst (Crystal) in background.&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;‍&lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;www.meetup.com&#x2F;pdx-video-tech&#x2F;events&#x2F;288533649&#x2F;&quot;&gt;&lt;strong&gt;PDX Video Tech Meetup&lt;&#x2F;strong&gt;&lt;&#x2F;a&gt;&lt;strong&gt;,&lt;&#x2F;strong&gt; Rob Malnati, COO, and Allan Konar, Solution Architect&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Portland, OR, October 5&lt;&#x2F;strong&gt;&lt;&#x2F;p&gt;
&lt;p&gt;Sponsored by AWS Elemental, this networking event will be held downtown Portland at Lucky Labrador Beer Hall and will bring local industry peers together for video tech talks.&lt;&#x2F;p&gt;
&lt;p&gt;Session:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;Analyze This! - Real-time Video Root Cause Analysis Using CMCD (&lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;that.re&#x2F;pdxvid&quot;&gt;read more&lt;&#x2F;a&gt;)&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;‍
&lt;div class=&quot;video-embed&quot;&gt;
  &lt;iframe src=&quot;https:&#x2F;&#x2F;www.youtube-nocookie.com&#x2F;embed&#x2F;RKGicPJbEOU&quot; title=&quot;YouTube video&quot;
    frameborder=&quot;0&quot; loading=&quot;lazy&quot;
    allow=&quot;accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture&quot;
    allowfullscreen&gt;&lt;&#x2F;iframe&gt;
&lt;&#x2F;div&gt;

‍&lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;that.re&#x2F;astraworkshop&quot;&gt;&lt;strong&gt;DataStax Workshop,&lt;&#x2F;strong&gt;&lt;&#x2F;a&gt; Michael Aglietti, Director of Developer Relations&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Virtual, October 19, Wednesday 08: 00 am PT 16:00 pm GMT 17:00 pm CET 20.30 pm IST&lt;&#x2F;strong&gt;&lt;&#x2F;p&gt;
&lt;p&gt;DataStax events are great venues for networking with colleagues, learning from real-world DataStax and Apache Cassandra™ use cases, and discovering new approaches to thrive in the new decade of open-source, scale-out, and cloud-native data in-person and online.&lt;&#x2F;p&gt;
&lt;p&gt;Workshop Session:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;http:&#x2F;&#x2F;that.re&#x2F;astraworkshop&quot;&gt;Real-time Graph ETL for Modern Data Pipelines with Quine and Cassandra&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;‍&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;uploads-ssl.webflow.com&#x2F;61f0aecf55af2565526f6a95&#x2F;633b647444b34466e2492f2e_reactive%20summit.png&quot; alt=&quot;&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;&lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;events.linuxfoundation.org&#x2F;reactive-summit&#x2F;&quot;&gt;&lt;strong&gt;Reactive Summit 2022,&lt;&#x2F;strong&gt;&lt;&#x2F;a&gt;Ryan Wright, CEO and Founder&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Detroit, MI, October 25&lt;&#x2F;strong&gt;&lt;&#x2F;p&gt;
&lt;p&gt;Reactive Summit is where application architects and developers learn and collaborate on the latest Reactive patterns and projects for building distributed systems using Serverless, Cloud Native Design, Reactive programming, Reactive systems, Reactive Streams, event-sourcing, microservices, and more.&lt;&#x2F;p&gt;
&lt;p&gt;Session&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;reactivesummit2022.sched.com&#x2F;event&#x2F;1B6Wi&quot;&gt;Streaming Graphs, Because We Can&#x27;t Afford to Query Any More&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;h3 id=&quot;and-a-teaser-of-what-is-coming-in-november&quot;&gt;And a teaser of what is coming in November:&lt;&#x2F;h3&gt;
&lt;p&gt;&lt;strong&gt;DataStax Cassandra Day Workshops&lt;&#x2F;strong&gt; (sessions take place the same day in two locations):&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;Seattle, Ryan Wright, CEO and Founder&lt;&#x2F;li&gt;
&lt;li&gt;Houston, Michael Aglietti, Director, Developer Relations&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;&lt;strong&gt;Seattle, WA, and Houston, TX,  November 10&lt;&#x2F;strong&gt;&lt;&#x2F;p&gt;
&lt;p&gt;DataStax events are great venues for networking with colleagues, learning from real-world DataStax and Apache Cassandra™ use cases, and discovering new approaches to thrive in the new decade of open-source, scale-out, and cloud-native data in-person and online.&lt;&#x2F;p&gt;
&lt;p&gt;We’ll have specific session info soon so watch this space!&lt;&#x2F;p&gt;
&lt;h2 id=&quot;why-you-should-attend&quot;&gt;Why You Should Attend&lt;&#x2F;h2&gt;
&lt;p&gt;Cybersecurity experts face immense challenges with frequent data breaches dominating the headlines. Traditional anomaly detectors often fall short in identifying and neutralizing threats in real-time. They require constant human tweaking of threat signatures and sensitivity levels to avoid exhausting professionals with mountains of false positive alerts. What if you could build contextual awareness into the application?&lt;&#x2F;p&gt;
&lt;p&gt;Join us to discover how thatDot Novelty, powered by the open-source technology Quine, is revolutionizing real-time threat detection and response. This cutting-edge technology combines event stream processing speed with a built-in AI that learns the contextual fingerprint of your data environment, and pinpoints problems automatically. Developed in a DARPA project, thatDot Novelty and thatDot Streaming Graph provide unparalleled capabilities in:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;Advanced Persistent Threat Detection&lt;&#x2F;li&gt;
&lt;li&gt;Insider Threat Detection&lt;&#x2F;li&gt;
&lt;li&gt;Attack Graph Analysis&lt;&#x2F;li&gt;
&lt;li&gt;Digital Twins&lt;&#x2F;li&gt;
&lt;li&gt;And many more critical use cases&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;Don’t miss this opportunity to learn from industry experts and gain a competitive edge in cybersecurity. Secure your spot today and be part of the future of threat detection and data analytics.&lt;&#x2F;p&gt;
&lt;p&gt;We look forward to your participation.&lt;&#x2F;p&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>Scaling Quine Streaming Graph to Process 1 Million Events&#x2F;Second</title>
        <published>2022-09-27T00:00:00+00:00</published>
        <updated>2022-09-27T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              Unknown
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://www.thatdot.com/blog/scaling-quine-streaming-graph-to-process-1-million-events-sec/"/>
        <id>https://www.thatdot.com/blog/scaling-quine-streaming-graph-to-process-1-million-events-sec/</id>
        
        <content type="html" xml:base="https://www.thatdot.com/blog/scaling-quine-streaming-graph-to-process-1-million-events-sec/">&lt;p&gt;&lt;strong&gt;Note&lt;&#x2F;strong&gt;: If you want to reproduce this test, we have published the &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;thatdot&#x2F;1m-scripts&quot;&gt;test details on Github&lt;&#x2F;a&gt; so that you can understand and run it yourself.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;solving-the-unsolvable-graph-that-scales-past-1-million-events-second&quot;&gt;Solving the Unsolvable: Graph that Scales Past 1 Million Events&#x2F;Second&lt;&#x2F;h2&gt;
&lt;p&gt;This is not a blog post about benchmarking Quine streaming graph. This is a post with an operational focus that explains how Quine solves the previously unsolvable: scaling graph data processing past a million events per second. In conventional terms, that means millions of simultaneous writes and multi-node graph traversals per second -- an unprecedented achievement.&lt;&#x2F;p&gt;
&lt;p&gt;The tests this post covers also demonstrate Quine Enterprise&#x27;s resilience in the face of common failure scenarios.&lt;&#x2F;p&gt;
&lt;p&gt;Most importantly, this blog is about the new use cases for graph this performance makes possible. Finding relationships within categorical data is graph&#x27;s strongpoint. Doing so at scale, as Quine now makes possible, has significant implications for cyber security, fraud detection, observability, logistics, e-commerce, and really any use case graph is both well-suited for and which must process high velocity data in real time.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;uploads-ssl.webflow.com&#x2F;61f0aecf55af2565526f6a95&#x2F;632fcc0a24771a2089714ba0_Pasted%20image%2020220923174204.png&quot; alt=&quot;&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;Scaling to 1M+ events&#x2F;second and demonstrating recovery from various failure scenarios&lt;&#x2F;p&gt;
&lt;h2 id=&quot;tl-dr&quot;&gt;tl;dr&lt;&#x2F;h2&gt;
&lt;p&gt;Our tests delivered the following results:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;1M events&#x2F;second processed for a 2 hour period
&lt;ul&gt;
&lt;li&gt;1M+ writes per second&lt;&#x2F;li&gt;
&lt;li&gt;1M 4-node graph traversals (reads) per second&lt;&#x2F;li&gt;
&lt;li&gt;21K results (4-node pattern matches) emitted per second&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;&#x2F;li&gt;
&lt;li&gt;190 commodity hosts plus 1 hot spare running Quine Enterprise
&lt;ul&gt;
&lt;li&gt;We found this 190-host configuration to be significantly cheaper than the 140-host cluster initially tested.&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;&#x2F;li&gt;
&lt;li&gt;48 storage hosts using Apache Cassandra persistor&lt;&#x2F;li&gt;
&lt;li&gt;3 hosts for Apache Kafka&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;h2 id=&quot;what-is-quine&quot;&gt;What is Quine?&lt;&#x2F;h2&gt;
&lt;p&gt;For those of you new to Quine, the simplest way to describe it is “real-time graph ETL”.&lt;&#x2F;p&gt;
&lt;p&gt;Quine streaming graph combines the graph data structure and persistence of graph databases (e.g. Neo4J) with the streaming properties of systems like Flink. Drop Quine into a streaming system &lt;a href=&quot;https:&#x2F;&#x2F;www.thatdot.com&#x2F;blog&#x2F;quine-streaming-graph-is-a-natural-fit-for-kafka-pipelines&#x2F;&quot;&gt;between two Apache Kafka&lt;&#x2F;a&gt; or Kinesis instances and start materializing and querying your real-time events as a graph.&lt;&#x2F;p&gt;
&lt;p&gt;There’s a lot more to Quine of course, so if you are interested in how it works – asynchronous actor model, caching strategies, &quot;all nodes exist,&quot; and more – &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;docs.thatdot.com&#x2F;quine-enterprise&#x2F;core-concepts&#x2F;&quot;&gt;check out the docs&lt;&#x2F;a&gt;. In particular, see &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;docs.thatdot.com&#x2F;quine-enterprise&#x2F;core-concepts&#x2F;streaming-graph-vs-database&#x2F;&quot;&gt;the comparison to standard database systems&lt;&#x2F;a&gt;.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;quine-operational-profiling&quot;&gt;Quine Operational Profiling&lt;&#x2F;h2&gt;
&lt;p&gt;The goal of this test is to demonstrate a high-volume of sustained ingest, that is resilient to cluster node failure in both Quine and the persister using commodity infrastructure, and to share performance results along with details of the test for those interested in either reproducing results or running Quine in production.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;infrastructure-used&quot;&gt;Infrastructure Used&lt;&#x2F;h3&gt;
&lt;h4 id=&quot;quine-cluster&quot;&gt;Quine Cluster&lt;&#x2F;h4&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Number of Hosts:&lt;&#x2F;strong&gt; 191&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Host Type:&lt;&#x2F;strong&gt; n2&lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;pcr.cloud-mercato.com&#x2F;providers&#x2F;google&#x2F;flavors&#x2F;n2-custom-8-16.cascadelake&quot;&gt;-custom-8-16384&lt;&#x2F;a&gt;
&lt;ul&gt;
&lt;li&gt;8 vCPU, 16GB Intel Cascade Lake Max&lt;&#x2F;li&gt;
&lt;li&gt;JVM heap set to 12GB&lt;&#x2F;li&gt;
&lt;li&gt;190 cluster size with 1 hot spare&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Cost:&lt;&#x2F;strong&gt; $28.73&#x2F;hour&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;h4 id=&quot;cassandra-persistor-cluster&quot;&gt;Cassandra Persistor Cluster&lt;&#x2F;h4&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Number of Hosts:&lt;&#x2F;strong&gt; 48&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Host Type:&lt;&#x2F;strong&gt; &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;pcr.cloud-mercato.com&#x2F;providers&#x2F;google&#x2F;flavors&#x2F;n2d-custom-16-128.amdrome&quot;&gt;n2d-custom-16-131072&lt;&#x2F;a&gt;
&lt;ul&gt;
&lt;li&gt;16 vCPU, 128GB AMD Rome&lt;&#x2F;li&gt;
&lt;li&gt;1 x 375 GB local SSD each&lt;&#x2F;li&gt;
&lt;li&gt;TTL: 15 minutes on snapshots (to control disk costs in testing and journals tables)&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Cost:&lt;&#x2F;strong&gt; $21.07&#x2F;hour&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;h4 id=&quot;kafka&quot;&gt;Kafka&lt;&#x2F;h4&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Number of Hosts:&lt;&#x2F;strong&gt; 3&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Host Type:&lt;&#x2F;strong&gt; &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;pcr.cloud-mercato.com&#x2F;providers&#x2F;google&#x2F;flavors&#x2F;n2-standard-4.cascadelake&quot;&gt;n2-standard-4&lt;&#x2F;a&gt;
&lt;ul&gt;
&lt;li&gt;4 vCPU, 16GB RAM&lt;&#x2F;li&gt;
&lt;li&gt;Preloaded with 8 billion events (sufficient for a sustained 2-hour ingest at 1 million events per second)&lt;&#x2F;li&gt;
&lt;li&gt;420 partitions&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Estimated Cost:&lt;&#x2F;strong&gt; Part of the data pipeline, not estimated&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;h3 id=&quot;infrastructure-update&quot;&gt;Infrastructure Update&lt;&#x2F;h3&gt;
&lt;p&gt;Our initial testing, provisioned 141 &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;pcr.cloud-mercato.com&#x2F;providers&#x2F;google&#x2F;flavors&#x2F;c2-standard-30.cascadelake&quot;&gt;c2-standard-30&lt;&#x2F;a&gt; hosts. However, as we proceeded with further testing, we made an important discovery. By deploying a higher number of smaller &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;pcr.cloud-mercato.com&#x2F;providers&#x2F;google&#x2F;flavors&#x2F;n2-custom-8-16.cascadelake&quot;&gt;n2-custom-8-16384&lt;&#x2F;a&gt; hosts, we achieved the same overall performance while significantly reducing our monthly costs.&lt;&#x2F;p&gt;
&lt;p&gt;Using 191 smaller hosts proved to be a more cost-effective solution compared to the initial setup with 141 larger hosts. This adjustment allows us to maintain optimal performance while ensuring budget efficiency.&lt;&#x2F;p&gt;
&lt;p&gt;Regarding the Cassandra persistor layer’s settings, we set a TTL of 15 minutes and replication factor of 1 in order to manage quota limits and spending on cloud infrastructure. This does not fit every possible use case, but it is fairly common. Other scenarios which are more data-storage oriented will often increase the replication factor and&#x2F;or TTL. In those variations, maintaining the 1 million events&#x2F;sec processing rate would require increasing the number of Cassandra hosts or disk storage, both of which are budgetary concerns more than technical concerns.&lt;&#x2F;p&gt;
&lt;p&gt;This cluster configuration was meant to demonstrate that high-volume graph processing is possible. In a later post we&#x27;ll describe how to optimize the cluster to achieve these results and minimize infrastructure costs.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;the-test&quot;&gt;The Test&lt;&#x2F;h3&gt;
&lt;p&gt;The plan is set out below, with each action labeled and the results explained. Events are clearly marked by sequence # on the Grafana screen grabs below the table.&lt;&#x2F;p&gt;
&lt;p&gt;A few notes on the test:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;A script is used to generate events&lt;&#x2F;li&gt;
&lt;li&gt;Host failures are manually triggered.&lt;&#x2F;li&gt;
&lt;li&gt;We used Grafana for the results (and screenshots).&lt;&#x2F;li&gt;
&lt;li&gt;We pre-loaded Kafka with enough events to sustain &lt;strong&gt;one million events&#x2F;second&lt;&#x2F;strong&gt; for two hours.&lt;&#x2F;li&gt;
&lt;li&gt;A Cassandra cluster is used for persistent data storage. The Cassandra cluster is not over-provisioned to accommodate compaction intentionally (a common strategy) so that the effects of database maintenance on the ingest rate can be demonstrated.&lt;&#x2F;li&gt;
&lt;li&gt;The cluster is run in a Kubernetes environment&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;h3 id=&quot;sequence-actions-expected-results-and-actual-results-overview&quot;&gt;Sequence Actions, Expected Results, and Actual Results Overview&lt;&#x2F;h3&gt;
&lt;h4 id=&quot;sequence-1&quot;&gt;Sequence 1&lt;&#x2F;h4&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Action:&lt;&#x2F;strong&gt; Start the Quine cluster and begin ingest from Kafka.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Expected Result:&lt;&#x2F;strong&gt; The ingest rate increases and settles at or above 1 million events per second.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Actual Result:&lt;&#x2F;strong&gt; Observed.&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;h4 id=&quot;sequence-2&quot;&gt;Sequence 2&lt;&#x2F;h4&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Action:&lt;&#x2F;strong&gt; Let Quine run for 40 minutes to establish a stable baseline.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Expected Result:&lt;&#x2F;strong&gt; Quine does not fail and maintains a baseline ingest rate at or above 1 million events per second.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Actual Result:&lt;&#x2F;strong&gt; Observed.&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;h4 id=&quot;sequence-3&quot;&gt;Sequence 3&lt;&#x2F;h4&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Action:&lt;&#x2F;strong&gt; Kill a Quine host.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Expected Result:&lt;&#x2F;strong&gt; Quine ingest is not significantly impacted. The hot spare steps in to recover quickly, and Kubernetes replaces the killed host, which becomes a new hot spare.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Actual Result:&lt;&#x2F;strong&gt; Observed at 17:47. No impact to ingest rate. The hot spare recovered quickly, and ingest was not impacted.&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;h4 id=&quot;sequence-4&quot;&gt;Sequence 4&lt;&#x2F;h4&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Action:&lt;&#x2F;strong&gt; Perform Cassandra persistor maintenance.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Expected Result:&lt;&#x2F;strong&gt; Cassandra regularly performs maintenance, Quine experiences this as increased latency and should backpressure the ingest to maintain stability during database maintenance.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Actual Result:&lt;&#x2F;strong&gt; From 17:55 - 18:15, the ingest rate is reduced as a corresponding increase in latency is measured above 1ms across all nodes from the Cassandra persistor.&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;h4 id=&quot;sequence-5&quot;&gt;Sequence 5&lt;&#x2F;h4&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Action:&lt;&#x2F;strong&gt; Kill two Quine hosts.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Expected Result:&lt;&#x2F;strong&gt; Observe the following sequence: hot spare recovers one host, while the whole cluster suspends ingest due to being degraded. Kubernetes replaces killed hosts, the first replaced host recovers the cluster, and the second replaced host becomes the new hot spare.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Actual Result:&lt;&#x2F;strong&gt; Observed from 18:18 - 18:25. Due to Kubernetes, the impact was not visible. However, the expected sequence was confirmed in the logs.&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;h4 id=&quot;sequence-6&quot;&gt;Sequence 6&lt;&#x2F;h4&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Action:&lt;&#x2F;strong&gt; Stop and resume a Quine host for about 1 minute to inject high latency.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Expected Result:&lt;&#x2F;strong&gt; Quine detects the host is no longer available, boots it from the cluster, and the hot spare steps in to recover. When the rejected host resumes, it learns it was removed from the cluster, shuts down, is restarted by Kubernetes, and becomes the new hot spare.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Actual Result:&lt;&#x2F;strong&gt; Observed from 18:41 - 18:46. No impact on ingest rate as the back-pressured ingest was for a single host in the cluster, and the recovery happened quickly.&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;h4 id=&quot;sequence-7&quot;&gt;Sequence 7&lt;&#x2F;h4&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Action:&lt;&#x2F;strong&gt; Stop and resume a Cassandra persistor host for about 1 minute to inject high latency.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Expected Result:&lt;&#x2F;strong&gt; Quine back pressures ingest until Cassandra persistor has recovered.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Actual Result:&lt;&#x2F;strong&gt; Observed from 18:47 - 18:54. Due to replication factor = 1, ingest was impacted until Cassandra persistor recovered. Ingest then resumed to &amp;gt; 1M events per second.&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;h4 id=&quot;sequence-8&quot;&gt;Sequence 8&lt;&#x2F;h4&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Action:&lt;&#x2F;strong&gt; Kill a Cassandra persistor host.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Expected Result:&lt;&#x2F;strong&gt; Quine suspends ingest until Cassandra persistor recovers with a new host.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Actual Result:&lt;&#x2F;strong&gt; Observed from 18:54 - 19:10. The host was recovered quickly due to Kubernetes, and ingest briefly recovered to 1M events per second by 18:58 (only a few minutes).&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;h4 id=&quot;sequence-9&quot;&gt;Sequence 9&lt;&#x2F;h4&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Action:&lt;&#x2F;strong&gt; Perform Cassandra persistor maintenance.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Expected Result:&lt;&#x2F;strong&gt; Cassandra regularly performs maintenance. Quine experiences this as increased latency and should backpressure the ingest to maintain stability during database maintenance.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Actual Result:&lt;&#x2F;strong&gt; From 17:55 - 18:15, the ingest rate is reduced as a corresponding increase in latency is measured above 1ms across all nodes from the Cassandra persistor.&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;h4 id=&quot;sequence-10&quot;&gt;Sequence 10&lt;&#x2F;h4&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Action:&lt;&#x2F;strong&gt; Let Quine consume the remaining Kafka stream.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Expected Result:&lt;&#x2F;strong&gt; Observe the Quine hosts drop to zero events per second (not all at once).&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Actual Result:&lt;&#x2F;strong&gt; Observed from 19:10 - 19:35. Around the time Cassandra persistor latency was returning to 1ms, and ingest returned to 1M events per second. The pre-loaded ingest stream began to become exhausted on some hosts. For the following 20 minutes, hosts exhausted their partitions in the stream.&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;h3 id=&quot;the-results&quot;&gt;The Results&lt;&#x2F;h3&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;uploads-ssl.webflow.com&#x2F;61f0aecf55af2565526f6a95&#x2F;633480dddfc3666ae2fee81f_Overall%20Ingest%20Rate.png&quot; alt=&quot;A diagram showing sustained event processing of 1 million events per second.&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;Figure 1: Overall Ingest Rate&lt;&#x2F;p&gt;
&lt;p&gt;As you can see from the overall ingest rate results:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;#1 shows an initial peak of 1.25M events&#x2F;sec&lt;&#x2F;li&gt;
&lt;li&gt;#2 Quine settles into a steady ingest rate &amp;gt; 1 million events&#x2F;sec&lt;&#x2F;li&gt;
&lt;li&gt;#3 Quine recovers nicely after killing single node&lt;&#x2F;li&gt;
&lt;li&gt;Quine settles into a steady ingest rate &amp;gt; 1 million events&#x2F;sec&lt;&#x2F;li&gt;
&lt;li&gt;#s 4 and 9 show Cassandra maintenance event (see Cassandra Latency - Figure 3)&lt;&#x2F;li&gt;
&lt;li&gt;#5 Quine has no problem with two-node failure events.&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;We observed that a persistor node high-latency event (7) has a more marked impact on performance than either a Quine node failure (5) or an outright failure of a persistor node (8). In the case of a clear failure, Kubernetes is quick to replace the node, allowing ingest to resume. In cases when a persistence node state is non-responsive but not clearly down, Quine’s response is to back pressure ingest until the node is recovered.&lt;&#x2F;p&gt;
&lt;p&gt;An alternate variation on this test could use more persistor machines to stabilize ingest rates during maintenance events.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;uploads-ssl.webflow.com&#x2F;61f0aecf55af2565526f6a95&#x2F;63348178ffb386cf42e5406b_Ingest%20Rate%20Per%20Host.png&quot; alt=&quot;A per-host diagram demonstrates when different operational events impacted performance of both individual hosts and the overall cluster.&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;Figure 2: Per Host Ingest Rate - Quine Only&lt;&#x2F;p&gt;
&lt;p&gt;The individual Quine node ingest graphs indicate when individual nodes are offline and reinforces the observation that Quine Enterprise’s cluster resilience allows for smooth operation during high-volume ingest, even in the face of a Quine node shut down or failure. Quine’s overall performance, and hence an area of operational focus for anyone planning a production deployment, more closely conforms with persistor performance.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;uploads-ssl.webflow.com&#x2F;61f0aecf55af2565526f6a95&#x2F;633481f18e6e9a9ada113cdd_Cassandra%20Latency.png&quot; alt=&quot;Cassandra Latency events line up with overall decreases in cluster performance.&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;Figure 3: Cassandra Persistor Latency&lt;&#x2F;p&gt;
&lt;p&gt;The median query latency for the Cassandra cluster during this test was &amp;lt;1 ms. Even during&#x2F;following persistor shutdown (8) or node failure (7), cluster latency stayed &amp;lt; 1.5 ms. Events at (1), (5), and (8), all reflect increased latency for single nodes.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;standing-queries-and-1-million-4-node-traversals-per-second&quot;&gt;Standing Queries and 1 Million 4-node traversals per second&lt;&#x2F;h3&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;uploads-ssl.webflow.com&#x2F;61f0aecf55af2565526f6a95&#x2F;633482db1ba35cae8386af6c_Overall%20Standing%20Query%20Rate.png&quot; alt=&quot;&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;Figure 4: Standing Query Results (events emitted&#x2F;second)&lt;&#x2F;p&gt;
&lt;p&gt;The purpose of running any complex event processor, Quine included, is in detecting and acting on high-value events in real time. This could mean detecting indications of a cyber attack, or video stream buffering, or identifying e-commerce upsell opportunities at check out. This is where Quine really excels.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;docs.quine.io&#x2F;getting-started&#x2F;standing-queries-tutorial.html&quot;&gt;Standing queries&lt;&#x2F;a&gt; are a unique feature of Quine. They monitor streams for specified patterns, maintaining partial matches, and executing user-specified actions the instant a full match is made. Actions can include anything from updating the graph itself by creating new nodes or edges, writing results out to Kafka (or Kinesis, or posting results to a webhook).&lt;&#x2F;p&gt;
&lt;p&gt;In this test, Quine standing queries monitored for specific 4-node patterns requiring a 4-node traversal every time an event was ingested. Traditional graph databases slow down ingest when performing multi-node traversal. Not Quine. Quine’s ability to sustain high-speed data ingest together with simultaneous graph analysis is a revolutionary new capability. Not only did Quine ingest more than 1,000,000 events per second, it analyzed all that data in real-time to find more than 20,000 matches per second for complex graph patterns. This is a whole new world!&lt;&#x2F;p&gt;
&lt;h2 id=&quot;summary-results&quot;&gt;Summary Results&lt;&#x2F;h2&gt;
&lt;h3 id=&quot;resource-usage-and-performance-metrics-overview&quot;&gt;Resource Usage and Performance Metrics Overview&lt;&#x2F;h3&gt;
&lt;h4 id=&quot;quine-host-metrics&quot;&gt;Quine Host Metrics&lt;&#x2F;h4&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Description:&lt;&#x2F;strong&gt; GB RAM used per Quine host
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Value:&lt;&#x2F;strong&gt; 12 GB&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Description:&lt;&#x2F;strong&gt; CPU% used per Quine host
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Value:&lt;&#x2F;strong&gt; 60%&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;h4 id=&quot;cassandra-persistor-node-metrics&quot;&gt;Cassandra Persistor Node Metrics&lt;&#x2F;h4&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Description:&lt;&#x2F;strong&gt; CPU% used per Cassandra persistor node
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Value:&lt;&#x2F;strong&gt; 80%+&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;h4 id=&quot;performance-metrics&quot;&gt;Performance Metrics&lt;&#x2F;h4&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Description:&lt;&#x2F;strong&gt; Overall Ingest Event Records&#x2F;Second
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Value:&lt;&#x2F;strong&gt; &amp;gt;1,000,000&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Description:&lt;&#x2F;strong&gt; Standing Query Results&#x2F;Second
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Value:&lt;&#x2F;strong&gt; 21,000&#x2F;sec&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Description:&lt;&#x2F;strong&gt; Average Persistor Latency
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Value:&lt;&#x2F;strong&gt; 1 ms&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Description:&lt;&#x2F;strong&gt; Data Storage Disk Space Used (Cassandra)
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Value:&lt;&#x2F;strong&gt; 70 GB&#x2F;host&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;h2 id=&quot;why-quine-hitting-1-million-events-sec-matters&quot;&gt;Why Quine Hitting 1 Million Events&#x2F;Sec Matters&lt;&#x2F;h2&gt;
&lt;p&gt;Since its release in 2007 at the start of the NoSQL revolution, Neo4J have proven conclusively the value of graph to connect and find complex patterns in &lt;a href=&quot;https:&#x2F;&#x2F;www.thatdot.com&#x2F;blog&#x2F;whats-the-difference-between-categorical-and-numerical-data&#x2F;&quot;&gt;categorical data&lt;&#x2F;a&gt;.&lt;&#x2F;p&gt;
&lt;p&gt;The graph data model is indispensable to everything from fraud detection to network observability to cybersecurity. It is used for recommendation engines, logistics, and XDR&#x2F;EDR.&lt;&#x2F;p&gt;
&lt;p&gt;But not long after NoSQL hit the scene, Kafka kicked off the movement toward real-time event processing. Soon, event processors like Flink, Spark Streaming and ksqlDB brought the ability to process live streams. These systems relied on less-expressive key-value stores or slower document and relational databases to save intermediate data.&lt;&#x2F;p&gt;
&lt;p&gt;Quine is the graph analog and is important because now you can do what graph is really good at -- finding complex patterns across multiple streams of data using not just numerical but categorical data.&lt;&#x2F;p&gt;
&lt;p&gt;Quine makes all the great graph use cases viable at high volumes and in real time.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;next-steps&quot;&gt;Next Steps&lt;&#x2F;h2&gt;
&lt;p&gt;If you want help planning your own test, or you would like to try the Quine Enterprise, please contact us. You can also read more about &lt;a href=&quot;https:&#x2F;&#x2F;www.thatdot.com&#x2F;products&#x2F;quine&#x2F;&quot;&gt;Streaming Graph here&lt;&#x2F;a&gt;.&lt;&#x2F;p&gt;
&lt;p&gt;Or you can start learning about Quine now by visiting the &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;quine.io&#x2F;&quot;&gt;Quine open source project&lt;&#x2F;a&gt;. We have a Slack channel where folks can ask questions and we are always up for a call.&lt;&#x2F;p&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>Streaming Graph ETL: Real-time Video Observability Simplified</title>
        <published>2022-09-06T00:00:00+00:00</published>
        <updated>2022-09-06T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              Unknown
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://www.thatdot.com/blog/streaming-graph-etl-real-time-video-observability-simplified/"/>
        <id>https://www.thatdot.com/blog/streaming-graph-etl-real-time-video-observability-simplified/</id>
        
        <content type="html" xml:base="https://www.thatdot.com/blog/streaming-graph-etl-real-time-video-observability-simplified/">&lt;h2 id=&quot;a-live-event-stream-a-cdn-and-a-manifest-services-provider-walk-into-a-bar&quot;&gt;A Live Event Stream, a CDN, and a Manifest Services Provider Walk into a Bar&lt;&#x2F;h2&gt;
&lt;p&gt;Video observability, or the end-to-end monitoring of complex video streaming architectures, entails some of the most challenging aspects of data engineering. A live event will usually traverse three and sometimes more partner systems on its path from origin to the end user.&lt;&#x2F;p&gt;
&lt;p&gt;Until recently, no single service provider in this chain of delivery had access to performance metrics of upstream or downstream providers, making diagnosing and resolving issues more difficult. But as standards begin to emerge for data sharing between partners, a new challenge has emerged: how to combine enormous amounts of high cardinality and high dimensionality data, formatted inconsistently, into a single cohesive view that can be acted on in real time?&lt;&#x2F;p&gt;
&lt;p&gt;Quine is a streaming graph processor that provides a unique solution for solving all of these challenges by combining graph data modeling (e.g., Neo4J) with highly efficient event stream processing (e.g., Flink or ksqlDB).&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;uploads-ssl.webflow.com&#x2F;61f0aecf55af2565526f6a95&#x2F;63179a4f8e0b738d84a9b92a_jumble%20of%20wires.jpg&quot; alt=&quot;A complicated tangle of computer cables.&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;End-to-end video observability is complicated by the many platforms and services a single stream traverses.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;video-observability-is-hard&quot;&gt;Video Observability Is Hard&lt;&#x2F;h2&gt;
&lt;p&gt;End-to-end video observability is challenging for many reasons:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;‍&lt;strong&gt;Multiple services from multiple vendors&lt;&#x2F;strong&gt;: No one vendor operates the entire platform. Critical-path services include: origins, trans&#x2F;encoders, manifest services, entitlement services, ad services, CDNs, and video players. Each element generates its own logs with different formats for user, client, and session IDs, time stamps, and focuses on different parts of the platform.&lt;strong&gt;‍&lt;&#x2F;strong&gt;&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Many Dimensions to Track:&lt;&#x2F;strong&gt; Even when these logs are combined and somehow synchronized, you encounter high data dimensionality: device hardware configurations, device software versions, client IPs, server IPs, video player versions, video assets versions, etc. &lt;strong&gt;‍&lt;&#x2F;strong&gt;&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;High Data Cardinality:&lt;&#x2F;strong&gt; Subnets and IPs, country&#x2F;state&#x2F;city designations, time stamps, and the combination of these with all the above dimensions.&lt;strong&gt;‍&lt;&#x2F;strong&gt;&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Categorical Data:&lt;&#x2F;strong&gt; Most log data are non-numbers – URL strings, classifications, asset titles, IP addresses, etc. and while valuable, this data is often discarded. Encoding these values to numbers is very difficult to manage and not always useful.&lt;&#x2F;li&gt;
&lt;li&gt;‍&lt;strong&gt;Scale:&lt;&#x2F;strong&gt; Live video events generate significant volumes of data within very short time periods, as live broadcast events start for millions of viewers.&lt;strong&gt;‍&lt;&#x2F;strong&gt;&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Real-time events need real-time fixes&lt;&#x2F;strong&gt;: the time to fix a video streaming issue is when the issue is ruining the user experience, especially for live streams.&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;h2 id=&quot;multiplying-complexity&quot;&gt;Multiplying Complexity&lt;&#x2F;h2&gt;
&lt;p&gt;Any one of the above reasons presents a significant barrier to correctly detecting and diagnosing network issues. Take this entire matrix of possible data combinations and operational challenges together and you are faced with a significant challenge to model and analyze the end-to-end behavior of live events in time frames suitable for remediation. Additionally, the costs of legacy log analysis tools are prohibitive at scale. As a result,  most broadcasters monitor individual elements of the video delivery workflow and use intuition to link element behavior on the end-to-end system. Sometimes, this doesn’t &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;www.macrumors.com&#x2F;2022&#x2F;08&#x2F;19&#x2F;netflix-down-streaming-issues&#x2F;&quot;&gt;work out so well&lt;&#x2F;a&gt;.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;what-data-structure-for-video-observability&quot;&gt;What Data Structure For Video Observability?&lt;&#x2F;h2&gt;
&lt;p&gt;As described in an earlier blog, &lt;a href=&quot;https:&#x2F;&#x2F;www.thatdot.com&#x2F;blog&#x2F;defining-video-observability&#x2F;&quot;&gt;Defining Video Observability&lt;&#x2F;a&gt;, combining event data from each component of the video delivery workflow is required to understand the contributions of each component of the end-to-end video delivery experience.&lt;&#x2F;p&gt;
&lt;p&gt;Connecting the logs of Origins, Manifests services, and CDNs provides several meaningful benefits to:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;understand the impact of one system component on the complete system and other components.&lt;&#x2F;li&gt;
&lt;li&gt;build and measure KPIs that align with user experience.&lt;&#x2F;li&gt;
&lt;li&gt;prioritize issues based on their impact to user experience.&lt;&#x2F;li&gt;
&lt;li&gt;identify root causes of identified problems and issues.&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;Assembling log and event data into a holistic end-to-end view allows operators at any point in the experience stack to quickly identify an issue’s root cause and enables real-time remediation and automation. Without a representation of the entire system, it is quite common to incorrectly diagnose  causes, wasting time and good will with operations staff, vendors, and customers.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;uploads-ssl.webflow.com&#x2F;61f0aecf55af2565526f6a95&#x2F;63176861b0b2326115ceb9ea_JBsXDYz--8r5DJrojPv_KcRLefdQu80qaRN0wTTPUbHkJUQ8LRDJexiaJI-ApUfEeFOlOgdDWMiVtQ04zDIdVLvkZTvE1_vh2_U5961tgTytfCPa70tKSUQ8UwWH5FpjWs3Jy51tTelGxEbP_l2KYA7UqZgTSDJ9oVa6z5R5BLC6T0sX81TV_LLWSQ.png&quot; alt=&quot;&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;The emerging CMCD standard represents a more efficient mechanism for matching CDN and video player client data to correlate video stream viewer experience with the CDNs that delivers the video streams. I&#x27;ll dig deeper into the specifics of CMCD in a future post.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;solving-multi-source-data-ingestion-challenges&quot;&gt;Solving Multi-Source Data Ingestion Challenges&lt;&#x2F;h2&gt;
&lt;p&gt;Synthesizing a unified view from multiple event streams in real time presents operational challenges as well as some of the data-specific problems discussed above. When millions of devices are connecting to global CDNs with thousands of POPS in dozens of countries, data will almost certainly arrive out of order. As partners, event types, and platforms change, schemas must be able to react without downtime. And, perhaps most importantly of all, detecting patterns that indicate issues and acting on them instantaneously can be a challenge for most databases. Quine, which combines a graph data model with the real-time capabilities of event stream processing, is built to solve these operational issues.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;streaming-graph-efficient-analysis&quot;&gt;Streaming Graph Efficient Analysis&lt;&#x2F;h3&gt;
&lt;p&gt;Log event dimensionality and cardinality are a critical challenge in video observability. Near endless combinations of data, as shown below, require hundreds of tables in traditional RDBMS systems. Joins of tables to connect data from multiple tables are compute intensive and the costs of these joins increases with the number of tables. The cost to query across tables is particularly expensive when there is a “fan out” of one table to subsidiary tables as shown with “Asset” in the image.&lt;&#x2F;p&gt;
&lt;p&gt;Graph data modeling offers an alternate approach, storing dimensions that would be rows in tables as nodes and describing the relationship between nodes as edges. This model makes associating a video playout event with the CDN, user, asset, geography etc., a very low cost action. When applied to video observability data at scale, the efficiency of graph is significant as compared to traditional RDBMS operations.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;uploads-ssl.webflow.com&#x2F;61f0aecf55af2565526f6a95&#x2F;631793b06a65f845b7ac6e1b_Streaming%20Graph%20for%20Real-time%20Video%20Buffering%20Root%20Cause%20Analysis-2.png&quot; alt=&quot;&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;Graph data structures provide an excellent alternative to relational models for real-time analysis.&lt;&#x2F;p&gt;
&lt;p&gt;Importantly, once data is stored in the streaming graph, we can define new KPIs that encompass insights from each element of video delivery workflow. Calculating a continuous state for end-to-end latency, or tracing the current state of CDN or asset health for a specific geo&#x2F;ASN&#x2F;CDN becomes trivial, even though this represents hundreds of thousands, or even millions, of separate values.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;categorical-data&quot;&gt;Categorical Data&lt;&#x2F;h3&gt;
&lt;p&gt;&lt;a href=&quot;https:&#x2F;&#x2F;www.thatdot.com&#x2F;blog&#x2F;whats-the-difference-between-categorical-and-numerical-data&#x2F;&quot;&gt;Categorical data&lt;&#x2F;a&gt; -- content titles, email addresses, process IDs, IP addresses -- is incredibly valuable for root cause analysis and, amazingly enough, frequently ignored by enterprses. Quine greatly expands the utility of log data and the effectiveness of log analytics by processing non-numerical data in its natural, categorical form. The avoidance of one-hot encoding simplifies data management and reduces computation needs, while making the system more human-friendly to operate.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;knowing-when-to-act-on-streaming-data&quot;&gt;Knowing When to Act on Streaming Data&lt;&#x2F;h3&gt;
&lt;p&gt;A significant advantage of the Quine streaming graph is its ability to generate actions in real time as data arrives. It does this using a feature unique to Quine: &lt;a href=&quot;https:&#x2F;&#x2F;www.thatdot.com&#x2F;blog&#x2F;drive-streaming-event-workflows-with-standing-queries&#x2F;&quot;&gt;standing queries&lt;&#x2F;a&gt;.&lt;&#x2F;p&gt;
&lt;p&gt;Think of standing queries as a sort of filter placed inline with the event stream, watching for any event data that is part of a pattern of interest – for example, a series of events that suggest an issue with a POP’s network or client connectivity. As new events are ingested into the graph, standing queries update this partial match waiting and watching for a full match to occur.&lt;&#x2F;p&gt;
&lt;p&gt;Traditional systems must continuously query to see if a full match has occurred. This is an expensive operation and introduces delays. With Quine, when a full match occurs, action is instantaneous. Possible actions can include anything from sending alerts to other systems (via Kafka, Kinesis, HTTP POST, and more) to updating the graph data itself.&lt;&#x2F;p&gt;
&lt;p&gt;Either way, by acting in real time, Quine can be the difference between anticipating and avoiding an issue and trying to fix it once it has already occurred.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;out-of-order-data-handling&quot;&gt;Out-of-Order Data Handling&lt;&#x2F;h3&gt;
&lt;p&gt;In a distributed system of global scale, events do not always arrive in the order they were created. Systems that have dropped off the network can send event data when they reconnect seconds, minutes, hours, or even days later.&lt;&#x2F;p&gt;
&lt;p&gt;Quine solves this by maintaining partial matches to queries, adding to the graph as data arrives and triggering actions like alert messages when a complete match is made. The order, and the interval between events in a pattern, do not matter.&lt;&#x2F;p&gt;
&lt;p&gt;For example, the creation of a “user video session” will complete as soon as the periodic client beacons, CDN logs for video chunks, origin server logs, and manifest files all arrive.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;uploads-ssl.webflow.com&#x2F;61f0aecf55af2565526f6a95&#x2F;6317686145b2bde7b8349c53__e8z196cH663wocjOG0db7fITgv_ON6HQelCA695oTl4IL6ovKFvz3wBUKFuU5DT7htE6ReaQsVzfBMIMxBBsbXffdjiOK2MI4t9lF3Oxf_OXYwG3h2pAJePtxD_-FZZ8uGH1GRwhrTp5jsfxSGuI0NT_e4vh5RkhxOkJIWiyKYvrMckEYWkKpOIdw.png&quot; alt=&quot;&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;h2 id=&quot;an-example-of-real-time-root-cause-analysis&quot;&gt;An Example of Real-time Root Cause Analysis&lt;&#x2F;h2&gt;
&lt;p&gt;The combination of all these streaming graph capabilities produces a system well suited for&lt;a href=&quot;https:&#x2F;&#x2F;www.thatdot.com&#x2F;blog&#x2F;building-a-quine-streaming-graph-ingest-streams&#x2F;&quot;&gt;ingestion&lt;&#x2F;a&gt; of logs of events characterized by highly dimensional, categorical  data from multiple systems or sources, as well as the evaluation of this data for outage or service degradation conditions in real time.&lt;&#x2F;p&gt;
&lt;p&gt;Consider an example using client, CDN, and origin logs where the goal is to identify and track patterns of events suggestive of performance issues that could lead to service degradations and issue specific, actionable alerts when the number of these events (which I call KPIs here) exceed a user-defined threshold.&lt;&#x2F;p&gt;
&lt;p&gt;After ingesting events into Quine, standing queries will continuously evaluate arriving data for patterns of service failure or degradation. When these “issue causes” are identified for any new data, high level KPIs (e.g. count of failure events for a server or Geo&#x2F;ASN) will roll up the individual events to assess the significance of issues.&lt;&#x2F;p&gt;
&lt;p&gt;When KPIs indicate a significant issue is occurring, the root cause definition is already known and made available to upstream systems for automated remediation, or published to NOC ticket management systems.&lt;&#x2F;p&gt;
&lt;p&gt;‍&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;uploads-ssl.webflow.com&#x2F;61f0aecf55af2565526f6a95&#x2F;6317cbc54679f15f334d031e_Streaming%20Graph%20for%20Real-time%20Video%20Buffering%20Root%20Cause%20Analysis.png&quot; alt=&quot;&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;The figure above illustrates the events Quine monitors for and, in this case, a real-time alert (in red) that reports client-observed re-buffering at significant enough volume to warrant investigation, and with identification that the issue is related to a CDN edge server in Tampa that is service users on the AT&amp;amp;T ISP (in orange).&lt;&#x2F;p&gt;
&lt;p&gt;The alert that is issued provides the information an operator would need to understand and take action on an issue. Standing queries can publish this alert, and&#x2F;or the raw event data that contributed to the KPI threshold being met, to Kafka, Kinesis, an API or even Slack -- whatever fits the desired workflow.&lt;&#x2F;p&gt;
&lt;p&gt;Without a graph data structure, combining all this categorical and numerical data into a single materialized view and quickly traversing connections to detect completed patterns would not be possible. However, unlike graph databases, Quine is designed to process process high volumes of event data and trigger alerts in real time.&lt;&#x2F;p&gt;
&lt;p&gt;All this adds up to more reliable stream delivery, more revenue, satisfied advertisers, and most importantly of all, happy customers.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;try-quine-streaming-graph-yourself&quot;&gt;&lt;strong&gt;Try Quine Streaming Graph Yourself&lt;&#x2F;strong&gt;&lt;&#x2F;h2&gt;
&lt;p&gt;If you want to try it on your own logs, here are some resources to help:&lt;&#x2F;p&gt;
&lt;ol&gt;
&lt;li&gt;Download Quine - &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;quine.io&#x2F;download&quot;&gt;JAR file&lt;&#x2F;a&gt; | &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;hub.docker.com&#x2F;r&#x2F;thatdot&#x2F;quine&quot;&gt;Docker Image&lt;&#x2F;a&gt; | &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;thatdot&#x2F;quine&quot;&gt;Github&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;li&gt;Check out the Ingest Data into Quine blog series covering everything from &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;docs.quine.io&#x2F;components&#x2F;ingest-sources&#x2F;kafka.html&quot;&gt;ingest from Kafka&lt;&#x2F;a&gt; to ingesting .CSV data&lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;quine.io&#x2F;recipes&#x2F;cdn-cache-efficiency-by-segment&quot;&gt;‍&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;quine.io&#x2F;recipes&#x2F;cdn-cache-efficiency-by-segment&quot;&gt;CDN Cache Efficiency Recipe&lt;&#x2F;a&gt; - this recipe provides more ingest pattern examples&lt;&#x2F;li&gt;
&lt;&#x2F;ol&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>Are You Ready for Low and Slow Auth Attacks?</title>
        <published>2022-08-23T00:00:00+00:00</published>
        <updated>2022-08-23T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              Unknown
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://www.thatdot.com/blog/are-you-ready-for-low-and-slow-authentication-attacks/"/>
        <id>https://www.thatdot.com/blog/are-you-ready-for-low-and-slow-authentication-attacks/</id>
        
        <content type="html" xml:base="https://www.thatdot.com/blog/are-you-ready-for-low-and-slow-authentication-attacks/">&lt;h2 id=&quot;preventing-authentication-attacks-in-real-time&quot;&gt;Preventing Authentication Attacks In Real Time&lt;&#x2F;h2&gt;
&lt;p&gt;Authentication attacks come in many forms, each using different strategies with distinct, often difficult to detect, characteristics. Detecting password spraying attacks is particularly difficult due to the deliberately low frequency of authentication attempts, the number of services probed, and the extended time period across which attempts are made. Detecting and preventing password spraying attacks &lt;em&gt;in real time&lt;&#x2F;em&gt; is impossible with current solutions. I’ll take a look at why this is and how Quine changes the game.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;low-and-slow-attacks-brute-force-a-little-bit-at-a-time&quot;&gt;Low and Slow Attacks: Brute Force, A Little Bit At A Time&lt;&#x2F;h2&gt;
&lt;p&gt;In the past, brute force attacks have been synonymous with easy-to-spot bursts of machine-driven activity designed to overwhelm defenses. But as attackers gain sophistication, they have found ways to reduce their profile while still harnessing the power of automation.&lt;&#x2F;p&gt;
&lt;p&gt;Low and slow attacks use automation to spread authentication attempts over days, weeks and months, in addition to distributing the attempts across a network of target systems, from a range of source IPs. Based on &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;attack.mitre.org&#x2F;techniques&#x2F;T1110&#x2F;003&#x2F;&quot;&gt;Mitre definitions of brute force attacks&lt;&#x2F;a&gt;, Password Spraying, Password Guessing, and Credential Stuffing attacks all leverage metered activity to probe password systems so slowly that failed attempts go undetected by legacy time window-based “lock out” business rules.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;why-low-and-slow-attacks-work&quot;&gt;Why Low and Slow Attacks Work&lt;&#x2F;h2&gt;
&lt;p&gt;Volumetric brute force password attack strategies can often be detected due to their size and velocity using typical statistical analysis mechanisms.&lt;&#x2F;p&gt;
&lt;p&gt;Password spraying attacks take a very different approach, probing multiple accounts for commonly used or compromised passwords. These attacks attempt to stay under the threshold that would trigger “3 strikes and you’re locked out” rules typically used by authentication applications.&lt;&#x2F;p&gt;
&lt;p&gt;Of course, current authentication attack prevention measures do not stop with rules defined in authentication applications. Logs, often from multiple systems (e.g. firewalls, DNS, and web authentication logs), are typically processed by log&#x2F;SIEM analysis solutions which perform more complex analysis, including analysis of multiple data sets concurrently or across longer time periods.&lt;&#x2F;p&gt;
&lt;p&gt;SIEMs, however, are by definition not analyzing data in real time and their use is limited by the volume of data they retain for active analysis, and specifically by the costs to retain and process that data. .&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;uploads-ssl.webflow.com&#x2F;61f0aecf55af2565526f6a95&#x2F;62ea8fc9f013233aecc0ab86_low%20and%20slow.png&quot; alt=&quot;A graph with Complexity of Data as Y Axis and Time to Analyze as X Axis. Polices are Real-time but simple. SIEM log analysis is complex but slow.&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Detection Time Frames vs. Low and Slow Attack Behavior Patterns&lt;&#x2F;strong&gt;&lt;&#x2F;p&gt;
&lt;p&gt;Real-time application rulesets don’t have the context gained from looking at long time periods of data or from data sourced from other systems. Batch-based log&#x2F;SIEM analysis tools can perform more complex analytics but are not in the real-time flow of authentications, meaning you may not find out about successful attacks until hours or days later, and make it prohibitively expensive to incorporate the extended time frames of data needed to find low and slow attack behaviors.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;uploads-ssl.webflow.com&#x2F;61f0aecf55af2565526f6a95&#x2F;62ea91a331321e189f365ac0_Attack%20Time%20Frame.png&quot; alt=&quot;A figure illustrating how low&#x2F;slow attacks extend over time periods beyond the storage and cost limits of current approaches.&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Detection Time Frames vs. Low and Slow Attack Behavior Patterns&lt;&#x2F;strong&gt;&lt;&#x2F;p&gt;
&lt;p&gt;The tradeoffs with current approaches are stark: either impose time windows to process events in real time and reduce cost or sacrifice real time responsiveness to store and process data over a greater time interval at great expense. In either case, it isn’t clear you’ll be able to prevent all or even some low and slow attacks. That’s what makes this attack vector so insidious&lt;&#x2F;p&gt;
&lt;h2 id=&quot;low-cost-real-time-analysis-without-time-windows&quot;&gt;Low Cost, Real-time Analysis without Time Windows&lt;&#x2F;h2&gt;
&lt;p&gt;With low and slow attack strategies exploiting the limited time window visibility of existing application and log analysis solutions, new detection and response tools are needed. These tools need to:&lt;&#x2F;p&gt;
&lt;ol&gt;
&lt;li&gt;support detection of attack behavior patterns in logs from multiple systems, over extended periods of time, while being,&lt;&#x2F;li&gt;
&lt;li&gt;cost aligned with the large data retention needs of active extended time window monitoring.&lt;&#x2F;li&gt;
&lt;&#x2F;ol&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;uploads-ssl.webflow.com&#x2F;61f0aecf55af2565526f6a95&#x2F;62ea92267dd2019c1eb561ef_cost%20to%20store%20enough%20data%20to%20catch%20low%3Aslow.png&quot; alt=&quot;A graph showing Cost rising as data needed to analyze larger time windows increase using traditional tools, and the need to reduce cost.&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Low and Slow Attack Detection Requires a New Tool ROI Paradigm&lt;&#x2F;strong&gt;&lt;&#x2F;p&gt;
&lt;p&gt;Cost-effective complex log analysis on enterprise or service provider scale requires a new  approach.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;streaming-graph-makes-windowless-pattern-detection-cost-effective-real-time&quot;&gt;Streaming Graph Makes Windowless Pattern Detection Cost Effective, Real Time&lt;&#x2F;h2&gt;
&lt;p&gt;The open source Quine Streaming Graph offers a new approach to complex behavior analysis necessary for the detection of password spraying and other low and slow attacks (including advanced persistent threats, or APTs). Two key Quine innovations are of particular interest in this context - standing queries and partial match tracking over extended time windows&lt;&#x2F;p&gt;
&lt;ol&gt;
&lt;li&gt;Standing queries are queries that live in the streaming graph and continuously filter against new data for query matches in real-time. Finding low and slow behaviors across scale volumes of logs from multiple systems and extended time periods using graph query definitions which have proven much more efficient than traditional RDBMS query logic.&lt;&#x2F;li&gt;
&lt;li&gt;Partial match tracking across in-memory and persistent storage, at scale, allows Quine to retain possibly interesting incomplete matches until the moment when a complete match occurs. By deferring storage of high volumes of partial matches to inexpensive persistent storage solving for the cost issues associated with traditional log analysis systems, while operating in the real-time workflow when attacks are occurring to minimize the impact of a breach.&lt;&#x2F;li&gt;
&lt;&#x2F;ol&gt;
&lt;p&gt;And Quine eliminates time windows without incurring the cost of SIEM solutions, sifting through data from multiple sources to find and store only the patterns that matter – in this case, the ones that indicate a low and slow attack is underway.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;learn-more-or-try-quine-yourself&quot;&gt;&lt;strong&gt;Learn more or try Quine yourself&lt;&#x2F;strong&gt;&lt;&#x2F;h2&gt;
&lt;p&gt;Quine is available in both open source and enterprise editions. You can try it yourself. Learn how to ingest your own data and build a streaming graph that can detect all sorts of attacks in real time.&lt;&#x2F;p&gt;
&lt;ol&gt;
&lt;li&gt;Join &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;that.re&#x2F;chat&quot;&gt;Quine Community on Discord&lt;&#x2F;a&gt; and get help from thatDot engineers and community members.&lt;&#x2F;li&gt;
&lt;li&gt;Download Quine - &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;quine.io&#x2F;download&quot;&gt;JAR file&lt;&#x2F;a&gt; | &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;hub.docker.com&#x2F;r&#x2F;thatdot&#x2F;quine&quot;&gt;Docker Image&lt;&#x2F;a&gt; | &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;thatdot&#x2F;quine&quot;&gt;Github&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;li&gt;Check out the Ingest Data into Quine blog series covering everything from ingest from Kafka to ingesting .CSV data&lt;&#x2F;li&gt;
&lt;li&gt;Try the &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;quine.io&#x2F;recipes&#x2F;ethereum-tag-propagation&quot;&gt;Ethereum Fraud Detection recipe&lt;&#x2F;a&gt;  - this recipe showcases ingest and standing query patterns that you may find helpful.&lt;&#x2F;li&gt;
&lt;&#x2F;ol&gt;
&lt;p&gt;‍&lt;&#x2F;p&gt;
&lt;p&gt;Photo credit: &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;unsplash.com&#x2F;@karlibri&quot;&gt;&lt;strong&gt;Karl Ibri&lt;&#x2F;strong&gt; @karlibri&lt;&#x2F;a&gt;&lt;&#x2F;p&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>What&#x27;s the difference between Categorical and Numerical Data?</title>
        <published>2022-08-10T00:00:00+00:00</published>
        <updated>2022-08-10T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              Unknown
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://www.thatdot.com/blog/whats-the-difference-between-categorical-and-numerical-data/"/>
        <id>https://www.thatdot.com/blog/whats-the-difference-between-categorical-and-numerical-data/</id>
        
        <content type="html" xml:base="https://www.thatdot.com/blog/whats-the-difference-between-categorical-and-numerical-data/">&lt;p&gt;According to a 2020 Microstrategy survey, 94% of enterprises report data and data analytics are crucial to their growth strategy. And yet, surprisingly, as much as 73% of the data that enterprises collect is never used, including a vast majority of what is termed “categorical data.”&lt;&#x2F;p&gt;
&lt;p&gt;Why would enterprises ignore an entire class of data? Especially when it is essential to high-priority use cases like personalization, customer 360, fraud detection and prevention, network performance monitoring, and supply chain management? The simple answer is that using categorical data with today’s tools is complex, and most data scientists aren’t trained to use it.&lt;&#x2F;p&gt;
&lt;p&gt;Figuring out how to use categorical data will help companies solve complex problems that have long evaded them. And they’ll be able to do so with data they already have. Here’s a look at categorical data, why it’s hard to wrangle, and how it could be useful.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;categorical-data-101&quot;&gt;&lt;strong&gt;Categorical Data 101&lt;&#x2F;strong&gt;&lt;&#x2F;h3&gt;
&lt;p&gt;&lt;img src=&quot;&#x2F;img&#x2F;2022&#x2F;08&#x2F;image-2-convert.io_.webp&quot; alt=&quot;&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;There are two main types of data: categorical and numerical. Numerical data, as the name implies, refers to numbers. Categorical data is everything else.&lt;&#x2F;p&gt;
&lt;p&gt;Categorical data is non-numerical information that is divided into groups.&lt;&#x2F;p&gt;
&lt;p&gt;As its name suggests, categorical data describes categories or groups. Some examples of categorical data could be:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;A list of most popular baby names;&lt;em&gt;‍&lt;&#x2F;em&gt;&lt;&#x2F;li&gt;
&lt;li&gt;Census data, such as citizenship, gender, and occupation;&lt;&#x2F;li&gt;
&lt;li&gt;ID numbers, phone numbers, and email addresses;&lt;&#x2F;li&gt;
&lt;li&gt;Brands (Audi, Mercedes-Benz, Kia, etc.).&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;In some instances, categorical data can be both categorical and numerical. For example, weather can be categorized as either “60% chance of rain,” or “partly cloudy.” Both mean the same thing to our brains, but the data takes a different form.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;the-challenges-of-categorical-data&quot;&gt;&lt;strong&gt;The Challenges of Categorical Data&lt;&#x2F;strong&gt;&lt;&#x2F;h3&gt;
&lt;p&gt;The same thing that makes categorical data so powerful makes it challenging. While it is easy for you and me to tell the relative difference between a dog and a plane versus a dog and a cat, doing so computationally is not so straightforward. To express the difference between two pieces of categorical data, one must use graph-based analytical tools or have a background in graph theory. This is why “knowledge graphs” have been a recent hot topic. Since graph tools are not so widespread in today’s enterprise and academic landscape, data scientists instead fall back on the statistical techniques they know and for which there are ready tools.&lt;&#x2F;p&gt;
&lt;p&gt;Most machine learning algorithms can only handle numerical data. They can count instances of categorical data with real but limited utility. The other alternative is turning categorical data into numeric values using one of several encoding techniques. These techniques all tend to be slow and produce poor results – even making some goals impossible, like anomaly detection. Using categorical data comes with another challenge: high cardinality.&lt;&#x2F;p&gt;
&lt;p&gt;Cardinality refers to the number of possible values for a particular category. For example, the cardinality of a list of all models of iPhone ever made is a relatively manageable 34. On the other hand, a list of serial numbers for all 2.2 billion iPhones sold since production began represents a high-cardinality data set.&lt;&#x2F;p&gt;
&lt;p&gt;The size and complexity of traditional analytical approaches spiral quickly out of control with high-cardinality data. Additionally, almost all tools for turning categorical values into numbers (like one-hot encoding) require a fixed set of possible values known in advance.&lt;&#x2F;p&gt;
&lt;p&gt;As some high-cardinality data values are unknown, this poses a problem since those tools cannot represent data they have never seen. With all these challenges, you can begin to understand why enterprises end up ignoring categorical data altogether.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;so-what-can-you-do-with-categorical-data&quot;&gt;&lt;strong&gt;So, What Can You Do with Categorical Data?&lt;&#x2F;strong&gt;&lt;&#x2F;h3&gt;
&lt;p&gt;The enormous and unrealized value of categorical data for enterprises resides in its ability to represent the relationships between values in a way humans can readily understand and express. These relationships can include all the properties associated with an object – I am tall, blonde, married, and have two children – or the relationship between two objects – I wrote this article, and you are reading this article.&lt;&#x2F;p&gt;
&lt;p&gt;You can use categorical data to efficiently group and connect classes of objects; for example, you can show all tall, blonde, married authors and the readers of their articles organized by geographic area and hobby. In doing so, you can uncover some unique insight and analysis. When you combine this “relationship thinking” with a computer’s ability to process enormous amounts of data, the astonishing power of categorical data becomes apparent.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;the-strengths-of-graph-technology&quot;&gt;&lt;strong&gt;The Strengths of Graph Technology&lt;&#x2F;strong&gt;&lt;&#x2F;h3&gt;
&lt;p&gt;With the emergence of graph technology in recent years, enterprises can finally represent these relationships directly. A graph is built of nodes and edges; you can picture this with circles for nodes and arrows for edges that connect nodes. The node-edge-node pattern connects two categorical values (nodes) by a relationship represented by the edge.&lt;&#x2F;p&gt;
&lt;p&gt;This is a natural way to represent data because that node-edge-node pattern corresponds perfectly to the subject-predicate-object pattern at the core of a natural human language. So anything you can say in words can be represented naturally in a graph. Then we can analyze the relationships between the values by following the connections between categorical data in a graph.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;lh7-us.googleusercontent.com&#x2F;docsz&#x2F;AD_4nXe30ZMBNu6PR9iHQglGl46-JQw7mk8oWS1Ps842kZiSCnjPjCbMO-wCyBFpxgUKCr6_rFuGv1pmp3p8a264mc0kiAKx67mXF4xTjZfLsuwdQmelYd5cDvILdbHyQAd4mGoI3RxBfO0mz3Fcm7SfooRopBdP?key=tsc2l44cT0IepHUx74kgYQ&quot; alt=&quot;A graph data structure that has the same outline as two hemispheres of a human brain.&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;Graph data structures connect information in a way that resembles the way we speak and think.&lt;&#x2F;p&gt;
&lt;p&gt;The challenge of using categorical data is like having a pantry of canned food and no can opener. There’s food there, but you have no tools to access it. Instead of looking at the same data with the same approach, the next generation of streaming graph data tools needs to make categorical data more accessible and usable.&lt;&#x2F;p&gt;
&lt;p&gt;We already see the success of categorical data as the key to improving anomaly detection in cybersecurity. But it’s only now that the tools for using this data to solve challenging problems are becoming available.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;thatdot-software-for-categorical-data-processing&quot;&gt;&lt;strong&gt;thatDot Software for Categorical Data Processing&lt;&#x2F;strong&gt;&lt;&#x2F;h2&gt;
&lt;p&gt;thatDot streaming graph software is built specifically for categorical data. It combines a graph data structure (like Neo4J or TigerGraph) with the performance and scale of event processing systems like Flink and Spark Streaming. &lt;strong&gt;thatDot Novelty&lt;&#x2F;strong&gt;, built on &lt;strong&gt;thatDot Streaming Graph&lt;&#x2F;strong&gt;, is the first anomaly detection system to use categorical data, making it uniquely powerful. thatDot Streaming Graph is powered by Quine open source software. You can try it yourself either by downloading Quine or starting a Streaming Graph free trial. Learn how to ingest your own categorical data and build a streaming graph that can detect all sorts of attacks in real time.&lt;&#x2F;p&gt;
&lt;ol&gt;
&lt;li&gt;Try &lt;strong&gt;&lt;a href=&quot;https:&#x2F;&#x2F;www.thatdot.com&#x2F;getting-started&#x2F;&quot;&gt;Streaming Graph&lt;&#x2F;a&gt;&lt;&#x2F;strong&gt; free for yourself.&lt;&#x2F;li&gt;
&lt;li&gt;Learn more about &lt;strong&gt;&lt;a href=&quot;https:&#x2F;&#x2F;www.thatdot.com&#x2F;products&#x2F;quine&#x2F;&quot;&gt;thatDot Streaming Graph&lt;&#x2F;a&gt;&lt;&#x2F;strong&gt;.&lt;&#x2F;li&gt;
&lt;li&gt;Join the &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;that.re&#x2F;chat&quot;&gt;&lt;strong&gt;Quine Discord Community&lt;&#x2F;strong&gt;&lt;&#x2F;a&gt; and get help from thatDot engineers and community members.&lt;&#x2F;li&gt;
&lt;li&gt;Download open source Quine – &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;quine.io&#x2F;download&quot;&gt;&lt;strong&gt;JAR file&lt;&#x2F;strong&gt;&lt;&#x2F;a&gt; | &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;hub.docker.com&#x2F;r&#x2F;thatdot&#x2F;quine&quot;&gt;&lt;strong&gt;Docker Image&lt;&#x2F;strong&gt;&lt;&#x2F;a&gt; | &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;thatdot&#x2F;quine&quot;&gt;&lt;strong&gt;Github&lt;&#x2F;strong&gt;&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;&#x2F;ol&gt;
&lt;p&gt;‍&lt;&#x2F;p&gt;
&lt;h3 id=&quot;blog-posts-on-related-topics&quot;&gt;&lt;strong&gt;Blog Posts on Related Topics&lt;&#x2F;strong&gt;&lt;&#x2F;h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https:&#x2F;&#x2F;www.thatdot.com&#x2F;blog&#x2F;stop-insider-threats-with-automated-behavioral-anomaly-detection&#x2F;&quot;&gt;&lt;strong&gt;Stop Insider Threats With Automated Behavioral Anomaly Detection&lt;&#x2F;strong&gt;&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a href=&quot;https:&#x2F;&#x2F;www.thatdot.com&#x2F;blog&#x2F;network-log-analysis-using-categorical-anomaly-detection&#x2F;&quot;&gt;&lt;strong&gt;Network Log Analysis Using Categorical Anomaly Detection&lt;&#x2F;strong&gt;&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a href=&quot;https:&#x2F;&#x2F;www.thatdot.com&#x2F;blog&#x2F;thatdot-anomaly-detector-enhancements-visualizations-and-data-transformations&#x2F;&quot;&gt;&lt;strong&gt;New to Quine’s Novelty Detector: Visualizations and Enhancements&lt;&#x2F;strong&gt;&lt;&#x2F;a&gt;‍&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;This article, in a slightly altered form, &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;www.datanami.com&#x2F;2022&#x2F;07&#x2F;25&#x2F;what-is-categorical-data&#x2F;&quot;&gt;&lt;strong&gt;first appeared in Datanami&lt;&#x2F;strong&gt;&lt;&#x2F;a&gt; on July 25th, 2022. Photo by &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;unsplash.com&#x2F;@jjying?utm_source=unsplash&amp;amp;utm_medium=referral&amp;amp;utm_content=creditCopyText&quot;&gt;&lt;strong&gt;JJ Ying&lt;&#x2F;strong&gt;&lt;&#x2F;a&gt; on &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;unsplash.com&#x2F;s&#x2F;photos&#x2F;background?utm_source=unsplash&amp;amp;utm_medium=referral&amp;amp;utm_content=creditCopyText&quot;&gt;&lt;strong&gt;Unsplash&lt;&#x2F;strong&gt;&lt;&#x2F;a&gt;&lt;&#x2F;p&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>Kafka data deduping made easy using Quine&#x27;s idFrom function</title>
        <published>2022-07-27T00:00:00+00:00</published>
        <updated>2022-07-27T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              Unknown
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://www.thatdot.com/blog/kafka-data-deduping-made-easy-using-quines-idfrom-function/"/>
        <id>https://www.thatdot.com/blog/kafka-data-deduping-made-easy-using-quines-idfrom-function/</id>
        
        <content type="html" xml:base="https://www.thatdot.com/blog/kafka-data-deduping-made-easy-using-quines-idfrom-function/">&lt;h2 id=&quot;using-quine-with-kafka-as-source-and-sink-to-process-categorical-data&quot;&gt;Using Quine with Kafka as Source and Sink to Process Categorical Data&lt;&#x2F;h2&gt;
&lt;p&gt;Quine streaming graph is specifically designed to find high-value patterns in high-volume event streams, consuming data from APIs, data lakes, and most commonly, event stream processing systems. Quine is complementary to systems like Flink and ksqlDB, both of which are quite powerful but do not make it easy to connect and find complex patterns in &lt;a href=&quot;https:&#x2F;&#x2F;www.thatdot.com&#x2F;blog&#x2F;whats-the-difference-between-categorical-and-numerical-data&#x2F;&quot;&gt;categorical data&lt;&#x2F;a&gt;.&lt;&#x2F;p&gt;
&lt;p&gt;A streaming system like Kafka allows developers to divide their monolithic applications into manageable components while addressing resilience and scalability needs.&lt;&#x2F;p&gt;
&lt;p&gt;Switching to real-time event processing does not come without tradeoffs, however.  Duplicate messages are common in streaming systems, and duplicate events will inevitably show up in a Kafka stream, especially at scale.&lt;&#x2F;p&gt;
&lt;p&gt;Quine natively addresses duplicate and out-of-order data issues in streaming data pipelines.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-problem-message-duplication-causes-multiple-negative-effects&quot;&gt;The Problem: Message Duplication Causes Multiple Negative Effects&lt;&#x2F;h2&gt;
&lt;p&gt;In a high-volume data pipeline, duplicate messages are unavoidable. The duplication of events is often the necessary side effect of guaranteeing that data is successfully delivered. The traditional solution is for a consumer application to record what it&#x27;s seen recently and drop any event that is already processed.&lt;&#x2F;p&gt;
&lt;p&gt;But event duplication can become a major challenge as your streaming system scales across multiple partitions. Stream consumers are usually distributed on different machines to help the system scale, making it difficult to quickly share knowledge of which events have already been processed. Each Kafka partition typically has its own consumer, if the consumer fails to process the event for any reason, when it resumes, the operation will request events starting from an earlier offset in Kafka. The result is that duplicate events will get sent downstream to other applications.&lt;&#x2F;p&gt;
&lt;p&gt;Processing events multiple times can cause inconsistencies within the &lt;em&gt;facts&lt;&#x2F;em&gt; that your application logic depends on. The effect is wrong analytic insights; or worse, your application performs the wrong actions.&lt;&#x2F;p&gt;
&lt;p&gt;Here are a few of the common approaches for managing duplicate events in a streaming system.&lt;&#x2F;p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Allow duplicate messages to occur&lt;&#x2F;strong&gt;. Maybe processing duplicate events is not a problem in your system. However, most of the time this is not the case.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Perform deduplication in a database&lt;&#x2F;strong&gt;. This approach starts off fine until your DB won’t scale. It is common for this to turn into a batch processing approach that defeats the reason that you decided to develop a streaming system in the first place.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Create a deduplication service&lt;&#x2F;strong&gt;. Call out from your streaming system to look up an event (or event ID) to see if it has already been processed. This is the natural evolution of option #2 which turns into its own expensive and painful service to manage.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Change your business logic or requirements to allow idempotent processing&lt;&#x2F;strong&gt;. If none of the previous options are appealing, you might try to alter your algorithms or your goals so that processing a duplicate message will have no effect. This is often impossible.&lt;&#x2F;li&gt;
&lt;&#x2F;ol&gt;
&lt;h2 id=&quot;a-better-solution-locate-nodes-in-the-graph-with-idfrom&quot;&gt;A Better Solution: locate nodes in the graph with &lt;em&gt;&lt;strong&gt;&lt;code&gt;idFrom(…)&lt;&#x2F;code&gt;&lt;&#x2F;strong&gt;&lt;&#x2F;em&gt;&lt;&#x2F;h2&gt;
&lt;p&gt;Duplicate data delivery is one of the main problems Quine is built to solve. To understand how Quine solves this problem, let&#x27;s first understand two of Quine&#x27;s fundamental design concepts:&lt;&#x2F;p&gt;
&lt;ol&gt;
&lt;li&gt;In Quine, streaming event processing is performed by graph nodes backed by &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;doc.akka.io&#x2F;docs&#x2F;akka&#x2F;current&#x2F;general&#x2F;actors.html&quot;&gt;actors&lt;&#x2F;a&gt; scaled across any number of servers.&lt;&#x2F;li&gt;
&lt;li&gt;Quine behaves as if all nodes already exist.&lt;&#x2F;li&gt;
&lt;&#x2F;ol&gt;
&lt;p&gt;Each event that Quine processes operates on a specific set of nodes in the graph. With traditional static graphs, your application must ensure that each node is created exactly once—and this becomes a big performance drain. Quine behaves as if all nodes exist already, but are not yet filled with data or connected to any other nodes. You don’t have to worry about “creating nodes” twice because all possible nodes exist already. There will always be exactly one right place to handle each message, if only it can be found…&lt;&#x2F;p&gt;
&lt;p&gt;To find the node responsible for each message, Quine has a built-in function called &lt;strong&gt;&lt;code&gt;idFrom(…). idFrom&lt;&#x2F;code&gt;&lt;&#x2F;strong&gt; takes data from the incoming event and deterministically turns it into a unique node ID in the graph. &lt;code&gt;idFrom&lt;&#x2F;code&gt; is entirely deterministic. &lt;strong&gt;Given the same arguments, &lt;code&gt;idFrom&lt;&#x2F;code&gt; will always return the same node ID&lt;&#x2F;strong&gt;. This is similar to a “consistent hashing” approach used for other purposes, but in this case, Quine returns a well-formed node ID instead of a hash.&lt;&#x2F;p&gt;
&lt;p&gt;Node IDs are user-configurable, so they can take many forms, but by default node IDs will be UUIDs. See the documentation on idProviders for more information on &lt;code&gt;idFrom&lt;&#x2F;code&gt; and alternate options for node ID types.&lt;&#x2F;p&gt;
&lt;p&gt;Once we know the ID of a node in the graph, that node will handle processing the event and deduplicating future events. So if the same event is received by Quine twice, &lt;code&gt;idFrom&lt;&#x2F;code&gt; will return the same &lt;code&gt;nodeId&lt;&#x2F;code&gt; each time. Since Quine only saves to disk the &lt;em&gt;changes&lt;&#x2F;em&gt; to each node, the duplicate event becomes a no-op. The practical effect of this is that using &lt;code&gt;idFrom&lt;&#x2F;code&gt; will resolve duplicate events in the stream automatically. So you can go back to building your application instead of micromanaging the event stream delivery guarantees.&lt;&#x2F;p&gt;
&lt;p&gt;Using &lt;code&gt;idFrom&lt;&#x2F;code&gt; within ingest stream queries is standard practice, even when a node is expected to show up repeatedly in the successive events. Take, for example, the &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;quine.io&#x2F;recipes&#x2F;wikipedia-page-ingest&quot;&gt;Wikipedia page ingest&lt;&#x2F;a&gt; recipe. The ingest stream query refers to a &lt;strong&gt;&lt;code&gt;dbNode&lt;&#x2F;code&gt;&lt;&#x2F;strong&gt; for each database where a &lt;code&gt;page-create&lt;&#x2F;code&gt; event belongs.&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;ingestStreams:&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  - type: ServerSentEventsIngest&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    url: https:&#x2F;&#x2F;stream.wikimedia.org&#x2F;v2&#x2F;stream&#x2F;page-create&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    format:&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;      type: CypherJson&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;      query: |-&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        MATCH (revNode) WHERE id(revNode) = idFrom(&amp;quot;revision&amp;quot;, $that.rev_id)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        MATCH (dbNode) WHERE id(dbNode) = idFrom(&amp;quot;db&amp;quot;, $that.database)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        MATCH (userNode) WHERE id(userNode) = idFrom(&amp;quot;id&amp;quot;, $that.performer.user_id)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        SET revNode = $that, revNode.type = &amp;quot;rev&amp;quot;&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        SET dbNode.database = $that.database, dbNode.type = &amp;quot;db&amp;quot;&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        SET userNode = $that.performer, userNode.type = &amp;quot;user&amp;quot;&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        WITH *, datetime($that.rev_timestamp) AS d&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        CALL create.setLabels(revNode, [&amp;quot;rev:&amp;quot; + $that.page_title])&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        CALL create.setLabels(dbNode, [&amp;quot;db:&amp;quot; + $that.database])&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        CALL create.setLabels(userNode, [&amp;quot;user:&amp;quot; + $that.performer.user_text])&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        CALL reify.time(d, [&amp;quot;year&amp;quot;, &amp;quot;month&amp;quot;, &amp;quot;day&amp;quot;, &amp;quot;hour&amp;quot;, &amp;quot;minute&amp;quot;]) YIELD node AS timeNode&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        CREATE (revNode)-[:at]-&amp;gt;(timeNode)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        CREATE (revNode)-[:db]-&amp;gt;(dbNode)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        CREATE (revNode)-[:by]-&amp;gt;(userNode)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Let&#x27;s take a closer look at line two of the query. Notice that even when starting with an empty Quine system, we begin by &lt;em&gt;MATCHing&lt;&#x2F;em&gt; the &lt;code&gt;dbNode&lt;&#x2F;code&gt;. We don’t create it because it already exists. We MATCH it with a WHERE constraint on its ID using &lt;code&gt;idFrom&lt;&#x2F;code&gt;:&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;MATCH (dbNode) WHERE id(dbNode) = idFrom(&amp;quot;db&amp;quot;, $that.database)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Using &lt;code&gt;idFrom&lt;&#x2F;code&gt;, Quine calculates the node ID using a combination of the string &quot;db&quot; and the value of the &lt;code&gt;database&lt;&#x2F;code&gt; field passed in from the event: &lt;code&gt;$that&lt;&#x2F;code&gt;. &lt;code&gt;idFrom&lt;&#x2F;code&gt; will always return the same node ID when given the same arguments.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;NOTE&lt;&#x2F;strong&gt;: It&#x27;s good practice to prefix the &lt;code&gt;idFrom()&lt;&#x2F;code&gt; with a descriptive name for the type of values being passed in in order to effectively create a namespace to further ensure there won&#x27;t be accidental collisions on the id that gets created. If another field coincidentally had the value as &lt;code&gt;$that.database&lt;&#x2F;code&gt;, prefixing it with a string will ensure the same value from different types doesn’t accidentally refer to the same node when it shouldn’t.&lt;&#x2F;p&gt;
&lt;p&gt;If we query the top five most connected database nodes, it reveals that &lt;code&gt;idFrom&lt;&#x2F;code&gt; deterministically calculated node IDs thousands of times over a short period while processing the &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;stream.wikimedia.org&#x2F;?doc#&#x2F;streams&#x2F;get_v2_stream_mediawiki_page_create&quot;&gt;Wikipedia page-create&lt;&#x2F;a&gt; Kafka stream.&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;❯ curl -s -X &amp;quot;POST&amp;quot; &amp;quot;http:&#x2F;&#x2F;0.0.0.0:8080&#x2F;api&#x2F;v1&#x2F;query&#x2F;cypher&amp;quot; \&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;     -H &amp;#39;Content-Type: text&#x2F;plain&amp;#39; \&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;     -d $&amp;#39;MATCH (n)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;WHERE n.type = &amp;quot;db&amp;quot;&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;MATCH (n)-[r]-()&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;RETURN DISTINCT n.database, count(r)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;ORDER BY count(r) DESC&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;LIMIT 5&amp;#39; \&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;| jq .&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;}&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;This produces the following results:&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;.tg  &lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;.tg td&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;.tg th&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;.tg .tg-5l9e&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;.tg .tg-7d05&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;.tg .tg-wpo4&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt; Database  &lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt; Count &lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt; commonswiki&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt; 2953&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt; wikidatawiki&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt; 1883&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt; enwiki&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt; 790&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt; ruwiki&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt; 144&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt; enwiktionary&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt; 139&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Using &lt;code&gt;idFrom&lt;&#x2F;code&gt; to calculate the &lt;strong&gt;&lt;code&gt;nodeId&lt;&#x2F;code&gt;&lt;&#x2F;strong&gt; tells us exactly where in the graph that message should be handled—whether it’s the first or thousandth time we’ve referred to that node. The processing on each node will only apply updates if the data actually needs updates. So duplicate messages routed to the same node will have the second message behave as a no-op and cause no troublesome side effects.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;code&gt;idFrom&lt;&#x2F;code&gt; is a powerful tool that makes complex streaming data easier to reason about in a graph and is the foundation for developing with the Quine streaming graph.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;just-like-kafka-quine-is-open-source&quot;&gt;Just like Kafka, Quine is Open Source&lt;&#x2F;h2&gt;
&lt;p&gt;If you are using Kafka and have issues with duplicate data, Quine’s a great solution. Quine is open source so trying it out is as simple as downloading it and connecting it to Kafka.&lt;&#x2F;p&gt;
&lt;p&gt;Here’s a list of resources to get you started:&lt;&#x2F;p&gt;
&lt;ol&gt;
&lt;li&gt;Download Quine - &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;quine.io&#x2F;download&quot;&gt;JAR file&lt;&#x2F;a&gt; | &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;hub.docker.com&#x2F;r&#x2F;thatdot&#x2F;quine&quot;&gt;Docker Image&lt;&#x2F;a&gt; | &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;thatdot&#x2F;quine&quot;&gt;Github&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;li&gt;Check out the Ingest Data into Quine blog series covering everything from ingest from Kafka to ingesting .CSV data&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;quine.io&#x2F;recipes&#x2F;apache-log-analytics&quot;&gt;Apache Log Recipe&lt;&#x2F;a&gt; - this recipe provides more ingest pattern examples&lt;&#x2F;li&gt;
&lt;li&gt;Join &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;that.re&#x2F;chat&quot;&gt;Quine Community on Discord&lt;&#x2F;a&gt; and get help from thatDot engineers and community members.&lt;&#x2F;li&gt;
&lt;&#x2F;ol&gt;
&lt;p&gt;‍&lt;&#x2F;p&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>Save Big on SIEM Storage Costs Using Quine&#x27;s Semantic ETL</title>
        <published>2022-07-21T00:00:00+00:00</published>
        <updated>2022-07-21T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              Unknown
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://www.thatdot.com/blog/use-quine-graph-etl-to-reduce-siem-storage-costs/"/>
        <id>https://www.thatdot.com/blog/use-quine-graph-etl-to-reduce-siem-storage-costs/</id>
        
        <content type="html" xml:base="https://www.thatdot.com/blog/use-quine-graph-etl-to-reduce-siem-storage-costs/">&lt;h2 id=&quot;the-high-cost-of-storing-low-value-data&quot;&gt;The High Cost of Storing Low Value Data&lt;&#x2F;h2&gt;
&lt;p&gt;The high cost of SIEM has given rise to countless &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;www.google.com&#x2F;search?q=%22SIEM%22+reduce+cost&quot;&gt;articles and dozens of companies&lt;&#x2F;a&gt; promoting strategies or products to reduce monthly bills, with some claiming 50-90% reductions.While the 50-90% number seems a little overblown and sure to be met with skepticism — enterprises tend to take a “better to store it and pay the price than regret we didn’t later” approach, especially when the data may have compliance implications1 — the appeal is easy to understand.&lt;&#x2F;p&gt;
&lt;p&gt;I took a look at the current methods for reducing SIEM costs and compared them to what graph ETL using Quine can accomplish all while considering impact on data fidelity.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-state-of-stream-pre-processing-random-destructive-and-only-somewhat-effective&quot;&gt;The State of Stream Pre-Processing: Random, Destructive, and Only Somewhat Effective&lt;&#x2F;h2&gt;
&lt;p&gt;Legacy event log pre-processing offerings typically employ one or more of six basic strategies to reduce the amount of data stored in the SIEM:&lt;&#x2F;p&gt;
&lt;ol&gt;
&lt;li&gt;Sample data&lt;&#x2F;li&gt;
&lt;li&gt;Filter out fields&lt;&#x2F;li&gt;
&lt;li&gt;Filter out events&lt;&#x2F;li&gt;
&lt;li&gt;De-duplicate&lt;&#x2F;li&gt;
&lt;li&gt;Aggregate&#x2F;roll-up&lt;&#x2F;li&gt;
&lt;li&gt;Re-route some data to cheaper alternatives for cold storage (e.g,. Logstash or Amazon S3)&lt;&#x2F;li&gt;
&lt;&#x2F;ol&gt;
&lt;p&gt;These solutions also usually include the ability to set rules that refine system behavior by data source or event type – for instance, sampling one in five events from a log of failed authentication attempts but one in twenty events from an Apache access log.&lt;&#x2F;p&gt;
&lt;p&gt;It is important to note that stream pre-processing can only be applied to each stream and each record individually. Since many modern event processing use cases — not just SIEM but those for machine learning and e-commerce — depend on combining multiple data sources to model complex events, the single-stream approach means storing duplicate data from each stream required to connect them later (in SQL terms, these data are the keys used to join the various data sets once they are stored).&lt;&#x2F;p&gt;
&lt;blockquote&gt;
&lt;p&gt;We were paying for 600 [GB] to 700 GB per day with Splunk, which meant we were lousy co-workers to our IT group, because we had to tell them, &#x27;Send us this field, not that field,&#x27; and limit the data ingestion severely,&quot; said John Gerber, principal cybersecurity analyst at Reston, Va., systems integrator SAIC. -- from &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;www.techtarget.com&#x2F;searchitoperations&#x2F;news&#x2F;252466110&#x2F;Elastic-SIEM-woos-enterprises-with-cost-savings&quot;&gt;Elastic SIEM woos enterprises with cost savings&lt;&#x2F;a&gt;&lt;&#x2F;p&gt;
&lt;&#x2F;blockquote&gt;
&lt;p&gt;As the quote above makes clear, some approaches also require lots of operational intervention, meaning delays for analysts and data scientists and an overall increase in cost of ownership.&lt;&#x2F;p&gt;
&lt;p&gt;The more important limitation is that these approaches &lt;em&gt;&lt;strong&gt;cannot determine the value of the data they discard&lt;&#x2F;strong&gt;&lt;&#x2F;em&gt;. They either throw data away or, in the case of aggregation, &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;pubmed.ncbi.nlm.nih.gov&#x2F;25810242&#x2F;&quot;&gt;reduce fidelity&lt;&#x2F;a&gt;. All data is considered to have the same value.&lt;&#x2F;p&gt;
&lt;p&gt;Quine’s approach is different: it turns high volumes of low-value data into low volumes of high-value data.&lt;&#x2F;p&gt;
&lt;p&gt;Instead of storing data in Splunk or a similar system and then determining value, Quine can evaluate data as it arrives and make choices to store or discard based on the problem you are trying to solve.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;quine-ingest-queries-semantic-etl-for-high-value-data&quot;&gt;Quine Ingest Queries: Semantic ETL for High Value Data&lt;&#x2F;h2&gt;
&lt;p&gt;At the heart of how Quine processes data are two query types: ingest and standing queries (more on the latter below).2&lt;&#x2F;p&gt;
&lt;p&gt;Quine uses ingest queries to consume event data and construct your streaming graph database. Ingest queries perform real-time ETL on incoming streams, combining multiple data sources (for example from multiple Kafka topics, Kinesis streams, data from databases, &lt;a href=&quot;https:&#x2F;&#x2F;www.thatdot.com&#x2F;blog&#x2F;ingesting-data-from-the-internet&#x2F;&quot;&gt;live feeds via APIs&lt;&#x2F;a&gt;) into a single streaming graph, eliminating the need to keep duplicate data around for joins.&lt;&#x2F;p&gt;
&lt;p&gt;Using Quine’s ingest ETL, you can join all the data, eliminating cross-data stream duplicates. That accounts for some incremental data reduction over existing methods, which along with the other five strategies (all of which Quine supports) means Quine offers superior savings on your SIEM costs. But more than just deduplicating data, joining streams lets you draw conclusions early about what makes some data more valuable than other data.&lt;&#x2F;p&gt;
&lt;p&gt;Quine’s real power, however, is its ability to apply a semantic filter to your data to find patterns made up of multiple events. And it does so as data streams in.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;save-only-the-patterns-that-matter&quot;&gt;Save Only the Patterns That Matter&lt;&#x2F;h2&gt;
&lt;p&gt;Ingest queries make it easy to organize the high value, often complex, patterns in data into graph structures. These patterns are characterized by the relationships between multiple events. In a practical sense, you are shaping the data into a form that anticipates the analysis you will perform downstream in your SIEM. Quine can join, interpret, and trim away any data not relevant to the answers.&lt;&#x2F;p&gt;
&lt;p&gt;What you end up creating in your graph-ETL are subgraphs, or patterns of two or more nodes and connecting edges.&lt;&#x2F;p&gt;
&lt;p&gt;Here are a few real world examples from the Quine community:&lt;&#x2F;p&gt;
&lt;ol&gt;
&lt;li&gt;Find and store all instances where there have been attempts (both successful and failed) to log into the accounts of members of the executive team from multiple IP addresses&lt;&#x2F;li&gt;
&lt;&#x2F;ol&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;uploads-ssl.webflow.com&#x2F;61f0aecf55af2565526f6a95&#x2F;62d8b7a7d0b4827cdf58578c_bVJJ5cxDoxCu4Omi-Ol5Ot9ZBY1z91XO7pT56YjsHYPfBrxdTmU-6MMtnyPG4A1m7bKAPlmG0nXQ6dVc7kIeoffz3OXjvyTyZyVwefSDeub_8ivGvAyaP-EXxWRV257_ISViAJdIvl3jRrFT-dRsXZA.png&quot; alt=&quot;A graph with three node types -- IP address, Account, and Executive Team.&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;A subgraph for monitoring authentication fraud attempts.&lt;&#x2F;p&gt;
&lt;ol start=&quot;2&quot;&gt;
&lt;li&gt;Find and store all instances where multiple processes in different office locations are sending message to the same IP address&lt;&#x2F;li&gt;
&lt;&#x2F;ol&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;uploads-ssl.webflow.com&#x2F;61f0aecf55af2565526f6a95&#x2F;62d8b7a75e23fde55b28c6a3_mKTrcA113h27LHnSho38sbEJhzX3Y0lY26oJk0f3SRa6oOS9DbJyva3Q-BjgEHCJiKsNerTdgibtDiX5248tCxZfYelbV6l0_9Uu4h6fBPGeEaYp42H0OBLgF2w1r5O5GHp6hWNZKPiSHllF13Zojzc.png&quot; alt=&quot;A security-focused subgraph.&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;A subgraph for monitoring processes and the IP addreses to which they write.&lt;&#x2F;p&gt;
&lt;p&gt;In both of these examples, the test for what you keep and what you discard is based on what might possibly be important, on what matters to your business.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;what-if-data-takes-time-to-become-interesting&quot;&gt;What if data takes time to become interesting?&lt;&#x2F;h2&gt;
&lt;p&gt;One challenge processing streaming data – especially when event data arrives from many networked sources – is that it can arrive late or out of order, obscuring what would otherwise be an interesting pattern. Consider the examples above.&lt;&#x2F;p&gt;
&lt;p&gt;What if the login attempts in example one were spread out over days or even weeks?&lt;&#x2F;p&gt;
&lt;p&gt;What if log events from several locations in example two (above) were delayed for several hours or started at different times? Quine handles this late arriving data (as well as out of order data) using standing queries.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;uploads-ssl.webflow.com&#x2F;61f0aecf55af2565526f6a95&#x2F;62d8b7a7a13f0ca333a8b115_R9g-L0bLE2nguGQ3BRektSDq1d4L9Gtzao1fK3wuwgkX_iGkcgtGYlOR2u3p6DsWbrIrZbUPY6VtLULwj2BoIO2-gVUngIcrk-z-9H3u7a6QPIM7sqBRrkatR1YxA7WLR5CuvP3ZCo6JypuAWww23g.png&quot; alt=&quot;A five step diagram of a standing query in a graph.&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;Standing queries persist in the graph, storing partial matches and triggering actions when a full match occurs.&lt;&#x2F;p&gt;
&lt;p&gt;Standing queries live inside the graph and automatically propagate the incremental results computed from both historical data and incoming streaming data. Once matches are found, standing queries trigger actions using those results (e.g., execute code, transform other data in the graph, publish data to another system like Apache Kafka or Kinesis).&lt;&#x2F;p&gt;
&lt;p&gt;The implication for SIEM storage reduction is that Quine can temporarily retain &lt;em&gt;possibly interesting&lt;&#x2F;em&gt; incomplete patterns until a match occurs. It is neither discarded nor taking up costly space in your SIEM. Then, at the instant the match occurs, it is sent along to the SIEM system for regular processing. If a match doesn’t occur within a useful period, the data can be discarded automatically.&lt;&#x2F;p&gt;
&lt;p&gt;Want to go further? Consider bypassing your SIEM altogether and sending alerts and data directly to your SOC or NOC’s dashboards, analysts, or data science team as it arrives and matches occur. But that’s for the next blog post. Until then, try out Quine’s graph ETL on your own log data. It is open source and easy to get started with.&lt;&#x2F;p&gt;
&lt;p&gt;Who knows, it might just save you a few million dollars.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;help-getting-started&quot;&gt;Help Getting Started&lt;&#x2F;h2&gt;
&lt;p&gt;If you want to try it on your own logs, here are some resources to help:&lt;&#x2F;p&gt;
&lt;ol&gt;
&lt;li&gt;Download Quine - &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;quine.io&#x2F;download&quot;&gt;JAR file&lt;&#x2F;a&gt; | &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;hub.docker.com&#x2F;r&#x2F;thatdot&#x2F;quine&quot;&gt;Docker Image&lt;&#x2F;a&gt; | &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;thatdot&#x2F;quine&quot;&gt;Github&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;li&gt;Check out the Ingest Data into Quine blog series covering everything from &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;docs.quine.io&#x2F;components&#x2F;ingest-sources&#x2F;kafka.html&quot;&gt;ingest from Kafka&lt;&#x2F;a&gt; to ingesting .CSV data&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;quine.io&#x2F;recipes&#x2F;apache-log-analytics&quot;&gt;Apache Log Recipe&lt;&#x2F;a&gt; - this recipe provides more ingest pattern examples&lt;&#x2F;li&gt;
&lt;li&gt;Join &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;that.re&#x2F;quine-slack&quot;&gt;Quine Community Slack&lt;&#x2F;a&gt; and get help from thatDot engineers and community members.&lt;&#x2F;li&gt;
&lt;&#x2F;ol&gt;
&lt;p&gt;—---&lt;&#x2F;p&gt;
&lt;p&gt;1 Set aside that there are better, cheaper alternatives for this specific use case (using an expensive SIEM provider this is sort of like renting a penthouse apartment for all your junk instead of a storage locker) and the fact remains: companies aren’t going to get rid of a certain amount of their data, no matter what.&lt;&#x2F;p&gt;
&lt;p&gt;2 If you are interested in a deeper technical understanding of Quine&#x27;s architecture, try our white paper.&lt;&#x2F;p&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>Drive Streaming Event Workflows with Standing Queries</title>
        <published>2022-07-06T00:00:00+00:00</published>
        <updated>2022-07-06T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              Unknown
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://www.thatdot.com/blog/drive-streaming-event-workflows-with-standing-queries/"/>
        <id>https://www.thatdot.com/blog/drive-streaming-event-workflows-with-standing-queries/</id>
        
        <content type="html" xml:base="https://www.thatdot.com/blog/drive-streaming-event-workflows-with-standing-queries/">&lt;h2 id=&quot;standing-queries-turning-event-driven-data-into-data-driven-events&quot;&gt;Standing Queries: Turning Event-Driven Data into Data-Driven Events&lt;&#x2F;h2&gt;
&lt;p&gt;Quine&#x27;s super power is the ability to store and execute business logic within the graph. That query can then operate directly on data as it streams in. We call this type of query a &lt;em&gt;standing query&lt;&#x2F;em&gt;.&lt;&#x2F;p&gt;
&lt;p&gt;A standing query incrementally matches some graph structure while new data is ingested into the graph. Quine’s special design makes this process extremely fast and efficient. When a full pattern match is found, a standing query takes action.&lt;&#x2F;p&gt;
&lt;p&gt;A standing query is defined in two parts: a &lt;strong&gt;pattern&lt;&#x2F;strong&gt; and an &lt;strong&gt;output&lt;&#x2F;strong&gt;. The &lt;strong&gt;pattern&lt;&#x2F;strong&gt; defines what we want to match, expressed in Cypher using the form &lt;strong&gt;&lt;code&gt;MATCH …&lt;&#x2F;code&gt; &lt;code&gt;WHERE …&lt;&#x2F;code&gt; &lt;code&gt;RETURN …&lt;&#x2F;code&gt;&lt;&#x2F;strong&gt;. The &lt;strong&gt;output&lt;&#x2F;strong&gt; defines the action(s) to take for each result produced by the &lt;code&gt;RETURN&lt;&#x2F;code&gt;  in the pattern query.&lt;&#x2F;p&gt;
&lt;p&gt;The result of a standing query output is passed to a series of actions which process the &lt;strong&gt;output&lt;&#x2F;strong&gt;. This output can be logged, passed to other systems (via Kafka, Kinesis, HTTP POST, and more), or can even be used to perform additional actions like running new queries or even rewriting parts of the graph. Whatever logic your application needs.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;how-nodes-match-patterns&quot;&gt;How nodes match patterns&lt;&#x2F;h2&gt;
&lt;p&gt;Each node in Quine is backed by an actor, which makes each graph node act like its own little CPU. Actors function as lightweight, single-threaded logical computation units that maintain state and communicate with each other by passing messages.&lt;&#x2F;p&gt;
&lt;p&gt;The actor model enables you to execute a standing query that is stored in the graph and remembered automatically. When you issue a &lt;strong&gt;&lt;code&gt;DistinctId&lt;&#x2F;code&gt;&lt;&#x2F;strong&gt; standing query, the query is broken into individual steps that can be tested one at a time on individual nodes. Quine stores the result of each successive decomposition of a query (smaller and smaller queries) internally on the node issuing that portion of the query. The previous node&#x27;s query is essentially a subscription to the next nodes status as either matching the query or not.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;uploads-ssl.webflow.com&#x2F;61f0aecf55af2565526f6a95&#x2F;62be5742c71a16da652f0888_Quine%20Streaming%20Graph%20Actor%20Model.png&quot; alt=&quot;The Quine Streaming Graph asynchronous actor model showing actors (units of compute) associated with each node in graph.&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;An actor associated with each node performs incremental computation.&lt;&#x2F;p&gt;
&lt;p&gt;Any changes in the next node’s pattern match state result in a notification to the querying node. In this way, a complex query is relayed through the graph, where each node subscribes to whether or not the next node fulfills its part of the query. When a complete match is made, or unmade, the chain is notified with results and an output action is triggered.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Info&lt;&#x2F;strong&gt;&lt;&#x2F;p&gt;
&lt;p&gt;There are two pattern match modes: distinctId and multipleValues&lt;br &#x2F;&gt;
This must take the form of &lt;strong&gt;&lt;code&gt;MATCH WHERE RETURN&lt;&#x2F;code&gt;&lt;&#x2F;strong&gt;&lt;br &#x2F;&gt;
When the mode is &lt;code&gt;DistinctId&lt;&#x2F;code&gt;, the pattern query &lt;code&gt;RETURN&lt;&#x2F;code&gt; must also be &lt;code&gt;DISTINCT&lt;&#x2F;code&gt;.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;creating-a-standing-query&quot;&gt;Creating a standing query&lt;&#x2F;h2&gt;
&lt;p&gt;The first step to making a Standing Query is determining the graph pattern you want to watch for. You may have deployed Quine in your data pipeline to perform a series of tasks to isolate data, implement a specific feature, or monitor the stream to find a specific pattern in real time. In any case, Quine will implement your logic using Cypher.&lt;&#x2F;p&gt;
&lt;p&gt;Let&#x27;s demonstrate this concept using Quine&#x27;s built in synthetic data generator that was introduced in v1.3.0. Say that you have a need to establish the relationships between all numbers in a number line and any number that is divisible by 10 using &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;mathworld.wolfram.com&#x2F;IntegerDivision.html&quot;&gt;integer division&lt;&#x2F;a&gt; (where dividing always returns a whole number; the remainder is discarded).&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;ingestStreams:&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  - format:&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;      query: |-&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        WITH gen.node.from(toInteger($that)) AS n,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;             toInteger($that) AS i&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        MATCH (thisNode), (nextNode), (divNode) &lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        WHERE id(thisNode) = id(n) &lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;          AND id(nextNode) = idFrom(i + 1) &lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;          AND id(divNode) = idFrom(i &#x2F; 10) &lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        SET this.i = i,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;            this.prop = gen.string.from(i)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        CREATE (thisNode)-[:next]-&amp;gt;(nextNode), &lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;               (thisNode)-[:div_by_ten]-&amp;gt;(divNode)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;      type: CypherLine&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    type: NumberIteratorIngest&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    ingestLimit: 100000&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Creates a graph with 100000 nodes and a shape that we can use for our example.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;uploads-ssl.webflow.com&#x2F;61f0aecf55af2565526f6a95&#x2F;62c5996d1c64c929f3834561_Screen%20Shot%202022-06-29%20at%2010.48.42%20AM.png&quot; alt=&quot;&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;Numbers divisible by 10 using integer division&lt;&#x2F;p&gt;
&lt;p&gt;In the example above, I want to count the unique times that a pattern like the one visualized above occurs in a sample of 100000 numbers. A key to our pattern is the existence of the &quot;data&quot; parameter in a node that is generated by the &lt;code&gt;gen.string.from()&lt;&#x2F;code&gt; function. The complete &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;thatdot&#x2F;quine&#x2F;blob&#x2F;main&#x2F;quine&#x2F;recipes&#x2F;sq-test.yaml&quot;&gt;recipe&lt;&#x2F;a&gt; is in the Quine repo if you want to follow along.&lt;&#x2F;p&gt;
&lt;p&gt;To detect a pattern in our data, we can write a Cypher query in the &lt;code&gt;pattern&lt;&#x2F;code&gt; section:&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;standingQueries:&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  - pattern:&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;      query: |-&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        MATCH (a)-[:div_by_ten]-&amp;gt;(b)-[:div_by_ten]-&amp;gt;(c)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        WHERE exists(c.prop)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        RETURN DISTINCT id(c) as id&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;      type: Cypher&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    outputs:&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;      count-1000-results:&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        type: Drop&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;It is looking for a number which is the ten-divisor of another number which is also the ten-divisor of a number in the graph. That basically means it&#x27;s looking for one of the first 1000 nodes created by our &quot;number iterator&quot; ingest.&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;❯ java -jar quine -r sq-test.yaml&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;Graph is ready&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;Running Recipe Standing Query Test Recipe&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;Using 1 node appearances&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;Using 11 quick queries &lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;Running Standing Query STANDING-1&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;Running Ingest Stream INGEST-1&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;Quine web server available at http:&#x2F;&#x2F;0.0.0.0:8080&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;INGEST-1 status is completed and ingested 100000&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt; | =&amp;amp;gt; STANDING-1 count 1000&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;This example simply counts how many are detected, using the standing query &lt;code&gt;output&lt;&#x2F;code&gt; variant:  &lt;code&gt;type: Drop&lt;&#x2F;code&gt;&lt;&#x2F;p&gt;
&lt;h2 id=&quot;standing-query-result-output-driving-workflows&quot;&gt;Standing query result output: driving workflows&lt;&#x2F;h2&gt;
&lt;p&gt;Say that instead of just counting the number of times that the pattern matches, we need to output the match for debugging or inspection. We can replace the &lt;code&gt;Drop&lt;&#x2F;code&gt; output with a &lt;code&gt;CypherQuery&lt;&#x2F;code&gt; that uses the matched result and then prints information to the console. When issuing a &lt;code&gt;DistinctId&lt;&#x2F;code&gt; standing query, the result of a match is a payload that looks like:&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;{&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    &amp;quot;meta&amp;quot;: {&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        &amp;quot;isPositiveMatch&amp;quot;: true,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        &amp;quot;resultId&amp;quot;: &amp;quot;2a757517-1225-7fe2-0d0e-22625ad3be37&amp;quot;&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    },&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    &amp;quot;data&amp;quot;: {&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        &amp;quot;a.id&amp;quot;: 45110,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        &amp;quot;a.prop&amp;quot;: &amp;quot;YH32SISr&amp;quot;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        &amp;quot;b.id&amp;quot;: 4511,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        &amp;quot;b.prop&amp;quot;: &amp;quot;fqx8aVAU&amp;quot;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        &amp;quot;c.id&amp;quot;: 451,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        &amp;quot;c.prop&amp;quot;: &amp;quot;61mTZqH8&amp;quot;&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    }&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;}&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;This payload includes the ID of the node that initially matched in the &lt;strong&gt;&lt;code&gt;data&lt;&#x2F;code&gt;&lt;&#x2F;strong&gt; field. So We can write a new Cypher query to go fetch additional information triggered by this match:&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;MATCH (a)-[:div_by_ten]-&amp;gt;(b)-[:div_by_ten]-&amp;gt;(c)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;WHERE id(c) = $that.data.id&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;RETURN a.i, a.prop, b.i, b.prop c.i, c.prop&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;The &lt;code&gt;MATCH&lt;&#x2F;code&gt; portion looks similar to our standing query, but this time we&#x27;re not monitoring the graph, we&#x27;re fetching data from the three-node pattern rooted at &lt;code&gt;(c)&lt;&#x2F;code&gt;.&lt;&#x2F;p&gt;
&lt;p&gt;Replacing the &lt;code&gt;count-1000-results&lt;&#x2F;code&gt; output with &lt;code&gt;inspect-results&lt;&#x2F;code&gt; from below would accomplish just that.&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;inspect-results:&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  type: CypherQuery&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  query: |-&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    MATCH (a)-[:div_by_ten]-&amp;gt;(b)-[:div_by_ten]-&amp;gt;(c)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    WHERE id(c) = $that.data.id&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    RETURN a.i, a.prop, b.i, b.prop c.i, c.prop&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  andThen:&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    type: PrintToStandardOut&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;The outputs stage of a standing query is where you can express your business logic and put Quine to work for you in your data pipeline. Take some time to review all of the possible output types in our &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;docs.quine.io&#x2F;reference&#x2F;rest-api.html#&#x2F;schemas&#x2F;StandingQueryResultOutput&quot;&gt;API documentation&lt;&#x2F;a&gt; located on the quine.io website.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;modifying-standing-queries&quot;&gt;Modifying standing queries&lt;&#x2F;h2&gt;
&lt;h3 id=&quot;modify-a-standing-query-output&quot;&gt;Modify a Standing Query Output&lt;&#x2F;h3&gt;
&lt;p&gt;Another time that you need to notify Quine of changes in your standing queries is when you modify the &lt;code&gt;outputs&lt;&#x2F;code&gt; section of an existing standing query. The Quine API has two methods for the &lt;code&gt;&#x2F;api&#x2F;v1&#x2F;query&#x2F;standing&#x2F;{standing-query-name}&#x2F;output&#x2F;{standing-query-output-name}&lt;&#x2F;code&gt; endpoint that allow you to &lt;code&gt;DELETE&lt;&#x2F;code&gt; and &lt;code&gt;POST&lt;&#x2F;code&gt; a new output to an existing standing query.&lt;&#x2F;p&gt;
&lt;p&gt;From above, let&#x27;s change the original standing query output type from &lt;code&gt;Drop&lt;&#x2F;code&gt; to a new &lt;code&gt;CypherQuery&lt;&#x2F;code&gt; that outputs the matches to the console. We will use two API calls to accomplish the change.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Delete the existing output:&lt;&#x2F;strong&gt;&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;curl --request DELETE \  &lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;--url http:&#x2F;&#x2F;0.0.0.0:8080&#x2F;api&#x2F;v1&#x2F;query&#x2F;standing&#x2F;STANDING-1&#x2F;output&#x2F;count-1000-results \  &lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;--header &amp;#39;Content-Type: application&#x2F;json&amp;#39;&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;&lt;strong&gt;Create the new output:&lt;&#x2F;strong&gt;&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;curl --request POST \&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  --url http:&#x2F;&#x2F;0.0.0.0:8080&#x2F;api&#x2F;v1&#x2F;query&#x2F;standing&#x2F;STANDING-1&#x2F;output&#x2F;inspect-results \&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  --header &amp;#39;Content-Type: application&#x2F;json&amp;#39; \&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  --data &amp;#39;{&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    &amp;quot;type&amp;quot;: &amp;quot;CypherQuery&amp;quot;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    &amp;quot;query&amp;quot;: &amp;quot;MATCH (a)-[:div_by_ten]-&amp;gt;(b)-[:div_by_ten]-&amp;gt;(c) WHERE id(c) = $that.data.id RETURN a.id, a.prop, b.id, b.prop c.id, c.prop&amp;quot;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    &amp;quot;andThen&amp;quot;: {&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;      &amp;quot;type&amp;quot;: &amp;quot;PrintToStandardOut&amp;quot;&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    }&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;}&amp;#39;&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;&lt;h3 id=&quot;propagate-a-new-standing-query&quot;&gt;Propagate a New Standing Query&lt;&#x2F;h3&gt;
&lt;p&gt;When a new standing query is registered in the system, it gets automatically registered only new nodes (or old nodes that are loaded back into the cache). This behavior is the default because pro-actively setting the standing query on all existing data might be quite costly depending on how much historical data there is. So Quine defaults to the most efficient option.&lt;&#x2F;p&gt;
&lt;p&gt;However, sometimes there is a need to actively propagate standing queries across all previously ingested data as well. You can use the API to request that Quine propagate a new standing query to all nodes in the existing graph. Here&#x27;s how the request looks in &lt;code&gt;curl&lt;&#x2F;code&gt;.&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;curl --request POST \&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  --url http:&#x2F;&#x2F;0.0.0.0:8080&#x2F;api&#x2F;v1&#x2F;query&#x2F;standing&#x2F;control&#x2F;propagate?include-sleeping=true \&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  --header &amp;#39;Content-Type: application&#x2F;json&amp;#39;&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Review the in-product API documentation via the Quine web interface for additional code snippets.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;conclusion&quot;&gt;Conclusion&lt;&#x2F;h2&gt;
&lt;p&gt;In this blog post, we looked at the different types of standing queries that you can create in Quine. A standing query is a powerful tool for data processing because it allows you to express your business logic as part of your data pipeline. We also looked at how you can modify an existing standing query output type and propagate a new standing query across the graph.&lt;&#x2F;p&gt;
&lt;p&gt;Quine is open source if you want to explore standing queries for yourself using your own data. Download a precompiled version or build it yourself from the codebase from the &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;thatdot&#x2F;quine&quot;&gt;Quine Github&lt;&#x2F;a&gt; codebase.&lt;&#x2F;p&gt;
&lt;p&gt;Have a question, suggestion, or improvement? I welcome your feedback! Please drop into &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;quine-io.slack.com&#x2F;&quot;&gt;Quine Slack&lt;&#x2F;a&gt; and let me know. I&#x27;m always happy to discuss Quine or answer questions.&lt;&#x2F;p&gt;
&lt;p&gt;‍&lt;&#x2F;p&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>Quine Streaming Graph 1.3.0: Focus on Usability, Query Performance</title>
        <published>2022-07-06T00:00:00+00:00</published>
        <updated>2022-07-06T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              Unknown
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://www.thatdot.com/news/quine-streaming-graph-1-3-0-focus-on-usability-query-performance/"/>
        <id>https://www.thatdot.com/news/quine-streaming-graph-1-3-0-focus-on-usability-query-performance/</id>
        
        <content type="html" xml:base="https://www.thatdot.com/news/quine-streaming-graph-1-3-0-focus-on-usability-query-performance/">&lt;h2 id=&quot;performant-pagination-at-scale-improved-querying-and-user-docs-advanced-recipes&quot;&gt;Performant Pagination at Scale, Improved Querying and User Docs, Advanced Recipes&lt;&#x2F;h2&gt;
&lt;p&gt;It is hard to believe we released Quine 1.2.0 only six weeks ago, especially when I look at the work that has gone into not just Quine but also documentation, how-to blogs and example recipes. Indeed, 1.3.0 cements a pattern of releases made up of a few features needed to achieve performance at scale and loads of smaller usability improvements that has emerged since we released Quine as an open source project in February.&lt;&#x2F;p&gt;
&lt;p&gt;Additions to Quine included vastly improved pagination performance inside of our Cypher compiler, overhauled the API documentation, making journals a default when running recipes, improved Cypher query support, and a number of small but consequential changes to the system’s logging behavior.&lt;&#x2F;p&gt;
&lt;p&gt;In addition, we’ve migrated the documentation to its own site to make it easier to make community contributions and keep docs in sync with releases, added three new recipes and made substantial updates to one of the favorites.&lt;&#x2F;p&gt;
&lt;p&gt;The common theme throughout: usability and performance.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;pagination-in-quine-streaming-graph&quot;&gt;Pagination in Quine Streaming Graph&lt;&#x2F;h2&gt;
&lt;p&gt;As part of our work to make all aspects of the system perform predictably and well at throughput rates of hundreds of thousands or even millions of events per second, we have undertaken some plumbing upgrades.&lt;&#x2F;p&gt;
&lt;p&gt;To give you an idea of the engineering involved, check out &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;ignaciochiazzo.medium.com&#x2F;paginating-requests-in-apis-d4883d4c1c4c&quot;&gt;this blog post&lt;&#x2F;a&gt; about the three most common pagination approaches (or I can save you time and tell you it explains page, point, and keySet-based pagination). We combined aspects of all three in our approach.&lt;&#x2F;p&gt;
&lt;p&gt;The other notable work focused on usability and community enablement.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;quine-and-api-documentation-plus-improved-usability&quot;&gt;Quine and API documentation plus Improved Usability&lt;&#x2F;h2&gt;
&lt;p&gt;We switched to the Stoplight Elements framework to make &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;docs.quine.io&#x2F;reference&#x2F;rest-api.html#&#x2F;&quot;&gt;API documentation&lt;&#x2F;a&gt; easier to access and migrated from quine.io&#x2F;docs to docs.quine.io. Not huge changes in themselves, but together they ensure docs are more accessible to the community to modify and never lag releases.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;ingest-streams-from-kafka-and-other-sources&quot;&gt;Ingest Streams from Kafka and other Sources&lt;&#x2F;h3&gt;
&lt;p&gt;We also completed five blog posts on ingesting streams, ranging from simple CSV files to internet feeds to Kafka integration.&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https:&#x2F;&#x2F;www.thatdot.com&#x2F;blog&#x2F;real-time-graph-analytics-for-kafka-streams-with-quine&#x2F;&quot;&gt;Real-time Graph Analytics for Kafka Streams with Quine&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a href=&quot;https:&#x2F;&#x2F;www.thatdot.com&#x2F;blog&#x2F;building-a-quine-streaming-graph-ingest-streams&#x2F;&quot;&gt;Building a Quine Streaming Graph: Ingest Streams&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a href=&quot;https:&#x2F;&#x2F;www.thatdot.com&#x2F;blog&#x2F;ingesting-data-from-the-internet&#x2F;&quot;&gt;Ingesting data from the Internet into Quine Streaming Graph&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a href=&quot;https:&#x2F;&#x2F;www.thatdot.com&#x2F;blog&#x2F;ingesting-from-multiple-data-sources-into-quine-streaming-graph&#x2F;&quot;&gt;Ingesting From Multiple Data Sources into Quine Streaming Graph&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a href=&quot;https:&#x2F;&#x2F;www.thatdot.com&#x2F;blog&#x2F;ingest-and-analyze-log-files-using-streaming-graph&#x2F;&quot;&gt;Ingest and Analyze Log Files Using Streaming Graph&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;h3 id=&quot;live-event-stream-log-and-network-observability-recipes&quot;&gt;Live event stream, log, and network observability recipes&lt;&#x2F;h3&gt;
&lt;p&gt;And of course when we wrote an explainer on ingesting and processing log files, we couldn’t resist a recipe that uses Quine logs as the source. We all know that consuming, parsing, and visualizing Java log output is a huge challenge, one that lacks a widely available solution. We think Quine might be an answer. Use the &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;quine.io&#x2F;recipes&#x2F;quine-logs-recipe&quot;&gt;Quine Log Recipe&lt;&#x2F;a&gt; as a baseline, then modify the regular expression inside the ingest stream Cypher query to fit your logs.&lt;&#x2F;p&gt;
&lt;p&gt;‍&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;uploads-ssl.webflow.com&#x2F;61f0aecf55af2565526f6a95&#x2F;62c5ad36496b6524b5103f3b_LqdCv5EUtu9dGUDiZ3UF1HSWHKO0UvKV4bdlrtK1pft9CHmL0sA5aLuGciocE4hY1WMLZMsTKsswYZdfZmbiAwhkjbtNiRxtl0gyba6ckOBA7BYgoYZ0wr8yejBz1YRc4Mrr7pmkLNHEMWJK2V0.png&quot; alt=&quot;An ouroboros, or snake holding onto its one tail.&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;Quine processing its own logs.&lt;&#x2F;p&gt;
&lt;p&gt;‍&lt;&#x2F;p&gt;
&lt;p&gt;In addition to the Quine Java log ingest recipe, we’ve created a recipe showing how to ingest and build a streaming graph from &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;quine.io&#x2F;recipes&#x2F;imdb-movie-data&quot;&gt;a feed of IMDB&lt;&#x2F;a&gt; movie data. (For anyone really interested in log processing, there’s also an &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;quine.io&#x2F;recipes&#x2F;apache-log-analytics&quot;&gt;Apache web logs analytics&lt;&#x2F;a&gt; recipe). Rounding out the trio of new recipes is a fun one: Ethan’s &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;quine.io&#x2F;recipes&#x2F;pi&quot;&gt;Pi Day recipe&lt;&#x2F;a&gt; using Quine to calculate Pi using Liebniz’s formula.&lt;&#x2F;p&gt;
&lt;p&gt;On the topic of observability and root cause analysis, the &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;quine.io&#x2F;recipes&#x2F;cdn-cache-efficiency-by-segment&quot;&gt;CDN Cache Efficiency&lt;&#x2F;a&gt; recipe got a major update:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;Moved shaping the graph from  standing queries into the ingest stream.&lt;&#x2F;li&gt;
&lt;li&gt;Updated code to reflect Cypher best practices.&lt;&#x2F;li&gt;
&lt;li&gt;Added quick queries to perform efficiency calculations.&lt;&#x2F;li&gt;
&lt;li&gt;Optimized the manifestation of nodes.&lt;&#x2F;li&gt;
&lt;li&gt;Added client device nodes.&lt;&#x2F;li&gt;
&lt;li&gt;Increased the data sample size&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;h3 id=&quot;quine-synthetic-data-generator&quot;&gt;Quine Synthetic Data Generator&lt;&#x2F;h3&gt;
&lt;p&gt;With Quine v1.3.0 we also introduced a powerful series of built-in synthetic data Cypher functions. The synthetic data functions can be used within ingest streams to create booleans, bytes, floats, integers, strings, or nodes. This allows you to generate streaming synthetic data that can be used for testing or development purposes.&lt;&#x2F;p&gt;
&lt;p&gt;Search for &lt;code&gt;gen.&lt;&#x2F;code&gt; to check out how to use the functions on the &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;docs.quine.io&#x2F;reference&#x2F;cypher&#x2F;cypher-functions.html#:~:text=gen.&quot;&gt;Cypher Functions&lt;&#x2F;a&gt; page of &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;docs.quine.io&#x2F;&quot;&gt;docs.quine.io&lt;&#x2F;a&gt;.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;next-up&quot;&gt;Next Up&lt;&#x2F;h2&gt;
&lt;p&gt;Quine is open source if you want to explore standing queries for yourself using your own data. Download &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;quine.io&#x2F;download&quot;&gt;a precompiled version&lt;&#x2F;a&gt; or build it yourself from the codebase from the &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;thatdot&#x2F;quine&quot;&gt;Quine Github&lt;&#x2F;a&gt; codebase.&lt;&#x2F;p&gt;
&lt;p&gt;Have a question, suggestion, or improvement? I welcome your feedback! Please drop into &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;quine-io.slack.com&#x2F;&quot;&gt;Quine Slack&lt;&#x2F;a&gt; and let me know. I&#x27;m always happy to discuss Quine or answer questions.&lt;&#x2F;p&gt;
&lt;p&gt;‍&lt;&#x2F;p&gt;
&lt;p&gt;Release Notes:&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;Release Quine 1.3.0&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;Features:&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;- Added a pagination (SKIP&#x2F;LIMIT) optimizer to the Cypher query engine for &lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  historical queries with no unaliased values (#1822)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;- Enabled journals by default when running a recipe (#1814)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;- Added support for using the Stoplight Elements interactive documentation &lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  behind an authentication proxy (#1781)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;Bugfixes:&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;- Fixed an issue where waking up a node would not correctly re-register its s&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  standing queries, potentially resulting in dropped results (#1830)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;- Fixed an issue where Cypher subqueries could be executed with too many &lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  variables in scope (#1821)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;- Fixed an issue where some Cypher constructs (notably: variable-length &lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  relationship patterns) could be executed with too many variables in scope (#1821)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;- Fixed a documentation rendering issue for Standing Query Outputs (#1815)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;- Renamed the metric &amp;quot;persistors.snapshot-sizes&amp;quot; to &amp;quot;persistor.snapshot-sizes&amp;quot; &lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  for consistency (#1788)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;- Fixed the behavior of DISTINCT during Cypher query execution, making &lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  it work correctly with SKIP and&#x2F;or LIMIT (#1777)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;Misc:&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;- Simplified startup log messages (#1831)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;- Update some error messages to use the correct name for DistinctId &lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  Standing Queries (#1796)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;- Improved UX for API-issued historical queries near the present &lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  time (#1786, #1789)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;- Removed logback-config logging library: to configure logging, use standard &lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  logback.xml (#1754)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;- Added timestamps to node journal events in debug.node and node &lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  debug APIs (#1741)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;- Removed StandingQueryPattern.Graph API (#1795)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;- Improved distribution of randomly-generated partitioned IDs (#1801)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;- Documented metrics endpoint in openapi specification (#1792)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;- Added peephole optimization for property value comparsion (#1783)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;- Refactored to simplify DomainGraphBranch representation (#1771)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;Updates:&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;- rocksdbjni to 7.3.1 (#1825)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;- msgpack-core to 0.9.2 (#1824)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;- cats-core to 2.8.0 (#1826)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;- metrics to 4.2.10 (#1823)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;- scala-library to 2.12.16&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;- sbt-paradox to 0.10.2 (#1809)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;- sbt-scalafix to 0.10.1 (#1808)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;- scala-java-time to 2.4.0 (#1798)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;==== Quine Enterprise Additions ====&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;Release Quine Enterprise 1.3.0&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;Misc&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;- Removed hydrolix persistor (#1739)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;Updates&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;- proguard-base to 7.2.2&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;- scala-logging to 3.9.5 (#1776)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;- classgraph to 4.8.147 (#1784)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;==== Quine.io &#x2F; docs.thatdot.com: Probably not in release notes ====&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;* 1471b8201 Fixed typo in Kinesis section (#1829)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;* e216d1475 Resolve left nav issue on docs page (#1819)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;* 60fc048b5 updated the social link to a community invite (#1816)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;* 72fe04a2d Added 3d data tutorial (#1806)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;* 01061a434 initial quine log recipe commit (#1802)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;* 96dba566b Added the movieData recipe. (#1787)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;* 6330c807f (query-manager-fiddling) 1.2-docs-bugFix (#1758)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;* b98b791d0 Refactor site to use - instead of _ in urls (#1772)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;Collapse&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;</content>
        
    </entry>
    <entry xml:lang="en">
        <title>Understanding the Scale Limitations of Graph Databases</title>
        <published>2022-07-05T00:00:00+00:00</published>
        <updated>2022-07-05T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              Unknown
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://www.thatdot.com/blog/understanding-the-scale-limitations-of-graph-databases/"/>
        <id>https://www.thatdot.com/blog/understanding-the-scale-limitations-of-graph-databases/</id>
        
        <content type="html" xml:base="https://www.thatdot.com/blog/understanding-the-scale-limitations-of-graph-databases/">&lt;h2 id=&quot;a-new-kind-of-database-using-graph-models-to-unlock-categorical-data&quot;&gt;A New Kind of Database: Using Graph Models to Unlock Categorical Data&lt;&#x2F;h2&gt;
&lt;p&gt;Graph databases and models have been around for well over a decade, and are among the most impactful technologies to emerge from the &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;www.webopedia.com&#x2F;definitions&#x2F;nosql&#x2F;&quot;&gt;NoSQL&lt;&#x2F;a&gt; movement.&lt;&#x2F;p&gt;
&lt;p&gt;Graph data models are natively designed to focus on the relationships within and between &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;www.eweek.com&#x2F;big-data-and-analytics&#x2F;data-analytics&#x2F;&quot;&gt;data&lt;&#x2F;a&gt; representing this data as nodes connected by edges. As such, the graph model is strikingly similar to the way humans often think and talk.&lt;&#x2F;p&gt;
&lt;p&gt;The node-edge-node pattern in a graph corresponds directly to the subject-predicate-object pattern common to languages like English. So, if you’ve ever used mind-mapping technology or diagrammed ideas on a whiteboard, you’ve created a graph.&lt;&#x2F;p&gt;
&lt;p&gt;A critical advantage of graph databases is their ability to express relationships between &lt;a href=&quot;https:&#x2F;&#x2F;www.thatdot.com&#x2F;blog&#x2F;what-is-categorical-data&#x2F;&quot;&gt;categorical data&lt;&#x2F;a&gt; (any non-numerical value e.g. email addresses, colors, models of cars, or geographic locations). This is not possible otherwise without using encoding methods that destroy much of the value of this data, and explains why most categorical data (and &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;www.inc.com&#x2F;jeff-barrett&#x2F;misusing-data-could-be-costing-your-business-heres-how.html&quot;&gt;73% of all data&lt;&#x2F;a&gt;) is simply ignored by enterprises.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;uploads-ssl.webflow.com&#x2F;61f0aecf55af2565526f6a95&#x2F;62c3aefcecf0519083f04e41_graph%20databases%20express%20relationships%20between%20categorgical%20data.jpg&quot; alt=&quot;An abstract photograph of a wire sculpture that resembles a graph.&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;Graph databases allow you to explore the relationships between data types.&lt;&#x2F;p&gt;
&lt;p&gt;Graph data models have become part of the standard toolkit for data scientists applying &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;www.eweek.com&#x2F;big-data-and-analytics&#x2F;top-ai-software&#x2F;&quot;&gt;artificial intelligence&lt;&#x2F;a&gt; (AI) to everything from fraud detection and manufacturing control systems to recommendation engines and customer 360s.&lt;&#x2F;p&gt;
&lt;p&gt;Given this broad applicability, it’s no surprise Gartner believes that &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;www.gartner.com&#x2F;doc&#x2F;4001808&quot;&gt;graph database technologies will be used&lt;&#x2F;a&gt; in more than 80% of data and analytics innovations, including real-time event streaming, by 2025. But &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;www.eweek.com&#x2F;database&#x2F;why-experts-see-graph-databases-headed-to-mainstream-use&#x2F;&quot;&gt;as adoption accelerates&lt;&#x2F;a&gt;, limitations and challenges are emerging. And one of the most significant limitations graph databases face is their inability to scale.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;volume-and-velocity-of-modern-data-generation&quot;&gt;&lt;strong&gt;Volume and Velocity of Modern Data Generation&lt;&#x2F;strong&gt;&lt;&#x2F;h2&gt;
&lt;p&gt;Much has changed since the emergence of the most recent generation of graph databases from a decade ago. Enterprises are dealing with previously unimaginable volumes of data to potentially query. That data enters and streams through the enterprise in a variety of channels, and enterprises want action on that information in real time.&lt;&#x2F;p&gt;
&lt;p&gt;Original graph designs couldn’t have imagined today’s sheer volume of data or the computation power needed to put that data to work. And it’s not just the volume of data dragging graph databases down. It’s the velocity of that data.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;uploads-ssl.webflow.com&#x2F;61f0aecf55af2565526f6a95&#x2F;62c3afa5d80178ad43ab3359_graph%20data%20models%20%20connections%20between%20data.jpg&quot; alt=&quot;A patch board with colorful cables.&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;Graph databases are great at allowing you to make connections, but they don&#x27;t scale.&lt;&#x2F;p&gt;
&lt;p&gt;While graph databases can excel at computation on moderately-sized sets of data at rest, they get especially &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;www.webopedia.com&#x2F;definitions&#x2F;data-silo&#x2F;&quot;&gt;siloed&lt;&#x2F;a&gt; and suffer significant tradeoffs when real-time actions on &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;www.eweek.com&#x2F;networking&#x2F;trends-streaming-data&#x2F;&quot;&gt;streaming data&lt;&#x2F;a&gt; are desired. Streaming is actively moving data; it constantly arrives from diverse sources.&lt;&#x2F;p&gt;
&lt;p&gt;And enterprises want to act upon it immediately in event-processing pipelines because when certain events are not caught quickly, as they happen, the opportunity to act disappears. For example, security incidents, transaction processing (such as fraud or credit validations), and &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;www.eweek.com&#x2F;enterprise-apps&#x2F;what-is-automation&#x2F;&quot;&gt;automated&lt;&#x2F;a&gt; machine-to-machine actions.&lt;&#x2F;p&gt;
&lt;p&gt;Anomalies and patterns need to be recognized with AI and ML algorithms that can automate (or at least escalate) an action. And that recognition needs to occur before an automated action can proceed.&lt;&#x2F;p&gt;
&lt;p&gt;Graph databases were simply never built for this scenario. They are typically restricted to hundreds or thousands of events per second. But today’s enterprises need to be able to process a velocity of millions of events per second and, in some advanced use cases, tens of millions.&lt;&#x2F;p&gt;
&lt;p&gt;There’s a hard limit both on how quickly graph systems can process data and on how much complexity (like how many hops in the query) they can handle. Because of those limits, graph systems often don’t get used. Since graph systems don’t get used, data engineering teams have no option other than to recreate the graph database-like functionality spread throughout their &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;www.webopedia.com&#x2F;definitions&#x2F;microservice-architecture-microservices&#x2F;&quot;&gt;microservices&lt;&#x2F;a&gt; architecture.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-rise-of-custom-data-pipeline-development&quot;&gt;&lt;strong&gt;The Rise of Custom Data Pipeline Development&lt;&#x2F;strong&gt;&lt;&#x2F;h2&gt;
&lt;p&gt;These workarounds to query the event streams in real time require significant effort. Developers typically turn to event stream processing systems like Flink and ksqlDB, which make it possible, but not easy, to use familiar &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;www.webopedia.com&#x2F;definitions&#x2F;sql-server&#x2F;&quot;&gt;SQL&lt;&#x2F;a&gt; query syntax to query the event streams.&lt;&#x2F;p&gt;
&lt;p&gt;It’s not uncommon for enterprises to have teams of data engineers developing extensive and complex micro service architectures for months or years to get up to the scale and speed needs of streaming data. However, these systems tend to lack the expressive query structures needed to find complex patterns in streams efficiently.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;uploads-ssl.webflow.com&#x2F;61f0aecf55af2565526f6a95&#x2F;62c3b14b12573c0f48b46f13_The%20Rise%20of%20Streaming%20Graph.jpg&quot; alt=&quot;Event stream processing systems like Apache Kafka and Kinesis created a new, event-driven architecture.&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;Event stream processing systems like Apache Kafka and Kinesis created a new, event-driven architecture.&lt;&#x2F;p&gt;
&lt;p&gt;As noted, to operate at the volume and velocity that enterprises require, these systems have had to make tough tradeoffs that lead to significant limitations.&lt;&#x2F;p&gt;
&lt;p&gt;For example, time windows can restrict a system’s ability to connect events that do not arrive within a narrow time interval (often measured in seconds or minutes). This means that rather than providing some critical insight or business value, an event is instead simply ignored if it arrives even seconds too late.&lt;&#x2F;p&gt;
&lt;p&gt;Even with costly limitations like time windows, event stream processing systems have been successful. Many can even scale to process millions of events per second—but with significant effort and limitations that fail to deliver the full power of graph data models.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;quine-streaming-graph-was-created-to-meet-demand&quot;&gt;&lt;strong&gt;Quine Streaming Graph Was Created to Meet Demand&lt;&#x2F;strong&gt;&lt;&#x2F;h2&gt;
&lt;p&gt;The demand for insights from instant event data streams and the value they deliver has never been higher. As adoption accelerates, businesses should expect to see new data infrastructure emerge to eliminate many of the scale struggles that can hold back the power of graph database models. That&#x27;s why we created Quine streaming graph. Quine solves the problem of scalable graph databases that can process &lt;a href=&quot;https:&#x2F;&#x2F;www.thatdot.com&#x2F;blog&#x2F;linear-scaling-to-1-1-trillion-monthly-log-events-in-thatdots-streaming-graph&#x2F;&quot;&gt;millions of events per second&lt;&#x2F;a&gt;.&lt;&#x2F;p&gt;
&lt;p&gt;Quine’s unique approach combines graph data and streaming technologies into a modern, developer-friendly open source software package. For the first time, teams can process categorical data in real time without resorting to encoding methods.&lt;&#x2F;p&gt;
&lt;p&gt;Developers and data pipeline engineers use Quine to rapidly build high volume, real-time, complex event processing workflows at scale, especially if they are using &lt;a href=&quot;https:&#x2F;&#x2F;www.thatdot.com&#x2F;blog&#x2F;quine-streaming-graph-is-a-natural-fit-for-kafka-pipelines&#x2F;&quot;&gt;Kafka&lt;&#x2F;a&gt; or &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;quine.io&#x2F;getting-started&#x2F;ingest-streams-tutorial&#x2F;?h=kinesis&quot;&gt;Kinesis&lt;&#x2F;a&gt;. A handful of Quine queries can replace months of development time and millions in costs, eliminating batch processing, multi-level joins, time windows, and other time-consuming and outdated processes that drag down and stall analysis on streaming data.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;next-steps&quot;&gt;Next Steps&lt;&#x2F;h3&gt;
&lt;p&gt;And if you want to try Quine yourself, you can &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;quine.io&#x2F;download&quot;&gt;download&lt;&#x2F;a&gt; it here. To get started, try the Ethereum Blockchain Fraud Detection, &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;quine.io&#x2F;recipes&#x2F;wikipedia-page-ingest&quot;&gt;Wikipedia Ingest&lt;&#x2F;a&gt; or  &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;quine.io&#x2F;recipes&#x2F;apache-log-analytics&quot;&gt;Apache Log Analytics&lt;&#x2F;a&gt; recipes for different use cases for streaming graph.&lt;&#x2F;p&gt;
&lt;p&gt;If you have questions or want to check out the community, &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;that.re&#x2F;quine-slack&quot;&gt;join Quine slack&lt;&#x2F;a&gt; or visit our &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;thatdot&#x2F;quine&quot;&gt;Github&lt;&#x2F;a&gt; page.&lt;&#x2F;p&gt;
&lt;p&gt;‍&lt;&#x2F;p&gt;
&lt;p&gt;&lt;em&gt;&lt;strong&gt;Note&lt;&#x2F;strong&gt;&lt;&#x2F;em&gt;: &lt;em&gt;A version of this post was previously published in eWeek on May 26th, 2022.&lt;&#x2F;em&gt;&lt;&#x2F;p&gt;
&lt;h4 id=&quot;photo-credits&quot;&gt;Photo Credits:&lt;&#x2F;h4&gt;
&lt;p&gt;Header image:  by &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;unsplash.com&#x2F;@jjying?utm_source=unsplash&amp;amp;utm_medium=referral&amp;amp;utm_content=creditCopyText&quot;&gt;JJ Ying&lt;&#x2F;a&gt; on &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;unsplash.com&#x2F;s&#x2F;photos&#x2F;connections?utm_source=unsplash&amp;amp;utm_medium=referral&amp;amp;utm_content=creditCopyText&quot;&gt;Unsplash&lt;&#x2F;a&gt;&lt;&#x2F;p&gt;
&lt;p&gt;Photo 1:  by &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;unsplash.com&#x2F;@alinnnaaaa?utm_source=unsplash&amp;amp;utm_medium=referral&amp;amp;utm_content=creditCopyText&quot;&gt;Alina Grubnyak&lt;&#x2F;a&gt; on &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;unsplash.com&#x2F;s&#x2F;photos&#x2F;connections?utm_source=unsplash&amp;amp;utm_medium=referral&amp;amp;utm_content=creditCopyText&quot;&gt;Unsplash&lt;&#x2F;a&gt;&lt;&#x2F;p&gt;
&lt;p&gt;Photo 2:  by &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;unsplash.com&#x2F;@barkiple?utm_source=unsplash&amp;amp;utm_medium=referral&amp;amp;utm_content=creditCopyText&quot;&gt;John Barkiple&lt;&#x2F;a&gt; on &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;unsplash.com&#x2F;s&#x2F;photos&#x2F;connections?utm_source=unsplash&amp;amp;utm_medium=referral&amp;amp;utm_content=creditCopyText&quot;&gt;Unsplash&lt;&#x2F;a&gt;&lt;&#x2F;p&gt;
&lt;p&gt;Photo 3:  by &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;unsplash.com&#x2F;@othentikisra?utm_source=unsplash&amp;amp;utm_medium=referral&amp;amp;utm_content=creditCopyText&quot;&gt;israel palacio&lt;&#x2F;a&gt; on &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;unsplash.com&#x2F;s&#x2F;photos&#x2F;connections?utm_source=unsplash&amp;amp;utm_medium=referral&amp;amp;utm_content=creditCopyText&quot;&gt;Unsplash&lt;&#x2F;a&gt;&lt;&#x2F;p&gt;
&lt;p&gt;thatDot appreciates the work of these artists and the fact they&#x27;ve made their excellent work available for use.&lt;&#x2F;p&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>Network Log Analysis Using Categorical Anomaly Detection</title>
        <published>2022-06-24T00:00:00+00:00</published>
        <updated>2022-06-24T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              Unknown
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://www.thatdot.com/blog/network-log-analysis-using-categorical-anomaly-detection/"/>
        <id>https://www.thatdot.com/blog/network-log-analysis-using-categorical-anomaly-detection/</id>
        
        <content type="html" xml:base="https://www.thatdot.com/blog/network-log-analysis-using-categorical-anomaly-detection/">&lt;p&gt;The distributed nature of modern virtualized software architectures has created added complexity in the networking stack, making it difficult to attribute behavior to any single service. Instrumenting services will give you insight into activity within the service, but doesn’t provide the entire picture. What’s missing is insight into the communication behaviors that happen between two logical hosts.&lt;&#x2F;p&gt;
&lt;p&gt;In an attempt to better expose this area I found a dataset containing over 200m network connection summary records from the open source &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;zeek.org&#x2F;&quot;&gt;Zeek&lt;&#x2F;a&gt; network monitoring service. Each Zeek log contains a number of fields including the originating host, the responding hosts with summary fields for connection state and connection history. A record converted to CSV looks like this (emphasis mine):&lt;&#x2F;p&gt;
&lt;p&gt;1331902125.080000, CIp1er3EKU2WUebCDe, &lt;strong&gt;192.168.202.94&lt;&#x2F;strong&gt;, 52307, 1**92.168.23.100,**445, tcp, -, 10.550000, 4803, 3174, &lt;strong&gt;SF&lt;&#x2F;strong&gt;, -, 0, &lt;strong&gt;ShADdaFf&lt;&#x2F;strong&gt;, 32,  6475, 27, 4590, (empty)&lt;&#x2F;p&gt;
&lt;p&gt;The metrics available in those records aid in informing standard monitors such as bandwidth (bytes received, bytes sent). Analysis of only the available metrics, however, is ignoring significant information encoded into the categorical elements of the log. This includes the hosts’ IP addresses and the summary abbreviations for connection state (SF) and connection history (ShADdaFf). For connection state, the entire field maps to a description. For connection history, each &lt;em&gt;character&lt;&#x2F;em&gt; maps to a different activity within the TCP lifecycle. Capital letters indicate originating server requests and lowercase letters indicate responding server responses.&lt;&#x2F;p&gt;
&lt;p&gt;Using thatDot Novelty Detector’s data transformation API, I was able to build a simple function to manipulate the raw logs into something more useful. The function is responsible for:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;Mapping abbreviations to their corresponding definitions for easier understanding.&lt;&#x2F;li&gt;
&lt;li&gt;Separating the activity for sending and receiving hosts.&lt;&#x2F;li&gt;
&lt;li&gt;Create the ordered data observation for submission to the API.&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;This function was then stored as a transformation that could be applied to all incoming data.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;data-transformation-map&quot;&gt;&lt;strong&gt;Data Transformation Map&lt;&#x2F;strong&gt;&lt;&#x2F;h2&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;uploads-ssl.webflow.com&#x2F;61f0aecf55af2565526f6a95&#x2F;61f2ce1e35eab8ff8ee553c7_zeek_log.png&quot; alt=&quot;Diagram of a data transformation map.&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;With the transformation in place, I was able to ingest the records and build a tree to visualize the connection history, ultimately giving us insight into a general fingerprint of conversation behavior. Once the system has recognized the fingerprint, it will begin to highlight connection paths that have deviated from normal behavior.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;visualization-of-communication-patterns&quot;&gt;&lt;strong&gt;Visualization Of Communication Patterns&lt;&#x2F;strong&gt;&lt;&#x2F;h2&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;uploads-ssl.webflow.com&#x2F;61f0aecf55af2565526f6a95&#x2F;61f2ce1f9901ba78dff7c013_tree_with_title.png&quot; alt=&quot;An Anomaly Detector graph tree.&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;The principle reason for using thatDot’s Novelty Detector for this analysis however, is to surface the “novel” data from amongst the volumes of “normal” data. This sampled plot chart does a nice job of identifying the highly novel network conversations. The items highest on the X axis are the most Novel observations which may or may not also be Unique in the data. It is always interesting to see when Unique data, shown via the coloring, is NOT Novel. Differentiating such “false-positive” events is a significant benefit of including categorical data in our analysis.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;example-observation-detail-visualization&quot;&gt;&lt;strong&gt;Example Observation Detail Visualization&lt;&#x2F;strong&gt;&lt;&#x2F;h2&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;uploads-ssl.webflow.com&#x2F;61f0aecf55af2565526f6a95&#x2F;61f2ce1e167f1209d8e01b49_15M_sample_scatter.png&quot; alt=&quot;Anomaly detector sample scatter chart&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;From this scatter plot chart we click through to one of the high novelty scored observation which leads us to the tree below, showing us that completing a handshake connection is abnormal for these two hosts. It is much more typical for these connections to time out.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;observation-detail-visualization&quot;&gt;&lt;strong&gt;Observation Detail Visualization&lt;&#x2F;strong&gt;&lt;&#x2F;h2&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;uploads-ssl.webflow.com&#x2F;61f0aecf55af2565526f6a95&#x2F;63226e4b0f29af7eb240df2d_network%20analysis%20anomaly%20-%20abnormal%20path.png&quot; alt=&quot;&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;This same mechanism is useful for a range of use cases:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;Real-time DDoS detection, such as &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;TCP_half-open&quot;&gt;TCP half-open&lt;&#x2F;a&gt; (SYN flood) attacks.&lt;&#x2F;li&gt;
&lt;li&gt;Public-Private hosts communications. Use to determine which hosts are trying to connect and why (protocol, port, etc)&lt;&#x2F;li&gt;
&lt;li&gt;New protocol use between known hosts&lt;&#x2F;li&gt;
&lt;li&gt;New hosts successfully communicating with known hosts&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;In summary, this turns out to be a useful tool to aid in enriching existing telemetry data to aid in discovery, remediation and automation.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;thatdot-novelty-detector&quot;&gt;&lt;strong&gt;thatDot Novelty Detector&lt;&#x2F;strong&gt;&lt;&#x2F;h2&gt;
&lt;p&gt;thatDot Novelty Detector is the first general-use application designed for finding anomalies in real-time in data sets that include categorical data. Available as an application for deployment in any cloud or data center thatDot Novelty Detector exposes an API that scores submitted observations for their “novelty” enabling real-time anomaly detention with fewer false positives than traditional threshold based metric analysis.&lt;&#x2F;p&gt;
&lt;p&gt;‍&lt;&#x2F;p&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>Reducing False Positive Alerts With Contextual Anomaly Detection</title>
        <published>2022-06-24T00:00:00+00:00</published>
        <updated>2022-06-24T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              Unknown
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://www.thatdot.com/blog/reducing-false-positive-alerts-with-contextual-anomaly-detection/"/>
        <id>https://www.thatdot.com/blog/reducing-false-positive-alerts-with-contextual-anomaly-detection/</id>
        
        <content type="html" xml:base="https://www.thatdot.com/blog/reducing-false-positive-alerts-with-contextual-anomaly-detection/">&lt;h3 id=&quot;too-many-false-positives&quot;&gt;&lt;strong&gt;Too many false positives!&lt;&#x2F;strong&gt;&lt;&#x2F;h3&gt;
&lt;p&gt;Traditionally, monitoring alerts are produced comparing metrics against thresholds to identify behavior outside the norm. This approach of metrics-based alert definitions often generates too many false positives that lead to wasted human time and effort or worse yet, loss of confidence and ignoring alerts as general practice!&lt;&#x2F;p&gt;
&lt;p&gt;Efforts to improve alert quality typically lead to devising more granular alerts. This approach leads to improved alerting for specific conditions, but introduces significant complexity in alert definitions and their associated maintenance as dimensionality increases. Machine Learning approaches often crumble under the same “curse of dimensionality” that humans feel: when looking at hundreds of alerts no person or machine can find the true anomalies. Dynamic threshold definitions that accommodate historically observed trends such as time-of-day or seasonal variations are helpful, but still limit us to looking for the problems &lt;em&gt;we know to expect&lt;&#x2F;em&gt;.&lt;&#x2F;p&gt;
&lt;p&gt;What we all want are high-confidence alerts that identify truly anomalous events as they occur in real-time, from a system that learns and adapts to our data immediately.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;A New Approach: Use Categorical Data&lt;&#x2F;strong&gt;&lt;&#x2F;p&gt;
&lt;p&gt;Categorical data is composed of the strings of information included in our logs and events: file names, IP addresses, HTTP status codes, geographical information, etc. Including categorical data in our monitoring analysis provides a greatly expanded context from which to evaluate application and network performance logs. As much as 80% of the information in our logs and events is categorical data. Why not include it in our monitoring? Doing so let’s us reduce the false positives that often overwhelm the people monitoring these systems, and also let’s us explain &lt;em&gt;WHY&lt;&#x2F;em&gt; and alert was generated.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Not Everything New Is Anomalous&lt;&#x2F;strong&gt;&lt;&#x2F;p&gt;
&lt;p&gt;The additional context gained by incorporating categorical dimensions of data provides a significant benefit in rapidly identifying unique data, identified as having high “surprise” value in our system, as well as recognizing anomalous data as separate from unique values. thatDot Novelty Detector learns a fingerprint for the data it observes, so that it can tell when “new” is actually just “normal”.&lt;&#x2F;p&gt;
&lt;p&gt;High cardinality is a normally expected condition of many data types. User agents, IP addresses, and file names, are all examples of data that can have many values. Shown below are two examples that illustrate the value of context for differentiating unique vs anomalous data.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;uploads-ssl.webflow.com&#x2F;61f0aecf55af2565526f6a95&#x2F;61f2cc4ac7838611b78f2c6a_high_surprise_high_novelty.png&quot; alt=&quot;A view of scatter plot and violin graph for Novelty Detector.&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;The above example shows the identification of a highly unique observation in a CDN log monitoring data set. To scatter plot of the data uses color to indicate the “surprise” or uniqueness of each observation, while the left hand scale of the scatter plot indicates thatDot’s anomaly score for each observation. The tree to the right is from thatDot Exploration UI and shows the context of the observation. It has both high Surprise and Anomaly scores, being the first observation of the FUJIFILM ISP out of 800,294 observations.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;uploads-ssl.webflow.com&#x2F;61f0aecf55af2565526f6a95&#x2F;61f2cc4a820a43300195e090_high_surprise_low_novelty.png&quot; alt=&quot;A view of scatter plot and violin graph for Novelty Detector.&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;In this second example we see an observation in the scatter plot that is yellow indicating high “surprise” or uniqueness, but this observation receives a low anomaly score from thatDot. thatDot’s Exploration UI tree shows that observing a unique Server IP value under the Spectrum ISP is not anomalous, despite this IP being seen for the first time, as the context of previous data has taught the system that new client IP values are a usual occurrence for the Spectrum ISP.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Alerts With Fewer False Positives&lt;&#x2F;strong&gt;&lt;&#x2F;p&gt;
&lt;p&gt;Utilizing the additional context provided by including categorical data in our anomaly detection can significantly improve the quality of our alerting. When we have high confidence in our ability to identify the real signal-from-the-noise users save the time they historically spent chasing false positives, and they get back time to build more automation into our support processes.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;thatDot Novelty&lt;&#x2F;strong&gt;&lt;&#x2F;p&gt;
&lt;p&gt;thatDot Novelty is the first general-use application designed for finding anomalies in real-time in data sets that include categorical data. Available as an application for deployment in any cloud or data center thatDot Novelty exposes an API that scores submitted observations for their “novelty” enabling real-time anomaly detention with fewer false positives than traditional threshold based metric analysis. Read more about Novelty and access the &lt;a href=&quot;https:&#x2F;&#x2F;www.thatdot.com&#x2F;getting-started&#x2F;&quot;&gt;Novelty free trial here&lt;&#x2F;a&gt;.&lt;&#x2F;p&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>Where Quine Streaming Graph Fits In Kafka-Based Data Pipelines</title>
        <published>2022-06-22T00:00:00+00:00</published>
        <updated>2022-06-22T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              Unknown
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://www.thatdot.com/blog/quine-streaming-graph-is-a-natural-fit-for-kafka-pipelines/"/>
        <id>https://www.thatdot.com/blog/quine-streaming-graph-is-a-natural-fit-for-kafka-pipelines/</id>
        
        <content type="html" xml:base="https://www.thatdot.com/blog/quine-streaming-graph-is-a-natural-fit-for-kafka-pipelines/">&lt;h2 id=&quot;&quot;&gt;&lt;&#x2F;h2&gt;
&lt;h2 id=&quot;the-answer-to-the-common-question-what-is-quine-or-streaming-graph&quot;&gt;&lt;strong&gt;The Answer to the Common Question: What is Quine or Streaming Graph?&lt;&#x2F;strong&gt;&lt;&#x2F;h2&gt;
&lt;p&gt;&lt;em&gt;“Quine is a real-time streaming graph that fits perfectly between two Kafka instances.”&lt;&#x2F;em&gt;&lt;&#x2F;p&gt;
&lt;p&gt;This is the most common answer I give whenever a data engineer asks “What is Quine?” As an answer, it works remarkably well. The reason it works is simple: everyone knows what Kafka does, even if they don’t run it themselves in production (which is rare).&lt;&#x2F;p&gt;
&lt;p&gt;It is also a heck of a lot  pithier than:&lt;&#x2F;p&gt;
&lt;p&gt;&lt;em&gt;“Quine is an open source stream processing application with a graph data model designed to ingest high volumes of event data from sources like Kafka or Kinesis and process them in real time using Cypher or Gremlin. The results of those queries can then be used to update the graph itself, can be stored in another database or data warehouse, or can be output back into the Kafka or Kinesis-based data pipeline. thatDot Streaming Graph is the commercial distributed high scale version.&quot;&lt;&#x2F;em&gt;&lt;&#x2F;p&gt;
&lt;p&gt;While accurate, this isn’t exactly a conversation starter like the shorter description.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;inline-graph-analytics&quot;&gt;&lt;strong&gt;Inline Graph Analytics&lt;&#x2F;strong&gt;&lt;&#x2F;h2&gt;
&lt;p&gt;One of the first “a-ha’s” when we talk to data engineers operating real-time data pipelines is that while Quine shares much in common with graph databases (data is represented as nodes and edges, nodes have properties, and you can query it using the two most common graph query languages), it is radically different in one specific way: it runs inline with your stream, becoming another part of the data pipeline.&lt;&#x2F;p&gt;
&lt;p&gt;Unlike graph databases, which are static stores accumulating data and are therefore essentially an off-ramp from the data stream, Quine doesn’t divert the flow of data through the system.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;lh7-us.googleusercontent.com&#x2F;docsz&#x2F;AD_4nXcujMUZGnshX2RoO5VJ8uiCuzYg7lvrIlsORUCztheOoifYOS0RgFZP1bj0_NNyC8AAYqaVKZoikDzUJbj2nA1gbYLQp1F3x1hnJne9pHykKCduRqSkHxYpYzaXlZt7JZdQ8c4_soMaCZRUUvGAVfVYQv8?key=fsCB33Ra70Kf2U63sNrqrw&quot; alt=&quot;Kafka example for Graph DB&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;&lt;em&gt;Graph databases cannot process data inline in real time.&lt;&#x2F;em&gt;&lt;&#x2F;p&gt;
&lt;p&gt;This is not meant to be a slight on graph databases. They just weren’t built from the ground up to exist inline with Kafka-driven data streams.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;lh7-us.googleusercontent.com&#x2F;docsz&#x2F;AD_4nXf9k5x9QsUNhbpma7sELRAw88hQl4_TRbfRJFvqpEdzO-y4U1-kP4VvzsRUEHFXogvtbUDkWsTezBDdvtWYBDu6AsYLsMv9c_xMmI0QZm5L-id51qCbRgoRIyH7IVnfi01hLBL3E05k6K66v7Tnr2sy9iI?key=fsCB33Ra70Kf2U63sNrqrw&quot; alt=&quot;Kafka to Quine to Kafka&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;&lt;em&gt;Quine runs inline with the data flow to process data into a real-time graph .&lt;&#x2F;em&gt;&lt;&#x2F;p&gt;
&lt;h2 id=&quot;ingesting-kafka-data-to-build-a-real-time-streaming-graph&quot;&gt;&lt;strong&gt;Ingesting Kafka data to build a Real-time Streaming Graph&lt;&#x2F;strong&gt;&lt;&#x2F;h2&gt;
&lt;p&gt;As the diagram above implies, Quine ingests data from Kafka and turns it into a dynamic streaming graph. In the &lt;a href=&quot;https:&#x2F;&#x2F;www.thatdot.com&#x2F;blog&#x2F;real-time-graph-analytics-for-kafka-streams-with-quine&#x2F;&quot;&gt;&lt;strong&gt;fifth installment of his Ingesting Data into Quine&lt;&#x2F;strong&gt;&lt;&#x2F;a&gt; blog series, Michael Aglietti covers the how’s and why’s in detail so I won’t delve too much deeper here.&lt;&#x2F;p&gt;
&lt;p&gt;I will make a few points:&lt;&#x2F;p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Streaming Graph scales with Kafka&lt;&#x2F;strong&gt; – Quine is designed to process streaming data and turn it into a graph without slowing down the flow of data through the system. A single node of Quine can ingest and process &lt;a href=&quot;https:&#x2F;&#x2F;www.thatdot.com&#x2F;blog&#x2F;linear-scaling-to-1-1-trillion-monthly-log-events-in-thatdots-streaming-graph&#x2F;&quot;&gt;&lt;strong&gt;thousands of events per second&lt;&#x2F;strong&gt;&lt;&#x2F;a&gt; hosted on a commodity server. A &lt;strong&gt;thatDot Streaming Graph cluster&lt;&#x2F;strong&gt; can process millions of events per second with tens of thousands of simultaneous queries.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Quine takes advantage of Kafka’s ability to regulate the stream&lt;&#x2F;strong&gt; – as anyone who has operated a production system knows, things don’t always go smoothly. Perhaps a host in a Streaming Graph cluster fails and throughput slows as a hot spare comes online. In that case, Streaming Graph counts on Kafka’s ability to handle back pressure. But that’s not all. Streaming Graph itself is also back-pressured.  If Quine or Streaming Graph is busy with a resource-intensive task downstream, or possibly waiting for the durable storage to finish processing, it will back pressure the ingest stream so that it does not overwhelm other components.&lt;&#x2F;li&gt;
&lt;&#x2F;ol&gt;
&lt;h2 id=&quot;inline-means-not-just-ingest-but-output&quot;&gt;&lt;strong&gt;Inline means Not Just Ingest but Output&lt;&#x2F;strong&gt;&lt;&#x2F;h2&gt;
&lt;p&gt;If Quine were just a highly write-optimized graph data processor, it would be pretty remarkable. But &lt;em&gt;inline&lt;&#x2F;em&gt; means keeping the data flowing  through the pipeline. It means Quine is not just a sink in Kafka terms but a high-velocity source. And this is where Quine is truly unique.&lt;&#x2F;p&gt;
&lt;p&gt;If ingest streams represent the &lt;em&gt;sink&lt;&#x2F;em&gt; side of Quine, standing queries turn Quine into a source.&lt;&#x2F;p&gt;
&lt;p&gt;The way standing queries work is that they persist at all times on all nodes in the graph, accumulating partial matches as data flows through and triggering an action when a complete match is made. Think of them as a net you stretch across the data stream that is designed to catch only specific data patterns.&lt;&#x2F;p&gt;
&lt;p&gt;Once a match is made, the standing query triggers an action which can include executing an arbitrary piece of code, updating the graph itself,  writing the results out to a database, or publishing data right back out to a Kafka topic.&lt;&#x2F;p&gt;
&lt;p&gt;And it can do this all with sub-millisecond latency.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;em&gt;It’s at this point in a call that the record scratch sound effect interrupts the conversation and one of the engineers on the call is like, “Hold up….I don’t believe you.” (If “Quine lives between two Kafka streams” is what we repeat most often on calls, “I don’t believe you” is what engineers we are talking to most often say.)&lt;&#x2F;em&gt;&lt;&#x2F;p&gt;
&lt;p&gt;Standing queries turn the whole idea of querying a database on its head. They are far more equivalent to the continuous queries of an event stream processor. They work because Quine is built on an asynchronous actor model. That is, every node in the graph also has an actor associated with it capable of performing discrete compute tasks and sending messages to other nodes. The &lt;a href=&quot;&#x2F;img&#x2F;2024&#x2F;08&#x2F;White-Paper-Technical-Quine.pdf&quot;&gt;&lt;strong&gt;Quine technical white paper&lt;&#x2F;strong&gt;&lt;&#x2F;a&gt; digs into this all in depth if you are interested. What is important about standing queries is they allow Quine and Streaming Graph to not just ingest high volumes of data but process the data and then send it out to continue its journey through the data pipeline. No off-ramps. No slow downs.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;when-real-time-really-does-mean-real-time&quot;&gt;&lt;strong&gt;When real-time really does mean real-time&lt;&#x2F;strong&gt;&lt;&#x2F;h2&gt;
&lt;p&gt;By way of conclusion, let’s revisit the statement that kicked off this post:  “Quine is a real-time streaming graph that lives between two Kafka instances.”&lt;&#x2F;p&gt;
&lt;p&gt;Graph analysis is incredibly powerful, especially when it comes to maximizing the value of categorical data. Graphs allow you to express relationships between objects in a direct and natural way that is both human readable and performant. Use cases like XDR, financial fraud detection, authentication attacks, insider trading prevention, or network observability and root cause analysis, would all benefit tremendously if they could apply a graph model to their data.&lt;&#x2F;p&gt;
&lt;p&gt;So why don’t they? The single biggest reason – and another thing we hear on calls all the time – is that graph databases can’t process the data fast enough. People end up batch processing data, which is the opposite of real time.&lt;&#x2F;p&gt;
&lt;p&gt;Quine is real-time graph processing that sits inline with your Kafka-based data pipeline and detects complex patterns the instant they emerge. Drop Quine in between two Kafka instances and you will discover a whole new dimension to your data.&lt;&#x2F;p&gt;
&lt;p&gt;And if you don’t believe me, that’s okay. We’re used to it.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;next-steps-and-further-reading&quot;&gt;&lt;strong&gt;Next Steps and Further Reading&lt;&#x2F;strong&gt;&lt;&#x2F;h2&gt;
&lt;p&gt;thatDot Streaming Graph is the commercial, distributed cluster scale version. Try it out in the &lt;a href=&quot;https:&#x2F;&#x2F;www.thatdot.com&#x2F;getting-started&#x2F;&quot;&gt;&lt;strong&gt;Free Trial&lt;&#x2F;strong&gt;&lt;&#x2F;a&gt;.&lt;&#x2F;p&gt;
&lt;p&gt;Quine is open source if you want to try it for yourself. Download a precompiled version or build it yourself from the codebase &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;thatdot&#x2F;quine&quot;&gt;&lt;strong&gt;Quine Github&lt;&#x2F;strong&gt;&lt;&#x2F;a&gt;.&lt;&#x2F;p&gt;
&lt;p&gt;Or  drop into the &lt;strong&gt;&lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;that.re&#x2F;chat&quot;&gt;Quine Discord Community&lt;&#x2F;a&gt;&lt;&#x2F;strong&gt;. We&#x27;re always happy to discuss Quine or answer questions.&lt;&#x2F;p&gt;
&lt;p&gt;And if you have a question, suggestion, or improvement, &lt;strong&gt;&lt;a href=&quot;https:&#x2F;&#x2F;www.thatdot.com&#x2F;contact&#x2F;&quot;&gt;Contact Us&lt;&#x2F;a&gt;.&lt;&#x2F;strong&gt;&lt;&#x2F;p&gt;
&lt;p&gt;And if you’re interested in learning more about building a streaming graph from various ingest sources., check out previous installments in this blog series:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https:&#x2F;&#x2F;www.thatdot.com&#x2F;blog&#x2F;real-time-graph-analytics-for-kafka-streams-with-quine&#x2F;&quot;&gt;&lt;strong&gt;Real-time Graph Analytics for Kafka Streams with Quine&lt;&#x2F;strong&gt;&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a href=&quot;https:&#x2F;&#x2F;www.thatdot.com&#x2F;blog&#x2F;building-a-quine-streaming-graph-ingest-streams&#x2F;&quot;&gt;&lt;strong&gt;Building a Quine Streaming Graph: Ingest Streams&lt;&#x2F;strong&gt;&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a href=&quot;https:&#x2F;&#x2F;www.thatdot.com&#x2F;blog&#x2F;ingesting-data-from-the-internet&#x2F;&quot;&gt;&lt;strong&gt;Ingesting data from the internet into Quine Streaming Graph&lt;&#x2F;strong&gt;&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a href=&quot;https:&#x2F;&#x2F;www.thatdot.com&#x2F;blog&#x2F;ingesting-from-multiple-data-sources-into-quine-streaming-graph&#x2F;&quot;&gt;&lt;strong&gt;Ingesting From Multiple Data Sources into Quine Streaming Graph&lt;&#x2F;strong&gt;&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a href=&quot;https:&#x2F;&#x2F;www.thatdot.com&#x2F;blog&#x2F;ingest-and-analyze-log-files-using-streaming-graph&#x2F;&quot;&gt;&lt;strong&gt;Ingest and Analyze Log Files Using Streaming Graph&lt;&#x2F;strong&gt;&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>Ingest How-To: Real-time Graph Analytics for Kafka Streams with Quine</title>
        <published>2022-06-20T00:00:00+00:00</published>
        <updated>2022-06-20T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              Unknown
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://www.thatdot.com/blog/real-time-graph-analytics-for-kafka-streams-with-quine/"/>
        <id>https://www.thatdot.com/blog/real-time-graph-analytics-for-kafka-streams-with-quine/</id>
        
        <content type="html" xml:base="https://www.thatdot.com/blog/real-time-graph-analytics-for-kafka-streams-with-quine/">&lt;h2 id=&quot;quine-adds-real-time-etl-for-kafka-based-event-streams&quot;&gt;Quine adds Real-time ETL for Kafka-based Event Streams&lt;&#x2F;h2&gt;
&lt;p&gt;Kafka is the tool of choice for data engineers when building streaming data pipelines. Adding Quine into a Kafka-centric data pipeline is the perfect way to introduce streaming analytics to the mix. Adding business logic directly into an event pipeline allows you to process high-value insights in real time. Quine also allows you to add processing of &lt;a href=&quot;https:&#x2F;&#x2F;www.thatdot.com&#x2F;blog&#x2F;whats-the-difference-between-categorical-and-numerical-data&#x2F;&quot;&gt;categorical data&lt;&#x2F;a&gt;, which makes up a vast majority of the data your business generates, yet is often overlooked or discarded.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;simple-streaming-pipeline-for-etl&quot;&gt;Simple Streaming Pipeline for ETL&lt;&#x2F;h2&gt;
&lt;p&gt;Consider this straightforward, minimum viable streaming pipeline.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;uploads-ssl.webflow.com&#x2F;61f0aecf55af2565526f6a95&#x2F;62ae6b5d9b6a1e70da060590_Ingest%205%20-%20Kafka%20Image%201.png&quot; alt=&quot;&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;A simple streaming pipeline with Quine ingesting Kafka streaming data&lt;&#x2F;p&gt;
&lt;p&gt;In this simple pipeline, &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;vector.dev&#x2F;&quot;&gt;Vector&lt;&#x2F;a&gt; will produce events (&lt;code&gt;dummy_log&lt;&#x2F;code&gt; lines) once a second and stream them into a Kafka topic (&lt;code&gt;demo-logs&lt;&#x2F;code&gt;) where an ingest stream from Quine will transform the log events into a streaming graph.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;setting-up-vector&quot;&gt;Setting up Vector&lt;&#x2F;h2&gt;
&lt;p&gt;Start by installing Vector in your environment. My examples use macOS and may need slight modifications to work correctly in your environment. I installed Vector with &lt;code&gt;brew install vector&lt;&#x2F;code&gt;, which includes a sample &lt;code&gt;Vector.toml&lt;&#x2F;code&gt; config in &lt;code&gt;&#x2F;opt&#x2F;homebrew&#x2F;etc&#x2F;vector&lt;&#x2F;code&gt;. I extended the sample Vector config to build our pipeline.&lt;&#x2F;p&gt;
&lt;p&gt;Run Vector to get a feel for the events that Vector emits.&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;❯ vector -c &#x2F;opt&#x2F;homebrew&#x2F;etc&#x2F;vector&#x2F;vector.toml&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Vector generates dummy log lines from a built-in &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;vector.dev&#x2F;docs&#x2F;reference&#x2F;configuration&#x2F;sources&#x2F;demo_logs&#x2F;&quot;&gt;demo_logs&lt;&#x2F;a&gt; source. The log lines are transformed in Vector using the &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;vector.dev&#x2F;docs&#x2F;reference&#x2F;vrl&#x2F;functions&#x2F;#parse_syslog&quot;&gt;parse_syslog&lt;&#x2F;a&gt; and emit a JSON object.&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;{&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;   &amp;quot;appname&amp;quot;: &amp;quot;Karimmove&amp;quot;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;   &amp;quot;facility&amp;quot;: &amp;quot;lpr&amp;quot;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;   &amp;quot;hostname&amp;quot;: &amp;quot;some.com&amp;quot;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;   &amp;quot;message&amp;quot;: &amp;quot;Take a breath, let it go, walk away&amp;quot;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;   &amp;quot;msgid&amp;quot;: &amp;quot;ID416&amp;quot;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;   &amp;quot;procid&amp;quot;: 9207,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;   &amp;quot;severity&amp;quot;: &amp;quot;debug&amp;quot;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;   &amp;quot;timestamp&amp;quot;: &amp;quot;2022-06-14T15:34:11.936Z&amp;quot;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;   &amp;quot;version&amp;quot;: 2&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;}&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Once Vector is emitting log entries, we need to connect that output to Kafka by adding in a &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;vector.dev&#x2F;docs&#x2F;reference&#x2F;configuration&#x2F;sinks&#x2F;kafka&#x2F;&quot;&gt;Kafka sink&lt;&#x2F;a&gt; element into the &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;gist.github.com&#x2F;maglietti&#x2F;abc26bb47c40940fb0b47ed37bed2c85&quot;&gt;Vector.toml&lt;&#x2F;a&gt; file.&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;# Stream parsed logs to kafka&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;[sinks.to_kafka]&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;type = &amp;quot;kafka&amp;quot;&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;inputs = [ &amp;quot;parse_logs&amp;quot; ]&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;bootstrap_servers = &amp;quot;127.0.0.1:9092&amp;quot;&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;key_field = &amp;quot;quine&amp;quot;&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;topic = &amp;quot;demo-logs&amp;quot;&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;encoding = &amp;quot;json&amp;quot;&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;compression = &amp;quot;none&amp;quot;&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;&lt;h3 id=&quot;local-kafka-instance-to-use-with-quine&quot;&gt;Local Kafka Instance to use with Quine&lt;&#x2F;h3&gt;
&lt;p&gt;Kafka is the next step in the pipeline. I set up a single node Kafka cluster in Docker. There are more than enough examples on the internet of how to set up a Kafka cluster in Docker, and please set up the cluster in a way that fits your environment. My cluster uses a &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;gist.github.com&#x2F;maglietti&#x2F;03c09030feae1329950a3a1db2ed8fd8&quot;&gt;docker-compose&lt;&#x2F;a&gt; file that launches version 7.1.1 of Zookeeper and Kafka containers.&lt;&#x2F;p&gt;
&lt;p&gt;Start the Kafka cluster and create a topic called demo-logs.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Note&lt;&#x2F;strong&gt;&lt;br &#x2F;&gt;
I had to run the docker compose up command a couple of times before both the Zookeeper and Kafka containers launched cleanly. Make sure the containers fully load at least once before including the &lt;code&gt;-d&lt;&#x2F;code&gt; option to run them in detached mode.&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;❯ Docker compose up -d&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;❯ docker exec Kafka Kafka-topics --bootstrap-server kafka:9092 --create --topic demo-logs&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Use &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;edenhill&#x2F;kcat&quot;&gt;kcat&lt;&#x2F;a&gt; to verify the Kafka cluster is up and that the &lt;code&gt;demo-logs&lt;&#x2F;code&gt; topic was configured.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;quine-config&quot;&gt;Quine Config&lt;&#x2F;h3&gt;
&lt;p&gt;Ok, let&#x27;s get Quine configured and ready to receive the log events from Kafka via an ingest stream. We can start with a simple ingest stream that takes each demo log line and creates a node.&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;ingestStreams:&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  - type: KafkaIngest&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    topics:&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;      - demo-logs&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    bootstrapServers: localhost:9092&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    format:&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;      type: CypherJson&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;      query: |-&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        MATCH (n)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        WHERE id(n) = idFrom($that)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        SET n.line = $that&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;&lt;h3 id=&quot;launch-the-pipeline&quot;&gt;Launch the Pipeline&lt;&#x2F;h3&gt;
&lt;p&gt;Let&#x27;s launch Vector and Quine to get the pipeline moving.&lt;&#x2F;p&gt;
&lt;p&gt;Launch Vector using the modified &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;gist.github.com&#x2F;maglietti&#x2F;abc26bb47c40940fb0b47ed37bed2c85&quot;&gt;vector.toml&lt;&#x2F;a&gt; configuration.&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;❯ vector -c vector.toml&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Launch Quine by running the &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;quine.io&#x2F;recipes&quot;&gt;Kafka Pipeline&lt;&#x2F;a&gt; recipe.&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;❯ java -jar quine-x.x.x -r kafka_pipeline.yaml&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;And verify that we see nodes generated in Quine.&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;Quine app web server available at http:&#x2F;&#x2F;0.0.0.0:8080&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;| =&amp;gt; INGEST-1 status is running and ingested 18&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Congratulations! 🎉 Your pipeline is operating!&lt;&#x2F;p&gt;
&lt;h2 id=&quot;improving-the-ingest-query&quot;&gt;Improving the Ingest Query&lt;&#x2F;h2&gt;
&lt;p&gt;The ingest query that I started with is pretty basic. Using &lt;code&gt;CALL recentNodes(1)&lt;&#x2F;code&gt;, let&#x27;s take a look at the newest node in the graph and see what the query produced.&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;❯ ## Get Latest Node&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;curl -s -X &amp;quot;POST&amp;quot; &amp;quot;http:&#x2F;&#x2F;0.0.0.0:8080&#x2F;api&#x2F;v1&#x2F;query&#x2F;cypher&amp;quot; \&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;     -H &amp;#39;Content-Type: text&#x2F;plain&amp;#39; \&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;     -d &amp;quot;CALL recentNodes(1)&amp;quot; \&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;| jq &amp;#39;.&amp;#39;&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;{&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  &amp;quot;columns&amp;quot;: [&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    &amp;quot;node&amp;quot;&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  ],&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  &amp;quot;results&amp;quot;: [&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    [&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;      {&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        &amp;quot;id&amp;quot;: &amp;quot;9fde7ef4-c5ec-35f1-ae5f-619bd9ab7d5c&amp;quot;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        &amp;quot;labels&amp;quot;: [],&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        &amp;quot;properties&amp;quot;: {&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;          &amp;quot;line&amp;quot;: {&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;            &amp;quot;appname&amp;quot;: &amp;quot;benefritz&amp;quot;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;            &amp;quot;facility&amp;quot;: &amp;quot;uucp&amp;quot;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;            &amp;quot;hostname&amp;quot;: &amp;quot;make.de&amp;quot;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;            &amp;quot;message&amp;quot;: &amp;quot;#hugops to everyone who has to deal with this&amp;quot;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;            &amp;quot;msgid&amp;quot;: &amp;quot;ID873&amp;quot;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;            &amp;quot;procid&amp;quot;: 871,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;            &amp;quot;severity&amp;quot;: &amp;quot;emerg&amp;quot;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;            &amp;quot;timestamp&amp;quot;: &amp;quot;2022-06-14T19:58:16.463Z&amp;quot;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;            &amp;quot;version&amp;quot;: 1&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;          }&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        }&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;      }&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    ]&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  ]&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;}&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;The ingest query creates nodes using &lt;code&gt;idFrom()&lt;&#x2F;code&gt;, populated them with the properties that it received from Kafka, and didn&#x27;t create any relationships. We can make this node more useful by giving it a label and removing parameters that are not interesting to us. Additionally, using &lt;code&gt;reify.time()&lt;&#x2F;code&gt;, I can associate the node with a &lt;code&gt;timeNode&lt;&#x2F;code&gt; to stitch together events that occur across the network in time.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;analyzing-the-sample-data&quot;&gt;Analyzing the sample data&lt;&#x2F;h3&gt;
&lt;p&gt;Quine has a web-based graph explorer that really comes to life once you have a handle on the shape of the streaming data. But I am starting from the beginning with a bare-bones recipe. For me, when I start pulling apart a stream of data, I find that using the API to ask a few analytical questions serves me well.&lt;&#x2F;p&gt;
&lt;p&gt;I&#x27;ll use the &lt;code&gt;&#x2F;query&#x2F;cypher&lt;&#x2F;code&gt; endpoint to get a feel for the shape of the sample data streaming from Kafka. I don&#x27;t recommend doing a full node scan on a mature streaming graph, but my streaming graph is still young and small.&lt;&#x2F;p&gt;
&lt;p&gt;Using my REST API client of choice, I POST a Cypher query that returns the metrics (counts) for parameters that are interesting.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;uploads-ssl.webflow.com&#x2F;61f0aecf55af2565526f6a95&#x2F;62ae70ea76a60d6ce4394d85_%20Post%20Cypher%20to%20Quine%20Kafka%20Ingest%20.png&quot; alt=&quot;Using my REST API client of choice, I POST a Cypher query that returns the metrics (counts) for parameters that are interesting.&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;‍&lt;br &#x2F;&gt;
That&#x27;s a lot of JSON results to review; let&#x27;s take this over to a Jupyter Notebook to continue the analysis. My REST API client includes a Python snip-it tool that makes it really easy to move directly into code without having to start from scratch.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;uploads-ssl.webflow.com&#x2F;61f0aecf55af2565526f6a95&#x2F;62ae70d4d2f730145b6b0bcc_Quine%20Kafka%20Python%20Ingest.png&quot; alt=&quot;My REST API client includes a Python snip-it tool that makes it really easy to move directly into code without having to start from scratch.&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;In &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;gist.github.com&#x2F;maglietti&#x2F;4fdbc681d703490811a8988e16b08b3f&quot;&gt;Jupyter&lt;&#x2F;a&gt;, within a few cells, I had the JSON response data loaded into a Pandas DataFrame and an easy to review textual visualization of what the sample data contains.&lt;&#x2F;p&gt;
&lt;p&gt;I let the pipeline run while I developed simple visualizations of the metrics. Right away, I could see that the sample data Vector produces is random and uniformly distributed across all of the parameters in the graph. And after 15000 log lines, the sample generation exhausted all permutations of the data.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;uploads-ssl.webflow.com&#x2F;61f0aecf55af2565526f6a95&#x2F;62ae70390eca250e373b4403_Kafka%20Quine%20Ingest%20Graph%20Distribution%20of%20Events.png&quot; alt=&quot;&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;h2 id=&quot;conclusions-and-next-steps&quot;&gt;Conclusions and Next Steps&lt;&#x2F;h2&gt;
&lt;p&gt;I learned a lot about streaming data while setting up this pipeline. Vector is a great tool that allows you to stream log files into Kafka for analysis. Add a Quine instance on the other side of Kafka, and you are able to perform streaming analytics inside a streaming graph using standing queries.&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;Use the same workflow to develop an understanding of streaming data that you do for data at rest&lt;&#x2F;li&gt;
&lt;li&gt;Perform streaming analysis by connecting Quine to your Kafka cluster&lt;&#x2F;li&gt;
&lt;li&gt;Use Cypher ingest queries to form the graph within a Quine ingest stream.&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;Quine is open source if you want to run this analysis for yourself. Download a precompiled version or build it yourself from the codebase &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;thatdot&#x2F;quine&quot;&gt;Quine Github&lt;&#x2F;a&gt;. I published the recipe that I developed at &lt;code&gt;https:&#x2F;&#x2F;quine.io&#x2F;recipes&lt;&#x2F;code&gt;.&lt;&#x2F;p&gt;
&lt;p&gt;Have a question, suggestion, or improvement? I welcome your feedback! Please drop into &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;that.re&#x2F;quine-slack&quot;&gt;Quine Slack&lt;&#x2F;a&gt; and let me know. I&#x27;m always happy to discuss Quine or answer questions.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;further-reading&quot;&gt;Further Reading&lt;&#x2F;h3&gt;
&lt;p&gt;And if you&#x27;re interested in learning more about building a streaming graph from various ingest sources., check out previous installments in this blog series:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https:&#x2F;&#x2F;www.thatdot.com&#x2F;blog&#x2F;building-a-quine-streaming-graph-ingest-streams&#x2F;&quot;&gt;Building a Quine Streaming Graph: Ingest Streams&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a href=&quot;https:&#x2F;&#x2F;www.thatdot.com&#x2F;blog&#x2F;ingesting-data-from-the-internet&#x2F;&quot;&gt;Ingesting data from the internet into Quine Streaming Graph&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a href=&quot;https:&#x2F;&#x2F;www.thatdot.com&#x2F;blog&#x2F;ingesting-from-multiple-data-sources-into-quine-streaming-graph&#x2F;&quot;&gt;Ingesting From Multiple Data Sources into Quine Streaming Graph&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a href=&quot;https:&#x2F;&#x2F;www.thatdot.com&#x2F;blog&#x2F;ingest-and-analyze-log-files-using-streaming-graph&#x2F;&quot;&gt;Ingest and Analyze Log Files Using Streaming Graph&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;‍&lt;&#x2F;p&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>Modernizing ETL For Cloud</title>
        <published>2022-06-13T00:00:00+00:00</published>
        <updated>2022-06-13T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              Unknown
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://www.thatdot.com/blog/modernizing-etl-for-cloud/"/>
        <id>https://www.thatdot.com/blog/modernizing-etl-for-cloud/</id>
        
        <content type="html" xml:base="https://www.thatdot.com/blog/modernizing-etl-for-cloud/">&lt;h2 id=&quot;quine-streaming-graph-a-new-approach-to-etl-for-cloud&quot;&gt;Quine Streaming Graph: A New Approach to ETL for Cloud&lt;&#x2F;h2&gt;
&lt;p&gt;Cloud architectures enable and encourage a new level of integration with 3rd party systems and data sources to deliver the enriched and personalized services our users and customers are looking for. Today’s data-driven services place significant new demands on our data pipelines, in terms of scale, agility and flexibility. Recent data pipeline evolution has focused on improving efficiency of existing ingestion workflows, but what we really need is to rethink the objective of data pipelines and let the needed form follow.&lt;&#x2F;p&gt;
&lt;p&gt;If our purpose is to drive event-driven architectures, train AI algorithms and filter big data for valuable data, then the real objective of a modern data pipeline is to assemble, distill and publish only the most relevant data needed to better inform and monitor our software infrastructure. We don’t want big lakes of data, we want small streams of high-value data.&lt;&#x2F;p&gt;
&lt;p&gt;Identifying, processing and packaging high-value data requires a lot from our data pipelines. Unlike ETL of the past, which operated within a limited and deterministic scope between a source and a sink, the cloud-era requires a much broader set of functions:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;Rapid adoption of an ever-growing set of unstandardized data sources&lt;&#x2F;li&gt;
&lt;li&gt;Accommodating a range of ingestion methods, including files, APIs, and webhooks&lt;&#x2F;li&gt;
&lt;li&gt;Ingestion of data from a global footprint of partners and sources&lt;&#x2F;li&gt;
&lt;li&gt;Producing custom formats of data for consumption by our applications&lt;&#x2F;li&gt;
&lt;li&gt;Real-time processing of data&lt;&#x2F;li&gt;
&lt;li&gt;Simple management of ongoing ingestion and publication changes&lt;&#x2F;li&gt;
&lt;li&gt;Ease of use in support of a widening audience of less technical data users&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;This is a big ask.&lt;&#x2F;p&gt;
&lt;p&gt;The recent trend towards Data Lakes, Data Warehouses, Data Lake Houses, etc. has solved for some inefficiencies in data pipeline processing by concentrating data operations to avoid duplication of effort and data storage. These solutions, however, do not remove the complexity of downstream processing that is needed to make our data more valuable in terms of timeliness, relevance or insight. Data lakes push data pipeline complexity “underwater”; they do not eliminate it.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;uploads-ssl.webflow.com&#x2F;61f0aecf55af2565526f6a95&#x2F;61f2c988cd3a4c7222a85f28_Datalakes_Do_Not_Solve_Complexity-296x300.png&quot; alt=&quot;Why Datalakes Do Not Solve Complexity&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;h5 id=&quot;data-lakes-move-data-pipeline-operational-complexity-underwater&quot;&gt;Data lakes move data pipeline operational complexity “underwater”&lt;&#x2F;h5&gt;
&lt;p&gt;Newer, real-time ETL solutions such as Apache Kafka combined with thatDot&#x27;s open source streaming graph &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;http:&#x2F;&#x2F;quine.io&quot;&gt;Quine&lt;&#x2F;a&gt;, however, promise a more “cloud-centric” approach to data pipeline engineering. These solutions combine multi-modal distributed data ingestion with real-time data transformation and computation, in the data ingestion process itself. The ability to operate on data as it is ingested provides significantly more efficient and simplified data operations, while expanding the range of functions available.&lt;&#x2F;p&gt;
&lt;p&gt;This approach of adding computation to data ingestion also brings a significant advantage in terms of distilling value from our data, turning big data into smart data, before it gets to our applications! This can be especially useful in use cases such as feeding data into ML&#x2F;AI solutions or for reducing data volume passed to downstream applications.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;uploads-ssl.webflow.com&#x2F;61f0aecf55af2565526f6a95&#x2F;62a7b63410c03b9542d80e69_ETL%20in%20the%20Cloud%20Quine.png&quot; alt=&quot;&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;h5 id=&quot;embedding-compute-with-ingestion-is-efficient-and-delivers-real-time-etl&quot;&gt;Embedding compute with ingestion is efficient and delivers real-time ETL&lt;&#x2F;h5&gt;
&lt;p&gt;The tight integration of data operations capabilities directly with data streams ingestion delivers the wide range of capabilities needed to deliver on our “modern data pipeline” requirements.&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Efficient&lt;&#x2F;strong&gt; – a single system to orchestrate global data ingestion, transformation and publication of data in real time, operated using common tooling and methodologies&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;“Cloud” Data Ingestion&lt;&#x2F;strong&gt; – graceful accommodation of API and webhook integrations, distributed data ingestion, per-source configurable ingestion adaptors&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Real-time ETL&lt;&#x2F;strong&gt; – data is operated upon as it is ingested, combined with historical data as needed from any time window, and directly published to downstream systems&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Out-Of-Order-Data-Handling –&lt;&#x2F;strong&gt; Data is processed correctly no matter what order, no matter when it comes in.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Event Multiplexing&lt;&#x2F;strong&gt; – Decompose strings, CSV and JSON data into atomic elements that can be individually transformed and reassembled into custom data for use by downstream services&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Customizable Publication&lt;&#x2F;strong&gt; – Extensible operation by individual work groups, allowing them to define data format and transformation operations with common tools&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Manageability &amp;amp; Usability&lt;&#x2F;strong&gt; – Cloud-friendly system deployment and management with common tools and methodologies, and a single system path to explore for debugging&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;It is fantastic when new technologies allow us to increase speed and function, while also reducing complexity. The combination of compute functions with data ingestion provides a new way to meet business requirements, bringing a new level of agility and efficiency to increasingly complex data pipelines.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;from-streaming-graph-theory-to-practice&quot;&gt;From Streaming Graph Theory to Practice&lt;&#x2F;h3&gt;
&lt;p&gt;We&#x27;ve published a series of how-to blogs that take you step-by-step through the ETL process using Quine&#x27;s ingest feature. Together with &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;docs.quine.io&#x2F;components&#x2F;ingest-sources&#x2F;ingest-sources.html#ingesting-event-driven-data&quot;&gt;Quine Docs,&lt;&#x2F;a&gt; these blogs will show you how to process high volumes of data with an intelligent, actor-based ETL system that can drive workflows.&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https:&#x2F;&#x2F;www.thatdot.com&#x2F;blog&#x2F;building-a-quine-streaming-graph-ingest-streams&#x2F;&quot;&gt;Building a Quine Streaming Graph: Ingest Streams&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a href=&quot;https:&#x2F;&#x2F;www.thatdot.com&#x2F;blog&#x2F;ingesting-data-from-the-internet&#x2F;&quot;&gt;Ingesting data from the internet into Quine Streaming Graph&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a href=&quot;https:&#x2F;&#x2F;www.thatdot.com&#x2F;blog&#x2F;ingesting-from-multiple-data-sources-into-quine-streaming-graph&#x2F;&quot;&gt;Ingesting From Multiple Data Sources into Quine Streaming Graph&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a href=&quot;https:&#x2F;&#x2F;www.thatdot.com&#x2F;blog&#x2F;ingest-and-analyze-log-files-using-streaming-graph&#x2F;&quot;&gt;Ingest and Analyze Log Files Using Streaming Graph&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;h3 id=&quot;next-steps&quot;&gt;Next Steps&lt;&#x2F;h3&gt;
&lt;p&gt;And if you want to try Quine yourself, you can &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;quine.io&#x2F;download&quot;&gt;download&lt;&#x2F;a&gt; it here. To get started, try  the &lt;a href=&quot;https:&#x2F;&#x2F;www.thatdot.com&#x2F;blog&#x2F;real-time-blockchain-monitoring-is-hard-and-your-database-is-the-reason&#x2F;&quot;&gt;Ethereum Blockchain Fraud Detection&lt;&#x2F;a&gt;, &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;quine.io&#x2F;recipes&#x2F;wikipedia-page-ingest&quot;&gt;Wikipedia Ingest&lt;&#x2F;a&gt; or  &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;quine.io&#x2F;recipes&#x2F;apache-log-analytics&quot;&gt;Apache Log Analytics&lt;&#x2F;a&gt; recipes for different ingest stream examples.&lt;&#x2F;p&gt;
&lt;p&gt;If you have questions or want to check out the community, &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;that.re&#x2F;quine-slack&quot;&gt;join Quine slack&lt;&#x2F;a&gt; or visit our &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;thatdot&#x2F;quine&quot;&gt;Github&lt;&#x2F;a&gt; page.&lt;&#x2F;p&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>Ingest and Analyze Log Files Using Streaming Graph</title>
        <published>2022-06-07T00:00:00+00:00</published>
        <updated>2022-06-07T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              Unknown
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://www.thatdot.com/blog/ingest-and-analyze-log-files-using-streaming-graph/"/>
        <id>https://www.thatdot.com/blog/ingest-and-analyze-log-files-using-streaming-graph/</id>
        
        <content type="html" xml:base="https://www.thatdot.com/blog/ingest-and-analyze-log-files-using-streaming-graph/">&lt;h2 id=&quot;processing-machine-logs-with-streaming-graph&quot;&gt;Processing Machine Logs with Streaming Graph&lt;&#x2F;h2&gt;
&lt;p&gt;You know we had to get here eventually. I&#x27;m looking into all of the ways that Quine can connect to and ingest streaming sources. Last time I covered &lt;a href=&quot;https:&#x2F;&#x2F;www.thatdot.com&#x2F;blog&#x2F;ingesting-from-multiple-data-sources-into-quine-streaming-graph&#x2F;&quot;&gt;ingest from multiple sources,&lt;&#x2F;a&gt; a Quine strength. Next up is my old friend, the log file.&lt;&#x2F;p&gt;
&lt;p&gt;Log files are a structured stream of parsable data using regular expressions. Log lines are emitted at all levels of an application. The challenge is that they are primarily islands of disconnected bits of the overall picture. Placed into a data pipeline, we can use Quine to combine different types of logs and use a standing query to match interesting patterns upstream of a log analytics solution like Splunk or Sumo Logic.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;log-line-structure&quot;&gt;Log Line Structure&lt;&#x2F;h2&gt;
&lt;p&gt;Processing log files can quickly become as messy as the log files themself. I think that it&#x27;s best to approach a log file like any other data source and take the time to understand the log line structure before asking any questions.&lt;&#x2F;p&gt;
&lt;p&gt;Quine is an application that produces log lines, and just like many other applications, the structure of the log lines follows a pattern. The logline pattern is defined in Scala, making it very easy for us to understand what the log line contains.&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;pattern = &amp;quot;%date %level [%mdc{akkaSource:-NotFromActor}] [%thread] %logger - %msg%n%ex&amp;quot;&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;&lt;h2 id=&quot;quine-log-regex&quot;&gt;Quine Log RegEx&lt;&#x2F;h2&gt;
&lt;p&gt;Each Quine log line was assembled using the pre-defined pattern. This presents a perfect opportunity to use a regular expression, reverse the pattern, and build a streaming graph.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;NOTE&lt;&#x2F;strong&gt;&lt;br &#x2F;&gt;
The regex link in the example below uses the log output from a Quine Enterprise cluster.&lt;br &#x2F;&gt;
Learn more about the Streaming Graph and&lt;br &#x2F;&gt;
other products created by thatDot.&lt;br &#x2F;&gt;
The regular expression will work for both Streaming Graph and Novelty.&lt;&#x2F;p&gt;
&lt;p&gt;I developed a &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;regex101.com&#x2F;r&#x2F;02qYsJ&#x2F;3&quot;&gt;regular expression&lt;&#x2F;a&gt; that reverses the logline and returns the log elements for use by the ingest stream ingest query. I also &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;quine.io&#x2F;recipes&#x2F;quine-logs-recipe&quot;&gt;published a recipe&lt;&#x2F;a&gt; that uses the regular expression to parse Quine log lines on Quine.io.&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;(^\d{4}-\d{2}-\d{2} \d{1,2}:\d{2}:\d{2},\d{3}) # date and time string &lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;(FATAL|ERROR|WARN|INFO|DEBUG)                  # log level&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;\[(\S*)\]                                      # actor address&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;\[(\S*)\]                                      # thread name&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;(\S*)                                          # logging class&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;-                                              # the log message&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;((?:(?!^[0-9]{4}(?:-[0-9]{2}){2}(?:[^|\r?\n]+){3}).*(?:\r?\n)?)+)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;&lt;h2 id=&quot;quine-log-ingest-stream&quot;&gt;Quine Log Ingest Stream&lt;&#x2F;h2&gt;
&lt;p&gt;In my previous article, I connected to a &lt;strong&gt;&lt;code&gt;CSV&lt;&#x2F;code&gt;&lt;&#x2F;strong&gt; file using the &lt;strong&gt;&lt;code&gt;CypherCsv FileIngest&lt;&#x2F;code&gt;&lt;&#x2F;strong&gt; format so that Quine could break the rows of data stored in the file back into columns. The &lt;strong&gt;&lt;code&gt;CypherLine FileIngest&lt;&#x2F;code&gt;&lt;&#x2F;strong&gt; format allows us to read each line into the &lt;code&gt;$that&lt;&#x2F;code&gt; variable and process it through a Cypher query.&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;ingestStreams:&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  - type: FileIngest&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    path: $in_file&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    format:&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;      type: CypherLine&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;      query: |-&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        &#x2F;&#x2F; Quine log pattern &amp;quot;%date %level [%mdc{akkaSource:-NotFromActor}] [%thread] %logger - %msg%n%ex&amp;quot;&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        WITH text.regexFirstMatch($that, &amp;quot;(^\\d{4}-\\d{2}-\\d{2} \\d{1,2}:\\d{2}:\\d{2},\\d{3}) (FATAL|ERROR|WARN|INFO|DEBUG) \\[(\\S*)\\] \\[(\\S*)\\] (\\S*) - (.*)&amp;quot;) as r &lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        WHERE r IS NOT NULL &lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        &#x2F;&#x2F; 0: whole matched line&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        &#x2F;&#x2F; 1: date time string&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        &#x2F;&#x2F; 2: log level&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        &#x2F;&#x2F; 3: actor address. Might be inside of `akka.stream.Log(…)`&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        &#x2F;&#x2F; 4: thread name&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        &#x2F;&#x2F; 5: logging class&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        &#x2F;&#x2F; 6: Message&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        WITH *, split(r[3], &amp;quot;&#x2F;&amp;quot;) as path, split(r[6], &amp;quot;(&amp;quot;) as msgPts&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        WITH *, replace(COALESCE(split(path[2], &amp;quot;@&amp;quot;)[-1], &amp;#39;No host&amp;#39;),&amp;quot;)&amp;quot;,&amp;quot;&amp;quot;) as qh&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        MATCH (actor), (msg), (class), (host)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        WHERE id(host)  = idFrom(&amp;quot;host&amp;quot;, qh)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;          AND id(actor) = idFrom(&amp;quot;actor&amp;quot;, r[3])&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;          AND id(msg)   = idFrom(&amp;quot;msg&amp;quot;, r[0])&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;          AND id(class) = idFrom(&amp;quot;class&amp;quot;, r[5])&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        SET host: Host, host.address = split(qh, &amp;quot;:&amp;quot;)[0], host.port = split(qh, &amp;quot;:&amp;quot;)[-1], host.host = qh,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;            actor: Actor, actor.address = r[3], actor.id = replace(path[-1],&amp;quot;)&amp;quot;,&amp;quot;&amp;quot;), actor.shard = path[-2], actor.type = path[-3],&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;            msg: Message, msg.msg = r[6], msg.type = split(msgPts[0], &amp;quot; &amp;quot;)[0], msg.level = r[2],&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;            class: Class, class.class = r[5]&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        WITH * CALL reify.time(datetime({date: localdatetime(r[1], &amp;quot;yyyy-MM-dd HH:mm:ss,SSS&amp;quot;)})) YIELD node AS time&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        CREATE (actor)-[:sent]-&amp;amp;amp;gt;(msg),&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;               (actor)-[:of_class]-&amp;amp;amp;gt;(class),&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;               (actor)-[:on_host]-&amp;amp;amp;gt;(host),&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;               (msg)-[:at_time]-&amp;amp;amp;gt;(time)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;The ingest stream definition:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;Reads Quine log lines from a file&lt;&#x2F;li&gt;
&lt;li&gt;Parses each line with regex&lt;&#x2F;li&gt;
&lt;li&gt;Creates host, actor, message, and class nodes&lt;&#x2F;li&gt;
&lt;li&gt;Populates the node properties&lt;&#x2F;li&gt;
&lt;li&gt;Relates the nodes in the streaming graph&lt;&#x2F;li&gt;
&lt;li&gt;Anchors the message with a relationship to a time node from &#x27;reify.time&#x27;&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;h2 id=&quot;configuring-quine-logs&quot;&gt;Configuring Quine Logs&lt;&#x2F;h2&gt;
&lt;p&gt;Ok, let&#x27;s run this recipe and see how it works. By default, the log level in Quine is set to WARN. We can increase the log level in the configuration or pass in a Java system configuration property when we launch Quine.&lt;&#x2F;p&gt;
&lt;p&gt;NOTE&lt;br &#x2F;&gt;
Set the log level in Quine ( or Quine Enterprise) via that thatdot.loglevelconfiguration option..&lt;&#x2F;p&gt;
&lt;h3 id=&quot;setting-log-level-in-configuration&quot;&gt;&lt;strong&gt;Setting Log Level in Configuration&lt;&#x2F;strong&gt;&lt;&#x2F;h3&gt;
&lt;p&gt;Start by getting your current Quine configuration. The easiest way to get the configuration is to start Quine and then &lt;code&gt;GET&lt;&#x2F;code&gt; the configuration via an API call.&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;❯ curl --request GET \&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  --url http:&#x2F;&#x2F;0.0.0.0:8080&#x2F;api&#x2F;v1&#x2F;admin&#x2F;config \&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  --header &amp;#39;Content-Type: application&#x2F;json&amp;#39; \&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;&amp;gt; quine.conf&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Edit the &lt;code&gt;quine.conf&lt;&#x2F;code&gt; file and add &lt;code&gt;&quot;thatdot&quot;:{&quot;loglevel&quot;:&quot;DEBUG&quot;}&lt;&#x2F;code&gt;&lt;strong&gt;,&lt;&#x2F;strong&gt; before the &lt;code&gt;quine&lt;&#x2F;code&gt; object.&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;❯ jq &amp;#39;.&amp;#39; quine.conf&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;{&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  &amp;quot;thatdot&amp;quot;: {&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    &amp;quot;loglevel&amp;quot;: &amp;quot;DEBUG&amp;quot;&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  },&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  &amp;quot;quine&amp;quot;: {&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    &amp;quot;decline-sleep-when-access-within&amp;quot;: &amp;quot;0&amp;quot;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    &amp;quot;decline-sleep-when-write-within&amp;quot;: &amp;quot;100ms&amp;quot;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    &amp;quot;dump-config&amp;quot;: false,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    &amp;quot;edge-iteration&amp;quot;: &amp;quot;reverse-insertion&amp;quot;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    &amp;quot;id&amp;quot;: {&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;      &amp;quot;partitioned&amp;quot;: false,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;      &amp;quot;type&amp;quot;: &amp;quot;uuid&amp;quot;&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    },&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    &amp;quot;in-memory-hard-node-limit&amp;quot;: 75000,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    &amp;quot;in-memory-soft-node-limit&amp;quot;: 10000,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    &amp;quot;labels-property&amp;quot;: &amp;quot;__LABEL&amp;quot;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    &amp;quot;metrics-reporters&amp;quot;: [&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;      {&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        &amp;quot;type&amp;quot;: &amp;quot;jmx&amp;quot;&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;      }&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    ],&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    &amp;quot;persistence&amp;quot;: {&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;      &amp;quot;effect-order&amp;quot;: &amp;quot;memory-first&amp;quot;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;      &amp;quot;journal-enabled&amp;quot;: true,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;      &amp;quot;snapshot-schedule&amp;quot;: &amp;quot;on-node-sleep&amp;quot;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;      &amp;quot;snapshot-singleton&amp;quot;: false,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;      &amp;quot;standing-query-schedule&amp;quot;: &amp;quot;on-node-sleep&amp;quot;&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    },&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    &amp;quot;shard-count&amp;quot;: 4,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    &amp;quot;should-resume-ingest&amp;quot;: false,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    &amp;quot;store&amp;quot;: {&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;      &amp;quot;create-parent-dir&amp;quot;: false,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;      &amp;quot;filepath&amp;quot;: &amp;quot;quine.db&amp;quot;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;      &amp;quot;sync-all-writes&amp;quot;: false,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;      &amp;quot;type&amp;quot;: &amp;quot;rocks-db&amp;quot;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;      &amp;quot;write-ahead-log&amp;quot;: true&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    },&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    &amp;quot;timeout&amp;quot;: &amp;quot;2m&amp;quot;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    &amp;quot;webserver&amp;quot;: {&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;      &amp;quot;address&amp;quot;: &amp;quot;0.0.0.0&amp;quot;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;      &amp;quot;enabled&amp;quot;: true,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;      &amp;quot;port&amp;quot;: 8080&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    }&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  }&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;}&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Now, restart Quine and include the &lt;code&gt;config.file&lt;&#x2F;code&gt; property.&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;java -Dconfig.file=quine.conf -jar quine-x.x.x.jar &amp;gt; quineLog.log&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;&lt;code&gt;DEBUG&lt;&#x2F;code&gt; level log lines will stream into the  &lt;code&gt;quineLog.log&lt;&#x2F;code&gt; file.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;passing-log-level-at-runtime&quot;&gt;Passing Log Level at Runtime&lt;&#x2F;h3&gt;
&lt;p&gt;Another slightly more straightforward way to enable Quine logs is to pass in a Java system configuration property. Here&#x27;s how to start Quine and enable logging from the command line.&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;java -Dthatdot.loglevel=DEBUG -jar quine-x.x.x.jar &amp;gt; quineLog.log&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;&lt;code&gt;DEBUG&lt;&#x2F;code&gt; level log lines will stream into the &lt;code&gt;quineLog.log&lt;&#x2F;code&gt; file.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;ingesting-other-log-formats&quot;&gt;Ingesting Other Log Formats&lt;&#x2F;h2&gt;
&lt;p&gt;You can easily modify the regex I developed for Quine log lines above to parse similar log output, like those found in *nix based system files or other Java applications.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;standard-ish-java-log-output&quot;&gt;Standard-ish Java Log Output&lt;&#x2F;h3&gt;
&lt;p&gt;Depending on the &lt;code&gt;log level&lt;&#x2F;code&gt;, Java emits a lot of information into logs. This ingest stream handles application log lines from most Java applications. Sometimes the log entry itself spans multiple lines.&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;- type: FileIngest&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  path: $app_log&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  format:&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    type: CypherJson&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    query: |-&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;      WITH *, text.regexFirstMatch($that.message, &amp;#39;^(\\d{4}(?:-\\d{2}){2}(?:[^]\\r?\\n]+))\\s+?\\[(.+?)\\]\\s+?(\\S+?)\\s+(.+?)\\s+\\-\\s+((?:(?!^\\d{4}(?:-\\d{2}){2}(?:[^|\\r?\\n]+){3}).*(?:\\r?\\n)?)+)&amp;#39;) AS r WHERE r IS NOT NULL&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;      CREATE (log {&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        timestamp: r[1],&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        component: r[2],&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        level: r[3],&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        subprocess: r[4],&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        message: r[5],&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        type: &amp;#39;log&amp;#39;&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;      })&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;      &#x2F;&#x2F; Create hour&#x2F;minute buckets per event&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;      WITH * WHERE r[1] IS NOT NULL CALL reify.time(datetime({date: localdatetime(r[1], &amp;quot;yyyy-MM-dd HH:mm:ss,SSS&amp;quot;)}), [&amp;quot;hour&amp;quot;,&amp;quot;minute&amp;quot;]) YIELD node AS timeNode&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;      &#x2F;&#x2F; Create edges for timenNodes&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;      CREATE (log)-[:at]-&amp;gt;(timeNode)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;&lt;h3 id=&quot;ubuntu-ubuntu-22-04-lts-syslog&quot;&gt;Ubuntu Ubuntu 22.04 LTS Syslog&lt;&#x2F;h3&gt;
&lt;p&gt;If you&#x27;re developing distributed applications, you will most likely need a regular expression that parses the Ubuntu &lt;code&gt;&#x2F;var&#x2F;log&#x2F;syslog&lt;&#x2F;code&gt; file. First, you need to edit &lt;code&gt;&#x2F;etc&#x2F;rsyslog.conf&lt;&#x2F;code&gt; and uncomment the line to emit the traditional &lt;code&gt;DateTime&lt;&#x2F;code&gt; format.&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;#&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;# Use traditional timestamp format.&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;# To enable high precision timestamps, comment out the following line.&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;#&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;$ActionFileDefaultTemplate RSYSLOG_TraditionalFileFormat&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;The log line format is:&lt;br &#x2F;&gt;
&lt;code&gt;%timestamp:::date-rfc3339% %HOSTNAME% %app-name% %procid% %msgid% %msg%n&lt;&#x2F;code&gt;&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;- type: FileIngest&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  path: $syslog&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  format:&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    type: CypherLine&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    query: |-&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;      WITH text.regexFirstMatch($that, &amp;#39;^(\\d{4}-\\d{2}-\\d{2}T\\d{2}:\\d{2}:\\d{2}\\.\\d*?\\+\\d{2}:\\d{2}|Z).?\\s(.*?)(?=\\s).?\\s(\\S+)\\[(\\S+)\\]:\\s(.*)&amp;#39;) AS s WHERE s IS NOT NULL&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;      CREATE (syslog {&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        timestamp: s[1],&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        hostname: s[2],&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        app_name: s[3],&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        proc_id: s[4],&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        message: s[5],&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        type: &amp;#39;syslog&amp;#39;&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;      })&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;      &#x2F;&#x2F; Create hour&#x2F;minute buckets per event&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;      WITH * WHERE s[1] IS NOT NULL CALL reify.time(datetime({date: localdatetime(s[1], &amp;quot;yyyy-MM-dd&amp;#39;T&amp;#39;HH:mm:ss.SSSSSSz&amp;quot;)}), [&amp;quot;hour&amp;quot;,&amp;quot;minute&amp;quot;]) YIELD node AS timeNode&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;      &#x2F;&#x2F; Create edges for timenNodes&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;      CREATE (syslog)-[:at]-&amp;gt;(timeNode)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;&lt;h3 id=&quot;mysql-error-log&quot;&gt;MySQL Error Log&lt;&#x2F;h3&gt;
&lt;p&gt;Working on a web application that&#x27;s been around for a while, it&#x27;s probably sitting on top of a MySQL database. The traditional-format MySQL log messages have these &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;dev.mysql.com&#x2F;doc&#x2F;refman&#x2F;8.0&#x2F;en&#x2F;error-log-format.html&quot;&gt;fields&lt;&#x2F;a&gt;:&lt;&#x2F;p&gt;
&lt;p&gt;&lt;code&gt;time thread [label] [err_code] [subsystem] msg&lt;&#x2F;code&gt;&lt;&#x2F;p&gt;
&lt;p&gt;For example:&lt;br &#x2F;&gt;
&lt;code&gt;2022-04-14T06:55:26.961757Z 0 [System] [MY-011323] [Server] X Plugin ready for connections. Socket: &#x2F;var&#x2F;run&#x2F;mysqld&#x2F;mysqlx.sock&lt;&#x2F;code&gt;&lt;&#x2F;p&gt;
&lt;p&gt;Add these log entries to your streaming graph for analysis too.&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;- type: FileIngest&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  path: $sqlerr_log&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  format:&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    type: CypherLine&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    query: |-&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;      WITH text.regexFirstMatch($that, &amp;#39;^(\\d{4}-\\d{2}-\\d{2}T\\d{2}:\\d{2}:\\d{2}\\.\\d{6}Z)\\s(\\d)\\s\\[(\\S+)\\]\\s\\[(\\S+)\\]\\s\\[(\\S+)\\]\\s(.*)&amp;#39;) AS m WHERE m IS NOT NULL&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;      CREATE (sqllog {&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        timestamp: m[1],&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        thread: m[2],&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        label: m[3],&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        err_code: m[4],&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        subsystem: m[5],&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        message: m[6],&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        type: &amp;#39;sqllog&amp;#39;&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;      })&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;      &#x2F;&#x2F; Create hour&#x2F;minute buckets per event&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;      WITH * WHERE m[1] IS NOT NULL CALL reify.time(datetime({date: localdatetime(m[1], &amp;quot;yyyy-MM-dd&amp;#39;T&amp;#39;HH:mm:ss.SSSSSSz&amp;quot;)}), [&amp;quot;hour&amp;quot;,&amp;quot;minute&amp;quot;]) YIELD node AS timeNode&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;      &#x2F;&#x2F; Create edges for timenNodes&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;      CREATE (sqllog)-[:at]-&amp;gt;(timeNode)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;&lt;h2 id=&quot;conclusion&quot;&gt;Conclusion&lt;&#x2F;h2&gt;
&lt;p&gt;Streaming data comes from all kinds of sources. With Quine, it&#x27;s easy to convert that data stream into a streaming graph.&lt;&#x2F;p&gt;
&lt;p&gt;Quine is open source if you want to run this analysis for yourself. Download a precompiled version or build it yourself from the codebase &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;thatdot&#x2F;quine&quot;&gt;Quine Github&lt;&#x2F;a&gt;. I published the recipe that I developed at &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;quine.io&#x2F;recipes&#x2F;quine-logs-recipe&quot;&gt;https:&#x2F;&#x2F;quine.io&#x2F;recipes&#x2F;quine-log-recipe&lt;&#x2F;a&gt;. The page has instructions for downloading the &lt;code&gt;quineLog.log&lt;&#x2F;code&gt; files and running the recipe.&lt;&#x2F;p&gt;
&lt;p&gt;Have a question, suggestion, or improvement? I welcome your feedback! Please drop in to &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;that.re&#x2F;quine-slack&quot;&gt;Quine Slack&lt;&#x2F;a&gt; and let me know. I&#x27;m always happy to discuss Quine or answer questions.&lt;&#x2F;p&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>Ingesting From Multiple Data Sources into Quine Streaming Graph</title>
        <published>2022-06-02T00:00:00+00:00</published>
        <updated>2022-06-02T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              Unknown
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://www.thatdot.com/blog/ingesting-from-multiple-data-sources-into-quine-streaming-graph/"/>
        <id>https://www.thatdot.com/blog/ingesting-from-multiple-data-sources-into-quine-streaming-graph/</id>
        
        <content type="html" xml:base="https://www.thatdot.com/blog/ingesting-from-multiple-data-sources-into-quine-streaming-graph/">&lt;h2 id=&quot;building-a-streaming-graph-from-multiple-sources&quot;&gt;Building a Streaming Graph from Multiple Sources&lt;&#x2F;h2&gt;
&lt;p&gt;As part of the ongoing series in which I exploring different ways to use the ingest stream to load data into Quine, I want to cover one of Quine&#x27;s specialities: building a streaming graph from multiple data sources. This time, we&#x27;ll work with CSV data exported from IMDb to answer the question; &lt;em&gt;&quot;Which actors have acted in and directed the same movie?&quot;&lt;&#x2F;em&gt;&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;The CSV Files&lt;&#x2F;strong&gt;&lt;&#x2F;p&gt;
&lt;p&gt;Usually, if someone says that they have data, most likely it&#x27;s going to be in &lt;code&gt;CSV&lt;&#x2F;code&gt; format or pretty darn close to it. (Or &lt;code&gt;JSON&lt;&#x2F;code&gt;, but that is another blog post.) In our case, we have two files filled with data in &lt;code&gt;CSV&lt;&#x2F;code&gt; format. Let&#x27;s inspect what&#x27;s inside.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;File 1:&lt;&#x2F;strong&gt; &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;uploads-ssl.webflow.com&#x2F;61f0aecf55af2560f76f6a75&#x2F;6298089ae7f6f6786995c761_movieData.csv&quot;&gt;&lt;strong&gt;movieData.csv&lt;&#x2F;strong&gt;&lt;&#x2F;a&gt;&lt;&#x2F;p&gt;
&lt;p&gt;The &lt;code&gt;movieData.csv&lt;&#x2F;code&gt; file contains records for actors, movies, and the actor&#x27;s relationship to the movie. Conveniently, each record type has a schema, flattened into rows during export.&lt;&#x2F;p&gt;
&lt;p&gt;Should we separate the data back into discrete files and then load them? No, we can set up separate ingest streams to act on each data type in the file. Effectively, we will separate the &quot;jobs to do&quot; into Cypher queries and stream in the data.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;File 2:&lt;&#x2F;strong&gt; &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;uploads-ssl.webflow.com&#x2F;61f0aecf55af2560f76f6a75&#x2F;629808a1744490320daca158_ratingData.csv&quot;&gt;&lt;strong&gt;ratingData.csv&lt;&#x2F;strong&gt;&lt;&#x2F;a&gt;&lt;&#x2F;p&gt;
&lt;p&gt;Our second file, &lt;code&gt;ratingData.csv&lt;&#x2F;code&gt; is very straightforward. It contains 100,000 rows of movie ratings. Adding the &lt;code&gt;ratings&lt;&#x2F;code&gt; data into our model completes our discovery phase for the supplied data.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;uploads-ssl.webflow.com&#x2F;61f0aecf55af2565526f6a95&#x2F;62983e505d711b9875c7caa0_Schema%20for%20IMDB%20data%20.png&quot; alt=&quot;The IMDB csv data&amp;#39;s original RDBMS schema, including Person, Movie, Rating, and Join entities.&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;Original implied schema of IMDB data.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-cyphercsv-ingest-stream&quot;&gt;The &lt;strong&gt;CypherCsv&lt;&#x2F;strong&gt; Ingest Stream&lt;&#x2F;h2&gt;
&lt;p&gt;The Quine API documentation defines the &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;docs.quine.io&#x2F;reference&#x2F;rest-api.html#&#x2F;schemas&#x2F;com.thatdot.quine.routes.IngestStreamConfiguration&quot;&gt;schema&lt;&#x2F;a&gt; of the &lt;em&gt;File Ingest Format&lt;&#x2F;em&gt; ingest stream for us. The schema is robust and accommodates CSV, JSON, and line file types. Please take a moment to read through the documentation. Be sure to select type: FileIngest -&amp;gt; format: CypherCsv using the API documentation dropdowns.&lt;&#x2F;p&gt;
&lt;p&gt;I define ingest streams to transform and load the movie data into Quine. Quine ingest streams behave independently and in parallel when processing files. This means that we can have multiple ingest streams operating on a single file. This is the case for the movieData.csv file because there are several operations that we need to perform on multiple types of data.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;movie-rows&quot;&gt;Movie Rows&lt;&#x2F;h2&gt;
&lt;p&gt;The first ingest stream that I set up will address the Movie rows in the movieData.csv file. There are 9,125 movies in the data set. I create two nodes from each Movie row using an ingest query, movie and genre. I store all of the movie data as properties in the Movie mode.&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;WITH $that AS row&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;MATCH (m) WHERE row.Entity = &amp;#39;Movie&amp;#39; AND id(m) = idFrom(&amp;quot;Movie&amp;quot;, row.movieId)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;SET&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  m:Movie,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  m.tmdbId = row.tmdbId,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  m.imdbId = row.imdbId,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  m.imdbRating = toFloat(row.imdbRating),&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  m.released = row.released,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  m.title = row.title,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  m.year = toInteger(row.year),&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  m.poster = row.poster,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  m.runtime = toInteger(row.runtime),&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  m.countries = split(coalesce(row.countries,&amp;quot;&amp;quot;), &amp;quot;|&amp;quot;),&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  m.imdbVotes = toInteger(row.imdbVotes),&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  m.revenue = toInteger(row.revenue),&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  m.plot = row.plot,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  m.url = row.url,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  m.budget = toInteger(row.budget),&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  m.languages = split(coalesce(row.languages,&amp;quot;&amp;quot;), &amp;quot;|&amp;quot;),&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  m.movieId = row.movieId&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;WITH m,split(coalesce(row.genres,&amp;quot;&amp;quot;), &amp;quot;|&amp;quot;) AS genres&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;UNWIND genres AS genre&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;WITH m, genre&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;MATCH (g) WHERE id(g) = idFrom(&amp;quot;Genre&amp;quot;, genre)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;SET g.genre = genre, g:Genre&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;MERGE (m:Movie)-[:IN_GENRE]-&amp;gt;(g:Genre)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Quine passes each line to the ingest stream via the variable &lt;code&gt;$that&lt;&#x2F;code&gt; to which I assign the identity &lt;code&gt;row&lt;&#x2F;code&gt;. A &lt;code&gt;MATCH&lt;&#x2F;code&gt; is made when the &lt;code&gt;row.Entity&lt;&#x2F;code&gt; value is &lt;code&gt;Movie&lt;&#x2F;code&gt; and a node &lt;code&gt;id&lt;&#x2F;code&gt; is returned from the &lt;code&gt;idFrom()&lt;&#x2F;code&gt; function. &lt;code&gt;SET&lt;&#x2F;code&gt; is used to give the node a label and to store metadata as node properties.&lt;&#x2F;p&gt;
&lt;p&gt;Each movie row has a pipe &lt;code&gt;|&lt;&#x2F;code&gt; delimited list of genres in the &lt;code&gt;genres&lt;&#x2F;code&gt; column. I split the column value apart and created a Genre node for each genre in the list, labeled and containing the genre as a property.&lt;&#x2F;p&gt;
&lt;p&gt;Finally, the &lt;code&gt;Movie&lt;&#x2F;code&gt; node is related to the &lt;code&gt;Genre&lt;&#x2F;code&gt; node with &lt;code&gt;MERGE&lt;&#x2F;code&gt;.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;person-rows&quot;&gt;Person Rows&lt;&#x2F;h2&gt;
&lt;p&gt;The second ingest stream addresses the &lt;code&gt;Person&lt;&#x2F;code&gt; rows in the same way I did for the &lt;code&gt;Movie&lt;&#x2F;code&gt; rows. There are 19047 person records in the &lt;code&gt;movieData.csv&lt;&#x2F;code&gt; file.&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;WITH $that AS row&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;MATCH (p) WHERE row.Entity = &amp;quot;Person&amp;quot; AND id(p) = idFrom(&amp;quot;Person&amp;quot;, row.tmdbId)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;SET&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  p:Person,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  p.imdbId = row.imdbId,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  p.bornIn = row.bornIn,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  p.name = row.name,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  p.bio = row.bio,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  p.poster = row.poster,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  p.url = row.url,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  p.born = row.born,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  p.died = row.died,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  p.tmdbId = row.tmdbId,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  p.born = CASE row.born WHEN &amp;quot;&amp;quot; THEN null ELSE datetime(row.born + &amp;quot;T00:00:00Z&amp;quot;) END,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  p.died = CASE row.died WHEN &amp;quot;&amp;quot; THEN null ELSE datetime(row.died + &amp;quot;T00:00:00Z&amp;quot;) END&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;The ingest query in this ingest stream matches when the &lt;code&gt;row.Entity&lt;&#x2F;code&gt; is &lt;code&gt;Person&lt;&#x2F;code&gt;, creates a node using the &lt;code&gt;idFrom()&lt;&#x2F;code&gt; function, and stores the Person metadata in node parameters.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;join-rows&quot;&gt;Join Rows&lt;&#x2F;h2&gt;
&lt;p&gt;Looking at the rows that have &lt;code&gt;Join&lt;&#x2F;code&gt; in the &lt;code&gt;Entity&lt;&#x2F;code&gt; column leads me to believe that the data in this &lt;code&gt;CSV&lt;&#x2F;code&gt; file originated from a relational database. There are two types of joins in the file, &lt;code&gt;Acted&lt;&#x2F;code&gt; and &lt;code&gt;Directed&lt;&#x2F;code&gt;. The ingest queries below process them.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Acted In&lt;&#x2F;strong&gt;&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;WITH $that AS row&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;WITH row WHERE row.Entity = &amp;quot;Join&amp;quot; AND row.Work = &amp;quot;Acting&amp;quot;&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;MATCH (p) WHERE id(p) = idFrom(&amp;quot;Person&amp;quot;, row.tmdbId)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;MATCH (m) WHERE id(m) = idFrom(&amp;quot;Movie&amp;quot;, row.movieId)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;MATCH (r) WHERE id(r) = idFrom(&amp;quot;Role&amp;quot;, row.tmdbId, row.movieId, row.role)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;SET &lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  r.role = row.role, &lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  r.movie = row.movieId, &lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  r.tmdbId = row.tmdbId, &lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  r:Role&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;MERGE (p:Person)-[:PLAYED]-&amp;gt;(r:Role)&amp;lt;-[:HAS_ROLE]-(m:Movie)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;MERGE (p:Person)-[:ACTED_IN]-&amp;gt;(m:Movie)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;‍&lt;&#x2F;p&gt;
&lt;p&gt;Acted join rows create relationships between Person, Role, and Movie nodes. There are two paths created from the Person nodes. The first path &lt;code&gt;(p)-[:PLAYED]-&amp;gt;(r)&amp;lt;-[:HAS_ROLE]-(m)&lt;&#x2F;code&gt; establishes the relationship between actors (Person) and the roles they have played as well as the roles in a movie (Movies). A second path is formed that directly relates an actor to movies they acted in.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Directed&lt;&#x2F;strong&gt;&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;WITH $that AS row&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;WITH row WHERE row.Entity = &amp;quot;Join&amp;quot; AND row.Work = &amp;quot;Directing&amp;quot;&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;MATCH (p) WHERE id(p) = idFrom(&amp;quot;Person&amp;quot;, row.tmdbId)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;MATCH (m) WHERE id(m) = idFrom(&amp;quot;Movie&amp;quot;, row.movieId)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;MERGE (p:Person)-[:DIRECTED]-&amp;gt;(m:Movie)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;The &lt;em&gt;Directed&lt;&#x2F;em&gt; ingest query matches join rows and creates a path relating directors with the movies they have directed.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;ratings&quot;&gt;Ratings&lt;&#x2F;h2&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;WITH $that AS row&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;MATCH (m) WHERE id(m) = idFrom(&amp;quot;Movie&amp;quot;, row.movieId)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;MATCH (u) WHERE id(u) = idFrom(&amp;quot;User&amp;quot;, row.userId)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;MATCH (rtg) WHERE id(rtg) = idFrom(&amp;quot;Rating&amp;quot;, row.movieId, row.userId, row.rating)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;SET u.name = row.name, u:User&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;SET rtg.rating = row.rating,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  rtg.timestamp = toInteger(row.timestamp),&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  rtg:Rating&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;MERGE (u:User)-[:SUBMITTED]-&amp;gt;(rtg:Rating)&amp;lt;-[:HAS_RATING]-(m:Movie)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;MERGE (u:User)-[:RATED]-&amp;gt;(m:Movie)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;The last ingest query processes rows from the &lt;code&gt;ratingData.csv&lt;&#x2F;code&gt; file. The query creates User and Rating nodes, then relates them together.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;running-the-recipe&quot;&gt;Running the Recipe&lt;&#x2F;h2&gt;
&lt;p&gt;As my project progressed, I developed a &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;quine.io&#x2F;getting-started&#x2F;recipes-tutorial&#x2F;#what-is-a-quine-recipe&quot;&gt;Quine recipe&lt;&#x2F;a&gt; to load my &lt;code&gt;CSV&lt;&#x2F;code&gt; files and perform the analysis. Running the recipe requires a couple of Quine options to pass in the locations of the &lt;code&gt;CSV&lt;&#x2F;code&gt; files and an updated configuration setting.&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;java \&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;-Dquine.in-memory-soft-node-limit=30000 \&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;-jar ..&#x2F;releases&#x2F;latest -r movieData \&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;--recipe-value movie_file=movieData.csv \&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;--recipe-value rating_file=ratingData.csv&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;After ingesting the &lt;code&gt;CSV&lt;&#x2F;code&gt; files, it results in the data set stored in Quine:&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;uploads-ssl.webflow.com&#x2F;61f0aecf55af2565526f6a95&#x2F;6298028ec77f0df20e3f788f_Quine%20Schema%20IMDB%20Data.png&quot; alt=&quot;&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;The data model in Quine for the IMDB data.&lt;&#x2F;p&gt;
&lt;p&gt;The orange Movie and Person nodes are created directly from the &lt;code&gt;Entity&lt;&#x2F;code&gt; column in &lt;code&gt;movieData.csv&lt;&#x2F;code&gt;. The User node is from &lt;code&gt;ratingData.csv&lt;&#x2F;code&gt; and the green nodes were derived from data stored within an entity row. The &lt;code&gt;ActedDirected&lt;&#x2F;code&gt; relationship is built by the standing query in the recipe.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;answering-the-question&quot;&gt;Answering the Question&lt;&#x2F;h2&gt;
&lt;p&gt;Getting all of this data into Quine was only part of the challenge. Remember the question that we were asked, *&quot;which actors have acted in and directed the same movie?&quot;*‍&lt;&#x2F;p&gt;
&lt;p&gt;Quine is a streaming graph; if we were to connect the ingest streams to the streaming source, rather than &lt;code&gt;CSV&lt;&#x2F;code&gt; files, the &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;docs.quine.io&#x2F;components&#x2F;writing-standing-queries.html&quot;&gt;standing query&lt;&#x2F;a&gt; inside of the recipe that I developed would answer the question for movies in the past as well as movies in the future.&lt;&#x2F;p&gt;
&lt;p&gt;Our standing query matches when a complete pattern for the situation when an actor (&lt;code&gt;Person&lt;&#x2F;code&gt;) both &lt;code&gt;ACTED_IN&lt;&#x2F;code&gt; and &lt;code&gt;DIRECTED&lt;&#x2F;code&gt; the same movie.&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;MATCH (a:Movie)&amp;lt;-[:ACTED_IN]-(p:Person)-[:DIRECTED]-&amp;gt;(m:Movie) &lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;WHERE id(a) = id(m)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;RETURN id(m) as movieId, m.title as Movie, id(p) as personId, p.name as Actor&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;When the standing query completes a match, it processes the movie &lt;code&gt;id&lt;&#x2F;code&gt; and person &lt;code&gt;id&lt;&#x2F;code&gt; through the &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;docs.quine.io&#x2F;components&#x2F;writing-standing-queries.html#result-outputs&quot;&gt;output&lt;&#x2F;a&gt; query and actions.&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;standingQueries:&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  - pattern:&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;      type: Cypher&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;      mode: MultipleValues&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;      query: |-&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        MATCH (a:Movie)&amp;lt;-[:ACTED_IN]-(p:Person)-[:DIRECTED]-&amp;gt;(m:Movie) &lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        WHERE id(a) = id(m)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        RETURN id(m) as movieId, m.title as Movie, id(p) as personId, p.name as Actor&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    outputs:&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;      set-ActedDirected:&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        type: CypherQuery&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        query: |-&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;          MATCH (m),(p)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;          WHERE strId(m) = $that.data.movie AND strId(p) = $that.data.person&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;          MERGE (p:Person)-[:ActedDirected]-&amp;gt;(m:Movie)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;      log-actor-director:&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        type: WriteToFile&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        path: &amp;quot;ActorDirector.jsonl&amp;quot;&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;My standing query creates a new &lt;code&gt;ActedDirected&lt;&#x2F;code&gt; relationship between the Person and Movie nodes, then logs the relationship.&lt;&#x2F;p&gt;
&lt;p&gt;Four hundred ninety-one actors acted in and directed the same movie in our data set.&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;{&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    &amp;quot;data&amp;quot;: {&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        &amp;quot;Actor&amp;quot;: &amp;quot;Clint Eastwood&amp;quot;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        &amp;quot;Movie&amp;quot;: &amp;quot;Unforgiven&amp;quot;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        &amp;quot;movieId&amp;quot;: &amp;quot;4a6d64c8-9c90-3362-b443-4d2e7b2fb9d1&amp;quot;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        &amp;quot;personId&amp;quot;: &amp;quot;4638a820-3b68-3fc7-9fa7-341e876b701e&amp;quot;&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    }&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;}&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;&lt;h2 id=&quot;conclusion&quot;&gt;Conclusion&lt;&#x2F;h2&gt;
&lt;p&gt;Phew, we made it through! And we learned a lot along the way.&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;CSV data is streamed into Quine&lt;&#x2F;li&gt;
&lt;li&gt;Quine can read from external files and streaming providers&lt;&#x2F;li&gt;
&lt;li&gt;You can ingest multiple streams at once, movies and reviewers, and combine them into one streaming graph&lt;&#x2F;li&gt;
&lt;li&gt;Always separate ingest queries using the jobs to be done framework&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;Quine is open source if you want to run this analysis for yourself. Download a precompiled version or build it yourself from the codebase &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;thatdot&#x2F;quine&quot;&gt;Quine Github&lt;&#x2F;a&gt;. I published the recipe that I developed at &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;quine.io&#x2F;recipes&quot;&gt;https:&#x2F;&#x2F;quine.io&#x2F;recipes&lt;&#x2F;a&gt;. The page has instructions for downloading the &lt;code&gt;CSV&lt;&#x2F;code&gt; files and running the recipe.&lt;&#x2F;p&gt;
&lt;p&gt;Have a question, suggestion, or improvement? I welcome your feedback! Please drop in to &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;that.re&#x2F;quine-slack&quot;&gt;Quine Slack&lt;&#x2F;a&gt; and let me know. I&#x27;m always happy to discuss Quine or answer questions.&lt;&#x2F;p&gt;
&lt;p&gt;‍&lt;&#x2F;p&gt;
&lt;p&gt;‍&lt;&#x2F;p&gt;
&lt;p&gt;‍&lt;&#x2F;p&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>Real-time Blockchain Monitoring is Hard without A Streaming Graph</title>
        <published>2022-05-31T00:00:00+00:00</published>
        <updated>2022-05-31T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              Unknown
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://www.thatdot.com/blog/real-time-blockchain-monitoring-is-hard-and-your-database-is-the-reason/"/>
        <id>https://www.thatdot.com/blog/real-time-blockchain-monitoring-is-hard-and-your-database-is-the-reason/</id>
        
        <content type="html" xml:base="https://www.thatdot.com/blog/real-time-blockchain-monitoring-is-hard-and-your-database-is-the-reason/">&lt;h2 id=&quot;the-challenges-of-finding-fraud-on-the-blockchain&quot;&gt;The Challenges of Finding Fraud on the Blockchain&lt;&#x2F;h2&gt;
&lt;p&gt;Blockchain-based technology growth has been explosive, with over 10,000 cryptocurrencies alone available to rapidly growing consumer and commercial user bases. Real-time governance and compliance techniques are needed to ensure confidence in the space in order for them to be embraced as alternatives to fiat currencies. The combination of new technology, well established user expectations for real-time transactions and rapidly evolving regulations demand new tools to handle the complexities of these distributed and &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;www.trendmicro.com&#x2F;vinfo&#x2F;us&#x2F;security&#x2F;definition&#x2F;pseudonymization&quot;&gt;pseudonymized&lt;&#x2F;a&gt; systems.&lt;&#x2F;p&gt;
&lt;p&gt;Detecting, tracing, and mitigating fraud across block chain(s) relies on many of the same practices used by more traditional banking systems: modeling user behaviors, watching for suspicious transactions relative to known exploits or typical behavior patterns (often termed &quot;know your customer&quot; or &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;www.thalesgroup.com&#x2F;en&#x2F;markets&#x2F;digital-identity-and-security&#x2F;banking-payment&#x2F;issuance&#x2F;id-verification&#x2F;know-your-customer&quot;&gt;KYC&lt;&#x2F;a&gt;), and rapid action to limit fraudulent transactions (e.g., from hacked accounts), ideally in real-time. The use of pseudonymity practices (e.g., the use of private addresses), however, require new data analysis techniques and mechanisms to maximize the contextual value of the data that is available to identify fraud while minimizing the impact false positives and investigative overhead on customers and business operations.&lt;&#x2F;p&gt;
&lt;p&gt;Given that user identity data is more limited in crypto, it becomes necessary to maximize the use of available information about the interactions of accounts and wallets to identify and trace money laundering. Fortunately, cryptos underlying blockchain(s) are essentially append-only ledgers and provide a complete history of such interactions and can be readily analyzed. That is, so long tools suited to modeling and querying these relationships are available.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;relational-databases-work-at-low-transaction-volumes&quot;&gt;Relational Databases Work at Low Transaction Volumes&lt;&#x2F;h3&gt;
&lt;p&gt;Modeling and monitoring relationships between event data can be done at low volume using legacy relational database tools. Tables are built to represent the relationships between an address and it’s transactions, which are joined with tables about the addresses and their accounts, which are then joined with tables about blocks, which are joined with tables from other blockchains… Such “nested joins” and the use of Foreign Keys to relate tables together are the state of the art today, but they are expensive computationally and slow to manifest query responses. Reducing queries to small blocks of time has been the modus operandi of the industry. However, the use of “time windowing,” a well-established practice, limits the data used to make decisions; the antithesis of the context enrichment we are seeking to analyze the relationships between blockchain event logs.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;using-categorical-data-to-process-the-blockchain&quot;&gt;Using Categorical Data to Process the Blockchain&lt;&#x2F;h3&gt;
&lt;p&gt;Enter graph technology. Graph data structures are ideal for modeling the relationships described in blockchain events . Flows of cryptocurrency between accounts and wallets are ideal inputs for graph data modeling. Accounts, addresses, time references, devices, assets, transaction details, etc. are all examples of &lt;a href=&quot;https:&#x2F;&#x2F;www.thatdot.com&#x2F;blog&#x2F;what-is-categorical-data&#x2F;&quot;&gt;categorical data&lt;&#x2F;a&gt; connected by relationships and are therefore ideal to be represented as the nodes, edges, and properties provided in a graph data model. Most importantly, the graph data model makes the relationships between entities first class citizens in the data model so the costs and complexity associated with table joins is entirely eliminated.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;uploads-ssl.webflow.com&#x2F;61f0aecf55af2565526f6a95&#x2F;629534657541e766bcd6b242_kMFQWtuGZrQWbY5t4dt2tRowf_Zb5D7so9nTg1w-1rf8KcQJODpLX9Uq88DoCcG451Ih3HhiIva8JAH8MDouAy3_Y-6hfcRwUdr7z9GAa69Dnto7QKkgFhOn51o893ChBPm-dEF_efGpj4hJwQ.png&quot; alt=&quot;&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;Graph is the ideal data model for blockchain relationship tracing.&lt;&#x2F;p&gt;
&lt;p&gt;Knowing a graph data model is a good alignment with blockchain event data, we then need to confront the well-known performance limitations of graph databases. Graph databases are still databases, and queries that traverse multiple levels of relationship degrees dramatically impact database performance. Unfortunately, this leads developers to once again fall back to batch processing with time limited windows of data. While graph is more efficient than relational databases in modeling relationships, it still lacks the performance throughput needed for real-time fraud detection and mitigation use cases.&lt;&#x2F;p&gt;
&lt;p&gt;What is needed is a system that combines the graph data model with an event processing architecture that provides fast enough throughput to cost-effectively perform deep graph traversal queries across the complete history of  one or more blockchain&#x27;s events. Achieving such performance maximizes the contextual value of available event data by looking at real-time and historical transactions, while acting fast enough to drive real-time transaction challenges to new ones.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;enter-quine-streaming-graph-for-fraud-detection&quot;&gt;Enter Quine Streaming Graph for Fraud Detection&lt;&#x2F;h3&gt;
&lt;p&gt;This is where Quine streaming graph comes in. Quine is designed to process high volumes of event stream data in real time in order to detect complex and sometimes subtle patterns like the sort that might indicate fraud. Quine scales to tens of thousands of events per node, and can easily handle blockchain transaction volumes in real time, while also providing access to the complete trace of activities through and across blockchains. Quine can simultaneously consume both streaming data sources (&lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;quine.io&#x2F;components&#x2F;ingest-sources&#x2F;kafka&#x2F;&quot;&gt;Kafka&lt;&#x2F;a&gt; and &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;quine.io&#x2F;components&#x2F;ingest-sources&#x2F;kinesis&#x2F;&quot;&gt;Kinesis&lt;&#x2F;a&gt;) and static sources stored in databases and data lakes in order to build an integrated graph data model.&lt;&#x2F;p&gt;
&lt;p&gt;If you are interested in trying Quine yourself, &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;quine.io&#x2F;download&quot;&gt;download&lt;&#x2F;a&gt; it here and try it with the &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;quine.io&#x2F;recipes&#x2F;ethereum-tag-propagation&quot;&gt;Ethereum tag propagation&lt;&#x2F;a&gt; recipe. If you have questions or want to check out the community, join &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;that.re&#x2F;quine-slack&quot;&gt;Quine slack&lt;&#x2F;a&gt; or visit our &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;thatdot&#x2F;quine&quot;&gt;Github&lt;&#x2F;a&gt; page.&lt;&#x2F;p&gt;
&lt;p&gt;‍&lt;&#x2F;p&gt;
&lt;p&gt;&lt;em&gt;Blog header photo credit: Photo by&lt;&#x2F;em&gt; &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;unsplash.com&#x2F;@hjrc33?utm_source=unsplash&amp;amp;utm_medium=referral&amp;amp;utm_content=creditCopyText&quot;&gt;&lt;em&gt;Héctor J. Rivas&lt;&#x2F;em&gt;&lt;&#x2F;a&gt; &lt;em&gt;on&lt;&#x2F;em&gt; &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;unsplash.com&#x2F;s&#x2F;photos&#x2F;blue?utm_source=unsplash&amp;amp;utm_medium=referral&amp;amp;utm_content=creditCopyText&quot;&gt;&lt;em&gt;Unsplash&lt;&#x2F;em&gt;&lt;&#x2F;a&gt;&lt;&#x2F;p&gt;
&lt;p&gt;‍&lt;&#x2F;p&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>Ingesting data from the internet into Quine Streaming Graph</title>
        <published>2022-05-24T00:00:00+00:00</published>
        <updated>2022-05-24T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              Unknown
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://www.thatdot.com/blog/ingesting-data-from-the-internet/"/>
        <id>https://www.thatdot.com/blog/ingesting-data-from-the-internet/</id>
        
        <content type="html" xml:base="https://www.thatdot.com/blog/ingesting-data-from-the-internet/">&lt;p&gt;The previous article in this series (&lt;a href=&quot;https:&#x2F;&#x2F;www.thatdot.com&#x2F;blog&#x2F;building-a-quine-streaming-graph-ingest-streams&#x2F;&quot;&gt;Quine Ingest Streams&lt;&#x2F;a&gt;) introduced the ingest stream and the basic structure for for creating them. In this article, I go deeper, exploring the ingest query and its role in the ingest stream.&lt;&#x2F;p&gt;
&lt;p&gt;A quick review of ingest streams:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;An ingest stream connects Quine to data producers.&lt;&#x2F;li&gt;
&lt;li&gt;Ingest streams use backpressure to avoid becoming overloaded.&lt;&#x2F;li&gt;
&lt;li&gt;Data is transformed by the ingest query into a streaming graph.&lt;&#x2F;li&gt;
&lt;li&gt;Using &lt;code&gt;idFrom&lt;&#x2F;code&gt; allows us to act as if all nodes in the graph already exist.&lt;&#x2F;li&gt;
&lt;li&gt;Ingest streams are created either by API calls or Recipes.&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;For this article, we use the built-in &lt;code&gt;wikipedia&lt;&#x2F;code&gt; recipe as a starting point.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;defining-an-ingest-stream&quot;&gt;Defining an Ingest Stream&lt;&#x2F;h2&gt;
&lt;p&gt;The &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;quine.io&#x2F;recipes&#x2F;wikipedia-page-ingest&quot;&gt;wikipedia page ingest&lt;&#x2F;a&gt; recipe defines an ingest stream that receives updates from the &lt;code&gt;mediawiki.page-create&lt;&#x2F;code&gt; &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;stream.wikimedia.org&#x2F;v2&#x2F;ui&#x2F;#&#x2F;?streams=mediawiki.page-create&quot;&gt;event stream&lt;&#x2F;a&gt;.&lt;&#x2F;p&gt;
&lt;p&gt;Here&#x27;s a copy of the ingest stream from the recipe:&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;ingestStreams:&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  - type: ServerSentEventsIngest&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    url: https:&#x2F;&#x2F;stream.wikimedia.org&#x2F;v2&#x2F;stream&#x2F;page-create&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    format:&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;      type: CypherJson&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;      query: |-&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        MATCH (revNode) WHERE id(revNode) = idFrom(&amp;quot;revision&amp;quot;, $that.rev_id)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        MATCH (dbNode) WHERE id(dbNode) = idFrom(&amp;quot;db&amp;quot;, $that.database)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        MATCH (userNode) WHERE id(userNode) = idFrom(&amp;quot;id&amp;quot;, $that.performer.user_id)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        SET revNode = $that, revNode.type = &amp;quot;rev&amp;quot;&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        SET dbNode.database = $that.database, dbNode.type = &amp;quot;db&amp;quot;&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        SET userNode = $that.performer, userNode.type = &amp;quot;user&amp;quot;&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        WITH *, datetime($that.rev_timestamp) AS d&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        CALL create.setLabels(revNode, [&amp;quot;rev:&amp;quot; + $that.page_title])&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        CALL create.setLabels(dbNode, [&amp;quot;db:&amp;quot; + $that.database])&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        CALL create.setLabels(userNode, [&amp;quot;user:&amp;quot; + $that.performer.user_text])&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        CALL reify.time(d, [&amp;quot;year&amp;quot;, &amp;quot;month&amp;quot;, &amp;quot;day&amp;quot;, &amp;quot;hour&amp;quot;, &amp;quot;minute&amp;quot;]) YIELD node AS timeNode&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        CALL incrementCounter(timeNode, &amp;quot;count&amp;quot;)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        CREATE (revNode)-[:at]-&amp;gt;(timeNode)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        CREATE (revNode)-[:db]-&amp;gt;(dbNode)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        CREATE (revNode)-[:by]-&amp;gt;(userNode)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;This ingest stream has three elements: &lt;code&gt;type&lt;&#x2F;code&gt;, &lt;code&gt;url&lt;&#x2F;code&gt;, and &lt;code&gt;format&lt;&#x2F;code&gt;. The type declaration for an ingest stream establishes the structure for the ingest stream object definition. This ingest stream is a &lt;code&gt;ServerSentEventsIngest1&lt;&#x2F;code&gt; stream.&lt;&#x2F;p&gt;
&lt;p&gt;Reviewing the &lt;code&gt;ServerSentEventsIngest&lt;&#x2F;code&gt; &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;docs.quine.io&#x2F;reference&#x2F;rest-api.html#&#x2F;schemas&#x2F;com.thatdot.quine.routes.IngestStreamConfiguration&quot;&gt;schema documentation&lt;&#x2F;a&gt; from the API docs provides us with the schema that we need to follow for the ingest stream definition.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;NOTE&lt;&#x2F;strong&gt;&lt;br &#x2F;&gt;
The schema definition will default to File Ingest Stream when first opened.&lt;br &#x2F;&gt;
Be sure to click on the down arrow 🔽 next to File Ingest Stream and select Server Sent Events Stream from the drop down to view the correct schema.&lt;&#x2F;p&gt;
&lt;p&gt;Here&#x27;s the schema for a &lt;code&gt;ServerSentEventsIngest&lt;&#x2F;code&gt;&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;uploads-ssl.webflow.com&#x2F;61f0aecf55af2565526f6a95&#x2F;628cf5e8c027e48ae5a3a349_Quine%20Ingest%20Stream%20Configuration.png&quot; alt=&quot;&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;Quine Server Ingest Stream Schema&lt;&#x2F;p&gt;
&lt;p&gt;The structure of the &lt;code&gt;ServerSentEventsIngest&lt;&#x2F;code&gt; stream is pretty straight forward.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;code&gt;type&lt;&#x2F;code&gt; specifies the schema type for the ingest stream&lt;br &#x2F;&gt;
&lt;code&gt;format&lt;&#x2F;code&gt; defines what the ingest stream will do with each line it receive&lt;br &#x2F;&gt;
&lt;code&gt;format&lt;&#x2F;code&gt; defines what the ingest stream will do with each line it receive&lt;&#x2F;p&gt;
&lt;p&gt;&lt;code&gt;type&lt;&#x2F;code&gt; identifies the line format in the stream&lt;br &#x2F;&gt;
&lt;code&gt;query&lt;&#x2F;code&gt; defines the Cypher ingest query&lt;br &#x2F;&gt;
&lt;code&gt;parameter&lt;&#x2F;code&gt; name of the parameter to store the current datum&lt;&#x2F;p&gt;
&lt;p&gt;&lt;code&gt;url&lt;&#x2F;code&gt; defines the connection URL for the data producer&lt;br &#x2F;&gt;
&lt;code&gt;parallelism&lt;&#x2F;code&gt; and &lt;code&gt;maximumPerSecond&lt;&#x2F;code&gt; tune the bandwidth for the ingest stream and when to apply backpressure&lt;&#x2F;p&gt;
&lt;h2 id=&quot;wikipedia-page-create-data&quot;&gt;Wikipedia &lt;code&gt;page-create&lt;&#x2F;code&gt; Data&lt;&#x2F;h2&gt;
&lt;p&gt;Quick aside, we need to understand the data that we are working on before we start pulling the ingest query apart.&lt;&#x2F;p&gt;
&lt;p&gt;Here&#x27;s a sample &lt;code&gt;page-create&lt;&#x2F;code&gt; json object to review. View more samples by visiting the Wikipedia &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;stream.wikimedia.org&#x2F;v2&#x2F;ui&#x2F;#&#x2F;&quot;&gt;event streams&lt;&#x2F;a&gt; page, selecting the &lt;code&gt;mediawiki.page-create&lt;&#x2F;code&gt; stream, then clicking the green &quot;Stream&quot; button.&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;{&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    &amp;quot;$schema&amp;quot;: &amp;quot;&#x2F;mediawiki&#x2F;revision&#x2F;create&#x2F;1.1.0&amp;quot;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    &amp;quot;meta&amp;quot;: {&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        &amp;quot;uri&amp;quot;: &amp;quot;https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Established_population&amp;quot;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        &amp;quot;request_id&amp;quot;: &amp;quot;85b7bd4b-23a5-4c20-84a1-d89430c21f6c&amp;quot;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        &amp;quot;id&amp;quot;: &amp;quot;8a34f1c0-a276-4a2b-ae2e-305f8822011c&amp;quot;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        &amp;quot;dt&amp;quot;: &amp;quot;2022-05-20T16:43:34Z&amp;quot;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        &amp;quot;domain&amp;quot;: &amp;quot;en.wikipedia.org&amp;quot;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        &amp;quot;stream&amp;quot;: &amp;quot;mediawiki.page-create&amp;quot;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        &amp;quot;topic&amp;quot;: &amp;quot;eqiad.mediawiki.page-create&amp;quot;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        &amp;quot;partition&amp;quot;: 0,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        &amp;quot;offset&amp;quot;: 231788500&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    },&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    &amp;quot;database&amp;quot;: &amp;quot;enwiki&amp;quot;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    &amp;quot;page_id&amp;quot;: 70828723,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    &amp;quot;page_title&amp;quot;: &amp;quot;Established_population&amp;quot;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    &amp;quot;page_namespace&amp;quot;: 0,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    &amp;quot;rev_id&amp;quot;: 1088883819,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    &amp;quot;rev_timestamp&amp;quot;: &amp;quot;2022-05-20T16:43:33Z&amp;quot;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    &amp;quot;rev_sha1&amp;quot;: &amp;quot;d9uoc7gw3cj3ejhs8ihvsi61hp54icq&amp;quot;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    &amp;quot;rev_minor_edit&amp;quot;: false,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    &amp;quot;rev_len&amp;quot;: 82,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    &amp;quot;rev_content_model&amp;quot;: &amp;quot;wikitext&amp;quot;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    &amp;quot;rev_content_format&amp;quot;: &amp;quot;text&#x2F;x-wiki&amp;quot;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    &amp;quot;performer&amp;quot;: {&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        &amp;quot;user_text&amp;quot;: &amp;quot;Invasive Spices&amp;quot;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        &amp;quot;user_groups&amp;quot;: [&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;            &amp;quot;extendedconfirmed&amp;quot;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;            &amp;quot;*&amp;quot;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;            &amp;quot;user&amp;quot;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;            &amp;quot;autoconfirmed&amp;quot;&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        ],&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        &amp;quot;user_is_bot&amp;quot;: false,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        &amp;quot;user_id&amp;quot;: 40272459,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        &amp;quot;user_registration_dt&amp;quot;: &amp;quot;2020-09-30T23:11:08Z&amp;quot;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        &amp;quot;user_edit_count&amp;quot;: 9319&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    },&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    &amp;quot;page_is_redirect&amp;quot;: true,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    &amp;quot;comment&amp;quot;: &amp;quot;#REDIRECT [[Naturalisation (biology)]] {{R cat shell| {{R from related topic}} }}&amp;quot;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    &amp;quot;parsedcomment&amp;quot;: &amp;quot;#REDIRECT &amp;lt;a href=\&amp;quot;&#x2F;wiki&#x2F;Naturalisation_(biology)\&amp;quot; title=\&amp;quot;Naturalisation (biology)\&amp;quot;&amp;gt;Naturalisation (biology)&amp;lt;&#x2F;a&amp;gt; {{R cat shell| {{R from related topic}} }}&amp;quot;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    &amp;quot;rev_slots&amp;quot;: {&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        &amp;quot;main&amp;quot;: {&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;            &amp;quot;rev_slot_content_model&amp;quot;: &amp;quot;wikitext&amp;quot;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;            &amp;quot;rev_slot_sha1&amp;quot;: &amp;quot;d9uoc7gw3cj3ejhs8ihvsi61hp54icq&amp;quot;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;            &amp;quot;rev_slot_size&amp;quot;: 82,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;            &amp;quot;rev_slot_origin_rev_id&amp;quot;: 1088883819&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        }&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    }&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;}&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Take a moment to get familiar with the &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;stream.wikimedia.org&#x2F;?doc#&#x2F;streams&#x2F;get_v2_stream_mediawiki_page_create&quot;&gt;page-create&lt;&#x2F;a&gt; schema from the wikipedia API documentation. The sample object is a bit messy for us to really see what is going on, so let&#x27;s clean it up a bit. Showing just the keys from the object with &lt;code&gt;jq&lt;&#x2F;code&gt; makes it much easier to plan our ingest query.&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;❯ jq &amp;#39;. | keys&amp;#39; &#x2F;tmp&#x2F;data.json&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;[&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  &amp;quot;$schema&amp;quot;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  &amp;quot;comment&amp;quot;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  &amp;quot;database&amp;quot;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  &amp;quot;meta&amp;quot;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  &amp;quot;page_id&amp;quot;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  &amp;quot;page_is_redirect&amp;quot;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  &amp;quot;page_namespace&amp;quot;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  &amp;quot;page_title&amp;quot;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  &amp;quot;parsedcomment&amp;quot;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  &amp;quot;performer&amp;quot;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  &amp;quot;rev_content_format&amp;quot;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  &amp;quot;rev_content_model&amp;quot;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  &amp;quot;rev_id&amp;quot;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  &amp;quot;rev_len&amp;quot;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  &amp;quot;rev_minor_edit&amp;quot;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  &amp;quot;rev_sha1&amp;quot;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  &amp;quot;rev_slots&amp;quot;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  &amp;quot;rev_timestamp&amp;quot;&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;]&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;The mediawiki recipe is an example use case for the &lt;code&gt;reify.time&lt;&#x2F;code&gt; user function. It creates temporal nodes in the graph and relationships with the &lt;code&gt;page-create&lt;&#x2F;code&gt; nodes based on the &lt;code&gt;rev_timestamp&lt;&#x2F;code&gt;.&lt;&#x2F;p&gt;
&lt;p&gt;By demonstrating the &lt;code&gt;reify.time&lt;&#x2F;code&gt; function, our ingest query creates revision nodes, db nodes, and user nodes that are related to each other and their representative time nodes.&lt;&#x2F;p&gt;
&lt;p&gt;To learn more about creating time-series nodes in Quine, read about &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;docs.quine.io&#x2F;reference&#x2F;reify-time.html&quot;&gt;time reification here.&lt;&#x2F;a&gt;&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-ingest-query&quot;&gt;The Ingest Query&lt;&#x2F;h2&gt;
&lt;p&gt;The ingest query is the workhorse of the ingest stream. Each datum, the &lt;code&gt;page-create&lt;&#x2F;code&gt; object in this case, is processed by the ingest query. The query is written in Cypher and is responsible for parsing data, creating nodes, storing data and setting relationships in the streaming graph.&lt;&#x2F;p&gt;
&lt;p&gt;First, the ingest query creates the nodes we want using &lt;code&gt;MATCH&lt;&#x2F;code&gt; and &lt;code&gt;WHERE&lt;&#x2F;code&gt;. The node &lt;strong&gt;&lt;code&gt;id&lt;&#x2F;code&gt;&lt;&#x2F;strong&gt; is assigned using the &lt;code&gt;idFrom&lt;&#x2F;code&gt; function.&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;MATCH (revNode) WHERE id(revNode) = idFrom(&amp;quot;revision&amp;quot;, $that.rev_id)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;MATCH (dbNode) WHERE id(dbNode) = idFrom(&amp;quot;db&amp;quot;, $that.database)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;MATCH (userNode) WHERE id(userNode) = idFrom(&amp;quot;id&amp;quot;, $that.performer.user_id)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Notice that we pass two parameters to the &lt;code&gt;idFrom&lt;&#x2F;code&gt; function. The first parameter, establishes a unique namespace for the &lt;code&gt;id&lt;&#x2F;code&gt;  to avoid collisions. The second parameter is the &lt;code&gt;rev_id&lt;&#x2F;code&gt; from the &lt;code&gt;page-create&lt;&#x2F;code&gt; object. The result from &lt;code&gt;idFrom&lt;&#x2F;code&gt; is a deterministic UUID for each node.&lt;&#x2F;p&gt;
&lt;p&gt;Next, we store the &lt;code&gt;rev&lt;&#x2F;code&gt;, &lt;code&gt;db&lt;&#x2F;code&gt;, and &lt;code&gt;user&lt;&#x2F;code&gt; values as properties in the respective nodes and label each node for clarity in the graph explorer. Quine parses the ingested line and stores the results in a variable, &lt;code&gt;$that&lt;&#x2F;code&gt;. You can retrieve values from the ingested datum using dot notation as &lt;code&gt;$that.&amp;lt;attribute&amp;gt;&lt;&#x2F;code&gt;.&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;SET revNode = $that, revNode.type = &amp;quot;rev&amp;quot;&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;SET dbNode.database = $that.database, dbNode.type = &amp;quot;db&amp;quot;&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;SET userNode = $that.performer, userNode.type = &amp;quot;user&amp;quot;&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;CALL create.setLabels(revNode, [&amp;quot;rev:&amp;quot; + $that.page_title])&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;CALL create.setLabels(dbNode, [&amp;quot;db:&amp;quot; + $that.database])&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;CALL create.setLabels(userNode, [&amp;quot;user:&amp;quot; + $that.performer.user_text])&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;There is quite a bit going on in this simple line. Specifically, the use of WITH *. Let&#x27;s take a moment to understand why we chose to use this pattern.&lt;&#x2F;p&gt;
&lt;p&gt;By calling &lt;code&gt;WITH *&lt;&#x2F;code&gt;, Cypher changes the scope of data available. If you explicitly list each node in the data and accidentally omit a variable, it&#x27;s lost for the remainder of the query, and you can get unexpected errors. Using the glob ensures that all nodes and variables are at your disposal in the ingest query.&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;WITH *, datetime($that.rev_timestamp) AS d&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;The ingest query make a &lt;code&gt;CALL&lt;&#x2F;code&gt; to the &lt;code&gt;reify.time&lt;&#x2F;code&gt; function to create a new &lt;code&gt;timeNode&lt;&#x2F;code&gt;. The resulting node is based on the year, month, day, hour, and minute of the &lt;code&gt;rev_timestamp&lt;&#x2F;code&gt;. It also increments the &lt;code&gt;count&lt;&#x2F;code&gt; parameter of the &lt;code&gt;timeNode&lt;&#x2F;code&gt;.&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;CALL reify.time(d, [&amp;quot;year&amp;quot;, &amp;quot;month&amp;quot;, &amp;quot;day&amp;quot;, &amp;quot;hour&amp;quot;, &amp;quot;minute&amp;quot;]) &lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;	YIELD node AS timeNode&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;CALL incrementCounter(timeNode, &amp;quot;count&amp;quot;)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Finally, the ingest query creates the relationships between nodes in the graph.&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;CREATE (revNode)-[:at]-&amp;gt;(timeNode)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;CREATE (revNode)-[:db]-&amp;gt;(dbNode)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;CREATE (revNode)-[:by]-&amp;gt;(userNode)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Now, let&#x27;s run the recipe to see how the ingest query builds out the graph in Quine.  With the latest Quine jar file downloaded from &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;quine.io&#x2F;download&quot;&gt;Quine.io&lt;&#x2F;a&gt; start the recipe from the command line.&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;❯ java -jar quine-x.x.x.jar -r wikipedia&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;The recipe includes a standing query that outputs nodes to the terminal as they arrive. You should see activity quickly after launching the recipe.&lt;&#x2F;p&gt;
&lt;p&gt;Before the graph gets too large, open Quine explorer (http:&#x2F;0.0.0.0:8080) and run the &lt;strong&gt;time nodes&lt;&#x2F;strong&gt; stored query. Each of the time nodes were created by the ingest query using the timestamp in the &lt;code&gt;page-create&lt;&#x2F;code&gt; object.&lt;&#x2F;p&gt;
&lt;p&gt;We call these synthetic nodes. Synthetic nodes are useful when looking for abstract patterns between loosely related nodes. In this case, which updates were done during a particular time bucket.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;uploads-ssl.webflow.com&#x2F;61f0aecf55af2565526f6a95&#x2F;628cf69dab26cf4671de4850_Quine%20Explorer%20UI%20MATCH.png&quot; alt=&quot;&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;Quine Exploration UI - Time Buckets&lt;&#x2F;p&gt;
&lt;p&gt;‍&lt;br &#x2F;&gt;
Using the API, let&#x27;s inspect the &lt;code&gt;ingest&lt;&#x2F;code&gt; stream using the ingest endpoint.&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;❯ http GET http:&#x2F;&#x2F;0.0.0.0:8080&#x2F;api&#x2F;v1&#x2F;ingest Content-Type:application&#x2F;json&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;HTTP&#x2F;1.1 200 OK&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;Content-Encoding: gzip&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;Content-Type: application&#x2F;json&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;Date: Fri, 20 May 2022 20:06:01 GMT&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;Server: akka-http&#x2F;10.2.9&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;Transfer-Encoding: chunked&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;{&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    &amp;quot;INGEST-1&amp;quot;: {&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        &amp;quot;settings&amp;quot;: {&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;            &amp;quot;format&amp;quot;: {&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;                &amp;quot;parameter&amp;quot;: &amp;quot;that&amp;quot;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;                &amp;quot;query&amp;quot;: &amp;quot;MATCH (revNode) WHERE id(revNode) = idFrom(\&amp;quot;revision\&amp;quot;, $that.rev_id)\nMATCH (dbNode) WHERE id(dbNode) = idFrom(\&amp;quot;db\&amp;quot;, $that.database)\nMATCH (userNode) WHERE id(userNode) = idFrom(\&amp;quot;id\&amp;quot;, $that.performer.user_id)\nSET revNode = $that, revNode.type = \&amp;quot;rev\&amp;quot;\nSET dbNode.database = $that.database, dbNode.type = \&amp;quot;db\&amp;quot;\nSET userNode = $that.performer, userNode.type = \&amp;quot;user\&amp;quot;\nWITH *, datetime($that.rev_timestamp) AS d\nCALL create.setLabels(revNode, [\&amp;quot;rev:\&amp;quot; + $that.page_title])\nCALL create.setLabels(dbNode, [\&amp;quot;db:\&amp;quot; + $that.database])\nCALL create.setLabels(userNode, [\&amp;quot;user:\&amp;quot; + $that.performer.user_text])\nCALL reify.time(d, [\&amp;quot;year\&amp;quot;, \&amp;quot;month\&amp;quot;, \&amp;quot;day\&amp;quot;, \&amp;quot;hour\&amp;quot;, \&amp;quot;minute\&amp;quot;]) YIELD node AS timeNode\nCALL incrementCounter(timeNode, \&amp;quot;count\&amp;quot;)\nCREATE (revNode)-[:at]-&amp;amp;gt;(timeNode)\nCREATE (revNode)-[:db]-&amp;amp;gt;(dbNode)\nCREATE (revNode)-[:by]-&amp;amp;gt;(userNode)&amp;quot;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;                &amp;quot;type&amp;quot;: &amp;quot;CypherJson&amp;quot;&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;            },&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;            &amp;quot;parallelism&amp;quot;: 16,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;            &amp;quot;type&amp;quot;: &amp;quot;ServerSentEventsIngest&amp;quot;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;            &amp;quot;url&amp;quot;: &amp;quot;https:&#x2F;&#x2F;stream.wikimedia.org&#x2F;v2&#x2F;stream&#x2F;page-create&amp;quot;&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        },&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        &amp;quot;stats&amp;quot;: {&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;            &amp;quot;byteRates&amp;quot;: {&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;                &amp;quot;count&amp;quot;: 1354157,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;                &amp;quot;fifteenMinute&amp;quot;: 1552.6927122874843,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;                &amp;quot;fiveMinute&amp;quot;: 1398.959143968717,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;                &amp;quot;oneMinute&amp;quot;: 1099.4731678954581,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;                &amp;quot;overall&amp;quot;: 1448.3578957557581&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;            },&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;            &amp;quot;ingestedCount&amp;quot;: 914,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;            &amp;quot;rates&amp;quot;: {&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;                &amp;quot;count&amp;quot;: 914,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;                &amp;quot;fifteenMinute&amp;quot;: 1.0510781922502073,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;                &amp;quot;fiveMinute&amp;quot;: 0.9474472912218986,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;                &amp;quot;oneMinute&amp;quot;: 0.7431750446830565,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;                &amp;quot;overall&amp;quot;: 0.9775815796950665&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;            },&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;            &amp;quot;startTime&amp;quot;: &amp;quot;2022-05-20T19:50:26.494025Z&amp;quot;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;            &amp;quot;totalRuntime&amp;quot;: 934608&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        },&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        &amp;quot;status&amp;quot;: &amp;quot;Running&amp;quot;&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    }&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;}&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;The ingest query defined via the recipe is named &lt;code&gt;INGEST-1&lt;&#x2F;code&gt; and is currently running.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Info&lt;&#x2F;strong&gt;&lt;&#x2F;p&gt;
&lt;p&gt;Did you know tat you can make API calls directly from the embedded API documentation? Select the page icon (📄) from the left nav inside of Quine Explore then navigate to the API endpoint that you want to exercise. Adjust the API call as needed, and press the blue &quot;Send API Request&quot; Button..&lt;&#x2F;p&gt;
&lt;p&gt;Pausing the stream via the API is done via the &lt;code&gt;ingest&#x2F;{name}&#x2F;pause endpoint&lt;&#x2F;code&gt;.&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;❯ http PUT http:&#x2F;&#x2F;tow-mater:8080&#x2F;api&#x2F;v1&#x2F;ingest&#x2F;INGEST-1&#x2F;pause Content-Type:application&#x2F;json&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;HTTP&#x2F;1.1 200 OK&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;Content-Encoding: gzip&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;Content-Type: application&#x2F;json&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;Date: Fri, 20 May 2022 20:09:27 GMT&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;Server: akka-http&#x2F;10.2.9&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;Transfer-Encoding: chunked&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;{&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    &amp;quot;name&amp;quot;: &amp;quot;INGEST-1&amp;quot;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    &amp;quot;settings&amp;quot;: {&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        &amp;quot;format&amp;quot;: {&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;            &amp;quot;parameter&amp;quot;: &amp;quot;that&amp;quot;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;            &amp;quot;query&amp;quot;: &amp;quot;MATCH (revNode) WHERE id(revNode) = idFrom(\&amp;quot;revision\&amp;quot;, $that.rev_id)\nMATCH (dbNode) WHERE id(dbNode) = idFrom(\&amp;quot;db\&amp;quot;, $that.database)\nMATCH (userNode) WHERE id(userNode) = idFrom(\&amp;quot;id\&amp;quot;, $that.performer.user_id)\nSET revNode = $that, revNode.type = \&amp;quot;rev\&amp;quot;\nSET dbNode.database = $that.database, dbNode.type = \&amp;quot;db\&amp;quot;\nSET userNode = $that.performer, userNode.type = \&amp;quot;user\&amp;quot;\nWITH *, datetime($that.rev_timestamp) AS d\nCALL create.setLabels(revNode, [\&amp;quot;rev:\&amp;quot; + $that.page_title])\nCALL create.setLabels(dbNode, [\&amp;quot;db:\&amp;quot; + $that.database])\nCALL create.setLabels(userNode, [\&amp;quot;user:\&amp;quot; + $that.performer.user_text])\nCALL reify.time(d, [\&amp;quot;year\&amp;quot;, \&amp;quot;month\&amp;quot;, \&amp;quot;day\&amp;quot;, \&amp;quot;hour\&amp;quot;, \&amp;quot;minute\&amp;quot;]) YIELD node AS timeNode\nCALL incrementCounter(timeNode, \&amp;quot;count\&amp;quot;)\nCREATE (revNode)-[:at]-&amp;amp;gt;(timeNode)\nCREATE (revNode)-[:db]-&amp;amp;gt;(dbNode)\nCREATE (revNode)-[:by]-&amp;amp;gt;(userNode)&amp;quot;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;            &amp;quot;type&amp;quot;: &amp;quot;CypherJson&amp;quot;&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        },&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        &amp;quot;parallelism&amp;quot;: 16,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        &amp;quot;type&amp;quot;: &amp;quot;ServerSentEventsIngest&amp;quot;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        &amp;quot;url&amp;quot;: &amp;quot;https:&#x2F;&#x2F;stream.wikimedia.org&#x2F;v2&#x2F;stream&#x2F;page-create&amp;quot;&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    },&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    &amp;quot;stats&amp;quot;: {&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        &amp;quot;byteRates&amp;quot;: {&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;            &amp;quot;count&amp;quot;: 1653281,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;            &amp;quot;fifteenMinute&amp;quot;: 1530.565647994232,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;            &amp;quot;fiveMinute&amp;quot;: 1428.2092910910662,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;            &amp;quot;oneMinute&amp;quot;: 1488.2104624440235,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;            &amp;quot;overall&amp;quot;: 1448.444229804896&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        },&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        &amp;quot;ingestedCount&amp;quot;: 1117,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        &amp;quot;rates&amp;quot;: {&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;            &amp;quot;count&amp;quot;: 1117,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;            &amp;quot;fifteenMinute&amp;quot;: 1.0361739604926652,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;            &amp;quot;fiveMinute&amp;quot;: 0.96669545913622,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;            &amp;quot;oneMinute&amp;quot;: 1.0032209384426753,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;            &amp;quot;overall&amp;quot;: 0.9786067989220232&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        },&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        &amp;quot;startTime&amp;quot;: &amp;quot;2022-05-20T19:50:26.494025Z&amp;quot;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        &amp;quot;totalRuntime&amp;quot;: 1141066&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    },&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    &amp;quot;status&amp;quot;: &amp;quot;Paused&amp;quot;&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;}&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Notice that the updates in your terminal window stopped and the &lt;strong&gt;&lt;code&gt;INGEST-1&lt;&#x2F;code&gt;&lt;&#x2F;strong&gt; ingest stream has a status of &lt;code&gt;&quot;Paused&quot;&lt;&#x2F;code&gt;.&lt;&#x2F;p&gt;
&lt;p&gt;Restart the stream with a &lt;code&gt;PUT&lt;&#x2F;code&gt; to the &lt;code&gt;&#x2F;ingest&#x2F;{name}&#x2F;start&lt;&#x2F;code&gt; endpoint. Updates will resume in your terminal window and the ingest stream status will return to &lt;code&gt;&quot;Running&quot;&lt;&#x2F;code&gt;.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;conclusion&quot;&gt;Conclusion&lt;&#x2F;h2&gt;
&lt;p&gt;We are just getting warmed up whit ingest streams! This post walked through a simple ingest stream and ingest query to read server-sent events (SSE) from the Wikipedia streaming events service.&lt;&#x2F;p&gt;
&lt;p&gt;Next up in the series is &lt;strong&gt;Ingesting CSV data&lt;&#x2F;strong&gt; where we will go over how Quine streams in data that is stored in a CSV file.&lt;&#x2F;p&gt;
&lt;p&gt;I welcome your feedback! Drop in to &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;that.re&#x2F;quine-slack&quot;&gt;Quine Slack&lt;&#x2F;a&gt; and let me know what you think. I&#x27;m always happy to discuss Quine or answer questions.&lt;&#x2F;p&gt;
&lt;p&gt;‍&lt;&#x2F;p&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>Building a Quine Streaming Graph: Ingest Streams</title>
        <published>2022-05-21T00:00:00+00:00</published>
        <updated>2022-05-21T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              Unknown
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://www.thatdot.com/blog/building-a-quine-streaming-graph-ingest-streams/"/>
        <id>https://www.thatdot.com/blog/building-a-quine-streaming-graph-ingest-streams/</id>
        
        <content type="html" xml:base="https://www.thatdot.com/blog/building-a-quine-streaming-graph-ingest-streams/">&lt;h2 id=&quot;quine-ingest-streams&quot;&gt;Quine Ingest Streams&lt;&#x2F;h2&gt;
&lt;p&gt;Quine is optimized to process high volumes of data in motion and then stream out high-quality insights in real-time. The ingest stream is where a streaming graph starts. It connects to data producers, transforms the data, then populates a streaming graph to be analyzed by standing queries.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;uploads-ssl.webflow.com&#x2F;61f0aecf55af2565526f6a95&#x2F;6286640ed44dedd4b8706254_Ingest%20Blog%201%20image%20body.png&quot; alt=&quot;Quine streaming graph: high volume data in, high value data out..&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;Quine streaming graph combines multiple sources to detect high value patterns.&lt;&#x2F;p&gt;
&lt;p&gt;Let&#x27;s get under the hood to understand how ingest streams work.&lt;&#x2F;p&gt;
&lt;p&gt;Quine is fundamentally a stream-oriented data processor that uses a graph data model. This provides optimal integration with streaming data producers and consumers such as Kafka and Kinesis. Quine builds on this streaming foundation to provide batch-like capabilities by converting data stored in files to streaming data to load into the graph.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;ingest-stream-concepts&quot;&gt;Ingest Stream Concepts&lt;&#x2F;h2&gt;
&lt;p&gt;&lt;strong&gt;What is an Ingest Stream?&lt;&#x2F;strong&gt;&lt;&#x2F;p&gt;
&lt;p&gt;An &lt;em&gt;ingest stream&lt;&#x2F;em&gt; connects a data source to Quine and prepares the emitted data for the streaming graph. Within the ingest stream, an ingest query, written in Cypher, updates the streaming graph nodes and edges as data is received.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Backpressuring Ingest Streams&lt;&#x2F;strong&gt;&lt;&#x2F;p&gt;
&lt;p&gt;Inevitably, when streaming data producers outpace consumers, the consumer will become overwhelmed. In Quine, as an ingest stream begins to get more data than it can process, it manages the dataflow to avoid becoming overwhelmed using &quot;backpressure.&quot;&lt;&#x2F;p&gt;
&lt;p&gt;A backpressured system does not buffer, it causes producers upstream to *not* send data at a rate greater than it can process. The problem with buffering is that a buffer will eventually run out of space. And then what? The system must decide what to do when the buffer is full: drop new results, drop old results, crash the system, or backpressure.&lt;&#x2F;p&gt;
&lt;p&gt;Backpressure is a &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;www.reactive-streams.org&#x2F;&quot;&gt;protocol&lt;&#x2F;a&gt; defining how to send a logical signal UP the stream with information about the downstream consumers readiness to receive more data. That backpressure signal follows the same path as data moving downstream, but in reverse. If downstream is not ready to consume, then upstream does does not send.&lt;&#x2F;p&gt;
&lt;p&gt;Quine uses a reactive stream implementation of backpressure, &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;doc.akka.io&#x2F;docs&#x2F;akka&#x2F;current&#x2F;stream&#x2F;stream-flows-and-basics.html#core-concepts&quot;&gt;Akka Streams&lt;&#x2F;a&gt;, built on top of the actor model to ensure that the ingestion and processing of streams are resilient.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Info&lt;&#x2F;strong&gt;&lt;&#x2F;p&gt;
&lt;p&gt;Curious about the operational challenge associated with reactive streams? Read the &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;reactivemanifesto.org&quot;&gt;Reactive Manifesto&lt;&#x2F;a&gt; to understand the problems faced by every streaming processor in a high-volume data pipeline.&lt;&#x2F;p&gt;
&lt;p&gt;Including asynchronous, non-blocking backpressure is the only method to ensure that all data from a high-volue stream is processed without data loss or processing delays.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;All Nodes Exist&lt;&#x2F;strong&gt;&lt;&#x2F;p&gt;
&lt;p&gt;With a graph data model, nodes are the primary unit of data — much like a &quot;row&quot; is the primary unit of data in a relational database. However, unlike traditional graph data systems, a Quine user never has to create a node directly. Instead, &lt;em&gt;the system functions as if all nodes exist.&lt;&#x2F;em&gt;&lt;&#x2F;p&gt;
&lt;p&gt;Quine represents every possible node as an existing &quot;empty node&quot; with no interesting history. As data streams into the system, the node becomes interesting, and Quine creates a history for the node.&lt;&#x2F;p&gt;
&lt;p&gt;We added an &lt;code&gt;idFrom&lt;&#x2F;code&gt; function to Cypher that takes any number of arguments and deterministically produces a node ID from that data. This is similar to a consistent-hashing strategy, except that the ID produced from this function is always an ID that conforms to the type chosen for the ID provider.&lt;&#x2F;p&gt;
&lt;p&gt;You will use &lt;code&gt;idFrom&lt;&#x2F;code&gt; in the ingest query part of every ingest stream that you create. For example, the absolute minimum ingest query to load incoming data into the graph is simply a wrapper around the &lt;code&gt;idFrom&lt;&#x2F;code&gt; function.&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;MATCH (n) WHERE id(n) = idFrom($that) SET n.line = $that&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;&lt;strong&gt;Historical Versioning&lt;&#x2F;strong&gt;&lt;&#x2F;p&gt;
&lt;p&gt;Each node in the graph records all of its historical changes over time. When a node&#x27;s properties or edges are changed, the change event and timestamp are saved to an append-only log for that particular node. This historical log can be replayed up to any desired moment in time, allowing for the system to quickly answer questions using the state of the graph as it was in the past. This is a technique known as &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;martinfowler.com&#x2F;eaaDev&#x2F;EventSourcing.html&quot;&gt;Event Sourcing&lt;&#x2F;a&gt;, applied individually to each node.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;syntax-and-structure&quot;&gt;Syntax and Structure&lt;&#x2F;h2&gt;
&lt;p&gt;The first step when defining an ingest stream is to understand the overall shape of your data. This includes identifying the data elements necessary for standing queries to use in a &lt;code&gt;MATCH&lt;&#x2F;code&gt;.&lt;&#x2F;p&gt;
&lt;p&gt;An ingest query is defined by setting a &lt;code&gt;type&lt;&#x2F;code&gt; described by the &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;quine.io&#x2F;reference&#x2F;rest-api&#x2F;#&#x2F;schemas&#x2F;com.thatdot.quine.routes.IngestStreamInfo&quot;&gt;API documentation&lt;&#x2F;a&gt;. Quine supports eight types of ingest streams. Each type has a unique form and requires a specific structure to configure properly.&lt;&#x2F;p&gt;
&lt;p&gt;For example, constructing an ingest stream via the &lt;code&gt;&#x2F;api&#x2F;v1&#x2F;ingest&#x2F;{name}&lt;&#x2F;code&gt; &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;docs.quine.io&#x2F;reference&#x2F;rest-api.html#&#x2F;paths&#x2F;api-v1-ingest-name&#x2F;post&quot;&gt;API&lt;&#x2F;a&gt;  endpoint to read data from standard in and store each line as a node looks similar to the example below.&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;{    &lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;	&amp;quot;type&amp;quot;: &amp;quot;StandardInputIngest&amp;quot;,    &lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;	&amp;quot;format&amp;quot;: {	    &lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;		&amp;quot;type&amp;quot;: &amp;quot;CypherLine&amp;quot;,	   &lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;		&amp;quot;query&amp;quot;: &amp;quot;MATCH (n) WHERE id(n) = idFrom($that) SET n.line = $that&amp;quot;    &lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    }&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt; }&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Quine natively reads from standard-in, passing each line into a Cypher query as: &lt;code&gt;$that&lt;&#x2F;code&gt;. A unique node ID is generated using &lt;code&gt;idFrom($that)&lt;&#x2F;code&gt;. Then, each line is stored as a &lt;code&gt;line&lt;&#x2F;code&gt; parameter associated with a new node in the streaming graph.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Info&lt;&#x2F;strong&gt;&lt;&#x2F;p&gt;
&lt;p&gt;When creating an ingest stream via the API, you are given the opportunity to name the steam with a name that has meaning. For example, you can name the above ingest stream standardIn to make it easier to reference in your application.&lt;&#x2F;p&gt;
&lt;p&gt;Alternatively, creating an ingest steam via a recipe, Quine automatically assigns a name to each steam using the format INGEST-1 where the first ingest stream defined in the recipe is INGEST-1 and subsequent ingest steams are name in order with # counting up.&lt;&#x2F;p&gt;
&lt;p&gt;Here is the same ingest stream defined in a Quine &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;docs.quine.io&#x2F;reference&#x2F;recipe-ref-manual.html&quot;&gt;Recipe&lt;&#x2F;a&gt;.&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;ingestStreams:&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  - type: StandardInputIngest&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    format:&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;      type: CypherLine&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;      query: |-&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        MATCH (n)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        WHERE id(n) = idFrom($that)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        SET n.line = $that&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;&lt;h2 id=&quot;ingest-stream-reporting&quot;&gt;Ingest Stream Reporting&lt;&#x2F;h2&gt;
&lt;p&gt;&lt;strong&gt;Inspecting Ingest Streams via the API&lt;&#x2F;strong&gt;&lt;&#x2F;p&gt;
&lt;p&gt;Quine exposes a series of API endpoints that enable you to monitor and manage ingest streams while in operation. The complete endpoint definitions are available in the API documentation.&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;docs.quine.io&#x2F;reference&#x2F;rest-api.html#&#x2F;paths&#x2F;api-v1-ingest&#x2F;get&quot;&gt;List all running ingest streams&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;docs.quine.io&#x2F;reference&#x2F;rest-api.html#&#x2F;paths&#x2F;api-v1-ingest-name&#x2F;get&quot;&gt;Look up a running ingest stream&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;docs.quine.io&#x2F;reference&#x2F;rest-api.html#&#x2F;paths&#x2F;api-v1-ingest-name--pause&#x2F;put&quot;&gt;Pause an ingest stream&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;docs.quine.io&#x2F;reference&#x2F;rest-api.html#&#x2F;paths&#x2F;api-v1-ingest-name--start&#x2F;put&quot;&gt;Unpause an ingest stream&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;docs.quine.io&#x2F;reference&#x2F;rest-api.html#&#x2F;paths&#x2F;api-v1-ingest-name&#x2F;delete&quot;&gt;Cancel a running ingest stream&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;Let&#x27;s take a look at the information available from the &lt;code&gt;INGEST-1&lt;&#x2F;code&gt; ingest stream  from the &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;quine.io&#x2F;recipes&#x2F;ethereum-tag-propagation&quot;&gt;Ethereum Tag Propagation&lt;&#x2F;a&gt; Recipe.&lt;&#x2F;p&gt;
&lt;p&gt;Start the recipe.&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;❯ java -jar quine-x.x.x.jar -r ethereum&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;List the ingest streams started by the Ethereum recipe using the &lt;code&gt;&#x2F;api&#x2F;v1&#x2F;ingest&lt;&#x2F;code&gt; endpoint.&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;❯ curl -s &amp;quot;http:&#x2F;&#x2F;localhost:8080&#x2F;api&#x2F;v1&#x2F;ingest&amp;quot; | jq &amp;#39;. | keys&amp;#39;&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;[&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  &amp;quot;INGEST-1&amp;quot;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  &amp;quot;INGEST-2&amp;quot;&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;]&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;The Ethereum recipe creates two ingest streams; &lt;code&gt;INGEST-1&lt;&#x2F;code&gt; and &lt;code&gt;INGEST-2&lt;&#x2F;code&gt;.&lt;&#x2F;p&gt;
&lt;p&gt;Now, view the ingest stream stats using the &lt;code&gt;&#x2F;api&#x2F;v1&#x2F;ingest&#x2F;INGEST-1&lt;&#x2F;code&gt; endpoint.&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;❯ curl -s &amp;quot;http:&#x2F;&#x2F;localhost:8080&#x2F;api&#x2F;v1&#x2F;ingest&#x2F;INGEST-1&amp;quot; | jq&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;{&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  &amp;quot;name&amp;quot;: &amp;quot;INGEST-1&amp;quot;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  &amp;quot;status&amp;quot;: &amp;quot;Running&amp;quot;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  &amp;quot;settings&amp;quot;: {&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    &amp;quot;format&amp;quot;: {&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;      &amp;quot;query&amp;quot;: &amp;quot;MATCH (BA), (minerAcc), (blk), (parentBlk)\nWHERE\n  &lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    id(blk) = idFrom(&amp;#39;block&amp;#39;, $that.hash)\n  AND id(parentBlk) = &lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    idFrom(&amp;#39;block&amp;#39;, $that.parentHash)\n  AND id(BA) = &lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    idFrom(&amp;#39;block_assoc&amp;#39;, $that.hash)\n  AND id(minerAcc) = &lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    idFrom(&amp;#39;account&amp;#39;, $that.miner)\nCREATE\n  &lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    (minerAcc)&amp;gt;-[:mined_by]-(blk)-[:header_for]-&amp;gt;(BA),\n  &lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    (blk)-[:preceded_by]-&amp;gt;(parentBlk)\nSET\n  BA:block_assoc,\n  &lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;     BA.number = $that.number,\n  BA.hash = $that.hash,\n  &lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;     blk:block,\n  blk = $that,\n  minerAcc:account,\n  &lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;     minerAcc.address = $that.miner&amp;quot;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;     	&amp;quot;parameter&amp;quot;: &amp;quot;that&amp;quot;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;      &amp;quot;type&amp;quot;: &amp;quot;CypherJson&amp;quot;&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    },&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    &amp;quot;url&amp;quot;: &amp;quot;https:&#x2F;&#x2F;ethereum.demo.thatdot.com&#x2F;blocks_head&amp;quot;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    &amp;quot;parallelism&amp;quot;: 16,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    &amp;quot;type&amp;quot;: &amp;quot;ServerSentEventsIngest&amp;quot;&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  },&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  &amp;quot;stats&amp;quot;: {&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    &amp;quot;ingestedCount&amp;quot;: 57,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    &amp;quot;rates&amp;quot;: {&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;      &amp;quot;count&amp;quot;: 57,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;      &amp;quot;oneMinute&amp;quot;: 0.045556443551085735,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;      &amp;quot;fiveMinute&amp;quot;: 0.06175571100053622,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;      &amp;quot;fifteenMinute&amp;quot;: 0.04159128290271318,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;      &amp;quot;overall&amp;quot;: 0.07659077758191643&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    },&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    &amp;quot;byteRates&amp;quot;: {&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;      &amp;quot;count&amp;quot;: 78451,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;      &amp;quot;oneMinute&amp;quot;: 62.49789862393008,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;      &amp;quot;fiveMinute&amp;quot;: 84.92629746711795,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;      &amp;quot;fifteenMinute&amp;quot;: 57.22987512826503,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;      &amp;quot;overall&amp;quot;: 105.41446006900763&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    },&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    &amp;quot;startTime&amp;quot;: &amp;quot;2022-05-17T18:56:08.161500Z&amp;quot;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    &amp;quot;totalRuntime&amp;quot;: 744041&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  }&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;}&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;&lt;strong&gt;Reporting on Ingest Stream progress using a Status Query&lt;&#x2F;strong&gt;&lt;&#x2F;p&gt;
&lt;p&gt;When creating an ingest query via a recipe, you can add a status query that runs continuously. For example, the status query below prints the information for each graph node, and a link to the visualization in the web UI.&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;statusQuery:&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  cypherQuery: MATCH (n) RETURN count(n)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;&lt;h2 id=&quot;ingest-stream-blog-series&quot;&gt;Ingest Stream Blog Series&lt;&#x2F;h2&gt;
&lt;p&gt;This is just the beginning. There&#x27;s lots more to cover. Over the next few weeks, we will cover the most common ingest streams in separate blog posts.&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https:&#x2F;&#x2F;www.thatdot.com&#x2F;blog&#x2F;ingesting-data-from-the-internet&#x2F;&quot;&gt;Ingesting data from an Internet Source&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a href=&quot;https:&#x2F;&#x2F;www.thatdot.com&#x2F;blog&#x2F;ingesting-from-multiple-data-sources-into-quine-streaming-graph&#x2F;&quot;&gt;Ingesting Multiple Sources&#x2F;CSV Data&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a href=&quot;https:&#x2F;&#x2F;www.thatdot.com&#x2F;blog&#x2F;ingest-and-analyze-log-files-using-streaming-graph&#x2F;&quot;&gt;Ingesting Log Files&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a href=&quot;https:&#x2F;&#x2F;www.thatdot.com&#x2F;blog&#x2F;real-time-graph-analytics-for-kafka-streams-with-quine&#x2F;&quot;&gt;Ingesting Data from Kafka&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;li&gt;Quine in a Data Pipeline&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;h3 id=&quot;try-adding-ingest-data-to-quine-yourself&quot;&gt;Try Adding Ingest Data To Quine Yourself&lt;&#x2F;h3&gt;
&lt;p&gt;And if you want to try Quine yourself, you can &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;quine.io&#x2F;download&quot;&gt;download&lt;&#x2F;a&gt; it here. And in addition to the Ethereum recipe, take a look at the &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;quine.io&#x2F;recipes&#x2F;wikipedia-page-ingest&quot;&gt;Wikipedia Ingest&lt;&#x2F;a&gt; and  &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;quine.io&#x2F;recipes&#x2F;apache-log-analytics&quot;&gt;Apache Log Analytics&lt;&#x2F;a&gt; recipes for different ingest stream examples.&lt;&#x2F;p&gt;
&lt;p&gt;If you have questions or want to check out the community, &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;that.re&#x2F;quine-slack&quot;&gt;join Quine slack&lt;&#x2F;a&gt; or visit our &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;thatdot&#x2F;quine&quot;&gt;Github&lt;&#x2F;a&gt; page.&lt;&#x2F;p&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>Time Series Streaming Graph and other Quine 1.2.0 Highlights</title>
        <published>2022-05-10T00:00:00+00:00</published>
        <updated>2022-05-10T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              Unknown
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://www.thatdot.com/blog/time-series-streaming-graph-and-other-quine-1-2-0-highlights/"/>
        <id>https://www.thatdot.com/blog/time-series-streaming-graph-and-other-quine-1-2-0-highlights/</id>
        
        <content type="html" xml:base="https://www.thatdot.com/blog/time-series-streaming-graph-and-other-quine-1-2-0-highlights/">&lt;h2 id=&quot;time-as-categorical-data-and-other-improvements&quot;&gt;Time As Categorical Data and other Improvements&lt;&#x2F;h2&gt;
&lt;p&gt;Last week saw the release of Quine 1.2.0 and with it, some notable new features, several new recipes, and  a cluster of performance related improvements that include two important changes impacting backwards compatibility.&lt;&#x2F;p&gt;
&lt;p&gt;All in all, Quine Streaming Graph 1.2.0 is a significant update and worth a detailed look.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;time-series-streaming-graph&quot;&gt;Time Series Streaming Graph&lt;&#x2F;h3&gt;
&lt;p&gt;&lt;strong&gt;Reification of time&lt;&#x2F;strong&gt; – easily the most immediately useful feature of 1.2.0, &lt;code&gt;reify.time&lt;&#x2F;code&gt; is a new custom Cypher procedure you can use to generate a structured representation of timestamp data. Think buckets of days, hours, minutes, and seconds represented as nodes. Does this make Quine a time series database now? More like a &lt;strong&gt;time series streaming graph&lt;&#x2F;strong&gt;. &lt;code&gt;reify.time&lt;&#x2F;code&gt; makes it straightforward to create and connect events to time-based nodes and execute time-related data analysis via &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;quine.io&#x2F;docs&#x2F;getting-started&#x2F;writing-standing-queries&quot;&gt;standing queries&lt;&#x2F;a&gt;.&lt;&#x2F;p&gt;
&lt;p&gt;For time-series use cases when you&#x27;re looking for real-time insights, the ability to treat time as categorical data and render it as nodes on the graph is pretty powerful.&lt;&#x2F;p&gt;
&lt;p&gt;And if you want to quickly try it yourself, we’ve added a new &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;quine.io&#x2F;recipes&#x2F;wikipedia-page-ingest&quot;&gt;Wikipedia Ingest&lt;&#x2F;a&gt; recipe.&lt;&#x2F;p&gt;
&lt;p&gt;Here’s the &lt;strong&gt;&lt;code&gt;reify.time&lt;&#x2F;code&gt;&lt;&#x2F;strong&gt; &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;quine.io&#x2F;docs&#x2F;reference&#x2F;reify-time&quot;&gt;documentation page&lt;&#x2F;a&gt; so you can learn more.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;cypher-feature-support&quot;&gt;Cypher Feature Support&lt;&#x2F;h3&gt;
&lt;p&gt;&lt;strong&gt;Cypher subqueries&lt;&#x2F;strong&gt; - with the introduction of the &lt;code&gt;CALL {}&lt;&#x2F;code&gt; command, Quine now adds the subquery. Since Neo4J has the best Cypher documentation in the biz, here’s a &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;neo4j.com&#x2F;docs&#x2F;cypher-manual&#x2F;current&#x2F;clauses&#x2F;call-subquery&#x2F;index.html#subquery-unit&quot;&gt;link to their Cypher Manual page&lt;&#x2F;a&gt; on &lt;strong&gt;&lt;code&gt;CALL {}&lt;&#x2F;code&gt;&lt;&#x2F;strong&gt;.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;OpenCypher and Java 11&lt;&#x2F;strong&gt; - starting with 1.2.0, Quine has switched to OpenCypher, which requires Java 11 or later. Previous versions required Java 8 or later.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;rest-api-and-storage-enhancements&quot;&gt;Rest API and Storage Enhancements&lt;&#x2F;h3&gt;
&lt;p&gt;&lt;strong&gt;REST API UPGRADE (DOCS)&lt;&#x2F;strong&gt; - we are preparing to move from the standard Swagger stacked REST API documentation to Spotlight’s Elements framework. Check it out on the REST API documentation page. If you have opinions or feedback, we’d love to hear it. Look for it in Quine itself in a near-term release.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;uploads-ssl.webflow.com&#x2F;61f0aecf55af2565526f6a95&#x2F;627abb697ae5bb0a7bf8d4f9_Quine%201.2.0%20Screenshot%20of%20new%20API%20docs%20.png&quot; alt=&quot;&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;Switch to Spotlight Elements for improved API Docs Legibility&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Added Quine.persistence.effect-order&lt;&#x2F;strong&gt;  – This &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;quine.io&#x2F;docs&#x2F;reference&#x2F;configuration&quot;&gt;configuration option&lt;&#x2F;a&gt; is particularly useful for Cassandra users who want to use Quine+Cassandra as their database of record. &lt;code&gt;quine.persistence.effect-order&lt;&#x2F;code&gt; can be set to either &lt;strong&gt;&lt;code&gt;memory-first&lt;&#x2F;code&gt;&lt;&#x2F;strong&gt; or &lt;strong&gt;&lt;code&gt;persistor-first&lt;&#x2F;code&gt;&lt;&#x2F;strong&gt;. The latter option, &lt;strong&gt;&lt;code&gt;persistor-first&lt;&#x2F;code&gt;&lt;&#x2F;strong&gt; is meant to support this use case directly.&lt;&#x2F;p&gt;
&lt;p&gt;In the presence of system failure, with &lt;strong&gt;&lt;code&gt;persistor-first&lt;&#x2F;code&gt;&lt;&#x2F;strong&gt; Quine will have durably stored updates before anything else occurs (e.g. triggering standing queries).&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Supernode performance enhancements&lt;&#x2F;strong&gt; - super nodes, or nodes with &lt;em&gt;LOTS&lt;&#x2F;em&gt; of edges, can negatively impact performance. As part of an ongoing effort to mitigate that impact, we have added two features to improve performance:&lt;&#x2F;p&gt;
&lt;ol&gt;
&lt;li&gt;Improved serialization for nodes with an extremely large edge and&#x2F;or property counts in the persistence backend. While large property counts don’t technically make a node a supernode but the performance problems are similar. &lt;em&gt;thatDot platform users should note that a similar change was made to Quine cluster member behavior.&lt;&#x2F;em&gt;&lt;&#x2F;li&gt;
&lt;li&gt;Nodes with an extremely large edge and&#x2F;or property count can now be accessed via the Literal Operations REST APIs&lt;&#x2F;li&gt;
&lt;&#x2F;ol&gt;
&lt;p&gt;&lt;strong&gt;Storage Format&lt;&#x2F;strong&gt; &lt;strong&gt;Change&lt;&#x2F;strong&gt; – data stored in Quine 1.1.2 and earlier can not be used in Quine 1.2.0 without migration.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Additional Improvements - storage format&lt;&#x2F;strong&gt;  in addition to improved serialization for nodes with large edge and&#x2F;or property counts, updates to Quine 1.2.0 storage included:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;Calling the debug API on a node in a historical query now only includes journal events up to the time of the historical query&lt;&#x2F;li&gt;
&lt;li&gt;Rename Cassandra store options &lt;strong&gt;&lt;code&gt;insert-time&lt;&#x2F;code&gt;&lt;&#x2F;strong&gt; and &lt;strong&gt;&lt;code&gt;select-timeout&lt;&#x2F;code&gt;&lt;&#x2F;strong&gt; to &lt;strong&gt;&lt;code&gt;write-timeout&lt;&#x2F;code&gt;&lt;&#x2F;strong&gt; and &lt;strong&gt;&lt;code&gt;read-timeout&lt;&#x2F;code&gt;&lt;&#x2F;strong&gt;, respectively (#1733)&lt;&#x2F;li&gt;
&lt;li&gt;Bugfix: Setting &lt;strong&gt;&lt;code&gt;snapshot-singleton=true&lt;&#x2F;code&gt;&lt;&#x2F;strong&gt;, &lt;strong&gt;&lt;code&gt;snapshot-schedule=on-node-update&lt;&#x2F;code&gt;&lt;&#x2F;strong&gt;, and &lt;strong&gt;&lt;code&gt;journal-enabled=false&lt;&#x2F;code&gt;&lt;&#x2F;strong&gt; no longer causes the most recent event on a node to be dropped&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;h3 id=&quot;download-and-try-quine-1-2-0-today&quot;&gt;&lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;http:&#x2F;&#x2F;quine.io&#x2F;download&quot;&gt;Download&lt;&#x2F;a&gt; and try Quine 1.2.0 today.&lt;&#x2F;h3&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>thatDot, makers of Quine, Announces CrowdStrike Falcon Fund Investment</title>
        <published>2022-04-27T00:00:00+00:00</published>
        <updated>2022-04-27T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              Unknown
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://www.thatdot.com/news/crowdstrike-invests-in-thatdot/"/>
        <id>https://www.thatdot.com/news/crowdstrike-invests-in-thatdot/</id>
        
        <content type="html" xml:base="https://www.thatdot.com/news/crowdstrike-invests-in-thatdot/">&lt;p&gt;We are both excited and proud to announce that CrowdStrike, through their Falcon Fund,  have made an investment in thatDot. This news comes on the heels of the &lt;a href=&quot;https:&#x2F;&#x2F;www.thatdot.com&#x2F;news&#x2F;announcing-open-source-release-of-quine-streaming-graph&#x2F;&quot;&gt;release of Quine&lt;&#x2F;a&gt;, our open source streaming graph software. The investment serves not only to validate our vision for what Quine can be, but the concrete progress we’ve made in executing on that vision.&lt;&#x2F;p&gt;
&lt;p&gt;The vision we are executing on is both simple and audacious: we see  engineers and data scientists using Quine as &lt;em&gt;the&lt;&#x2F;em&gt; central hub for high volume, real-time, complex event processing workflows at scale.&lt;&#x2F;p&gt;
&lt;p&gt;A handful of Quine queries can replace months of development time and millions in costs, eliminating the need to build complex microservices architectures that drag down and stall analysis on streaming data.&lt;&#x2F;p&gt;
&lt;p&gt;It is rare that such a revolutionary advance occurs in such an important infrastructure category. Investment from CrowdStrike and others will help thatDot scale more rapidly to meet the many applications of the platform across a large variety of use cases.&lt;&#x2F;p&gt;
&lt;p&gt;Or, as Michael Sentonas, Chief Technology Officer at CrowdStrike said:&lt;&#x2F;p&gt;
&lt;blockquote&gt;
&lt;p&gt;“CrowdStrike and thatDot share a commitment to bringing speed and efficiency to data pipeline development teams through real-time, critical analysis of telemetry. The thatDot platform unlocks value for these teams, enabling them to understand and act upon massive amounts of data quickly and confidently.”&lt;&#x2F;p&gt;
&lt;&#x2F;blockquote&gt;
&lt;p&gt;For those of you who’ve already joined the &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;that.re&#x2F;quine-slack&quot;&gt;Quine community&lt;&#x2F;a&gt;, thanks for helping us reach this important milestone. For those of you who are new to the community and to streaming graph, we want to &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;quine.io&#x2F;&quot;&gt;welcome you to the adventure&lt;&#x2F;a&gt;.&lt;&#x2F;p&gt;
&lt;p&gt;‍&lt;&#x2F;p&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>Your Graph DB Won’t Scale? Stop Querying it.</title>
        <published>2022-04-14T00:00:00+00:00</published>
        <updated>2022-04-14T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              Unknown
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://www.thatdot.com/blog/graph-db-wont-scale-stop-querying-it/"/>
        <id>https://www.thatdot.com/blog/graph-db-wont-scale-stop-querying-it/</id>
        
        <content type="html" xml:base="https://www.thatdot.com/blog/graph-db-wont-scale-stop-querying-it/">&lt;h2 id=&quot;&quot;&gt;&lt;&#x2F;h2&gt;
&lt;p&gt;It is almost inevitable when talking to data engineers or scientists about Quine streaming graph that they start ticking off all the graph databases they’ve already tried and how vastly different each was to operate. That’s not surprising. The tidy category name of &lt;em&gt;graph database&lt;&#x2F;em&gt; obscures what is in fact a pretty diverse set of database technologies. From purpose-built graph databases (Neo4J and TigerGraph) to triple stores (AWS Neptune), and from distributed graphs tightly coupled to column stores (Titan) to multi-model document stores with graph wrappers (ArrangoDB and OrientDB), you’ll find &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;people.cs.aau.dk&#x2F;~matteo&#x2F;&#x2F;publications&#x2F;journal&#x2F;2018-vldb-gdb.html&quot;&gt;&lt;strong&gt;widely divergent&lt;&#x2F;strong&gt;&lt;&#x2F;a&gt; underlying data structures, indexing and storage architectures, performance profiles, and target use cases.&lt;&#x2F;p&gt;
&lt;p&gt;They have two traits in common that always come up in these calls, though: they all behave like classic databases in that you must proactively query them to see if data is available and they’ve proven &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;tdwi.org&#x2F;articles&#x2F;2017&#x2F;03&#x2F;14&#x2F;good-bad-and-hype-about-graph-databases-for-mdm.aspx&quot;&gt;&lt;strong&gt;unable to scale&lt;&#x2F;strong&gt;&lt;&#x2F;a&gt; beyond what are today relatively modest workloads, especially increasingly common event stream processing workloads. Those two traits are in fact profoundly interconnected. Querying a database until you get the answer you need is such a deeply ingrained pattern when it comes to detecting patterns in data that we don’t even question it. It is also grossly inefficient. Compute resources are spent polling for data that is either not in the database or has been in the database for some amount of time.&lt;&#x2F;p&gt;
&lt;p&gt;The only way to know if the data is available is to issue the query. Applying this model to event streams of even moderate volume (e.g. 1,000 events&#x2F;sec) only compounds the problem, rendering graph databases incapable of delivering results in anything close to a reasonable timeframe. [link to benchmarking whitepaper] So when we describe how Quine can scale up to thousands of events&#x2F;second per node, handle out of order and late arriving data, and doesn’t rely on time windows, people actually tell us they don’t believe us.&lt;&#x2F;p&gt;
&lt;p&gt;That’s impossible, they say. How? &lt;strong&gt;Our answer? Stop querying your data.&lt;&#x2F;strong&gt; It sounds absurd or deliberately provocative, but this is exactly the design choice Quine makes. Quine is a streaming graph. It combines characteristics of complex event processing software (consuming high volume event streams from Kafka and Kinesis) with some of the defining aspects of graph databases. ‍&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;lh7-us.googleusercontent.com&#x2F;docsz&#x2F;AD_4nXeYSN-wp_ghrgWqqoZH37ChQQ14H_etIr-0y4qAIUNYPb7IblMrTDwojZFhpyMApinF7FN5ld-DwT0JmFov85kTafwVaqWrQX-dOGbJ8t38g5KPQzDaPJjRLOy7ADuZlwm7EqnMfZyu3GPGfxjgEEn4FB0?key=Fxm-gsIaz92YCMYN8DeBfw&quot; alt=&quot;Quine Streaming Graph model &quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;The evolution of Quine streaming graph.&lt;&#x2F;p&gt;
&lt;p&gt;‍ Quine supports the &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;quine.io&#x2F;docs&#x2F;reference&#x2F;cypher-language&quot;&gt;&lt;strong&gt;Cypher&lt;&#x2F;strong&gt;&lt;&#x2F;a&gt; and &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;quine.io&#x2F;docs&#x2F;reference&#x2F;gremlin-language&quot;&gt;&lt;strong&gt;Gremlin&lt;&#x2F;strong&gt;&lt;&#x2F;a&gt; query syntax, its data structure is defined by nodes, edges, and properties of nodes, and it performs best when you structure your graph for the questions you need answered. Quine’s approach to finding complex patterns and relationships in event data, and the scale at which Quine operates, differs fundamentally. Quine doesn’t query the database.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;when-is-a-query-not-a-query-when-it-is-a-quine-standing-query&quot;&gt;&lt;strong&gt;When is a query not a query? When it is a Quine Standing Query.&lt;&#x2F;strong&gt;&lt;&#x2F;h2&gt;
&lt;p&gt;The &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;quine.io&#x2F;docs&#x2F;getting-started&#x2F;writing-standing-queries&quot;&gt;&lt;strong&gt;standing query&lt;&#x2F;strong&gt;&lt;&#x2F;a&gt; is what makes Quine different. And after spending several hundred words telling you not to query your data if you want to scale, I must admit the name might be misleading. But 1) naming is hard and 2) let’s focus on the standing part for now. ‍ A standing query is like a filter on streams of event data. You issue it once, it propagates into the graph and then…you wait.&lt;&#x2F;p&gt;
&lt;p&gt;As events stream in, standing queries keep track of incremental changes in the graph state. At the instant a match is made, the standing query springs into action, &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;docs.quine.io&#x2F;components&#x2F;writing-standing-queries.html#result-outputs&quot;&gt;&lt;strong&gt;triggering an arbitrary operation&lt;&#x2F;strong&gt;&lt;&#x2F;a&gt; you’ve specified – updating node properties, adding edges, writing out to a Kafka topic or a database. ‍&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;lh7-us.googleusercontent.com&#x2F;docsz&#x2F;AD_4nXeh29OFl-w5JeKkE7bIfnoMW81epP_qEb1593Iqv5VCuFsK7CiIyvnzR6JWTYESSsarPKicpt_5WSXL3SAiZ6_YRkXHPiFjkn9Ji6JAW8V0IHW28shn775yMUmsrzDGawX6g2bdn0uqD2I1eRgeCY1af7Wu?key=Fxm-gsIaz92YCMYN8DeBfw&quot; alt=&quot;Traditional DB Query Pattern&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;Traditional DB query patterns vs. standing queries in Quine streaming graph&lt;&#x2F;p&gt;
&lt;p&gt;‍How is this possible without expending tremendous compute resources? The key innovation that makes standing queries possible – and therefore allows Quine to scale to millions of events per second — is that Quine combines a graph data model with an &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;quine.io&#x2F;docs&#x2F;about-quine&#x2F;main-concepts&quot;&gt;&lt;strong&gt;asynchronous actor-based&lt;&#x2F;strong&gt;&lt;&#x2F;a&gt; compute model built on the same graph. In Quine, every node both stores data and can instantiate an actor as needed to send and receive messages and perform arbitrary computation on these messages.&lt;&#x2F;p&gt;
&lt;p&gt;Actors are similar to threads in that they are computationally efficient, can be loaded on-demand and are reclaimed when no longer needed. Going back up a level, this design makes possible the incremental compute necessary for a standing query. As events are ingested, actors on nodes responsible for a standing query detect incremental changes to the graph state that matter to them and pass messages up a tree-like hierarchy of nodes responsible for coordinating the standing query.&lt;&#x2F;p&gt;
&lt;p&gt;Again, all this is done with lightweight actors and no compute is expended except when incremental matches are made. Unlike with &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;people.cs.aau.dk&#x2F;~matteo&#x2F;pdf&#x2F;vldb18.gdb.p386-lissandrini.pdf&quot;&gt;&lt;strong&gt;graph databases&lt;&#x2F;strong&gt;&lt;&#x2F;a&gt; [PDF; see &lt;em&gt;section 6.3 Complex Queries&lt;&#x2F;em&gt;], the patterns standing queries filter for can be quite complex without increasing query latency. In fact, the notion of query latency doesn’t really exist. A match is made when the data necessary for the match is present. No resources are expended until that happens.&lt;&#x2F;p&gt;
&lt;p&gt;Then, and only then, is an action triggered. Standing queries aren’t the only reason Quine processes high-volume event data so effectively. Its use of semantic caching – also a byproduct of a graph-based data and compute model – and division of read and write concerns between an in-memory graph structure and &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;quine.io&#x2F;docs&#x2F;components&#x2F;persistors&quot;&gt;&lt;strong&gt;write-optimized persistors&lt;&#x2F;strong&gt;&lt;&#x2F;a&gt; both contribute to its ability to scale. But it is the decision to not actively query the data that unlocks the performance at scale necessary for event-driven applications.&lt;&#x2F;p&gt;
&lt;p&gt;Graph databases remain a great choice for many uses. And the graph query syntax they mainstreamed is highly effective for finding deeper, more valuable relationships between events. But as event-driven data grows in both volume and importance, graph needs to evolve away from the old database patterns.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;learn-more-or-try-quine-streaming-graph&quot;&gt;&lt;strong&gt;Learn more or try Quine Streaming Graph&lt;&#x2F;strong&gt;&lt;&#x2F;h2&gt;
&lt;p&gt;If you are interested in learning more about Quine’s architecture and design choices, or if you want to try Quine for yourself, visit &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;quine.io&#x2F;&quot;&gt;&lt;strong&gt;Quine.io&lt;&#x2F;strong&gt;&lt;&#x2F;a&gt; for docs and downloads.&lt;&#x2F;p&gt;
&lt;p&gt;‍&lt;&#x2F;p&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>9 Events, 90 Days - See Quine Streaming Graph in Action</title>
        <published>2022-04-10T00:00:00+00:00</published>
        <updated>2022-04-10T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              Unknown
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://www.thatdot.com/news/nine-events-in-ninety-days-quine-streaming-graph/"/>
        <id>https://www.thatdot.com/news/nine-events-in-ninety-days-quine-streaming-graph/</id>
        
        <content type="html" xml:base="https://www.thatdot.com/news/nine-events-in-ninety-days-quine-streaming-graph/">&lt;h2 id=&quot;9-events-90-days-whew&quot;&gt;9 Events, 90 Days. Whew!!!&lt;&#x2F;h2&gt;
&lt;p&gt;It has been an exciting couple of months since the launch of Quine, our open source software. The thatDot executive team has been connecting with developers, data engineers and other in-person and virtually to show how they can turn high volume data into high value data.&lt;&#x2F;p&gt;
&lt;p&gt;During the rest of the year, our team will be at additional events - including meetups and conferences - across the US to talk all things data, streaming graph and Quine. If you will be at one of these conferences and would like to schedule a meeting contact us. Don&#x27;t see something you can attend? Drop us a line at &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;http:&#x2F;&#x2F;info@thatdot.com&quot;&gt;info@thatdot.com&lt;&#x2F;a&gt; and we&#x27;ll add you to our mailing list and find a way to connect.&lt;&#x2F;p&gt;
&lt;p&gt;And if you want to connect with other engineers and data scientists exploring Quine, join the &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;join.slack.com&#x2F;t&#x2F;quine-io&#x2F;shared_invite&#x2F;zt-171x7z72f-a6Zqhrot3C7cwsR3xi5suA&quot;&gt;Quine community slack channel&lt;&#x2F;a&gt;.&lt;&#x2F;p&gt;
&lt;p&gt;Our team will be at the following events:&lt;&#x2F;p&gt;
&lt;h3 id=&quot;odsc-east-2022-ryan-wright-ceo-and-founder&quot;&gt;&lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;odsc.com&#x2F;boston&#x2F;&quot;&gt;ODSC East 2022&lt;&#x2F;a&gt;, Ryan Wright, CEO and Founder&lt;&#x2F;h3&gt;
&lt;p&gt;&lt;strong&gt;Boston, MA, April 19-21&lt;&#x2F;strong&gt;&lt;&#x2F;p&gt;
&lt;p&gt;&lt;em&gt;ODSC – Open Data Science Conference – is the largest applied data science conference, essential for anyone who wants to connect to the data science community and contribute to the open source applications it uses every day.&lt;&#x2F;em&gt;&lt;&#x2F;p&gt;
&lt;p&gt;Sessions:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;odsc.com&#x2F;speakers&#x2F;quine-a-streaming-graph-for-event-driven-data-pipelines&#x2F;&quot;&gt;Quine: A Streaming Graph for Event-Driven Data Pipelines&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;odsc.com&#x2F;speakers&#x2F;noiseless-anomaly-detection-with-streaming-graph-a-i&#x2F;&quot;&gt;Noiseless Anomaly Detection with Streaming Graph A.I.&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;h3 id=&quot;the-knowledge-graph-conference-ryan-wright-ceo-and-founder&quot;&gt;&lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;www.knowledgegraph.tech&#x2F;&quot;&gt;The Knowledge Graph Conference&lt;&#x2F;a&gt;, Ryan Wright, CEO and Founder&lt;&#x2F;h3&gt;
&lt;p&gt;&lt;strong&gt;New York City, May 2-6&lt;&#x2F;strong&gt;&lt;&#x2F;p&gt;
&lt;p&gt;&lt;em&gt;The Knowledge Graph Conference is emerging as the premiere source of learning around knowledge graph technologies. We believe knowledge graphs are an underutilized yet essential force for solving complex societal challenges like climate change, democratizing access to knowledge and opportunity, and capturing business value made possible by advances in AI.&lt;&#x2F;em&gt;&lt;&#x2F;p&gt;
&lt;p&gt;Session:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;em&gt;&lt;strong&gt;We will update as soon as the schedule is released.&lt;&#x2F;strong&gt;&lt;&#x2F;em&gt;&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;h2 id=&quot;past-quine-streaming-graph-events&quot;&gt;Past Quine Streaming Graph Events&lt;&#x2F;h2&gt;
&lt;p&gt;For those who will not be at the events listed above, we have the presentation available to view to learn more about Quine:&lt;&#x2F;p&gt;
&lt;h3 id=&quot;recordings&quot;&gt;Recordings:&lt;&#x2F;h3&gt;
&lt;ul&gt;
&lt;li&gt;Let’s Talk Data with Joe Reis &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=1P-iHaAPs4g&amp;amp;t=4s&quot;&gt;https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=1P-iHaAPs4g&amp;amp;t=4s&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;li&gt;Scala Love &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=RF_-ETNVr4Y&amp;amp;list=PLBqWQH1MiwBTMk9HV-RNN7sQpB9ZPi_Az&amp;amp;index=12&quot;&gt;https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=RF_-ETNVr4Y&amp;amp;list=PLBqWQH1MiwBTMk9HV-RNN7sQpB9ZPi_Az&amp;amp;index=12&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;li&gt;Global Big Data Conference &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=2-APGoQ8QnI&quot;&gt;https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=2-APGoQ8QnI&lt;&#x2F;a&gt;‍&lt;&#x2F;li&gt;
&lt;li&gt;ODSC Webinar &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=kpzjLTDhkoE&quot;&gt;https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=kpzjLTDhkoE&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;h3 id=&quot;presentation&quot;&gt;Presentation:&lt;&#x2F;h3&gt;
&lt;ul&gt;
&lt;li&gt;‍&lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;drive.google.com&#x2F;file&#x2F;d&#x2F;1BnHoP6ehItqJH1IS8Y-q7RaYCCg_n6kG&#x2F;view&quot;&gt;Introducing Quine: A streaming graph for modern data pipelines&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;‍&lt;&#x2F;p&gt;
&lt;p&gt;‍&lt;&#x2F;p&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>Key Concepts to Help You Get Started With Streaming Graph</title>
        <published>2022-04-06T00:00:00+00:00</published>
        <updated>2022-04-06T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              Unknown
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://www.thatdot.com/blog/key-concepts-to-help-you-get-started-with-streaming-graph/"/>
        <id>https://www.thatdot.com/blog/key-concepts-to-help-you-get-started-with-streaming-graph/</id>
        
        <content type="html" xml:base="https://www.thatdot.com/blog/key-concepts-to-help-you-get-started-with-streaming-graph/">&lt;h2 id=&quot;getting-started-with-streaming-graph&quot;&gt;Getting Started with Streaming Graph&lt;&#x2F;h2&gt;
&lt;p&gt;When I started at thatDot six weeks ago, I began keeping a journal of what I learned about Quine, and this new category of software, streaming graph. Journaling helps me organize my thoughts and it is both useful and fun to look in retrospect to see how my ideas and understandings evolved. It is also a great way to distill ideas and share concepts that I find challenging, exciting, or both&lt;&#x2F;p&gt;
&lt;p&gt;And that’s what I hope this post can do for you: give you a good starting frame of reference for streaming graph while instilling in you the same sense of excitement I feel after six weeks of working with Quine.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;start-with-the-questions-you-need-answered&quot;&gt;Start with the questions you need answered.&lt;&#x2F;h2&gt;
&lt;p&gt;Before using Quine ask yourself: do I know what question am I trying to answer? If you don’t, and you just want to load as much data as possible into a database and start exploring, Quine offers no great advantage over graph databases.&lt;&#x2F;p&gt;
&lt;p&gt;But if there are patterns of events that you want to watch out for and, when detected, upon which you want to take action , Quine is the right choice.&lt;&#x2F;p&gt;
&lt;p&gt;Consider a few examples:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;you want to detect, then block, fraudulent blockchain transactions  as well as identify in real time all parties who transact with the source.&lt;&#x2F;li&gt;
&lt;li&gt;you  collecting sensor readings and real-time environmental data and want to combine them with historical readings and maintenance data in order to anticipate and head off costly outages.&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;In both cases the approach is the same: you start by identifying the patterns -- the collection of events related in a specific way to one another -- that indicate a suspect transaction or imminent system failure and what actions should be taken once that pattern emerges.&lt;&#x2F;p&gt;
&lt;p&gt;These patterns help to determine which data to represent as nodes, which as properties of nodes, and which as edges expressing the relationship between nodes. Getting this right is the key to building an efficient streaming graph and is the essence of graph data modeling. Because graphs are based on the same subject-predicate-object structure we use to communicate, I personally find expressing data models as graphs straightforward.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;quine-streaming-graph-is-a-lot-like-graph-databases-except-when-it-is-not&quot;&gt;&lt;strong&gt;Quine streaming graph is a lot like graph databases except when it is not.&lt;&#x2F;strong&gt;&lt;&#x2F;h2&gt;
&lt;p&gt;Experience with graph databases and graph query languages (especially Cypher) will make getting started with Quine relatively easy. In fact, most of your Cypher queries will &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;quine.io&#x2F;docs&#x2F;reference&#x2F;cypher-language&quot;&gt;&lt;em&gt;just work&lt;&#x2F;em&gt; on Quine&lt;&#x2F;a&gt;. This ease of getting started can lead you to mistake Quine for a traditional graph database.&lt;&#x2F;p&gt;
&lt;p&gt;Quine is much more than a graph database.&lt;&#x2F;p&gt;
&lt;p&gt;Quine was built for a specific purpose – to apply the graph data model so you can detect complex relationships within event streams in real time.&lt;&#x2F;p&gt;
&lt;p&gt;In its simplest form, Quine’s usage pattern is: consume streaming data from Kafka or Kinesis, and shape it into a graph. Then, create queries that find complex patterns among the relationships inside the graph, and trigger some arbitrary action, including modifying the graph itself or writing back out to &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;docs.quine.io&#x2F;components&#x2F;ingest-sources&#x2F;kafka.html#reading-records-from-kafka&quot;&gt;Kafka&lt;&#x2F;a&gt; or &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;docs.quine.io&#x2F;components&#x2F;ingest-sources&#x2F;kinesis.html&quot;&gt;Kinesis&lt;&#x2F;a&gt;.&lt;&#x2F;p&gt;
&lt;p&gt;The key here is finding patterns within streams of data and taking action on the matches in real-time.&lt;&#x2F;p&gt;
&lt;p&gt;If analogies are your thing, the difference between Quine streaming graph and a graph database is the difference between setting a net across a raging torrent to catch what flows through it and casting and recasting a net into a lake until you catch something interesting.&lt;&#x2F;p&gt;
&lt;p&gt;If Quine behaved like a legacy graph database, you’d need to checkpoint the stream, ingest a sampling of data, then query it and subsequent checkpoints continuously until a match was made and only then could you take action. Not only are you expending resources with needless queries, but you are also introducing a delay between when a pattern is complete and action is taken. Instead, using standing queries (more on this later) in Quine, you can trigger an action the instant a match occurs.&lt;&#x2F;p&gt;
&lt;p&gt;This has profound implications on the design and performance of your solution, allowing you to fine tune the graph structure and queries to zero in on the patterns in event data that matter most to you. Or in other words, you still need to develop the questions to ask of your data to solve or, avoid your business concerns.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;when-and-how-to-work-on-data-ingest-and-standing-queries&quot;&gt;When and how to work on data: ingest and standing queries.&lt;&#x2F;h2&gt;
&lt;p&gt;Quine operates on data using two contexts - &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;docs.quine.io&#x2F;components&#x2F;ingest-sources&#x2F;ingest-sources.html#overview&quot;&gt;ingest querie&lt;&#x2F;a&gt;s and &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;docs.quine.io&#x2F;components&#x2F;writing-standing-queries.html&quot;&gt;standing queries&lt;&#x2F;a&gt;. Together, these control the majority of the interactions you’ll have with the streaming graph.&lt;&#x2F;p&gt;
&lt;p&gt;Standing queries, because they are such a unique and powerful feature of Quine, tend to get a lot of attention. But I’ve learned that the ingest query, while not as flashy, plays a critical role in building an efficient streaming graph.&lt;&#x2F;p&gt;
&lt;p&gt;Think of ingest queries like an ETL processor. They connect to one or more data sources, classify data as nodes, properties of nodes, and may even create edges, then load the data into the graph. The ingest query sets the structure that your questions will take.&lt;&#x2F;p&gt;
&lt;p&gt;Standing queries take over from ingest queries to look for patterns as they emerge from the event streams. The instant a match is made, standing queries trigger actions. Standing queries can send data to an external system (e.g., writing out to Kafka) or perform actions on the graph itself, like creating new nodes, updating properties, or creating and updating edges.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;uploads-ssl.webflow.com&#x2F;61f0aecf55af2565526f6a95&#x2F;624de0a0578d73cc485f4ce2_Ingest%20Query%20Standing%20Query%20Quine%20Streaming%20Graph.png&quot; alt=&quot;&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;Event driven data in. Data driven data out.&lt;&#x2F;p&gt;
&lt;p&gt;The best way to start learning about the role each query type plays is to look at recipes.&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;The &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;quine.io&#x2F;recipes&#x2F;apache-log-analytics&quot;&gt;Apache Log recipe&lt;&#x2F;a&gt; provides a good example of the ingest query in action, extracting data using regex to create two node types (log, which represents an HTTP request and verb, which represents the HTTP Method). At the same time, it connects the log nodes to their associated HTTP Method using the verb edge. It gives you a great sense of how the structure of the graph maps to the sorts of questions you’d want to ask.&lt;&#x2F;li&gt;
&lt;li&gt;The &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;quine.io&#x2F;recipes&#x2F;cdn-cache-efficiency-by-segment&quot;&gt;CDN Cache Efficiency recipe&lt;&#x2F;a&gt; uses a relatively simple ingest query and showcases the power of the standing query to transform the graph, incrementing counters when there are cache misses and classifying network elements based on reliability.&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;h2 id=&quot;putting-it-all-together&quot;&gt;‍&lt;strong&gt;Putting it all together&lt;&#x2F;strong&gt;&lt;&#x2F;h2&gt;
&lt;p&gt;With all this in mind – how streaming graph differs from graph DBs, the need to start with the question and work backward, and using recipes to understand the role of ingest and standing queries –  pulling down, dissecting, and then modifying one of the advanced recipes is the next logical next step.&lt;&#x2F;p&gt;
&lt;p&gt;I learned a lot exploring the &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;quine.io&#x2F;recipes&#x2F;ethereum-tag-propagation&quot;&gt;Ethereum Tag Propagation recipe&lt;&#x2F;a&gt;. It uses a live data feed from the Ethereum blockchain and both the ingest and standing queries are robust examples. The Ethereum recipe and the recipes mentioned above are all available as open source on &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;quine.io&#x2F;&quot;&gt;Quine.io&lt;&#x2F;a&gt;.&lt;&#x2F;p&gt;
&lt;p&gt;Interacting with streaming data in Quine opened my eyes to the possibilities of streaming graph solutions. Give it a try yourself. Quine is easy to get up and running from the &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;quine.io&#x2F;download&quot;&gt;download&lt;&#x2F;a&gt; page.&lt;&#x2F;p&gt;
&lt;p&gt;Let me know how you do once you’ve had a chance to experiment. Reach out on the &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;that.re&#x2F;quine-slack&quot;&gt;Quine community slack channe&lt;&#x2F;a&gt;l, I’m &lt;strong&gt;@allan&lt;&#x2F;strong&gt;. Did you figure out something really cool using Quine? Share your work with the &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;thatdot&#x2F;quine&quot;&gt;community through Github&lt;&#x2F;a&gt;. Submit a pull request with your recipe and a short description.&lt;&#x2F;p&gt;
&lt;p&gt;‍&lt;&#x2F;p&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>The Evolution To Streaming Graph from Graph Databases</title>
        <published>2022-03-22T00:00:00+00:00</published>
        <updated>2022-03-22T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              Unknown
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://www.thatdot.com/blog/the-evolution-to-streaming-graph-from-graph-databases/"/>
        <id>https://www.thatdot.com/blog/the-evolution-to-streaming-graph-from-graph-databases/</id>
        
        <content type="html" xml:base="https://www.thatdot.com/blog/the-evolution-to-streaming-graph-from-graph-databases/">&lt;p&gt;After a decade of growth in large scale data repositories -- databases, data warehouses, data lakes, even data lake houses -- a &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;www.cio.com&#x2F;article&#x2F;303230&#x2F;why-everyones-talking-about-event-streaming.html&quot;&gt;perceptible shift&lt;&#x2F;a&gt; is underway in internet architecture toward &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;www.oreilly.com&#x2F;library&#x2F;view&#x2F;software-architecture-patterns&#x2F;9781491971437&#x2F;ch02.html&quot;&gt;event-driven programming&lt;&#x2F;a&gt;. This shift is being spurred on by a wide range of factors including consumer demand for real-time responsiveness, businesses seeking to personalize the customer experience to maximize engagement, and a shared desire for more effective security protections.&lt;&#x2F;p&gt;
&lt;p&gt;Sources of data to deliver on these objectives is a solved issue. The inexorable digitization of commerce, media and social experiences has turned every customer interaction into an ever-growing stream of event data from which companies wish to &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;smallbusiness.chron.com&#x2F;long-website-viewers-attention-72249.html&quot;&gt;distill actionable insights in real time&lt;&#x2F;a&gt;.. Extracting high confidence insights through the joining of multiple data sources to expand the context of understanding, however, remains a challenge.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Using Databases to Query Event Streams: Nobody Wins&lt;&#x2F;strong&gt;‍&lt;&#x2F;p&gt;
&lt;p&gt;Together, the enormous scale of data and the requirement to process it in real time poses a huge challenge to companies embracing this new &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;aws.amazon.com&#x2F;event-driven-architecture&#x2F;&quot;&gt;event-driven paradigm&lt;&#x2F;a&gt;. Event streaming solutions like Kafka and Kinesis have quickly evolved to organize and operationalize data streams. However, once companies have turned their data into event streams, they encounter a second problem: how to query these streams in real time. Databases have proven ill-suited for this task.&lt;&#x2F;p&gt;
&lt;p&gt;Processing such big data volumes to extract valuable insights has made it necessary to break the endless stream of events into batches just so existing databases can join data together and apply computation to it. This means waiting for batches of data load, a process that regularly takes hours. And then you are only querying these batches, or snapshots, of data. This is neither streaming or real-time.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;uploads-ssl.webflow.com&#x2F;61f0aecf55af2565526f6a95&#x2F;62394d456f22b2042e0d3499_The%20graph%20data%20model%20encompases%20relational%20data%20model.jpg&quot; alt=&quot;&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;It is straightforward to translate RDBMS data into graph data structures. Quine’s property graph structure encodes relationships as edges.&lt;&#x2F;p&gt;
&lt;p&gt;The limitations of databases manifest in the enterprise as complex systems composed of various &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;thenewstack.io&#x2F;the-rise-of-the-event-streaming-database&#x2F;&quot;&gt;libraries and services&lt;&#x2F;a&gt; tied together with custom code deployed as micro-services to cross the chasm between high-throughput event streaming platforms and the lower-throughput but higher value operations of databases.&lt;&#x2F;p&gt;
&lt;p&gt;Data engineers have been quite clever in developing sophisticated solutions to deal with these challenges. The typical solution is a complex set of micro services that allows for agile adaptation to new constraints as they surface, but result in an application stack that is difficult to support and which requires expert consultation to manage or change. This complexity inevitably leads to a rebuild every 18-24 months: a poor investment and solution.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Event Stream Processing Addresses only Half the Problem&lt;&#x2F;strong&gt;&lt;&#x2F;p&gt;
&lt;p&gt;To query the event streams in real time, developers had to turn to event stream processing systems like Flink and Spark. They make it possible for developers to use familiar SQL syntax to query the event streams. And while this involved some tradeoffs like &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;blog.knoldus.com&#x2F;windows-operator-heart-of-processing-infinite-streams-in-flink&#x2F;&quot;&gt;time windows&lt;&#x2F;a&gt;, in many regards event stream processing systems have been quite successful. Many can scale to process millions of events per second.&lt;&#x2F;p&gt;
&lt;p&gt;But these systems are designed for the relational data model, which while pervasive in industry, lacks the expressive query structures needed to find complex patterns in streams. And it is in the detection of these complex patterns that the real power of event stream processing lies.&lt;&#x2F;p&gt;
&lt;p&gt;So what are we left with? Graph databases, which are well-suited for finding complex relationships with large data sets at rest, were not designed for real-time event data streams. They simply can’t keep up. And event stream processing systems lack the ability to query for the complex relationships and to do so without resorting to time windows.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Enter Streaming Graph&lt;&#x2F;strong&gt;&lt;&#x2F;p&gt;
&lt;p&gt;There is a better way: the streaming graph. Streaming graph is a variety of  “&lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;venturebeat.com&#x2F;2021&#x2F;04&#x2F;04&#x2F;what-is-a-streaming-database&#x2F;&quot;&gt;streaming databases&lt;&#x2F;a&gt;,” an emerging class of software designed specifically to process infinite streams of data. Quine streaming graph brings together the scalability of event stream processing systems with the ability to query for complex relationships offered by graph databases.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;uploads-ssl.webflow.com&#x2F;61f0aecf55af2565526f6a95&#x2F;62394f306f22b21fe40fb5e8_Streaming%20graph%20combines%20graph%20databases%20and%20event%20stream%20processing%20systems.png&quot; alt=&quot;&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;Quine streaming graph adds the scalability of event stream processing systems to graph.&lt;&#x2F;p&gt;
&lt;p&gt;Instead of trying to engineer around the shortcomings of graph databases. Quine’s streaming graph technology includes some important innovations:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;Uses a graph data model to understand the relationships between data natively, without the need for joins, nested joins and foreign key management&lt;&#x2F;li&gt;
&lt;li&gt;Continuous and incremental application of queries to newly arriving data, eliminating the need for time windowing&lt;&#x2F;li&gt;
&lt;li&gt;Distributes and parallelizes read and write operations to high-throughput and low-latency queries at scale&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;Quine streaming graph offers teams a simple, drop-in solution capable of ingesting high volume Kafka or Kinesis data streams with sub-millisecond query performance, even when the query involves deep graph traversal. Quine also eliminates time-windows and makes it possible to handle out of order and late-arriving data that is a common limitation of other event-stream processing systems.&lt;&#x2F;p&gt;
&lt;p&gt;Streaming graph fills the architectural gap that exists between high-volume event streaming and high-value graph database computation. The combination of a native understanding of data relationships with high volume event processing provides a new tool for realizing the real-time use cases event-driven programming is meant to enable. Graph AI techniques can now come out of the lab and drive the next generation of recommendations, root-cause analysis, fraud and security threat detection in production.&lt;&#x2F;p&gt;
&lt;p&gt;Learn more about Quine streaming graph, available both in open source and enterprise editions, at &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;www.thatdot.com&#x2F;&quot;&gt;www.thatdot.com&lt;&#x2F;a&gt;&lt;&#x2F;p&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>Let&#x27;s Talk Streaming Graph! (with demo)</title>
        <published>2022-03-16T00:00:00+00:00</published>
        <updated>2022-03-16T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              Unknown
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://www.thatdot.com/blog/streaming-graphs-w-ryan-wright-on-tgif-lets-talk-data/"/>
        <id>https://www.thatdot.com/blog/streaming-graphs-w-ryan-wright-on-tgif-lets-talk-data/</id>
        
        <content type="html" xml:base="https://www.thatdot.com/blog/streaming-graphs-w-ryan-wright-on-tgif-lets-talk-data/">&lt;h2 id=&quot;tgif-let-s-talk-streaming-graph-data&quot;&gt;TGIF! Let&#x27;s Talk (streaming graph) Data&lt;&#x2F;h2&gt;
&lt;p&gt;On Friday, 11 March Ryan Wright sat down with &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;www.linkedin.com&#x2F;in&#x2F;josephreis&quot;&gt;Joe Reis&lt;&#x2F;a&gt; and &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;www.linkedin.com&#x2F;in&#x2F;housleymatthew&quot;&gt;Matt Housley,&lt;&#x2F;a&gt; CEO and CTO respectively of &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;www.ternarydata.com&#x2F;&quot;&gt;Ternary Data&lt;&#x2F;a&gt;, to discuss Quine, the open source streaming graph. What followed was an hour of remarkably incisive questions and well-informed discussion. In addition to the video, we&#x27;ve pulled out the transcript of some particularly good bits. If you like this, give Joe and Matt&#x27;s show a follow.
&lt;div class=&quot;video-embed&quot;&gt;
  &lt;iframe src=&quot;https:&#x2F;&#x2F;www.youtube-nocookie.com&#x2F;embed&#x2F;1P-iHaAPs4g&quot; title=&quot;YouTube video&quot;
    frameborder=&quot;0&quot; loading=&quot;lazy&quot;
    allow=&quot;accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture&quot;
    allowfullscreen&gt;&lt;&#x2F;iframe&gt;
&lt;&#x2F;div&gt;
&lt;&#x2F;p&gt;
&lt;h4 id=&quot;excerpt-one-0-46&quot;&gt;Excerpt One &lt;em&gt;0:46&lt;&#x2F;em&gt;‍&lt;&#x2F;h4&gt;
&lt;p&gt;&lt;strong&gt;Joe (Ternary):&lt;&#x2F;strong&gt;&lt;&#x2F;p&gt;
&lt;blockquote&gt;
&lt;p&gt;I&#x27;m sorry, what the heck is a streaming graph?&lt;&#x2F;p&gt;
&lt;&#x2F;blockquote&gt;
&lt;p&gt;‍&lt;strong&gt;Ryan (thatDot, Quine):&lt;&#x2F;strong&gt;&lt;&#x2F;p&gt;
&lt;blockquote&gt;
&lt;p&gt;So it&#x27;s kind of like, imagine the unholy love child of Apache Kafka and Neo4J... something sort of like a graph database, but aimed at high volume event stream processing. The goal is really to &lt;em&gt;interpret&lt;&#x2F;em&gt; what does that data stream mean? Like the data in that high volume event stream. What does it mean? And that&#x27;s what we build big micro service architectures for.&lt;&#x2F;p&gt;
&lt;&#x2F;blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;quine.io&quot;&gt;Quine&lt;&#x2F;a&gt; is this standalone application meant to help make that process a whole lot easier. So the idea is basically you can consume that event data, form it into a graph, because the graph is so expressive, and so powerful, to look at and relate data to each other and analyze it, except that it&#x27;s really fast. And that&#x27;s really been the linchpin for the database mindset: [people think] graph DBs are cool, but they&#x27;re too slow. So Quine is trying to change that  and bring fast streaming graphs that trigger action to the world of Event Stream Processing.&lt;&#x2F;p&gt;
&lt;&#x2F;blockquote&gt;
&lt;h4 id=&quot;excerpt-two-6-16&quot;&gt;Excerpt Two &lt;em&gt;6:16&lt;&#x2F;em&gt;&lt;&#x2F;h4&gt;
&lt;p&gt;&lt;strong&gt;Matt (Ternary)&lt;&#x2F;strong&gt;&lt;&#x2F;p&gt;
&lt;blockquote&gt;
&lt;p&gt;It&#x27;s interesting to me, too, that you&#x27;re, you&#x27;re solving these very hard graph problems. But you guys also decided to adopt a real time analysis model. And by that, I mean, people complain a lot about micro batch. Micro batches is often a perfectly good approach. But you&#x27;re saying basically, that not only can you process this graph, but you&#x27;re not doing it in like ten second micro batches, you&#x27;re actually taking each event and saying, Okay, this event arriving triggers or this this piece, this node triggers some kind of analysis that I can see things happening almost immediately. What kind of latency? Are we talking for that processing to happen?&lt;&#x2F;p&gt;
&lt;&#x2F;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Ryan:&lt;&#x2F;strong&gt;&lt;&#x2F;p&gt;
&lt;blockquote&gt;
&lt;p&gt;Yeah, oh, great question. So we&#x27;ve, we&#x27;ve measured it, an to kind of set set the stage a little bit, the kind of thing that we&#x27;re typically looking for is, look for a graph pattern that is maybe four or five nodes connected in a certain kind of way. And so it&#x27;s each each node in that graph, you might think of as one row in a relational table, you know, there&#x27;s an equivalence there to say these are the same kinds of things. And so when you traverse an edge, you&#x27;re doing a join between tables to say...this row here is connected to that row in that table over there. And so it&#x27;s the kind of situation that is usually join, join, join, join, join, there&#x27;s your answer. And so to do lots of those (joins) is just untenable. And so the reason to go in the direction of a graph is because it gives us instead of having to join tables, we just get the small little units of a node. And so we can hop across four, five, six, ten, fifty, you know, any number of nodes, in a much more efficient way than if we&#x27;re joining tables together. And so what we&#x27;ve done and measured for some of our applications is: look for patterns that ar, four or five nodes,  and then measure the latency that it takes to compute that in the single digit microseconds. So something like 8,000 or 9,000 nanoseconds has been some of the fastest stuff that we do. Which to anybody who&#x27;s worked in this space, you should have the response that is: &#x27;No way I don&#x27;t believe it.&#x27;&lt;&#x2F;p&gt;
&lt;&#x2F;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Joe:&lt;&#x2F;strong&gt;&lt;&#x2F;p&gt;
&lt;blockquote&gt;
&lt;p&gt;[laughing] I don&#x27;t believe it.&lt;&#x2F;p&gt;
&lt;&#x2F;blockquote&gt;
&lt;h3 id=&quot;quine-demo&quot;&gt;Quine Demo&lt;&#x2F;h3&gt;
&lt;p&gt;There&#x27;s also a demo if you&#x27;re interested, starting at the 40:21 mark.&lt;&#x2F;p&gt;
&lt;p&gt;If you want to learn more, check out the &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;quine.io&#x2F;docs&#x2F;about-quine&#x2F;what-is-quine&quot;&gt;Quine.io docs&lt;&#x2F;a&gt; and never hesitate to jump into the &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;that.re&#x2F;quine-slack&quot;&gt;Quine Community Slack&lt;&#x2F;a&gt; channel.&lt;&#x2F;p&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>Recipe for Streaming Graph Success</title>
        <published>2022-03-10T00:00:00+00:00</published>
        <updated>2022-03-10T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              Unknown
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://www.thatdot.com/blog/quine-streaming-graph-recipe-blog/"/>
        <id>https://www.thatdot.com/blog/quine-streaming-graph-recipe-blog/</id>
        
        <content type="html" xml:base="https://www.thatdot.com/blog/quine-streaming-graph-recipe-blog/">&lt;p&gt;Quine Recipes Make Getting Started Easy&lt;&#x2F;p&gt;
&lt;p&gt;In the world of infrastructure software, there is a certain cachet associated with standing up and operating vast, complicated systems. Like tearing down and rebuilding a motor or &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;hackaday.com&#x2F;2022&#x2F;03&#x2F;09&#x2F;upgraded-3d-printed-tank-gets-better-drivetrain-and-controls&#x2F;&quot;&gt;supercharging your 3D printer&lt;&#x2F;a&gt;, the challenge appeals to the engineer’s mind.&lt;&#x2F;p&gt;
&lt;p&gt;At least until the third or fourth time you have redeploy one of those complex systems because of a particularly pernicious gremlin. That’s when you start asking yourself (usually at around 3 am the night before launch) why can’t someone make a distributed system that is both easy to deploy and designed to scale up to production workloads.&lt;&#x2F;p&gt;
&lt;p&gt;This is the exact question that led to the creation of &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;quine.io&#x2F;&quot;&gt;Quine streaming graph&lt;&#x2F;a&gt; and, more recently, the introduction of Quine recipes.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;intro-to-quine-streaming-graph-recipes&quot;&gt;Intro to Quine Streaming Graph Recipes&lt;&#x2F;h3&gt;
&lt;p&gt;A recipe, in simple terms, is a (YAML or JSON) document containing all the information Quine needs to execute any batch or streaming data processing task. It is referenced when invoking Quine and is often used for modeling, development and testing on local systems. Here’s an example of a recipe that creates a graph by ingesting each line in &quot;$in_file&quot; as graph node with property &quot;line&quot;:&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;version: 1&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;title: Ingest&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;contributor: The thatDot Team&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;summary: Ingest input file lines as graph nodes&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;description: Ingests each line in &amp;quot;$in_file&amp;quot; as graph node with property &amp;quot;line&amp;quot;.&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;ingestStreams:&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  - type: FileIngest&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    path: $in_file&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    format:&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;      type: CypherLine&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;      query: |-&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        MATCH (n)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        WHERE id(n) = idFrom($that)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        SET n.line = $that&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;standingQueries: [ ]&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;nodeAppearances: [ ]&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;quickQueries: [ ]&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;sampleQueries: [ ]&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Pretty simple.&lt;&#x2F;p&gt;
&lt;p&gt;But don’t underestimate the extensibility of Recipes. Using the same simple template as this recipe, you can configure Quine to ingest and process multiple event streams, build highly-connected graphs, and set up standing queries that do everything from handling out-of-order and late arriving data to writing results back into the graph or out to Kafka topics.&lt;&#x2F;p&gt;
&lt;p&gt;And once you’ve constructed your recipe, everyone on your team has a handy reference for what you’ve built. This is especially useful for recurring tasks, like log processing (see the &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;quine.io&#x2F;recipes&#x2F;apache-log-analytics&quot;&gt;Apache Access Log recipe&lt;&#x2F;a&gt;) or for teams that are growing or that want to maintain continuity as people come and go.  Did I mention that you can embed comments in your recipes?&lt;&#x2F;p&gt;
&lt;p&gt;Recipes are also a great way to contribute back to the community. For example, community member Alok Aggarwal, contributed a recipe for calculating&lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;quine.io&#x2F;recipes&#x2F;cdn-cache-efficiency-by-segment&quot;&gt;CDN cache efficiency&lt;&#x2F;a&gt;that is already among the most popular on the Quine site.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Best part:&lt;&#x2F;strong&gt; once you’re satisfied with the recipe, it can be pushed to a production system via the &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;quine.io&#x2F;docs&#x2F;reference&#x2F;rest-api&quot;&gt;Quine RESTful API&lt;&#x2F;a&gt;.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;anatomy-of-a-quine-streaming-graph-recipe&quot;&gt;Anatomy of a Quine Streaming Graph Recipe:&lt;&#x2F;h3&gt;
&lt;p&gt;To develop a recipe that is executable from a command line, you may use the following YAML template as a starting point:&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;version:&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;title:&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;contributor:&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;summary:&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;description:&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;ingestStreams: []&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;standingQueries: []&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;nodeAppearances: []&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;quickQueries: []&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;sampleQueries: []&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;statusQueries: []&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;The first five items – version number, title, author (you, the contributor), as well as summary and description (optional but nice to have) – are pretty self explanatory.  If you plan to submit the recipe to Quine.io, the optional fields should be filled in to provide the community with context for your recipe, and any details such as data source and output formats.&lt;&#x2F;p&gt;
&lt;p&gt;The next two sections - ingestStreams and standingQueries define your recipe’s behavior.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Ingesting and Modeling Data in the Streaming Graph&lt;&#x2F;strong&gt;&lt;&#x2F;p&gt;
&lt;p&gt;The first query type we will build is an Ingest Stream. Information in the Ingest Stream provides everything Quine needs to find and consume data in order to build a streaming graph.&lt;&#x2F;p&gt;
&lt;p&gt;Quine was specifically designed to handle the demands of high volume streaming data. You can use recipes to ingest from &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;quine.io&#x2F;docs&#x2F;components&#x2F;ingest-sources&#x2F;apache-kafka&quot;&gt;Kafka&lt;&#x2F;a&gt;, &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;quine.io&#x2F;docs&#x2F;components&#x2F;ingest-sources&#x2F;aws-kinesis-support&quot;&gt;Kinesis&lt;&#x2F;a&gt;, and &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;quine.io&#x2F;docs&#x2F;components&#x2F;ingest-sources&#x2F;aws-sns-and-sqs-support&quot;&gt;SNS&#x2F;SQS&lt;&#x2F;a&gt;.&lt;&#x2F;p&gt;
&lt;p&gt;In addition to event streaming sources, you may ingest data from &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;quine.io&#x2F;docs&#x2F;components&#x2F;ingest-sources&quot;&gt;files&lt;&#x2F;a&gt;, &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;quine.io&#x2F;docs&#x2F;components&#x2F;ingest-sources&quot;&gt;named pipes&lt;&#x2F;a&gt;, and &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;quine.io&#x2F;docs&#x2F;components&#x2F;ingest-sources&#x2F;standard-in&quot;&gt;stdin&lt;&#x2F;a&gt;. In the simple Ingest example above CypherLine indicates the source is a file.&lt;&#x2F;p&gt;
&lt;p&gt;Data ingested from a file is read into the system line by line, from a functional perspective, behaving just like a stream when consumed. The only difference is that a file is automatically read into the system as fast as the system can handle, and a stream may be rate limited by the incoming data.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;standing-queries-quine-s-superpower&quot;&gt;Standing Queries: Quine’s Superpower&lt;&#x2F;h3&gt;
&lt;p&gt;Now that we have data ingested into the graph, we should do something with it (although you don’t have to). Let’s set up a &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;quine.io&#x2F;docs&#x2F;getting-started&#x2F;writing-standing-queries&quot;&gt;standing query.&lt;&#x2F;a&gt;&lt;&#x2F;p&gt;
&lt;p&gt;Standing queries persist in the graph, waiting until a query condition is matched, triggering an action (e.g., updating the graph, executing code, or writing to Kafka). Standing queries are definitely worth mastering.&lt;&#x2F;p&gt;
&lt;p&gt;WIth every standing query, we have to provide two things. First, we need to provide the &lt;strong&gt;&lt;code&gt;pattern&lt;&#x2F;code&gt;&lt;&#x2F;strong&gt; for the system to match, then describe the action we want it to take in the form of a query &lt;strong&gt;&lt;code&gt;output&lt;&#x2F;code&gt;&lt;&#x2F;strong&gt;.&lt;&#x2F;p&gt;
&lt;p&gt;The simple example we started with doesn’t include a standing query so let’s take a look at the one from a another recipe (&lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;quine.io&#x2F;recipes&#x2F;ethereum-tag-propagation&quot;&gt;the Ethereum recipe&lt;&#x2F;a&gt;), which propagates the tainted flag along outgoing transaction paths:&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;standingQueries:&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  - pattern:&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;      query: |-&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        MATCH (tainted:account)&amp;lt;-[:from]-(tx:transaction)-[:to]-&amp;gt;(otherAccount:account),&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;          (tx)-[:defined_in]-&amp;gt;(ba:block_assoc)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        WHERE&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;          tainted.tainted IS NOT NULL&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;          AND NOT EXISTS (ba.orphaned)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        RETURN&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;          id(tainted) AS accountId,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;          tainted.tainted AS oldTaintedLevel,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;          id(otherAccount) AS otherAccountId&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;      type: Cypher&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;      mode: MultipleValues&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    outputs:&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;      propagate-tainted:&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        query: |-&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;          MATCH (tainted), (otherAccount)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;          WHERE&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;            tainted &amp;lt;&amp;gt; otherAccount&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;            AND id(tainted) = $that.data.accountId&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;            AND id(otherAccount) = $that.data.otherAccountId&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;          WITH *, coll.min([($that.data.oldTaintedLevel + 1), otherAccount.tainted]) AS newTaintedLevel&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;          SET otherAccount.tainted = newTaintedLevel&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;          RETURN&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;            strId(tainted) AS taintedSource,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;            strId(otherAccount) AS newlyTainted,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;            newTaintedLevel&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        type: CypherQuery&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        andThen:&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;          type: PrintToStandardOut&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;&lt;h3 id=&quot;optional-recipe-elements-to-customize-the-experience&quot;&gt;Optional Recipe Elements to Customize the Experience&lt;&#x2F;h3&gt;
&lt;p&gt;Now that you’ve ingested and are querying data, you can use the remaining parameters to customize the user experience.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;code&gt;nodeAppearances: []&lt;&#x2F;code&gt; use to customize the &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;quine.io&#x2F;docs&#x2F;getting-started&#x2F;exploration-ui&quot;&gt;web exploration UI&lt;&#x2F;a&gt;&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;&lt;code&gt;quickQueries: []&lt;&#x2F;code&gt;&lt;&#x2F;strong&gt; use to add queries to node context menus in &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;quine.io&#x2F;docs&#x2F;getting-started&#x2F;exploration-ui&quot;&gt;web exploration UI&lt;&#x2F;a&gt;&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;&lt;code&gt;sampleQueries: []&lt;&#x2F;code&gt;&lt;&#x2F;strong&gt; use to customize sample queries listed in web UI&lt;&#x2F;p&gt;
&lt;p&gt;&lt;code&gt;statusQueries: []&lt;&#x2F;code&gt; specifies a Cypher query to be executed and reported to the Recipe user&lt;&#x2F;p&gt;
&lt;h3 id=&quot;try-the-ethereum-tag-propagation-recipe&quot;&gt;Try the Ethereum Tag Propagation Recipe&lt;&#x2F;h3&gt;
&lt;p&gt;If you are interested in learning more about recipes, there’s no better way than to try one for yourself. I recommend the Ethereum Tag Propagation recipe because it uses actual live data and its use case – detecting tainted transactions on the blockchain – is more relevant by the day..&lt;&#x2F;p&gt;
&lt;p&gt;And if you are interested in creating your own recipe, here are some additional reference resources to get you started:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;quine.io&#x2F;docs&#x2F;reference&#x2F;recipe-reference&quot;&gt;Recipe Reference&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;quine.io&#x2F;docs&#x2F;reference&#x2F;cypher-language&quot;&gt;Cypher Language Reference&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;docs.quine.io&#x2F;community&#x2F;contributing.html&quot;&gt;Writing and Contributing to the Community&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;Thanks for taking the time to read this and bon appétit!&lt;&#x2F;p&gt;
&lt;p&gt;‍&lt;&#x2F;p&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>Computing Recursive Rollups in a Kafka Event Streaming Pipeline</title>
        <published>2022-02-24T00:00:00+00:00</published>
        <updated>2022-02-24T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              Unknown
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://www.thatdot.com/blog/computing-recursive-rollups-in-a-kafka-event-streaming-pipeline-with-quine/"/>
        <id>https://www.thatdot.com/blog/computing-recursive-rollups-in-a-kafka-event-streaming-pipeline-with-quine/</id>
        
        <content type="html" xml:base="https://www.thatdot.com/blog/computing-recursive-rollups-in-a-kafka-event-streaming-pipeline-with-quine/">&lt;h2 id=&quot;streaming-graph-combines-graph-dbs-with-stream-event-processing&quot;&gt;Streaming Graph Combines Graph DBs with Stream Event Processing&lt;&#x2F;h2&gt;
&lt;p&gt;Quine&#x27;s &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;quine.io&#x2F;docs&#x2F;about-quine&#x2F;main-concepts&quot;&gt;graph&lt;&#x2F;a&gt;-based streaming event architecture simplified my processing codebase. I tried pre-aggregation of rollup data using a Kafka streams application and found that other solutions like Kafka Streams KTables were not well suited, or a natural fit, for my data set. So, then my team fell back on relational database (RDBMS) path table patterns to represent the hierarchical data. With the RDBMS we recomputed the rollups using complex recursive function queries through the path-table structure, each time our UI needed to display that data. Using Quine, I replaced the complex queries with succinct Cypher queries for the stream computed rollup value that updates at each underlying event change.&lt;&#x2F;p&gt;
&lt;p&gt;The Cypher query language used in Quine encourages treating relationships as a primary quality of your data. It provides a rich set of features for constraining queries across multi-step relationships. The depth constraints on the relationship allowed me to replace those recursive SQL functions with a n-depth relationship expression. As I will show later, I was able to match all tiers of my hierarchical data with a single expression.&lt;&#x2F;p&gt;
&lt;p&gt;In my use case, I have a hierarchical graph of meta-data that groups sets of pass&#x2F;fail Requirements. Event data carrying pass&#x2F;fail results relate directly to leaf nodes in a hierarchical graph. Complete sets of events are produced for multiple subjects, and we need to be able to provide the percent pass vs fail for each subject at every level of the grouping hierarchy.&lt;&#x2F;p&gt;
&lt;p&gt;Inserting Quine into my Kafka streaming pipeline is done by simply consuming from topics that are already inputs in Kafka. When deployed, Quine is configured to &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;quine.io&#x2F;docs&#x2F;components&#x2F;ingest-sources&quot;&gt;ingest data&lt;&#x2F;a&gt; from my existing results topic, and my grouping data topic. The new outputs from Quine are subscribed to by the streams recording service and written to the database.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;uploads-ssl.webflow.com&#x2F;61f0aecf55af2565526f6a95&#x2F;620c4bdd9a7ac6180290b228_iFXPtt2bCIhn96VQqH-YDCPPCH3jEwQIA0fxDe0utmwdFOomEcsvydZxZSLyrIoUyg2MzZfswNX9Ww0h-c_eBgOzCAfY9tqTVE_8jiwSBk5oSlqeowfZz5kLabsDfg5sJ8cHfdYk.png&quot; alt=&quot;A diagram of how Quine fits within Kafka streams.&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;h3 id=&quot;ingesting-the-data&quot;&gt;Ingesting the Data&lt;&#x2F;h3&gt;
&lt;p&gt;In my system, when the grouped Requirement data is updated, it is streamed out over a Kafka topic. Systems being evaluated for satisfaction of these requirements produce &lt;strong&gt;&lt;code&gt;Result&lt;&#x2F;code&gt;&lt;&#x2F;strong&gt; events that are dis-embodied from the grouping context. They also flow in on a Kafka topic.&lt;&#x2F;p&gt;
&lt;p&gt;I started by populating the Quine graph with my hierarchical data. From my perspective, the easiest approach to this is to first pre-process the groups and leaf nodes into content suited for streaming input. So, a JSON document that has nested data structures can be flattened into expressions of nodes and their relationships. This upfront transformation can easily be performed by something like a Kafka-Streams application.&lt;&#x2F;p&gt;
&lt;p&gt;I transform something like this document:&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;{   &lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  “groups”: [&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;     {&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;      “name”: “group-1”,    &lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;      “groups”: [&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        {&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;          “name”: “group-1-1”,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;          “items”: [          &lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;            { “name”: “requirement-1.1-a”, … },&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;            { “name”: “requirement-1.1-b”, … }&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;          ]&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        }, …&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Into this list of events that express the parent group relationship:&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;{ “name”: “requirement-1.1-a”, “group”: “group-1-1”, “type”: “requirement”, … }&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;{ “name”: “requirement-1.1-b”, “group”: “group-1-1”, “type”: “requirement”, … }&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;{ “name”: “group-1.1”, “group”: “group-1”, “type”: “group”, … }&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;{ “name”: “group-1”, “group”: “root”, “type”: “group”, … }&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;When the data flows into Quine, it is detected by two (2) ingest queries. I used a different query for each value of &lt;strong&gt;&lt;code&gt;type&lt;&#x2F;code&gt;&lt;&#x2F;strong&gt; in my input JSON. The query is expressed as JSON data that you &lt;strong&gt;&lt;code&gt;POST&lt;&#x2F;code&gt;&lt;&#x2F;strong&gt; to the &lt;strong&gt;&lt;code&gt;&#x2F;api&#x2F;v1&#x2F;ingest&#x2F;&amp;lt;name&amp;gt;&lt;&#x2F;code&gt;&lt;&#x2F;strong&gt; REST endpoint. You can set &lt;strong&gt;&lt;code&gt;&amp;lt;name&amp;gt;&lt;&#x2F;code&gt;&lt;&#x2F;strong&gt; to a unique value by which the ingest query can be identified. For example, my query path was: &lt;strong&gt;&lt;code&gt;&#x2F;api&#x2F;v1&#x2F;ingest&#x2F;test_groups&lt;&#x2F;code&gt;&lt;&#x2F;strong&gt;.&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;{&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;	…,   &lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  “format”: {&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    “query”: “&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;WITH * WHERE $that.type = ‘group’  &lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  MATCH (g)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;	WHERE id(g) = idFrom(‘group’, $that.name)  &lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  OPTIONAL MATCH (parent) WHERE id(parent) = idFrom(‘group’, $that.group)  &lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  CREATE (g)-[:has_parent]-&amp;gt;(parent)  &lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  SET g = $that, g:Group    &lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    “,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    “type”: “CypherJson”&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  }&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;}&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;‍&lt;em&gt;Note: While in production, I ingest my data from the Kafka streams. However, for experimentation I used the&lt;&#x2F;em&gt; &lt;strong&gt;&lt;code&gt;FileIngest&lt;&#x2F;code&gt;&lt;&#x2F;strong&gt; &lt;em&gt;type of ingest query. This allowed rapid exploration of Quine without having to force events to be sent through Kafka. In that case, you would add fields like:&lt;&#x2F;em&gt; &lt;strong&gt;&lt;code&gt;type: FileIngest, path: &#x2F;json&#x2F;file&#x2F;to&#x2F;load.json&lt;&#x2F;code&gt;&lt;&#x2F;strong&gt;&lt;&#x2F;p&gt;
&lt;p&gt;This ingest query matches all records flowing in where the json record referenced by &lt;strong&gt;&lt;code&gt;$that&lt;&#x2F;code&gt;&lt;&#x2F;strong&gt; has a field named &lt;strong&gt;&lt;code&gt;type&lt;&#x2F;code&gt;&lt;&#x2F;strong&gt; with a value of &lt;strong&gt;&lt;code&gt;group&lt;&#x2F;code&gt;&lt;&#x2F;strong&gt;. We will refer to it as node variable &lt;em&gt;&lt;strong&gt;&lt;code&gt;g&lt;&#x2F;code&gt;&lt;&#x2F;strong&gt;&lt;&#x2F;em&gt;. The &lt;strong&gt;&lt;code&gt;WHERE&lt;&#x2F;code&gt;&lt;&#x2F;strong&gt; clause says the id of the node &lt;em&gt;g&lt;&#x2F;em&gt; must be equal to the result of the &lt;strong&gt;&lt;code&gt;idFrom&lt;&#x2F;code&gt;&lt;&#x2F;strong&gt; &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;docs.quine.io&#x2F;components&#x2F;id-provider.html#idfrom-&quot;&gt;function&lt;&#x2F;a&gt;. The &lt;strong&gt;&lt;code&gt;idFrom&lt;&#x2F;code&gt;&lt;&#x2F;strong&gt; function creates a unique, reproducible ID from the sequence of values given. In this case we will include our node type, and record name. You do not have to worry about concerns such as if the node already exists. If it does, &lt;em&gt;&lt;strong&gt;&lt;code&gt;g&lt;&#x2F;code&gt;&lt;&#x2F;strong&gt;&lt;&#x2F;em&gt; will reference that node. If it does not, &lt;em&gt;&lt;strong&gt;&lt;code&gt;g&lt;&#x2F;code&gt;&lt;&#x2F;strong&gt;&lt;&#x2F;em&gt; will materialize a node for the computed id. The &lt;strong&gt;&lt;code&gt;SET&lt;&#x2F;code&gt;&lt;&#x2F;strong&gt; part later assigns all the fields in the incoming record as properties of the node, updating &lt;strong&gt;&lt;code&gt;g&lt;&#x2F;code&gt;&lt;&#x2F;strong&gt; if it previously existed.&lt;&#x2F;p&gt;
&lt;p&gt;This ingest-query also creates a relationship between node &lt;em&gt;&lt;strong&gt;&lt;code&gt;g&lt;&#x2F;code&gt;&lt;&#x2F;strong&gt;&lt;&#x2F;em&gt; and the parent node identified by the &lt;strong&gt;&lt;code&gt;group&lt;&#x2F;code&gt;&lt;&#x2F;strong&gt; field in my incoming records. The &lt;strong&gt;&lt;code&gt;OPTIONAL MATCH&lt;&#x2F;code&gt;&lt;&#x2F;strong&gt; creates a secondary node query, exactly like the one for node &lt;em&gt;&lt;strong&gt;&lt;code&gt;g&lt;&#x2F;code&gt;&lt;&#x2F;strong&gt;&lt;&#x2F;em&gt; but this time the values passed into ‘idFrom’ represent the parent node. Then we declare a relationship using &lt;strong&gt;&lt;code&gt;CREATE&lt;&#x2F;code&gt;&lt;&#x2F;strong&gt; that says node &lt;em&gt;&lt;strong&gt;&lt;code&gt;g&lt;&#x2F;code&gt;&lt;&#x2F;strong&gt;&lt;&#x2F;em&gt; relates to parent node, and we name that relationship &lt;strong&gt;&lt;code&gt;has_parent&lt;&#x2F;code&gt;&lt;&#x2F;strong&gt;.&lt;&#x2F;p&gt;
&lt;p&gt;Quine adds the &lt;strong&gt;&lt;code&gt;idFrom&lt;&#x2F;code&gt;&lt;&#x2F;strong&gt; function to the Cypher query language, along with the implicit materialization of nodes in match statements to facilitate building the graph from streaming sources without requiring your application to be concerned with the order of data insertion. If we know the expected identifier for the node that we are creating a relationship to, then it will materialize. It just will not have the attributes until a specific record adds them.&lt;&#x2F;p&gt;
&lt;p&gt;I repeated this pattern to also ingest the &lt;strong&gt;&lt;code&gt;requirement&lt;&#x2F;code&gt;&lt;&#x2F;strong&gt; records. At this point my hierarchical data is represented in the Quine graph. Next, I have event records that have relationships to a Subject and a Requirement. This one is a little more complicated:&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;{&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  …, &lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  “format”: {&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    “query”: “&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;MATCH (result), (requirement), (subject)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  WHERE id(result) = idFrom(‘result’, $that.subjectId, $that.requirementName)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  AND id(subject) = idFrom(‘subject’, $that.subjectId)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  AND id(requirement) = idFrom(‘requirement’, $that.requirementName)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  CREATE (subject)-[:results]-&amp;gt;(result)-[:requirements]-&amp;gt;(requirement)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  SET result = {status: $that.status, timestamp: $that.timestamp}, result:Result, subject = {subjectId: $that.subjectId}, subject:Subject&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    “,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    “type”: “CypherJson”&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  }&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;}&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;The above ingest query matches three different nodes as the data flows in, based on identifying data in the event record. It then assigns the event data to the subject and result node and creates two relationships from subject to result and from result to requirement.&lt;&#x2F;p&gt;
&lt;p&gt;The graph may look something like this now:&lt;&#x2F;p&gt;
&lt;p&gt;‍&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;uploads-ssl.webflow.com&#x2F;61f0aecf55af2565526f6a95&#x2F;6215737dfd8af97011cfb39c_graph%20visualization.png&quot; alt=&quot;&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;Graph visualization using the Quine web UI.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;streaming-data-out-of-quine&quot;&gt;Streaming data out of Quine&lt;&#x2F;h3&gt;
&lt;p&gt;My goal was to have the rollup computations performed in Quine and emitted as streaming events when any of the result nodes change. I record those in our relational database for the webservice’s API to retrieve with simple queries. To emit the data, we create a &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;quine.io&#x2F;docs&#x2F;getting-started&#x2F;writing-standing-queries&quot;&gt;Standing Query&lt;&#x2F;a&gt;. This query identifies changes that match and can trigger additional queries to fetch data and emit to one of several outputs, such as a log file, or Kafka topic.&lt;&#x2F;p&gt;
&lt;p&gt;The first part of the Standing Query includes the pattern of data that triggers further behavior. Normally, they are only triggered when added items join the set of results represented by the query. The includedCancellations option set to true requests that the Standing Query be triggered for nodes that stop matching the criteria as well. This works to detect any substantive change in my data set, as I have a status field that is either &lt;strong&gt;&lt;code&gt;PASS&lt;&#x2F;code&gt;&lt;&#x2F;strong&gt; or &lt;strong&gt;&lt;code&gt;FAIL&lt;&#x2F;code&gt;&lt;&#x2F;strong&gt;&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;{&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  &amp;quot;includeCancellations&amp;quot;: true,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  &amp;quot;pattern&amp;quot;: {&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    &amp;quot;query&amp;quot;: &amp;quot;&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;MATCH (s :Subject)-[:results]-&amp;gt;(res :Result)-[:requirements]-&amp;gt;(req :Requirement)-[:has_parent]-&amp;gt;(g :Group)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;WHERE res.status = &amp;#39;PASS&amp;#39;&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;RETURN DISTINCT id(res) as id&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    &amp;quot;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    &amp;quot;type&amp;quot;: &amp;quot;Cypher&amp;quot;&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  },&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  …&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;}&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Next, we want to do something in response to finding these results that change their status. Namely, I want to aggregate the total and passing counts for each level in the graph of groups for each subject that has results. So, we will add to the standing query an output:&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;&amp;quot;outputs&amp;quot;: {&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    &amp;quot;createGroupResult&amp;quot;: {&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;      &amp;quot;type&amp;quot;: &amp;quot;CypherQuery&amp;quot;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;      &amp;quot;query&amp;quot;: &amp;quot;&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;MATCH (subject :Subject)-[:results]-&amp;amp;gt;(res :Result)-[:requirements]-&amp;amp;gt;(:Requirement)-[:has_parent*]-&amp;amp;gt;(group :Group)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;WHERE id(v) = $that.data.id &lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;MATCH (subject)-[:results]-&amp;amp;gt;(allres :Result)-[:requirements]-&amp;amp;gt;(:Requirement)-[:has_parent*]-&amp;amp;gt;(group :Group)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;RETURN id(subject) as subject_id, id(group) as group_id, sum(CASE allres.status WHEN &amp;#39;PASS&amp;#39; THEN 1 ELSE 0 END) AS pass, count(av) as total&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;      &amp;quot;, …&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;‍&lt;br &#x2F;&gt;
That creates another CypherQuery that will consume the result IDs that changed, and computes the rollups needed. It first matches on the subject related through that result, and with relationship expression &lt;strong&gt;&lt;code&gt;-&lt;&#x2F;code&gt;****&lt;code&gt;[:has_parent*]-&amp;gt;&lt;&#x2F;code&gt;&lt;&#x2F;strong&gt; it matches all the groups in the parent hierarchy. Then it performs another match constrained to that subject and each of those groups, where it finds &lt;em&gt;allres&lt;&#x2F;em&gt; (all the results in between) and uses the aggregation functions sum and count to produce the pass and total values.&lt;&#x2F;p&gt;
&lt;p&gt;That little ‘*’ asterisk replaces the need we had for recursive query functions in my RDMS that are complex, and difficult to work with in SQL. In a graph walking an &lt;em&gt;n&lt;&#x2F;em&gt; length chain of relationships is a natural fit. With an RDBMS it is often a requirement to predict this sort of query requirement and denormalize the data insertion to accommodate efficient retrieval. Or it requires fairly complex queries. When solving this with an RDBMS you are creating a bespoke, poor man’s graph data store, overtop models optimized for completely different purposes. However you do so without any of the tools inherent in Quine and the Cypher query language that allow working with the data efficiently.&lt;&#x2F;p&gt;
&lt;p&gt;For future queries, I wanted to record these aggregations back into the graph. So before chaining in an external output to the Kafka Stream, I add another CypherQuery to the chain:&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;&amp;quot;andThen&amp;quot;: {&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        &amp;quot;type&amp;quot;: &amp;quot;CypherQuery&amp;quot;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        &amp;quot;query&amp;quot;: &amp;quot;&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;MATCH (subject :Subject), (group :Group)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;WHERE id(subject) = $that.data.subject_id AND id(group) = $that.data.group_id&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;MATCH (gr)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;WHERE id(gr) = idFrom(&amp;#39;group_result&amp;#39;, id(subject), id(group))&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;CREATE (subject)-[:group_results]-&amp;gt;(gr)-[:has_group]-&amp;gt;(group)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;SET gr = {pass: $that.data.pass, total: $that.data.total}, gr:GroupResult&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;RETURN subject.subject_id as subjectId, group.id as groupId, group.name as name, gr.pass as pass, gr.total as total&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        &amp;quot;, …&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;This matches the group and subject from the previous output IDs, and then materializes a GroupResult that holds the pass and total values with relationships to that group and subject. Finally, it defines the return expression which will be a JSON object containing subjectId, groupId, name, pass, and total fields. We send them on to an external output by adding an andThen to the previous CypherQuery.&lt;&#x2F;p&gt;
&lt;p&gt;Note: for testing purposes, much like you can ingest from files, you can output to a file with a &lt;strong&gt;&lt;code&gt;WriteToFile&lt;&#x2F;code&gt;&lt;&#x2F;strong&gt; in the output chain: &lt;strong&gt;&lt;code&gt;andThen:{type:WriteToFile, path: rollups.json}&lt;&#x2F;code&gt;&lt;&#x2F;strong&gt;&lt;&#x2F;p&gt;
&lt;h3 id=&quot;a-win-win-situation&quot;&gt;A Win-Win Situation&lt;&#x2F;h3&gt;
&lt;p&gt;My organization has chosen to use Kafka streaming to produce our event data flows, as well as to build our web services APIs over top traditional RDBMs data stores for skill set purposes. By emitting my rollup values and storing them into the RDBMs, I am effectively materializing the view required to optimize answering the data questions our API authors encounter.&lt;&#x2F;p&gt;
&lt;p&gt;Working with the graph provides a flexible approach over simulating a graph in an RDBMS with recursive queries. Ramp up on the Cypher query language was quick, as there are concise training materials in video form on various learning websites. Learning to ingest the data, compute the rollups and egress the data from Quine took me about the same amount of clock time as a team of people spent expressing the recursive queries and tuning them in the RDBMS. The biggest issue being that, as a rule, with an RDBMS you must optimize your data insertion model for your data retrieval needs. This can lead to substantial changes for little questions. With Quine’s graph model, it is much easier to make incremental changes to the event data models without rewriting the whole pipeline.&lt;&#x2F;p&gt;
&lt;p&gt;Additionally, by feeding the rollups back into the Quine graph, they are available as inputs to further standing queries. It now becomes easy to create further events when the rollups change, such as raising a red flag if one of the subjects scores too low for one of the top tier groups.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;em&gt;Courses I used to ramp up on the Cypher query language:&lt;&#x2F;em&gt;&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;NoSQL: Neo4j and Cypher (Part: 1-Beginners) &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;www.udemy.com&#x2F;course&#x2F;neo4j_beginners1&#x2F;&quot;&gt;https:&#x2F;&#x2F;www.udemy.com&#x2F;course&#x2F;neo4j_beginners1&#x2F;&lt;&#x2F;a&gt;,&lt;&#x2F;li&gt;
&lt;li&gt;NoSQL: Neo4j and Cypher (Part: 2-Intermediate) &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;www.udemy.com&#x2F;course&#x2F;neo4j_intermediate&#x2F;&quot;&gt;https:&#x2F;&#x2F;www.udemy.com&#x2F;course&#x2F;neo4j_intermediate&#x2F;&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>Announcing Open Source Release of Quine Streaming Graph</title>
        <published>2022-02-23T00:00:00+00:00</published>
        <updated>2022-02-23T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              Unknown
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://www.thatdot.com/news/announcing-open-source-release-of-quine-streaming-graph/"/>
        <id>https://www.thatdot.com/news/announcing-open-source-release-of-quine-streaming-graph/</id>
        
        <content type="html" xml:base="https://www.thatdot.com/news/announcing-open-source-release-of-quine-streaming-graph/">&lt;h2 id=&quot;the-world-s-first-streaming-graph-for-categorical-data&quot;&gt;The World&#x27;s First Streaming Graph for Categorical Data&lt;&#x2F;h2&gt;
&lt;p&gt;We at thatDot are pleased to announce that today the release of &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;http:&#x2F;&#x2F;quine.io&#x2F;&quot;&gt;Quine&lt;&#x2F;a&gt;, our streaming graph for event processing, as an open source project. Quine’s &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;assets.website-files.com&#x2F;61f0aecf55af2560f76f6a75&#x2F;620fd58ba117ef2365c2ab07_Quine_StreamingGraph_WP1.1.pdf&quot;&gt;unique approach&lt;&#x2F;a&gt; combines graph data and streaming technologies into a modern, developer-friendly open source software package. For the first time, teams can process &lt;a href=&quot;https:&#x2F;&#x2F;www.thatdot.com&#x2F;blog&#x2F;what-is-categorical-data&#x2F;&quot;&gt;categorical data&lt;&#x2F;a&gt; in real time without resorting to encoding methods.&lt;&#x2F;p&gt;
&lt;p&gt;Developers and data pipeline engineers use Quine to rapidly build high volume, real-time, complex event processing workflows at scale. A handful of Quine queries can replace months of development time and millions in costs, eliminating batch processing, multi-level joins, time windows, and other time-consuming and outdated processes that drag down and stall analysis on streaming data.&lt;&#x2F;p&gt;
&lt;p&gt;“Enterprise data engineering teams are confined to the limitations and tradeoffs of the previous generation of event processing frameworks like Flink. They spend enormous time and effort building complicated event-driven architectures that only work on small time-windows of in-memory data and miss out on the bigger picture,” said Ryan Wright, the creator of Quine and Founder&#x2F;CEO of thatDot. “Quine can transform months of tedious data engineering into an afternoon’s work enabling data pipeline engineers to easily interpret high-volume event data streams, innovate and ship products  faster, and to use the emerging Graph AI tools driving the next wave in machine learning.”&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Community Created, Pre-built Recipes for Common Workflows&lt;&#x2F;strong&gt;&lt;&#x2F;p&gt;
&lt;p&gt;Early access launch partners, community members, and contributors have already created pre-built application functions called “&lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;quine.io&#x2F;recipes&quot;&gt;recipes&lt;&#x2F;a&gt;” —to package up valuable use cases for one-click operation. These include:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;em&gt;Blockchain Real-time Tag Propagation -&lt;&#x2F;em&gt; Ingests Ethereum and propagates dirty money tags to trace money laundering.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;em&gt;CDN Cache Efficiency Analysis -&lt;&#x2F;em&gt; Continuously monitor CDN logs to materialize cache efficiency by PoP, Geography, and ASN, generating alerts and tracing root cause.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;em&gt;Apache Server Log Observability -&lt;&#x2F;em&gt; Ingests Apache server events and observes event lineage between services.&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;“thatDot’s Quine is a powerful new tool for anyone building event-driven applications. Standing queries let us match complex patterns as data arrives as well as query the past shape of data without the restriction of time windows.” Roy Hodgman, Data Science Manager, Rapid7&lt;&#x2F;p&gt;
&lt;p&gt;Quine is freely available  on &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;thatdot&#x2F;quine&quot;&gt;GitHub&lt;&#x2F;a&gt; and directly from the &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;quine.io&#x2F;&quot;&gt;Quine Community website&lt;&#x2F;a&gt;. Quine is part of thatDot’s portfolio of event processing solutions. Elements of thatDot’s solutions were  instrumental to DARPA’s cybersecurity research for insider threat detection and stopping Advanced Persistent Threats (APTs) .&lt;&#x2F;p&gt;
&lt;p&gt;Register for the upcoming conferences, webinars or recordings showcasing Quine:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;www.eventbrite.com&#x2F;e&#x2F;webinar-introducing-quine-a-streaming-graph-for-modern-data-pipelines-tickets-250100987787&quot;&gt;Open Data Science Conference (ODSC) Webinar&lt;&#x2F;a&gt; (March 3, 2022)&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;http:&#x2F;&#x2F;www.globalbigdataconference.com&#x2F;santa-clara&#x2F;global-artificial-intelligence-virtual-conference&#x2F;event-129.html&quot;&gt;Global Big Data &amp;amp; AI Conference&lt;&#x2F;a&gt; (March 18-19, 2022)&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;www.meetup.com&#x2F;PDX-Data-Engineering&#x2F;events&#x2F;283493444&#x2F;&quot;&gt;PDXDataEngineering Meetup&lt;&#x2F;a&gt; (March 10, 2022)&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;‍&lt;&#x2F;p&gt;
&lt;p&gt;‍&lt;&#x2F;p&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>AWS Names thatDot&#x27;s Novelty Detector As A Containers Anywhere Partner</title>
        <published>2021-12-08T00:00:00+00:00</published>
        <updated>2021-12-08T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              Unknown
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://www.thatdot.com/blog/aws-announces-thatdot-anomaly-detector-as-a-containers-anywhere-launch-partner/"/>
        <id>https://www.thatdot.com/blog/aws-announces-thatdot-anomaly-detector-as-a-containers-anywhere-launch-partner/</id>
        
        <content type="html" xml:base="https://www.thatdot.com/blog/aws-announces-thatdot-anomaly-detector-as-a-containers-anywhere-launch-partner/">&lt;p&gt;Bringing cloud-based data management into the enterprise data center, where much enterprise data still lives, is now simpler than ever with AWS Containers Anywhere and thatDot is excited to be a launch partner of the new &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;aws.amazon.com&#x2F;blogs&#x2F;aws&#x2F;new-aws-marketplace-for-containers-anywhere-to-deploy-your-kubernetes-cluster-in-any-environment&#x2F;&quot;&gt;AWS Marketplace for Containers Anywhere&lt;&#x2F;a&gt; program.&lt;&#x2F;p&gt;
&lt;p&gt;As expected, AWS Re:Invent 2021 delivered great training content and insights into where the Cloud industry is going. The emphasis on AI and ML enablement was a clear topic of focus, reflecting the ever wider adoption of technology that leverages data for better business outcomes. With AWS Marketplace for Containers Anywhere enterprises can leverage leading technologies within the bounds of their own data centers.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;thatdot-novelty-detector&quot;&gt;thatDot Novelty Detector&lt;&#x2F;h2&gt;
&lt;p&gt;Existing anomaly detection techniques rely on numerical data and threshold analysis, which breaks down in the face of high data dimensionality and produces high volumes of false-positives. thatDot Novelty Detector uses &lt;a href=&quot;https:&#x2F;&#x2F;www.thatdot.com&#x2F;blog&#x2F;what-is-categorical-data&#x2F;&quot;&gt;categorical data&lt;&#x2F;a&gt; to build a comprehensive behavioral fingerprint of your data. This &lt;a href=&quot;https:&#x2F;&#x2F;www.thatdot.com&#x2F;blog&#x2F;the-known-security-challenge-of-the-unknown&#x2F;&quot;&gt;deep contextual understanding&lt;&#x2F;a&gt; eliminates false-positives and provides &lt;em&gt;WHY&lt;&#x2F;em&gt; an anomaly was identified, making it immediately actionable.&lt;&#x2F;p&gt;
&lt;p&gt;Popular uses of thatDot Novelty Detector by AWS users include Detecting Stolen Credential Use in AWS CloudTrail Logs and &lt;a href=&quot;https:&#x2F;&#x2F;www.thatdot.com&#x2F;blog&#x2F;identifying-data-exfiltration-in-aws-cloudtrail-logs-using-categorical-anomaly-detection&#x2F;&quot;&gt;Detecting Data Exfiltration in AWS CloudTrail Logs&lt;&#x2F;a&gt;.&lt;&#x2F;p&gt;
&lt;p&gt;If you are looking to deploy thatDot Novelty Detector in any AWS region, you can follow &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;docs.thatdot.com&#x2F;&quot;&gt;these instructions&lt;&#x2F;a&gt; documented in the thatDot documentation site.&lt;&#x2F;p&gt;
&lt;p&gt;‍&lt;&#x2F;p&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>Stop Insider Threats With Automated Behavioral Anomaly Detection</title>
        <published>2021-10-13T00:00:00+00:00</published>
        <updated>2021-10-13T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              Unknown
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://www.thatdot.com/blog/stop-insider-threats-with-automated-behavioral-anomaly-detection/"/>
        <id>https://www.thatdot.com/blog/stop-insider-threats-with-automated-behavioral-anomaly-detection/</id>
        
        <content type="html" xml:base="https://www.thatdot.com/blog/stop-insider-threats-with-automated-behavioral-anomaly-detection/">&lt;h2 id=&quot;introduction-a-murky-and-labor-intensive-challenge&quot;&gt;Introduction: A Murky and Labor-Intensive Challenge&lt;&#x2F;h2&gt;
&lt;p&gt;Finding a malicious employee is one of the toughest cyber-security challenges in the industry. Someone who has been deliberately given access to sensitive information… but violates that trust and secretly steals private data to give to a third party. Finding such a threat has always been a murky and labor-intensive challenge. Until now.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;behavioral-anomaly-detection&quot;&gt;Behavioral Anomaly Detection&lt;&#x2F;h2&gt;
&lt;p&gt;We’ve developed a new technique for measuring how unusual each behavior is. Unlike traditional approaches, this innovation uses non-numeric &lt;a href=&quot;https:&#x2F;&#x2F;www.thatdot.com&#x2F;blog&#x2F;what-is-categorical-data&#x2F;&quot;&gt;categorical data&lt;&#x2F;a&gt; directly, instead of trying to convert it to numbers first. We can analyze that data in real-time to score each behavioral event to explain how unusual it is. As a result, it’s very easy to automate the analysis of behavioral data, and get powerful results in real-time.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;vast-insider-threat-dataset&quot;&gt;VAST Insider Threat Dataset&lt;&#x2F;h2&gt;
&lt;p&gt;A standard benchmark dataset for insider threat detection is the publicly available &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;www.osti.gov&#x2F;biblio&#x2F;1001546&quot;&gt;VAST Insider Threat challenge dataset&lt;&#x2F;a&gt;. Originally released in 2009, it remains a good example of the data available—and challenge presented—to cyber-security professionals today who are trying to detect and stop a malicious insider. This post will use the &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;http:&#x2F;&#x2F;visualdata.wustl.edu&#x2F;varepository&#x2F;VAST%20Challenge%202009&#x2F;challenges&#x2F;MC1%20-%20Badge%20and%20Network%20Traffic&#x2F;&quot;&gt;VAST dataset for the “mini-challenge” #1: “MC1 – Badge and Network Traffic”&lt;&#x2F;a&gt; to demonstrate automated insider threat detection. For comparison, &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;www.cs.umd.edu&#x2F;hcil&#x2F;VASTchallenge09&#x2F;2009&#x2F;Palantir_VAST09&#x2F;traffic&#x2F;traffic_15&#x2F;Palantir_MC1&#x2F;Palantir_MC1&#x2F;index.htm&quot;&gt;the accepted solution to the problem can be found here&lt;&#x2F;a&gt;, where a laborious manual exploration was the previously accepted approach to this problem.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-embassy-scenario&quot;&gt;The Embassy Scenario&lt;&#x2F;h2&gt;
&lt;p&gt;The data describes the workplace environment of 60 employees at an embassy. There are 30 offices shared with two people to an office. A classified space is separated from the offices. There are two doors, one provides access between the outside and the office area, and the other provides access from the office space to the classified space.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;uploads-ssl.webflow.com&#x2F;61f0aecf55af2565526f6a95&#x2F;61f2f5cef4b1a74c654a13f7_Embassy.jpeg&quot; alt=&quot;&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;Employees are expected to scan their badges every time they enter from the outside, and every time they enter and leave the classified space. Each office holds the desktop computers assigned uniquely to each employee who shares that office. The last octet of each computer address corresponds to the employee ID.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;source-data-download&quot;&gt;Source Data [&lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;that.re&#x2F;insider-threat&quot;&gt;download&lt;&#x2F;a&gt;]&lt;&#x2F;h2&gt;
&lt;p&gt;Data from the &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;that.re&#x2F;insider-threat&quot;&gt;dataset&lt;&#x2F;a&gt; is broken into three files: employee data in &lt;strong&gt;&lt;code&gt;employeeData.csv&lt;&#x2F;code&gt;&lt;&#x2F;strong&gt;, proximity data from door badge scanners in &lt;strong&gt;&lt;code&gt;proxLog.csv&lt;&#x2F;code&gt;&lt;&#x2F;strong&gt;, and network traffic in I &lt;strong&gt;&lt;code&gt;PLog3.5.csv&lt;&#x2F;code&gt;&lt;&#x2F;strong&gt;. Some sample records of each file (below) shows their shape.&lt;&#x2F;p&gt;
&lt;p&gt;Note: the original data includes a &lt;strong&gt;&lt;code&gt;USER WARNING&lt;&#x2F;code&gt;&lt;&#x2F;strong&gt; that this is &lt;strong&gt;&lt;code&gt;Synthetic Data&lt;&#x2F;code&gt;&lt;&#x2F;strong&gt; . This field appears on each record of the employee and network traffic files, but is deleted from this point on for brevity.&lt;&#x2F;p&gt;
&lt;p&gt;Data is read from each &lt;strong&gt;&lt;code&gt;.csv&lt;&#x2F;code&gt;&lt;&#x2F;strong&gt; file and parsed into Python dictionaries, also parsing out timestamps into &lt;strong&gt;&lt;code&gt;datetime&lt;&#x2F;code&gt;&lt;&#x2F;strong&gt; objects:&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;employeeData = []&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;ipLog = []&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;proxLog = []&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;# Reading employee data&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;with open(&amp;quot;employeeData.csv&amp;quot;, &amp;quot;r&amp;quot;) as fd:&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    for l in csv.DictReader(fd):&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        del l[&amp;quot;USER_WARNING&amp;quot;]&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        employeeData.append(l)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;# Reading IP log data&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;with open(&amp;quot;IPLog3.5.csv&amp;quot;, &amp;quot;r&amp;quot;) as fd:&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    for l in csv.DictReader(fd):&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        del l[&amp;quot;USER WARNING&amp;quot;]&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        l[&amp;quot;AccessTime&amp;quot;] = datetime.fromisoformat(l[&amp;quot;AccessTime&amp;quot;])&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        ipLog.append(l)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;# Reading proximity log data&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;with open(&amp;quot;proxLog.csv&amp;quot;, &amp;quot;r&amp;quot;) as fd:&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    for l in csv.DictReader(fd):&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        l[&amp;quot;Datetime&amp;quot;] = datetime.fromisoformat(l[&amp;quot;Datetime&amp;quot;])&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        proxLog.append(l)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Looking at the first three sample records from each file helps build a sense of what this data looks like:&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;pprint(employeeData[:3])&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;len(employeeData)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;[{&amp;quot;EmployeeID&amp;quot;: &amp;quot;0&amp;quot;, &amp;quot;IP&amp;quot;: &amp;quot;37.170.100.0&amp;quot;, &amp;quot;Office&amp;quot;: &amp;quot;0&amp;quot;},&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt; {&amp;quot;EmployeeID&amp;quot;: &amp;quot;1&amp;quot;, &amp;quot;IP&amp;quot;: &amp;quot;37.170.100.1&amp;quot;, &amp;quot;Office&amp;quot;: &amp;quot;0&amp;quot;},&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt; {&amp;quot;EmployeeID&amp;quot;: &amp;quot;2&amp;quot;, &amp;quot;IP&amp;quot;: &amp;quot;37.170.100.2&amp;quot;, &amp;quot;Office&amp;quot;: &amp;quot;1&amp;quot;}]&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;60 records&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;pprint(proxLog[:3])&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;len(proxLog)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;[&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  {&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    Datetime: datetime.datetime(2008, 1, 1, 7, 28),&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    ID: &amp;quot;44&amp;quot;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    Type: &amp;quot;prox-in-building&amp;quot;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  },&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  {&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    Datetime: datetime.datetime(2008, 1, 1, 8, 31),&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    ID: &amp;quot;44&amp;quot;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    Type: &amp;quot;prox-in-classified&amp;quot;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  },&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  {&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    Datetime: datetime.datetime(2008, 1, 1, 9, 23),&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    ID: &amp;quot;38&amp;quot;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    Type: &amp;quot;prox-in-building&amp;quot;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  },&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;];&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;10162 records&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;pprint(ipLog[:3])&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;len(ipLog)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;[&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  {&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    DestIP: &amp;quot;37.170.100.200&amp;quot;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    ReqSize: 7063,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    RespSize: 49591,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    Socket: &amp;quot;80&amp;quot;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    SourceIP: &amp;quot;37.170.100.38&amp;quot;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  },&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  {&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    AccessTime: datetime.datetime(2008, 1, 1, 9, 43, 8, 861000),&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    DestIP: &amp;quot;37.157.76.124&amp;quot;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    ReqSize: 5171,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    RespSize: 434285,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    Socket: &amp;quot;80&amp;quot;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    SourceIP: &amp;quot;37.170.100.38&amp;quot;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  },&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  {&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    AccessTime: datetime.datetime(2008, 1, 1, 9, 47, 41, 282000),&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    DestIP: &amp;quot;37.170.30.250&amp;quot;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    ReqSize: 32818,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    RespSize: 182798,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    Socket: &amp;quot;25&amp;quot;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    SourceIP: &amp;quot;37.170.100.38&amp;quot;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  },&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;];&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;115414 records&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;&lt;h2 id=&quot;behavioral-data&quot;&gt;Behavioral Data&lt;&#x2F;h2&gt;
&lt;p&gt;The proximity data in &lt;strong&gt;&lt;code&gt;proxLog&lt;&#x2F;code&gt;&lt;&#x2F;strong&gt; and network traffic data in &lt;strong&gt;&lt;code&gt;ipLog&lt;&#x2F;code&gt;&lt;&#x2F;strong&gt; both represent different kinds of behavioral data. Though they are separate files, each type of data is an aspect of the same behavioral information we’re trying to analyze. They can be combined together by adding the relevant employees status from the &lt;strong&gt;&lt;code&gt;proxLog&lt;&#x2F;code&gt;&lt;&#x2F;strong&gt; to each record from &lt;strong&gt;&lt;code&gt;ipLog&lt;&#x2F;code&gt;&lt;&#x2F;strong&gt;:&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;sorted_emp_prox = {} &lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;for id in [e[&amp;quot;EmployeeID&amp;quot;] for e in employeeData]:&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    sorted_emp_prox[id] = sorted([x for x in proxLog if x[&amp;quot;ID&amp;quot;] == id], key=lambda x: x[&amp;quot;Datetime&amp;quot;])&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    &lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;def emp_state_at_time(employee, dt):&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    last = None&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    for e in sorted_emp_prox[employee]:&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        if e[&amp;quot;Datetime&amp;quot;] &amp;lt; dt:&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;            last = e&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        else: # e[&amp;quot;AccessTime&amp;quot;] &amp;gt;= dt:&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;            if last is not None:&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;                return last[&amp;quot;Type&amp;quot;]&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;            else:&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;                return &amp;quot;unknown&amp;quot;&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    if last is not None:&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        return last[&amp;quot;Type&amp;quot;]&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    else:&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        return &amp;quot;unknown&amp;quot;&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;emp_for_ip = {e[&amp;quot;IP&amp;quot;]: e[&amp;quot;EmployeeID&amp;quot;] for e in employeeData}&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;behaviors = [[emp_state_at_time(emp_for_ip[x[&amp;quot;SourceIP&amp;quot;]], x[&amp;quot;AccessTime&amp;quot;]), x[&amp;quot;SourceIP&amp;quot;], x[&amp;quot;Socket&amp;quot;], x[&amp;quot;DestIP&amp;quot;]] for x in ipLog]&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;pprint(behaviors[:5])&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;We’ve added proximity badge status to each network record, and then trimmed those records down to only four values relevant for our automated detection:&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;[&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  [&amp;quot;prox-in-building&amp;quot;, &amp;quot;37.170.100.38&amp;quot;, &amp;quot;80&amp;quot;, &amp;quot;37.170.100.200&amp;quot;],&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  [&amp;quot;prox-in-building&amp;quot;, &amp;quot;37.170.100.38&amp;quot;, &amp;quot;80&amp;quot;, &amp;quot;37.157.76.124&amp;quot;],&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  [&amp;quot;prox-in-building&amp;quot;, &amp;quot;37.170.100.38&amp;quot;, &amp;quot;25&amp;quot;, &amp;quot;37.170.30.250&amp;quot;],&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  [&amp;quot;prox-in-building&amp;quot;, &amp;quot;37.170.100.38&amp;quot;, &amp;quot;80&amp;quot;, &amp;quot;37.116.192.39&amp;quot;],&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  [&amp;quot;prox-in-building&amp;quot;, &amp;quot;37.170.100.38&amp;quot;, &amp;quot;80&amp;quot;, &amp;quot;10.24.74.254&amp;quot;],&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  ...&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;];&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;&#x2F;&#x2F; 115414 records&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;&lt;h2 id=&quot;automated-detection&quot;&gt;Automated Detection&lt;&#x2F;h2&gt;
&lt;p&gt;With the combined proximity+network data, we can dump that out as a &lt;strong&gt;&lt;code&gt;.csv&lt;&#x2F;code&gt;&lt;&#x2F;strong&gt; file and use the built-in “Getting Started” page to easily feed this to thatDot’s Novelty Detection system:&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;uploads-ssl.webflow.com&#x2F;61f0aecf55af2565526f6a95&#x2F;61f2f5cf4c22f9de95be04d2_td-ad-getting-starty-empty.png&quot; alt=&quot;&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;Loading data trains the system and scores each result as it streams in.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;uploads-ssl.webflow.com&#x2F;61f0aecf55af2565526f6a95&#x2F;61f2f5cff4b1a79b6a4a13f8_td-ad-getting-started-in-progress.png&quot; alt=&quot;&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;Streaming results are often more difficult to produce but more useful than batch results. However, since this scenario is a batch of data and meant to be analyzed as a batch, we can use a quick Python script to make a second pass that feeds the data to the novelty scoring system in a read-only fashion (does not update the model) to get results where each is informed by the entire batch:&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;e = read_results(&amp;amp;quot;behaviors&amp;amp;quot;, behaviors, 10000)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;Read: 10,000  Rate: 52,631 &#x2F; second&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;Read: 10,000  Rate: 38,910 &#x2F; second&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;Read: 10,000  Rate: 56,497 &#x2F; second&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;Read: 10,000  Rate: 43,668 &#x2F; second&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;Read: 10,000  Rate: 55,248 &#x2F; second&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;Read: 10,000  Rate: 56,497 &#x2F; second&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;Read: 10,000  Rate: 47,169 &#x2F; second&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;Read: 10,000  Rate: 35,087 &#x2F; second&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;Read: 10,000  Rate: 49,019 &#x2F; second&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;Read: 10,000  Rate: 46,296 &#x2F; second&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;Read: 10,000  Rate: 56,818 &#x2F; second&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;Read:  5,414  Rate: 40,103 &#x2F; second&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;And plot the results:&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;px.scatter(limit([x for x in e if x[&amp;quot;score&amp;quot;] &amp;gt; 0.5]), title=&amp;quot;Scores after observing all data (sampled)&amp;quot;, y=&amp;quot;score&amp;quot;, x=&amp;quot;sequence&amp;quot;, marginal_y=&amp;quot;violin&amp;quot;).show()&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;‍&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;uploads-ssl.webflow.com&#x2F;61f0aecf55af2565526f6a95&#x2F;61f2f5cf9627049c6ab680f4_vast-behaviors.png&quot; alt=&quot;&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;More novel, high scoring results show up higher on the plot. We see from this sampling of results (not every single data point is shown), that there is a distinct group of nodes which show up with very high scores around 0.94.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;uploads-ssl.webflow.com&#x2F;61f0aecf55af2565526f6a95&#x2F;61f2f5cf8a64e3e76c0017a1_vast-behaviors-result1.png&quot; alt=&quot;&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;Novelty Detector has found a small number of clearly unusual behaviors. Looking at the observation results shows that each of these events was network traffic to the same destination IP address: &lt;strong&gt;&lt;code&gt;100.59.151.133&lt;&#x2F;code&gt;&lt;&#x2F;strong&gt;. What’s more, these connections were usually made from different source IP addresses, but while the respective employee responsible for that computer was badged into the classified space.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;uploads-ssl.webflow.com&#x2F;61f0aecf55af2565526f6a95&#x2F;61f2f5cfe32e6a3ab4894050_vast-behaviors-result2.png&quot; alt=&quot;&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;h2 id=&quot;who-done-it&quot;&gt;Who Done It?&lt;&#x2F;h2&gt;
&lt;p&gt;Novelty Detector successfully found the needle in the haystack! Eighteen records in the network traffic log correspond with traffic sent to this suspicious IP address:&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;suspicious_ip = list(set([x[&amp;quot;observation&amp;quot;][3] for x in e if x[&amp;quot;score&amp;quot;] &amp;gt; 0.9]))[0]&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;suspicious_ip_traffic = [x for x in ipLog if x[&amp;quot;DestIP&amp;quot;] == &amp;quot;100.59.151.133&amp;quot;]&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;pprint(suspicious_ip_traffic[:3])&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;[{&amp;quot;AccessTime&amp;quot;: datetime.datetime(2008, 1, 8, 17, 1, 33, 1000),&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;	&amp;quot;DestIP&amp;quot;: &amp;quot;100.59.151.133&amp;quot;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;	&amp;quot;ReqSize&amp;quot;: 8889677,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;	&amp;quot;RespSize&amp;quot;: 12223,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;	&amp;quot;Socket&amp;quot;: &amp;quot;8080&amp;quot;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;	&amp;quot;SourceIP&amp;quot;: &amp;quot;37.170.100.31&amp;quot;},&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;   {&amp;quot;AccessTime&amp;quot;: datetime.datetime(2008, 1, 10, 14, 27, 12, 238000),&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;	&amp;quot;DestIP&amp;quot;: &amp;quot;100.59.151.133&amp;quot;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;	&amp;quot;ReqSize&amp;quot;: 6543216,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;	&amp;quot;RespSize&amp;quot;: 22315,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;	&amp;quot;Socket&amp;quot;: &amp;quot;8080&amp;quot;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;	&amp;quot;SourceIP&amp;quot;: &amp;quot;37.170.100.31&amp;quot;},&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;   {&amp;quot;AccessTime&amp;quot;: datetime.datetime(2008, 1, 10, 16, 1, 53, 956000),&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;	&amp;quot;DestIP&amp;quot;: &amp;quot;100.59.151.133&amp;quot;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;	&amp;quot;ReqSize&amp;quot;: 8543125,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;	&amp;quot;RespSize&amp;quot;: 12312,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;	&amp;quot;Socket&amp;quot;: &amp;quot;8080&amp;quot;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;	&amp;quot;SourceIP&amp;quot;: &amp;quot;37.170.100.16&amp;quot;}]&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  &lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  18 records&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;So who is the guilty party?&lt;&#x2F;p&gt;
&lt;p&gt;Using the behavioral data about employee proximity, we can put together a set of people who were available to access the compromised computer during each of the 18 illicit data transmissions:&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;def available_emp(atTime):&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    a = []&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    available = [&amp;quot;prox-out-classified&amp;quot;, &amp;quot;prox-in-building&amp;quot;]&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    for e in employeeData:&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        if emp_state_at_time(e[&amp;quot;EmployeeID&amp;quot;], atTime) in available:&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;            a.append(e[&amp;quot;EmployeeID&amp;quot;])&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    return a&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;opportunities = [available_emp(x[&amp;quot;AccessTime&amp;quot;]) for x in suspicious_ip_traffic]&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;[len(x) for x in opportunities]&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;At each of the relevant times, there were between 38–60 employees available.&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;[52, 43, 49, 42, 42, 38, 53, 60, 47, 54, 52, 48, 47, 44, 44, 56, 45, 45]&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;But who was available during &lt;em&gt;&lt;strong&gt;ALL&lt;&#x2F;strong&gt;&lt;&#x2F;em&gt; times?&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;had_opportunity = set(opportunities[0])&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;for o in opportunities:&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    had_opportunity = had_opportunity.intersection(o)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;had_opportunity&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;{&amp;quot;27&amp;quot;, &amp;quot;30&amp;quot;}&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Only two employees! Maybe they are colluding. But if not, one of them would be a witness if their officemate’s computer was compromised. Looking at the traffic records, we can focus in on suspects who wouldn’t be caught by officemate in the room with the compromised computer:&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;def not_caught_by_officemate(suspects, suspicious_events):&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    still_suspect = []&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    for s in suspects:&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        suspect_office = [e[&amp;quot;Office&amp;quot;] for e in employeeData if e[&amp;quot;EmployeeID&amp;quot;] == s][0]&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        officemate_ip = [e[&amp;quot;IP&amp;quot;] for e in employeeData if e[&amp;quot;Office&amp;quot;] == suspect_office and e[&amp;quot;EmployeeID&amp;quot;] != s][0]&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        sent_traffic = [x[&amp;quot;SourceIP&amp;quot;] for x in suspicious_ip_traffic]&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        if officemate_ip in sent_traffic:&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;            still_suspect.append(s)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    return still_suspect&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;not_caught_by_officemate(had_opportunity, suspicious_ip_traffic)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;&lt;pre class=&quot;giallo&quot; style=&quot;color: #BFBDB6; background-color: #0D1017;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;{&amp;quot;30&amp;quot;}&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Which leaves only Employee #30 as the guilty party. Employee #30 had opportunity during each event and was not observable by the officemate.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;uploads-ssl.webflow.com&#x2F;61f0aecf55af2565526f6a95&#x2F;61f2f5cfcef539caa111c3bc_case-closed.png&quot; alt=&quot;&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;h2 id=&quot;bonus-real-time-automated-detection&quot;&gt;Bonus: Real-Time Automated Detection&lt;&#x2F;h2&gt;
&lt;p&gt;Novelty Detector doesn’t have to be used only on static data. In fact, it’s meant to run on live streaming data and to produce results immediately, in real-time. What would the experience be like to a person who was monitoring this data live while it came in? Could they have stopped the malicious insider sooner?&lt;&#x2F;p&gt;
&lt;p&gt;This image shows the “Results” section of Novelty Detector tool, which updates live with all the recent results:&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;uploads-ssl.webflow.com&#x2F;61f0aecf55af2565526f6a95&#x2F;61f2f5cfb8d7a276f41985ad_td-ad-recents.png&quot; alt=&quot;&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;This plot shows the scores as they were produced in real-time; higher dots correspond with more novel data. The colors in this plot indicate how unique each behavioral observation was (dark red means we had never seen that specific observation before). You can see from the x-axis that it only shows the most recent result. Another plot on this page shows the highest scoring results overall:&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;uploads-ssl.webflow.com&#x2F;61f0aecf55af2565526f6a95&#x2F;61f2f5cf46530559836cb649_td-ad-top.png&quot; alt=&quot;&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;Since these results are the real-time streaming results, each observation is scored immediately without any knowledge of what data will arrive after it. This system uses past data to understand how novel new data is, so at the beginning of the stream, there is a lot of novel data. Those early high scores are genuinely novel at that moment, given what little has been seen so far, but for those of us with knowledge of the whole dataset, we know that is not enough data to have a representative sample of what is yet to come.&lt;&#x2F;p&gt;
&lt;p&gt;After about 25k observations in this dataset, the novelty scoring system has automatically learned what this data looks like. Notice that &lt;strong&gt;no data labeling is required!&lt;&#x2F;strong&gt; Simply turn on the system and let it observe a representative sample of your data. Ignore the early results while it is automatically training to fit the data. In a real-world scenario, this system would likely be running long before the insider began their malicious activity, so the early tuning would have long since been accomplished.&lt;&#x2F;p&gt;
&lt;p&gt;A security analyst observing this stream of results would regularly see low-scoring results they would easily ignore during the course of normal business. But the moment the malicious insider sneaked onto an employee’s computer, it would produce the first top-scoring result:&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;uploads-ssl.webflow.com&#x2F;61f0aecf55af2565526f6a95&#x2F;61f2f5cfe32e6a1d6f894067_td-ad-top-first.png&quot; alt=&quot;&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;The observer would have a clear signal about the first occurrence of malicious activity—in this case, when Employee #30 used their office-mate’s computer while #31 was in the classified space. That warning would come with enough information for an observer to understand the context of what is happening, why is it unusual, and where to go &lt;em&gt;RIGHT NOW&lt;&#x2F;em&gt; to catch the malicious insider red handed! If the observer missed this opportunity or needed to wait for more evidence, subsequent malicious activity on other computers would give the authorities additional opportunities and evidence.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;uploads-ssl.webflow.com&#x2F;61f0aecf55af2565526f6a95&#x2F;61f2f5cf4c22f90890be053e_td-ad-top-second.png&quot; alt=&quot;&quot; &#x2F;&gt;
&lt;img src=&quot;https:&#x2F;&#x2F;uploads-ssl.webflow.com&#x2F;61f0aecf55af2565526f6a95&#x2F;61f2f5cf8a64e3952c0017c4_td-ad-top-third.png&quot; alt=&quot;&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;h2 id=&quot;this-is-just-the-beginning&quot;&gt;This Is Just the Beginning…&lt;&#x2F;h2&gt;
&lt;p&gt;Novelty detection based on categorical data is a major new innovation in cyber-security threat detection. It comes out of years of DARPA-funded R&amp;amp;D with some of the world’s foremost experts. As shown here, it’s a game-changer for insider threat detection. —but it doesn’t stop here! This tool can be used in a wide variety of other domains to measure in real-time how unusual is each and every piece of data. Our blog includes other examples like &lt;a href=&quot;https:&#x2F;&#x2F;www.thatdot.com&#x2F;blog&#x2F;identifying-stolen-credential-use-in-aws-cloudtrail-logs-with-high-confidence-using-categorical-anomaly-detection&#x2F;&quot;&gt;detecting stolen credential use&lt;&#x2F;a&gt;, or &lt;a href=&quot;https:&#x2F;&#x2F;www.thatdot.com&#x2F;blog&#x2F;identifying-data-exfiltration-in-aws-cloudtrail-logs-using-categorical-anomaly-detection&#x2F;&quot;&gt;data exfiltration in the cloud&lt;&#x2F;a&gt;.&lt;&#x2F;p&gt;
&lt;p&gt;While anomaly detection algorithms have been around for decades, there has never been an effective way to apply them to categorical data… until now. The first of it’s kind, thatDot’s Novelty Detector is currently being used to detect real-time cyber security threats, reduce the cost of data analysis tools (e.g. only analyze the unusual data), find fraudulent activity, detect stolen credentials, analyze log data, audit financial transactions, and much more.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;try-novelty&quot;&gt;Try Novelty&lt;&#x2F;h2&gt;
&lt;p&gt;Experience the power of &lt;a href=&quot;https:&#x2F;&#x2F;www.thatdot.com&#x2F;products&#x2F;novelty&#x2F;&quot;&gt;thatDot&#x27;s Novelty&lt;&#x2F;a&gt; and sign up for our &lt;a href=&quot;https:&#x2F;&#x2F;www.thatdot.com&#x2F;getting-started&#x2F;&quot;&gt;free trial&lt;&#x2F;a&gt; today and see the difference for yourself. Or get in touch with us via the &lt;a href=&quot;https:&#x2F;&#x2F;www.thatdot.com&#x2F;getting-started&#x2F;&quot;&gt;request a demo&lt;&#x2F;a&gt; page to talk to a solution architect.&lt;&#x2F;p&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>Quine Streaming Graph Scales to 1.1 Trillion Log Events per Month</title>
        <published>2021-09-21T00:00:00+00:00</published>
        <updated>2021-09-21T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              Unknown
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://www.thatdot.com/blog/linear-scaling-to-1-1-trillion-monthly-log-events-in-thatdots-streaming-graph/"/>
        <id>https://www.thatdot.com/blog/linear-scaling-to-1-1-trillion-monthly-log-events-in-thatdots-streaming-graph/</id>
        
        <content type="html" xml:base="https://www.thatdot.com/blog/linear-scaling-to-1-1-trillion-monthly-log-events-in-thatdots-streaming-graph/">&lt;h2 id=&quot;just-the-numbers-please&quot;&gt;Just the numbers, please&lt;&#x2F;h2&gt;
&lt;ul&gt;
&lt;li&gt;Achieved a sustained rate of recording 425,000 records per second into our streaming graph&lt;&#x2F;li&gt;
&lt;li&gt;We ran 64 &lt;em&gt;thatDot Quine Enterprise&lt;&#x2F;em&gt; hosts on AWS c5.2xlarge instances (the most important feature being the 8 CPU cores per host)&lt;&#x2F;li&gt;
&lt;li&gt;This was supported by 15 Apache Cassandra hosts on m6gd.4xlarge instances&lt;&#x2F;li&gt;
&lt;li&gt;The total costs were less than $13&#x2F;hour to run at full scale using reserved instances&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;uploads-ssl.webflow.com&#x2F;61f0aecf55af2565526f6a95&#x2F;61f2f146b8d7a2863515b987_thatDotConnect_425K-per-second.png&quot; alt=&quot;&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;h2 id=&quot;why-spend-the-time&quot;&gt;Why spend the time?&lt;&#x2F;h2&gt;
&lt;p&gt;thatDot&#x27;s enterprise version of &lt;em&gt;Quine&lt;&#x2F;em&gt; is designed to be a web-scale stream processing and data store solution. We’re always developing with an eye toward performance and wanted to demonstrate the linear scaling the system is built to deliver.&lt;&#x2F;p&gt;
&lt;p&gt;The architecture of &lt;em&gt;Quine&lt;&#x2F;em&gt; takes some interesting twists on traditional data management systems. For example, each node in a &lt;em&gt;Quine&lt;&#x2F;em&gt; datastore is capable of performing its own computation. This makes &lt;em&gt;Quine&lt;&#x2F;em&gt; hugely parallel, meaning that in practice, we can just about always make use of every part of every core on a host machine. Furthermore, we take advantage of deterministic entity resolution to ensure that regardless of how much data &lt;em&gt;Quine&lt;&#x2F;em&gt; has already ingested, each additional record takes roughly the same amount of time to process, analyze, and store.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;starting-the-scaling-effort&quot;&gt;Starting the scaling effort&lt;&#x2F;h2&gt;
&lt;p&gt;To start we set a goal of processing 250,000 records per second. If you ask me, that’s a &lt;em&gt;lot&lt;&#x2F;em&gt; of data. If you ask our lead Sales Engineer Josh, he’ll just laugh and say, “I’ve seen worse”. Regardless, it’s a nice round number that’s easy to talk about and do math with. Great for benchmarking. For our dataset, we generated a simulated log of process creation events, like you might find reported by an intrusion detection system. Each record had 9 fields of varying types.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;uploads-ssl.webflow.com&#x2F;61f0aecf55af2565526f6a95&#x2F;61f2f147f47ae73a4af618f2_ScalingRecord.png&quot; alt=&quot;&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;Our plan was to create a node in &lt;em&gt;Quine&lt;&#x2F;em&gt; for each process event, creating a property on the node for each field, and linking each process to its parent via an edge. Each record was protobuf-encoded, to keep the [de]serialization simple but nontrivial, and to reduce how large the Kafka topic’s storage would need to be.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;uploads-ssl.webflow.com&#x2F;61f0aecf55af2565526f6a95&#x2F;61f2f1460591e54e1212fc80_ConnectNode.png&quot; alt=&quot;&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;We started small, with just 4 hosts, each a c5.2xlarge. We got to a moderate, but respectable, 30,000 records per second ingested, from Kafka to Cassandra. That’s 7,500 records per second per host. Once we had played around a bit, we doubled the cluster size, up to 8 hosts. Accordingly, our ingest rate doubled to creating 56,000 nodes per second.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-big-clusters&quot;&gt;The big clusters&lt;&#x2F;h2&gt;
&lt;p&gt;With the &lt;em&gt;Quine&lt;&#x2F;em&gt; application smoothly performing on 8 hosts, we decided to quadruple the cluster size to 32 hosts, expecting to see our pattern of linear scaling continue. Unfortunately, instead of the 220,000 records per second we expected, we saw only 144,000. Strangely, the 144,000 figure wasn’t &lt;em&gt;stable&lt;&#x2F;em&gt; — the cluster constantly fluctuated in ingest rate. We were momentarily baffled, until we realized that in our excitement, we had forgotten to increase Cassandra’s size proportionally to the new &lt;em&gt;Quine&lt;&#x2F;em&gt; cluster size. The &lt;em&gt;Quine&lt;&#x2F;em&gt; graph will back-pressure when some system components, like the Cassandra data storage layer, cannot keep up. Back-pressuring keeps the overall system stable instead of overwhelming the now-underpowered data storage layer. The result was a disappointing-but-oddly-beautiful initial performance graph.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;uploads-ssl.webflow.com&#x2F;61f0aecf55af2565526f6a95&#x2F;61f2f147d48222c28f8296f0_thatDot_32nodeperf.png&quot; alt=&quot;Performance was disappointing -- under 7K&#x2F;sec per host -- before tuning Cassandra.&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;Once we realized that the Cassandra JVM was scrambling to keep up with the memory pressure of a 32-host &lt;em&gt;Quine Enterprise&lt;&#x2F;em&gt; cluster, we had no trouble increasing the Cassandra cluster to use more hosts, and bigger hosts. By kicking Cassandra up from using 9 2xlarge instances to 15 4xlarge, we had more than enough capacity to run &lt;em&gt;Quine&lt;&#x2F;em&gt; at-scale. We hit our 220,000 records per second hypothesis without any trouble. The cluster smoothly maintained its pattern of linear scaling, ingesting a more or less constant 220,000 events per second, creating nodes and edges. We were &lt;em&gt;so close&lt;&#x2F;em&gt; to hitting our goal of 250,000 records per second. We probably could have added just a few more hosts to reach the goal. After all, we had every indication that every host added to the cluster would add around 7,000 records per second to the cluster’s ingest rate.&lt;&#x2F;p&gt;
&lt;p&gt;We did not add just a few more hosts. We doubled the cluster size instead, hitting an EC2 resource limit imposed by AWS, but once clear of that hurdle it was smooth sailing and everything was working as hoped!&lt;&#x2F;p&gt;
&lt;p&gt;We kicked off the ingest on each of the 64 clustered &lt;em&gt;thatDot Quine&lt;&#x2F;em&gt; hosts, and immediately, it was clear we’d far exceeded our goal.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;uploads-ssl.webflow.com&#x2F;61f0aecf55af2565526f6a95&#x2F;61f2f14771acd8341b1d1a4b_thatDot_64nodeperf.png&quot; alt=&quot;&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;h2 id=&quot;linear-scaling&quot;&gt;Linear Scaling!&lt;&#x2F;h2&gt;
&lt;p&gt;Linear scaling to 64 hosts had been achieved. Our initial target of 250,000 records per second was exceeded, and we maintained an ingest rate of over 425,000 log events per second, each creating a node with several properties, and most creating at least one edge to connect them into a graph. That’s over 1.1 &lt;em&gt;trillion&lt;&#x2F;em&gt; events a month.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-work-continues&quot;&gt;The work continues&lt;&#x2F;h2&gt;
&lt;p&gt;Of course, we’re not done. We’re always working to explore what can be done with stream processing systems at large scales, and we’re excited to offer our products to support others in their explorations.&lt;&#x2F;p&gt;
&lt;p&gt;‍&lt;&#x2F;p&gt;
&lt;p&gt;‍&lt;&#x2F;p&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>Data Exfiltration Detection in AWS CloudTrail Logs Using Categorical Data</title>
        <published>2021-07-20T00:00:00+00:00</published>
        <updated>2021-07-20T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              Unknown
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://www.thatdot.com/blog/identifying-data-exfiltration-in-aws-cloudtrail-logs-using-categorical-anomaly-detection/"/>
        <id>https://www.thatdot.com/blog/identifying-data-exfiltration-in-aws-cloudtrail-logs-using-categorical-anomaly-detection/</id>
        
        <content type="html" xml:base="https://www.thatdot.com/blog/identifying-data-exfiltration-in-aws-cloudtrail-logs-using-categorical-anomaly-detection/">&lt;h2 id=&quot;use-categorical-data-to-detect-and-prevent-data-exfiltration-from-aws-cloudtrail-logs&quot;&gt;Use Categorical Data to Detect and Prevent Data Exfiltration from AWS CloudTrail logs&lt;&#x2F;h2&gt;
&lt;p&gt;In our previous blog, Identifying stolen credential use in AWS CloudTrail logs with high confidence using categorical anomaly detection, we discussed the power of  graph machine learning to analyze the &lt;a href=&quot;https:&#x2F;&#x2F;www.thatdot.com&#x2F;blog&#x2F;what-is-categorical-data&#x2F;&quot;&gt;categorical data&lt;&#x2F;a&gt; in AWS CloudTrail logs to identify novel or anomalous behaviors. We then compared &lt;a href=&quot;https:&#x2F;&#x2F;www.thatdot.com&#x2F;products&#x2F;novelty&#x2F;&quot;&gt;thatDot Novelty Detector&lt;&#x2F;a&gt; findings to the results of traditional statistical anomaly detection tools. This article builds on our previous analysis to further investigate the use of categorical anomaly detection to identify multi-stage exploit campaigns in AWS CloudTrail logs.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;about-aws-cloudtrail-and-data-exfiltration&quot;&gt;&lt;strong&gt;About AWS CloudTrail and Data Exfiltration&lt;&#x2F;strong&gt;&lt;&#x2F;h2&gt;
&lt;p&gt;AWS &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;docs.aws.amazon.com&#x2F;cloudtrail&#x2F;index.html&quot;&gt;CloudTrail&lt;&#x2F;a&gt; monitors calls to AWS services and delivers detailed logs, providing a complete audit of management calls, with optional inclusion of data calls. To detect attacks effectively, you will need both, but the resulting high volume of log events creates a glut of data, making it very difficult to detect a behavioral attack like data exfiltration.&lt;&#x2F;p&gt;
&lt;p&gt;Data exfiltration can be an exploit on its own, or as is increasingly seen, it can be one step in a larger ransomware campaign. Attackers recognize the obfuscating effect of high volume data movement, and use the complexity of the logs to hide their activity by moving slowly and carefully. This challenge requires an anomaly detection system capable of understanding the shared characteristics of ‘novel’ behaviors while ignoring the simpler sequential and time stamped information often relied upon by other methods.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;data-exfiltration-from-aws-an-example&quot;&gt;&lt;strong&gt;Data Exfiltration from AWS — an Example&lt;&#x2F;strong&gt;&lt;&#x2F;h2&gt;
&lt;p&gt;Monitoring data transfer activity for data theft is a true “needle in a haystack” problem. A storage layer like S3 is in a continuous state of change with data being ingested, exported, updated, duplicated, moved, and expired. With storage being a fundamental service, monitoring user or system behavior at any scale can be challenging. Rulesets designed to limit undesired activity lack the granularity required to avoid unintended interruptions in business functions, and more granular rules are incredibly difficult to manage. Administrators need a better way to highlight system and user behaviors that stand out from the norm.&lt;&#x2F;p&gt;
&lt;p&gt;In the case described below a hacker has initiated a series of steps to steal S3 data.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;preventing-exfiltration-using-graph-based-techniques&quot;&gt;&lt;strong&gt;Preventing Exfiltration using Graph-based Techniques&lt;&#x2F;strong&gt;&lt;&#x2F;h2&gt;
&lt;p&gt;If your security monitoring system is configured to monitor CloudTrail logs and uses thatDot Novelty Detector, you can detect data theft quickly. Novelty Detector generates an observation for each CloudTrail event. Each observation receives a novelty score indicating how relevant the observation is and how much it warrants your attention. High novelty doesn’t immediately trigger a security event. However, the system will identify what makes the observation novel, providing the unique insight necessary to speed manual or automated categorization.&lt;&#x2F;p&gt;
&lt;p&gt;As observations stream into thatDot Novelty Detector, it generates real time plots showing it learning both system and user behavior and learning what is ‘normal’ for the environment. The plots below provide a graphical representation of the observed behaviors with scores for normal behaviors gradually dropping over time to form a down-and-to-the-right curve.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;uploads-ssl.webflow.com&#x2F;61f0aecf55af2565526f6a95&#x2F;61f2f0428577ff46750d54d8_thatDot_AnomalyDetector_Baseline.png&quot; alt=&quot;thatDot Anomaly Detector Baseline&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;Observations that stand out from the norm receive high novelty scores and appear higher on the chart. Looking at a plot of the most recent data, we see Novelty Detector has detected a series of observations with particularly high scores.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;uploads-ssl.webflow.com&#x2F;61f0aecf55af2565526f6a95&#x2F;61f2f04216dfc848ad2756d4_thatDot_AnomalyDetector_DataExfil.png&quot; alt=&quot;thatDot Anomaly Detector Data Exfiltration Detection&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;The high scoring observations are for a particular user, &lt;em&gt;raul,&lt;&#x2F;em&gt; who initiated a series of API calls to the AWS S3 service:&lt;&#x2F;p&gt;
&lt;ol&gt;
&lt;li&gt;GetCallerIdentity – To validate S3 access for the credentials&lt;&#x2F;li&gt;
&lt;li&gt;CreateSnapShot – To create a data snapshot of the target data using Account ID and Volume ID&lt;&#x2F;li&gt;
&lt;li&gt;ModifySnapshotAttributes – To modify the snapshot permissions to allow export of data&lt;&#x2F;li&gt;
&lt;li&gt;At this point the data snapshot is then “pulled” from an external account&lt;&#x2F;li&gt;
&lt;li&gt;DeleteSnapshot – To delete the snapshot and cover their tracks&lt;&#x2F;li&gt;
&lt;&#x2F;ol&gt;
&lt;p&gt;These activities present as novel because the user, raul, does not typically execute these API calls. Novelty Detector not only identifies the activity as novel, but provides the critical insight that &lt;em&gt;raul’s execution of these actions is the most novel element.&lt;&#x2F;em&gt;&lt;&#x2F;p&gt;
&lt;p&gt;Novelty Detector retains the CloudTrail event ID and event time, enabling analysts to navigate to the actual CloudTrail events to investigate the details. Most importantly, it forwards the observation and score to a security monitoring system for action.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;uploads-ssl.webflow.com&#x2F;61f0aecf55af2565526f6a95&#x2F;61f2f0425505b0fbec5c04da_thatDot_AnomalyDetector_MultiStageIndicatorDetection.png&quot; alt=&quot;Identifying Data Exfiltration in AWS Cloudtrail logs&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;One of the most compelling aspects of Novelty Detector’s identification of raul’s data exfiltration is the multi-stage view of all of the activities associated with this exploit. In our first blog, we showed how thatDot Novelty Detector was able to identify indicators of stolen credential use. When combined with our data exfiltration results, we are able to see the entire sequence of events: Stage 1, initial credential validity testing and broader probing of associated permissions, and Stage 2, the series of calls that indicate data exfiltration. Identifying multi-stage campaigns in this way provides the audit trail needed to trace the entire intrusion and thus understand the related consequences.&lt;&#x2F;p&gt;
&lt;p&gt;When we see this pattern and the multiple novel events generated by &lt;em&gt;raul,&lt;&#x2F;em&gt; it’s obvious that &lt;em&gt;raul&lt;&#x2F;em&gt;’s credentials are being used in multiple, highly unusual, operations. In response, for this one user, we can immediately force a logout and password reset while also immediately identifying impacted data. This remediation limits the disruption only to the impacted user and provides our analysts a comprehensive view of the breach impact.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;summary&quot;&gt;&lt;strong&gt;Summary&lt;&#x2F;strong&gt;&lt;&#x2F;h2&gt;
&lt;p&gt;thatDot’s categorical analysis delivers the capability to generate high value, high confidence alerts, so analysts can quickly find &lt;em&gt;true&lt;&#x2F;em&gt; anomalies, judge them for maliciousness, easily trace the comprehensive impact of an exploit,  and act accordingly. By looking beyond just numbers, thatDot filters out the noise, and finds anomalies in the richer set of &lt;em&gt;categorical&lt;&#x2F;em&gt; data including values like usernames, hostnames, file paths, URLs, process names and more. These benefits mean that a security analyst will focus on high-value, high likelihood, alerts, leading to a 10x increase in productivity. Good security analysts are hard to find, and their limited time and the burnout common in the industry make it crucial to use their time wisely.&lt;&#x2F;p&gt;
&lt;p&gt;If you’d like to see this in action, or learn more from the team that is pioneering categorical data analysis at thatDot.&lt;&#x2F;p&gt;
&lt;p&gt;‍&lt;&#x2F;p&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>The Known Security Challenge of the Unknown</title>
        <published>2021-07-07T00:00:00+00:00</published>
        <updated>2021-07-07T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              Unknown
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://www.thatdot.com/blog/the-known-security-challenge-of-the-unknown/"/>
        <id>https://www.thatdot.com/blog/the-known-security-challenge-of-the-unknown/</id>
        
        <content type="html" xml:base="https://www.thatdot.com/blog/the-known-security-challenge-of-the-unknown/">&lt;h2 id=&quot;lacking-categorical-data-enterprises-are-vulnerable-to-cyber-attacks&quot;&gt;&lt;strong&gt;Lacking Categorical Data, Enterprises are Vulnerable to Cyber Attacks&lt;&#x2F;strong&gt;&lt;&#x2F;h2&gt;
&lt;p&gt;Destructive attack campaigns like WannaCry, NotPetya, or even the Mirai DDoS family, succeed because they integrate new techniques or new hardcoded credentials to access and victimize their targets. Once used, though, they rapidly lose much of their sting as they are understood and mitigated. Organizations patch exploitable vulnerabilities, change credentials, and block known-hostile traffic and executables with application firewalls and gateways.  Traditional security is most effective when called upon to identify and block what it can understand.&lt;&#x2F;p&gt;
&lt;p&gt;Unfortunately, profit-motive and system complexity breed a seemingly endless stream of new threats.  Some are simply trivially reconstituted versions of older attacks, like malware, and some are ingeniously constructed, like multi-component credential theft and ransomware campaigns. There will always be latency between the arrival of a new threat and the corresponding protection that will recognize and disrupt it.&lt;&#x2F;p&gt;
&lt;p&gt;To address this gap, security practitioners and vendors have long attempted to identify new threats by first learning what good traffic or artifacts look like, then identifying what’s new and applying some type of logic to decide if that new event or artifact is good or bad.  This has been done with machine learning-based analysis of executable objects to identify malware and through network-based anomaly detection to find hostile behavioral patterns. Both approaches have struggled with a poor signal-to-noise ratio, requiring additional effort and expertise to distill the real threats from the vagaries of a dynamic environment.&lt;&#x2F;p&gt;
&lt;p&gt;These approaches are often used in tandem to minimize the likelihood of false negatives. Resulting security events can be ascribed to one of two detection techniques: Signature Detection or Anomaly Detection&lt;&#x2F;p&gt;
&lt;h4 id=&quot;signature-detection&quot;&gt;&lt;strong&gt;Signature Detection&lt;&#x2F;strong&gt;&lt;&#x2F;h4&gt;
&lt;p&gt;When a pattern is available in an object or in a series of activities, signatures can be used to describe and then identify what is known. This creates a low number of false positives, but it puts the user on an unending treadmill of effort to research, identify, and upgrade new protection with near continuous updates.&lt;&#x2F;p&gt;
&lt;h4 id=&quot;anomaly-detection&quot;&gt;&lt;strong&gt;Anomaly Detection&lt;&#x2F;strong&gt;&lt;&#x2F;h4&gt;
&lt;p&gt;For anomaly detection to work, a baseline needs to be established or learned. After this, new behaviors, connections, users, or services, will be surfaced as concerns for resolution by security analysts. This will create a low number of false negatives but will generate unmanageable quantities of false positives in any dynamic or user-facing environment. Anomalous events happen all the time, making them poor providers of conclusive data.&lt;&#x2F;p&gt;
&lt;p&gt;So, in short, signatures are too specific and time-delayed while anomalies are too general and time-consuming.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;enter-the-impact-of-novelty-detector&quot;&gt;&lt;strong&gt;Enter the Impact of Novelty Detector&lt;&#x2F;strong&gt;&lt;&#x2F;h2&gt;
&lt;p&gt;When discovering and describing anomalous events, much of the event context is commonplace. It is the combination of elements that make an event an anomaly. The characteristic of an element that creates an anomalous event is that element’s &lt;strong&gt;novelty&lt;&#x2F;strong&gt;. This more granular attribute of an event can be analyzed to disambiguate the troubling from the simply unusual.  As an example, think of an anomaly that is created by an unexpected file access. That operation will be characterized by multiple elements, including the user, user IP address, target file, target file system, time of day, network, geography, filetype, user role, and others which together form a behavior context. If the access is anomalous because it is a first-time file access (&lt;em&gt;&lt;strong&gt;the filename is novel&lt;&#x2F;strong&gt;&lt;&#x2F;em&gt;), that class of detection will generate a flood of false positives because the operation is new, but the context is quite common.&lt;&#x2F;p&gt;
&lt;p&gt;In contrast, consider a case where the characteristic that makes the event anomalous is &lt;strong&gt;the user&lt;&#x2F;strong&gt;. If multiple events are flagged and a user’s ID is the novel characteristic, this is more concerning. The context of the detection is a pattern of file accesses made unusual because the user doesn’t ordinarily access those files. Add to this a novel geography, time of day, or user&#x2F;role combination, and that novelty suddenly transforms a first-time access into a likely security event.&lt;&#x2F;p&gt;
&lt;blockquote&gt;
&lt;p&gt;Novelty data defines a path to evolving anomaly analysis from high volume, low confidence detections to low volume, high confidence events&lt;&#x2F;p&gt;
&lt;&#x2F;blockquote&gt;
&lt;p&gt;Collecting and training on novelty data defines a path to evolving anomaly analysis from high volume, low confidence detections to low volume, high confidence events. It represents a deeper level of modeling that simulates the type of second-order investigation and clarification typically performed by analysts. The savings in time, alert fatigue, and missed events, make novelty analysis a foundational improvement for threat and active attack detection.&lt;&#x2F;p&gt;
&lt;p&gt;To learn more about novelty and its revolutionary impact on threat detection visit &lt;a href=&quot;https:&#x2F;&#x2F;www.thatdot.com&#x2F;products&#x2F;novelty&#x2F;&quot;&gt;the Novelty Detector&lt;&#x2F;a&gt; overview page.&lt;&#x2F;p&gt;
&lt;p&gt;‍&lt;&#x2F;p&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>Find Stolen Credentials Use in AWS CloudTrail Logs using Quine Graph</title>
        <published>2021-04-24T00:00:00+00:00</published>
        <updated>2021-04-24T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              Unknown
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://www.thatdot.com/blog/identifying-stolen-credential-use-in-aws-cloudtrail-logs-with-high-confidence-using-categorical-anomaly-detection/"/>
        <id>https://www.thatdot.com/blog/identifying-stolen-credential-use-in-aws-cloudtrail-logs-with-high-confidence-using-categorical-anomaly-detection/</id>
        
        <content type="html" xml:base="https://www.thatdot.com/blog/identifying-stolen-credential-use-in-aws-cloudtrail-logs-with-high-confidence-using-categorical-anomaly-detection/">&lt;h2 id=&quot;using-categorical-data-to-secure-cloud-infrastructure&quot;&gt;&lt;strong&gt;Using Categorical Data to Secure Cloud Infrastructure&lt;&#x2F;strong&gt;&lt;&#x2F;h2&gt;
&lt;p&gt;&lt;em&gt;(Note: This is part one of a two part series. If you find this blog interesting, make sure and check out&lt;&#x2F;em&gt; &lt;a href=&quot;https:&#x2F;&#x2F;www.thatdot.com&#x2F;blog&#x2F;identifying-data-exfiltration-in-aws-cloudtrail-logs-using-categorical-anomaly-detection&#x2F;&quot;&gt;&lt;em&gt;Stop Data Exfiltration in AWS CloudTrail Logs With Categorical Data&lt;&#x2F;em&gt;&lt;&#x2F;a&gt;&lt;em&gt;.)&lt;&#x2F;em&gt;&lt;&#x2F;p&gt;
&lt;p&gt;The move to the cloud represents new challenges for enterprise security teams. Systems are more distributed and the impact of credential theft is greater than ever. Running your services in a public cloud vendor like AWS requires you to monitor and detect attacks in real-time, but how do you do that without drowning in the noise? Existing tools can highlight statistical anomalies, but are limited to counts and thresholds and have been shown to produce an unacceptable rate of false positives. Novelty Detector processes &lt;a href=&quot;https:&#x2F;&#x2F;www.thatdot.com&#x2F;blog&#x2F;what-is-categorical-data&#x2F;&quot;&gt;categorical data&lt;&#x2F;a&gt; to uncover real threats and reduce false positives.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;about-aws-cloudtrail&quot;&gt;&lt;strong&gt;About AWS CloudTrail&lt;&#x2F;strong&gt;&lt;&#x2F;h2&gt;
&lt;p&gt;AWS &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;docs.aws.amazon.com&#x2F;cloudtrail&#x2F;index.html&quot;&gt;CloudTrail&lt;&#x2F;a&gt; monitors calls to AWS services and delivers detailed logs, providing a complete audit of management calls, with optional inclusion of data calls. To detect attacks effectively, you will need both, but the resulting high volume of log events creates a glut of data, making it very difficult to detect a behavioral attack like those leveraging stolen credentials. Stolen credentials provide access to sensitive resources, and an attack will tailor its activities to make those actions look like normal usage. In addition, attackers recognize the obfuscating effect of high traffic volume, and use the complexity of the logs to hide their activity by moving slowly and carefully. This challenge requires that anomaly detection be capable of understanding the shared characteristics of novel behaviors and to ignore the simpler sequential and time stamped information that is often relied upon by other methods.&lt;&#x2F;p&gt;
&lt;p&gt;In the example below, we have configured CloudTrail to monitor management, data, and Insights events. We have configured thatDot Novelty Detector to read the “trail” of CloudTrail events as they are written to s3.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;stolen-credential-use-an-example&quot;&gt;&lt;strong&gt;Stolen Credential Use—an Example&lt;&#x2F;strong&gt;&lt;&#x2F;h2&gt;
&lt;p&gt;Let’s say through a dark web scan, you’ve become aware that many of your staff’s credentials may have been compromised. It might be through a company partner, a service provider, or the breach of a service popular in your industry. To address these newly exposed credentials, a common brute force remediation is to force a logout and password reset for every employee. This is obviously extremely disruptive and can create unintended interruptions and delays in business functions. By doing this, you have only changed the passwords, and have not detected a source, purpose, or target of a credential theft attack. A motivated adversary may have already installed keylogging or browser-subverting capabilities that will capture any new password and be able to gather other authenticating credentials. If this has happened, in spite of your efforts, you will still have no idea whether an attack is coming.&lt;&#x2F;p&gt;
&lt;p&gt;In the case described below, the compromise is real and a hacker has begun to probe the AWS account. She finds that she can successfully scan S3 buckets and creates a script to try other high-value services such as Service Discovery and ELB and runs the script later that evening.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;enter-thatdot&quot;&gt;&lt;strong&gt;Enter: thatDot&lt;&#x2F;strong&gt;&lt;&#x2F;h2&gt;
&lt;p&gt;If your security monitoring system is configured to monitor CloudTrail logs and uses &lt;a href=&quot;https:&#x2F;&#x2F;www.thatdot.com&#x2F;products&#x2F;novelty&#x2F;&quot;&gt;thatDot Novelty Detector&lt;&#x2F;a&gt;, you can detect the attack quickly. Novelty Detector generates an observation for each CloudTrail event, and these observations have a novelty score that indicates how relevant it is and how much it warrants your attention. Novelty doesn’t immediately trigger a security event, but the system will identify what makes the observation novel, and this provides the unique insight to speed manual or automated categorization. At this point we can show the real-time plots in Novelty Detector to show it learning both system and user behavior and then seeing everything as normal in the form of a down-and-to-the-right curve.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;uploads-ssl.webflow.com&#x2F;61f0aecf55af2565526f6a95&#x2F;61f2d10aaba2715891bcfcec_Anomaly_Detection-1.png&quot; alt=&quot;thatDot Anomaly Detection Observations&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;Looking at the data below, which shows the most recent observations, we see that Novelty Detector detected an observation with a particularly high score. User &lt;em&gt;raul&lt;&#x2F;em&gt; accessed three different AWS resources for the first time, and then three other AWS resources a few minutes later. The activities are novel, but the critical insight is that &lt;em&gt;raul’s execution of these actions is the element that is most novel.&lt;&#x2F;em&gt; Novelty Detector retains the CloudTrail event ID and event time, enabling us to navigate to the actual CloudTrail events and investigate the details. Most importantly, it forwards the observation and score to our security monitoring system for action.&lt;&#x2F;p&gt;
&lt;p&gt;‍&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;uploads-ssl.webflow.com&#x2F;61f0aecf55af2565526f6a95&#x2F;61f2d10a80a92c529307da65_thatDot_Anomaly_Detector_Observation1.png&quot; alt=&quot;thatDot Anomaly Detector Observation&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;Seeing this pattern and the multiple novel events generated by &lt;em&gt;raul,&lt;&#x2F;em&gt; it’s obvious that &lt;em&gt;raul&lt;&#x2F;em&gt;’s credentials are being used in multiple, highly unusual, operations. In response, for this one user, we can immediately force a logout and password reset. This remediation limits the disruption only to the impacted user. Because we’ve identified the attacked services, we can follow up by reviewing the services’ access logs for other signs of suspicious activity.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;how-is-thatdot-different&quot;&gt;&lt;strong&gt;How is thatDot Different?&lt;&#x2F;strong&gt;&lt;&#x2F;h2&gt;
&lt;p&gt;Let’s compare this to watching for credential misuse through AWS &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;docs.aws.amazon.com&#x2F;awscloudtrail&#x2F;latest&#x2F;userguide&#x2F;cloudtrail-concepts.html#cloudtrail-concepts-insights-events&quot;&gt;CloudTrail Insights&lt;&#x2F;a&gt;, which many use to find anomalies in CloudTrail events. CloudTrail Insights is based on traditional anomaly detection techniques, which, in this case, watches for a simple variance in the number of API calls. This method does not highlight the attack, as their volume of events is too low. It did, however, highlight a number of other events that are clearly false positives.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;uploads-ssl.webflow.com&#x2F;61f0aecf55af2565526f6a95&#x2F;61f2d10b04b071790524120a_CloudTrail_Insights_Management_Console.png&quot; alt=&quot;thatDot CloudTrail Insights Management Console&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;As an example, one of the CloudTrail Insights events shows that API calls to create a network interface on EC2 increased over a period of time. But this event is neither new nor novel. Every EC2 instance allows the creation of up to 15 network interfaces, and each of these can have one or more separate IP addresses. This is necessary for deploying services with multiple SSL certificates, and for many other purposes.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;uploads-ssl.webflow.com&#x2F;61f0aecf55af2565526f6a95&#x2F;61f2d10a1781d780a265221b_CloudTrail_Insights_Event_Detail.png&quot; alt=&quot;thatDot CloudTrail Insights Event Detail&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;These types of false positives create two problems for security analysts. First, the additional volume of data forces extra work onto the analysts to recategorize and eliminate the false positives. Secondly, real events that may be detected are buried within the stream of mixed information, creating the problem of alert fatigue and missed events that define overworked security analysts. Using thatDot Novelty Detector creates a limited and high confidence set of events, allowing analysts to identify, work, and resolve the most relevant events. To find what’s truly novel you will often need all of the context of an event. In this demonstration case, we’ve configured thatDot Novelty Detector to focus on each users’ use of different operations on all services at various times of day. This context is also crucial for skills-based routing—getting the incident information to the right team who can follow up with timely verification and remediation.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;alert-on-what-matters&quot;&gt;&lt;strong&gt;Alert on What Matters&lt;&#x2F;strong&gt;&lt;&#x2F;h2&gt;
&lt;p&gt;Normally, you don’t watch Novelty Detector in real time. Your security monitoring system has integrations that alert you to the anomalies that you care about. In this example, we may set up a PagerDuty integration, with a score threshold to &amp;gt;0.99, so we were only notified for this urgent anomaly. For Slack, we set the threshold to &amp;gt;0.95, so we received six other observations of interest that we can tackle as our time permits.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;summary&quot;&gt;&lt;strong&gt;Summary&lt;&#x2F;strong&gt;&lt;&#x2F;h2&gt;
&lt;p&gt;These benefits mean that a security analyst will focus on high-value, high likelihood, alerts, leading to a 10x increase in productivity. Good security analysts are hard to find, and their limited time and burnout common in the industry make it crucial to use their time wisely.&lt;&#x2F;p&gt;
&lt;p&gt;thatDot’s categorical analysis delivers the capability to generate these high value, high confidence alerts, so that analysts can quickly find &lt;em&gt;true&lt;&#x2F;em&gt; anomalies, judge them for maliciousness, and filter out the noise. thatDot finds anomalies in a richer set of data including categorical values like usernames, hostnames, file paths, URLs, process names and more; not just numbers. Better tools for examining more kinds of data finds more true anomalies and filters out results that are unsurprising numeric outliers. In CloudTrail, other anomaly detection solutions miss the fact that &lt;em&gt;raul&lt;&#x2F;em&gt;’s account was used to try to access several new services because the context, the link between operation requests and a single user, went unexamined. This contextual awareness is the categorical difference.&lt;&#x2F;p&gt;
&lt;p&gt;If you’d like to see this in action, or learn more from the team that is pioneering categorical analysis at thatDot.&lt;&#x2F;p&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>What Is Categorical Data? Comparing it to Numerical Data for Analytics</title>
        <published>2021-04-05T00:00:00+00:00</published>
        <updated>2021-04-05T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              Unknown
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://www.thatdot.com/blog/what-is-categorical-data/"/>
        <id>https://www.thatdot.com/blog/what-is-categorical-data/</id>
        
        <content type="html" xml:base="https://www.thatdot.com/blog/what-is-categorical-data/">&lt;h2 id=&quot;two-kinds-of-data-categorical-and-numerical&quot;&gt;&lt;strong&gt;Two kinds of Data: Categorical and Numerical&lt;&#x2F;strong&gt;&lt;&#x2F;h2&gt;
&lt;p&gt;Data comes in two flavors: Numeric and Categorical. Numeric data is easy, it’s numbers. Categorical data is everything else.&lt;&#x2F;p&gt;
&lt;p&gt;As the name suggests, categorical data is information that comes in categories—which means each instance of it is distinct from the others.&lt;&#x2F;p&gt;
&lt;p&gt;Names are an example of categorical data, and my name is distinct from your name. On the unlikely chance that your name is the same as mine, I’m sure our government-issued ID numbers, phone numbers, and email addresses are distinct—which are also categorical data.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;examples-of-numeric-and-categorical-data&quot;&gt;&lt;strong&gt;Examples of Numeric and Categorical Data&lt;&#x2F;strong&gt;&lt;&#x2F;h2&gt;
&lt;table&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;&lt;&#x2F;th&gt;&lt;th&gt;&lt;&#x2F;th&gt;&lt;&#x2F;tr&gt;&lt;&#x2F;thead&gt;&lt;tbody&gt;
&lt;tr&gt;&lt;td&gt;Numeric&lt;&#x2F;td&gt;&lt;td&gt;Categorical&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;Rate: 27 events&#x2F;second&lt;&#x2F;td&gt;&lt;td&gt;Name: Mary Shelley&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;Score: 0.91237&lt;&#x2F;td&gt;&lt;td&gt;IP Address: 192.168.1.100&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;Clicks (counts): 2,743&lt;&#x2F;td&gt;&lt;td&gt;File path: C:\Windows\System32\notepad.exe&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;Money: $19.79&lt;&#x2F;td&gt;&lt;td&gt;Sentiment: cautiously optimistic&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;Temperature: 72° F&lt;&#x2F;td&gt;&lt;td&gt;Address: 10 Downing Street&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;Age: 27 years old&lt;&#x2F;td&gt;&lt;td&gt;Zip code: 97214&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;Weight: 165 lbs.&lt;&#x2F;td&gt;&lt;td&gt;Email: &lt;strong&gt;info@thatDot.com&lt;&#x2F;strong&gt;&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;Distance: 127 miles&lt;&#x2F;td&gt;&lt;td&gt;Flavor: Umami&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;Location: 45.5209, -122.6778&lt;&#x2F;td&gt;&lt;td&gt;Location: 421 SW 6th Avenue, Portland, OR&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;Color: #1f4c7c&lt;&#x2F;td&gt;&lt;td&gt;Color: blue&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;Angle: 91°&lt;&#x2F;td&gt;&lt;td&gt;Angle: obtuse&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;Weather: 60% chance of rain&lt;&#x2F;td&gt;&lt;td&gt;Weather: Partly Cloudy&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;Time: 1617212687 (Unix time)&lt;&#x2F;td&gt;&lt;td&gt;Time: Wednesday at 10:44 am&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;&#x2F;tbody&gt;&lt;&#x2F;table&gt;
&lt;p&gt;As you can see from the examples at the bottom of this table, some kinds of data can be represented both as numeric data and categorical data. The type of information conveyed is different in each case, but this illustrates that there is often a reasonable relationship or translation between the two.&lt;&#x2F;p&gt;
&lt;p&gt;In fact, many times when a person is trying to use numeric data, they implicitly convert it into categorical data—at least mentally. If I offered to put either $5,000,791 or $5,000,792 into your bank account, you probably wouldn’t spend much time arguing about which deposit should be made.&lt;&#x2F;p&gt;
&lt;p&gt;The amounts are not categorically different. Your brain still says “$5 million.” Their difference doesn’t matter as much as the fact that they are just very big compared to the small $5 you might occasionally find on the sidewalk.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;lh7-us.googleusercontent.com&#x2F;docsz&#x2F;AD_4nXd7ZGuLu7Q4pzQaxCgAmPGocmFFJizZ4qaA4udBJigexCBc7TjjtFF5abeLlVcEJovP8OPpycOcTVW7bbCoqTdttE8auuxN7feTomeibXWEE9m_1pnQUoC5nyOf62TK6UailMQ4d_CmSUcAcHwIXVhb3x1Q?key=odSlTPiJq2XAaQZ5MATNQw&quot; alt=&quot;this dot is not that dot&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;Categorical data is often directly interpretable by humans—and often more of a challenge to interpret with computers. While numeric data is produced by measuring—and you can usually divide them (at least conceptually) into smaller parts as much as you want (remember the plan in &lt;strong&gt;&lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;www.imdb.com&#x2F;title&#x2F;tt0151804&#x2F;&quot;&gt;Office Space&lt;&#x2F;a&gt;&lt;&#x2F;strong&gt; to collect fractions of a penny?)—categorical data is counted or referenced, not measured.&lt;&#x2F;p&gt;
&lt;p&gt;It is often something you can point at or refer to linguistically. Each dot in a plot has a numeric position, but even if two dots have the same position the dots themselves are distinct because “this dot” is not “that dot.” So in short, we might just say that: categorical data is what numeric data is about.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;what-can-you-do-with-categorical-data&quot;&gt;&lt;strong&gt;What can you do with categorical data?&lt;&#x2F;strong&gt;&lt;&#x2F;h2&gt;
&lt;h3 id=&quot;ignore-it&quot;&gt;&lt;strong&gt;Ignore It&lt;&#x2F;strong&gt;&lt;&#x2F;h3&gt;
&lt;p&gt;Unfortunately, the most common strategy is just to ignore the categorical data. Log data often holds a wealth of information about categorical values, but because of its volume and lack of tooling, most of that data sits unused in log archives on the vague hope that, if a human is ever forced to look at this data by some future algorithm, the human will be able to read the categorical information and understand it directly.&lt;&#x2F;p&gt;
&lt;p&gt;While the numerical data is processed by common analysis tools, the categorical data is ignored in the hope that numeric data happens to contain the answers that will be needed in the future.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;count-it&quot;&gt;&lt;strong&gt;Count It&lt;&#x2F;strong&gt;&lt;&#x2F;h3&gt;
&lt;p&gt;If you don’t ignore categorical data, then by far the most common thing to do with it is to count the values. Entire tech stacks—and even entire companies—have been built around counting how many times each categorical value is seen.&lt;&#x2F;p&gt;
&lt;p&gt;It is often very useful to know how frequent some values are. Rare values can be insightful. Common values can help you understand your data better.&lt;&#x2F;p&gt;
&lt;p&gt;The word-count problem has become the de facto “hello, world” style example when getting started with stream processing tools like in this example from &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;spark.apache.org&#x2F;examples.html&quot;&gt;&lt;strong&gt;Apache Spark&lt;&#x2F;strong&gt;&lt;&#x2F;a&gt;.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;turn-it-into-numbers&quot;&gt;&lt;strong&gt;Turn it into Numbers&lt;&#x2F;strong&gt;&lt;&#x2F;h3&gt;
&lt;p&gt;If you try to do something more sophisticated than simple counting, then the next most common approach is to use one of a handful of techniques to try to represent the categorical data as numeric data. While counting is often the domain of data engineers (and much harder and &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;HyperLogLog&quot;&gt;&lt;strong&gt;more interesting than it looks&lt;&#x2F;strong&gt;&lt;&#x2F;a&gt;), data scientists usually try to reach further; &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;One-hot&quot;&gt;&lt;strong&gt;one-hot encoding&lt;&#x2F;strong&gt;&lt;&#x2F;a&gt; is the most common technique.&lt;&#x2F;p&gt;
&lt;p&gt;More complex approaches try to “embed” the categorical data into a high-dimensional vector space. If successfully trained, this process will put similar values close to each other, and dissimilar values farther away.&lt;&#x2F;p&gt;
&lt;p&gt;Embedding techniques can accomplish almost miraculous results in some specialized contexts (&lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Word2vec&quot;&gt;&lt;strong&gt;Word2Vec&lt;&#x2F;strong&gt;&lt;&#x2F;a&gt; is still an astounding result, eight years later!), but these techniques require huge amounts of data, expertly trained, in a batch process ahead of time, so they cannot be used on data previously unseen.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;connect-it&quot;&gt;&lt;strong&gt;Connect It&lt;&#x2F;strong&gt;&lt;&#x2F;h3&gt;
&lt;p&gt;The hidden value of categorical data lies in its potential relationship to other values. Sophisticated embedding techniques can approximate these relationships, but a more natural approach is to represent the relationships directly.&lt;&#x2F;p&gt;
&lt;p&gt;The last 10 years has seen emerging graph technologies that do exactly that. A graph is built of nodes and edges, you can picture this with circles for nodes and arrows for edges.&lt;&#x2F;p&gt;
&lt;p&gt;The Node—Edge—Node pattern connects two categorical values (as nodes) by a relationship represented by the edge.&lt;&#x2F;p&gt;
&lt;p&gt;This is a very natural way to represent data because that Node—Edge—Node pattern corresponds perfectly to the Subject—Predicate—Object pattern at the core of natural language.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;lh7-us.googleusercontent.com&#x2F;docsz&#x2F;AD_4nXcG91sFBNdawE8YkzYx7wGp55D33_bHLxzJea2Tcxt-xkUSmVk3rSsV84BveHMkwsnfBgDAWAB_0wPgt1z6tJetrHdDgLECvJls7__Yg421eqxTcFkEzm1DUCq9_o7pSTssTmLkDSotZMaIDTXKX4v4GfSS?key=odSlTPiJq2XAaQZ5MATNQw&quot; alt=&quot;Streaming Graph Screenshot  thatDot&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;With categorical values represented as nodes in a graph, a wealth of information can be represented or discovered by analyzing the structure of that graph.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Knowledge_graph&quot;&gt;&lt;strong&gt;Knowledge Graphs&lt;&#x2F;strong&gt;&lt;&#x2F;a&gt; can concisely represent the domain expertise of large groups, and can lead to new discoveries simply by connecting what we already know.&lt;&#x2F;p&gt;
&lt;p&gt;A connected graph represents the ideal data representation for flexible&#x2F;schema-less data structures which can also be computed on easily.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;why-is-it-hard-to-use-categorical-data&quot;&gt;&lt;strong&gt;Why is it hard to use categorical data?&lt;&#x2F;strong&gt;&lt;&#x2F;h2&gt;
&lt;h3 id=&quot;bias-toward-numeric-analysis&quot;&gt;&lt;strong&gt;Bias Toward Numeric Analysis&lt;&#x2F;strong&gt;&lt;&#x2F;h3&gt;
&lt;p&gt;Analysis tools focus almost exclusively on numeric data. Relationships among categorical values can be profoundly useful, but they are difficult to quantify.&lt;&#x2F;p&gt;
&lt;p&gt;Since they are hard to quantify, it’s hard to show that one analysis is obviously better than another. Graph Theory is a powerful discipline in mathematics focused on exactly that issue, but it is usually only taught at the graduate level and to very few practitioners overall.&lt;&#x2F;p&gt;
&lt;p&gt;As a result, most data analysts and data scientists limit their work to the quantitative tools they studied in statistics. The industry as a whole is very limited in the tools available for working with categorical data.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;high-cardinality&quot;&gt;&lt;strong&gt;High Cardinality&lt;&#x2F;strong&gt;&lt;&#x2F;h3&gt;
&lt;p&gt;“Cardinality” refers to the number of possible values that might occur for a particular category. The cardinality of states in the U.S.A. is fifty.&lt;&#x2F;p&gt;
&lt;p&gt;A value with high cardinality is one where an inconveniently large number of different values show up in the data—like all possible street addresses in the U.S.A.&lt;&#x2F;p&gt;
&lt;p&gt;High cardinality becomes a challenge for some of the strategies mentioned above—like counting—because high cardinality requires maintaining a very large number of counters.&lt;&#x2F;p&gt;
&lt;p&gt;When you are interested in the relationship between multiple values with high cardinality, that usually means maintaining separate counters for every possible combination of values. The size and complexity of this approach spirals out of control very quickly with high cardinality data!&lt;&#x2F;p&gt;
&lt;p&gt;A related challenge for working with high-cardinality categorical data is when you don’t know all possible values ahead of time.&lt;&#x2F;p&gt;
&lt;p&gt;You might call this a problem of “increasing cardinality.” Almost all of the tools for turning categorical values into numbers (like one-hot encoding and embedding techniques) require a fixed set of possible values, known in advance. These tools are not able to represent data they have never seen.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;how-can-we-better-use-categorical-data&quot;&gt;&lt;strong&gt;How can we better use categorical data?&lt;&#x2F;strong&gt;&lt;&#x2F;h2&gt;
&lt;p&gt;Depending on the organization, the first step is to stop ignoring categorical data and start making use of it. More than half of the data collected by enterprise companies is never used! It’s collected, stored, and paid for… but never used.&lt;&#x2F;p&gt;
&lt;p&gt;Most of that unused data is categorical and can contain critical information to solve otherwise intractable problems. The tendency to ignore categorical data and instead use numeric data simply because current tools are built for numbers leaves many problems unsolved.&lt;&#x2F;p&gt;
&lt;p&gt;The industry behaves like the drunk person looking for their keys under a lamppost because that’s where the light is.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;our-tools-need-to-evolve&quot;&gt;&lt;strong&gt;Our Tools Need to Evolve&lt;&#x2F;strong&gt;&lt;&#x2F;h3&gt;
&lt;p&gt;The development of graph databases over the last 10 years has been a major step forward in making use of categorical data. Tools like &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;neo4j.com&#x2F;&quot;&gt;&lt;strong&gt;Neo4j&lt;&#x2F;strong&gt;&lt;&#x2F;a&gt; have blazed this trail and proven the value of the graph model for getting to value in small categorical datasets.&lt;&#x2F;p&gt;
&lt;p&gt;As the world moves inexorably toward high-volume and real-time stream processing, this powerful graph data model needs to be supported by the next generation of high-volume stream processing tools.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;kafka.apache.org&#x2F;&quot;&gt;&lt;strong&gt;Apache Kafka&lt;&#x2F;strong&gt;&lt;&#x2F;a&gt; is an incredibly powerful tool for delivering event streams, and tools like &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;quine.io&#x2F;&quot;&gt;&lt;strong&gt;Quine&lt;&#x2F;strong&gt;&lt;&#x2F;a&gt; are being used to join those event streams together and process them through a streaming graph engine to produce a more intelligent real-time stream as output.&lt;&#x2F;p&gt;
&lt;p&gt;Categorical data is also proving to be the long-elusive key to improving anomaly detection in challenging domains like cybersecurity. Our company has developed a &lt;a href=&quot;https:&#x2F;&#x2F;www.thatdot.com&#x2F;products&#x2F;novelty&#x2F;&quot;&gt;&lt;strong&gt;streaming novelty detector for categorical data&lt;&#x2F;strong&gt;&lt;&#x2F;a&gt; which is able to produce real-time novelty scores, assessments, and explanations through behavioral fingerprinting.&lt;&#x2F;p&gt;
&lt;p&gt;This system has been shown to accurately assess the novelty of categorical data in cybersecurity event streams and reduce false positives by 99%.&lt;&#x2F;p&gt;
&lt;p&gt;The next generation of streaming data tools are making categorical data more accessible and usable. This long-neglected and underused class of data is already being collected by virtually all enterprise companies. It’s only now that the tools for using this data to solve challenging problems are becoming available.&lt;&#x2F;p&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>New to Quine&#x27;s Novelty Detector: Visualizations and Enhancements</title>
        <published>2021-02-02T00:00:00+00:00</published>
        <updated>2021-02-02T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              Unknown
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://www.thatdot.com/blog/thatdot-anomaly-detector-enhancements-visualizations-and-data-transformations/"/>
        <id>https://www.thatdot.com/blog/thatdot-anomaly-detector-enhancements-visualizations-and-data-transformations/</id>
        
        <content type="html" xml:base="https://www.thatdot.com/blog/thatdot-anomaly-detector-enhancements-visualizations-and-data-transformations/">&lt;h2 id=&quot;adding-capabilities-to-novelty-detector&quot;&gt;Adding Capabilities to Novelty Detector&lt;&#x2F;h2&gt;
&lt;p&gt;Since the launch of thatDot’s real-time Novelty Detector for &lt;a href=&quot;https:&#x2F;&#x2F;www.thatdot.com&#x2F;blog&#x2F;what-is-categorical-data&#x2F;&quot;&gt;Categorical data&lt;&#x2F;a&gt; in November, we have received numerous feature requests for additional data exploration and data transformation capabilities. We are excited to announce the addition of these key functions in the latest release, available now from &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;docs.thatdot.com&#x2F;novelty&#x2F;using-novelty&#x2F;aws-quick-start&#x2F;aws-quickstart.html&quot;&gt;AWS Marketplace.&lt;&#x2F;a&gt;&lt;&#x2F;p&gt;
&lt;h3 id=&quot;data-exploration-capabilities&quot;&gt;Data Exploration Capabilities:&lt;&#x2F;h3&gt;
&lt;p&gt;While the primary output of thatDot Novelty Detector is our Novelty Score API response payload, numerous users shared that they found value in the data exploration tools we use in our demos. This is especially useful when iterating on new use cases or digging into the details of specific anomaly events. To better support these requirements we have added the following to Novelty Detector.&lt;&#x2F;p&gt;
&lt;h4 id=&quot;data-distribution-plots&quot;&gt;&lt;strong&gt;Data Distribution Plots&lt;&#x2F;strong&gt;&lt;&#x2F;h4&gt;
&lt;p&gt;Data Distribution Plots are a sampled view of API responses that provide visual insight into score distribution and rapid identification of your most anomalous observations. Plots combine Sequence, Novelty Scores, Uniqueness Scores, and score distribution and display different ranges of observations, including long term history, recent observations and high-scoring events.&lt;&#x2F;p&gt;
&lt;p&gt;Plots feature significant interactivity, including continuous updates with new observations, drill-down to smaller data populations, and click-through to any single observation. Lastly, the entire Plots page can be rendered for each anomaly context you have configured in thatDot Novelty Detector.&lt;&#x2F;p&gt;
&lt;h4 id=&quot;example-data-distribution-plot&quot;&gt;&lt;strong&gt;Example Data Distribution Plot&lt;&#x2F;strong&gt;&lt;&#x2F;h4&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;uploads-ssl.webflow.com&#x2F;61f0aecf55af2565526f6a95&#x2F;61f2cccf9a9230282e43b517_thatDot_Anomaly_Detector_Plots.png&quot; alt=&quot;A scatter plot shows a highly anomalous event exposed by Novelty Detector &quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;h4 id=&quot;observation-detail-visualizations&quot;&gt;&lt;strong&gt;Observation Detail&lt;&#x2F;strong&gt; &lt;strong&gt;Visualizations&lt;&#x2F;strong&gt;&lt;&#x2F;h4&gt;
&lt;p&gt;Observation Detail Visualizations are used to discover “why” an observation has been scored as it was, revealing the root cause of a score. They are accessed by clicking on any data point in your Data Distribution Plots, or by querying for the sequence number of a particular observation directly in the thatDot Discovery UI. Observation details show the relational context of each data element in the observation and a count of the number of times that value has been observed in the context of the data element preceding it. Clicking on any data element allows you to expand the tree to see the range of values observed in the data set.&lt;&#x2F;p&gt;
&lt;h4 id=&quot;example-observation-detail-visualization&quot;&gt;&lt;strong&gt;Example Observation Detail Visualization&lt;&#x2F;strong&gt;&lt;&#x2F;h4&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;uploads-ssl.webflow.com&#x2F;61f0aecf55af2565526f6a95&#x2F;61f2cccfe77b16f85be1907b_thatDot-Anomaly-Detector-Observation-Details.png&quot; alt=&quot;thatDot Anomaly Detector Observation Details&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;h3 id=&quot;data-transformation-functions&quot;&gt;Data Transformation Functions:&lt;&#x2F;h3&gt;
&lt;p&gt;Quickly transforming or reordering data elements to experiment and refine your anomaly detection efforts was a top request by users. We are excited to share that users may now define data transformations using javascript, removing the need for external data preprocessing and allowing rapid iteration of new anomaly detection scenarios.&lt;&#x2F;p&gt;
&lt;p&gt;Available to all users, the data transformation API supports a range of operations that can be applied against both batch and streaming data.&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Decomposing strings&lt;&#x2F;strong&gt; into components, which is particularly useful for decomposing directory paths or user agents&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Concatenation of fields&lt;&#x2F;strong&gt; into a single aggregate value&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Encoding numerical values as strings&lt;&#x2F;strong&gt;, such as converting metrics into good&#x2F;poor&#x2F;bad buckets or turning timestamps into day time descriptions such as morning, mid-day, evening and night&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Data filtering&lt;&#x2F;strong&gt; to remove data not needed for your model&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Data obfuscation&lt;&#x2F;strong&gt; including data hashing or masking&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Data reordering&lt;&#x2F;strong&gt; to assess the impact of different data contexts&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;thatDot’s built-in Data Transformation Functions allow users to rapidly modify their observations, greatly increasing the pace of model testing and iteration.&lt;&#x2F;p&gt;
&lt;p&gt;We at thatDot are excited to share these new updates with you and welcome additional feedback and feature requests. As noted above, the latest release of thatDot Novelty Detector for Categorical Data is available now from &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;aws.amazon.com&#x2F;marketplace&#x2F;pp&#x2F;B08L8CPH2P?ref_=srh_res_product_title&quot;&gt;AWS Marketplace&lt;&#x2F;a&gt; or you can &lt;a href=&quot;https:&#x2F;&#x2F;www.thatdot.com&#x2F;contact&#x2F;&quot;&gt;contact thatDot.com&lt;&#x2F;a&gt; for a demo.&lt;&#x2F;p&gt;
&lt;h4 id=&quot;thatdot-novelty-detector&quot;&gt;thatDot Novelty Detector&lt;&#x2F;h4&gt;
&lt;p&gt;thatDot Novelty Detector is the first general-use application designed for finding anomalies in real-time in data sets that include categorical data. Available as an application for deployment in any cloud or data center thatDot Novelty Detector exposes an API that scores submitted observations for their “novelty” enabling real-time anomaly detention with fewer false positives than traditional threshold based metric analysis.&lt;&#x2F;p&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>The World’s First Real-time Novelty Detector For Categorical Data</title>
        <published>2020-12-02T00:00:00+00:00</published>
        <updated>2020-12-02T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              Unknown
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://www.thatdot.com/blog/the-worlds-first-real-time-novelty-detector-for-categorical-data/"/>
        <id>https://www.thatdot.com/blog/the-worlds-first-real-time-novelty-detector-for-categorical-data/</id>
        
        <content type="html" xml:base="https://www.thatdot.com/blog/the-worlds-first-real-time-novelty-detector-for-categorical-data/">&lt;h2 id=&quot;an-industry-first-novelty-detection-for-categorical-data-in-real-time&quot;&gt;An Industry First: Novelty Detection for Categorical Data in Real-Time&lt;&#x2F;h2&gt;
&lt;p&gt;thatDot is excited to share the general availability of the world’s first system for real-time categorical anomaly detection. Data Engineers, Developers, and Data Scientists can now generate a real-time score showing the novelty of any &lt;a href=&quot;https:&#x2F;&#x2F;www.thatdot.com&#x2F;blog&#x2F;what-is-categorical-data&#x2F;&quot;&gt;categorical data&lt;&#x2F;a&gt;—greatly extending the science of anomaly detection beyond the mainstay of time-series numerical analysis, and opening up zettabytes of data for new insights.&lt;&#x2F;p&gt;
&lt;p&gt;Most of the big data collected globally is not numerical data; so traditional tools don’t apply. File names, email addresses, postal addresses, demographic groupings, IP addresses, given names, and other identifiers are all examples of categorical data that cannot be natively processed by existing anomaly detection technology. thatDot’s Novelty Detector is expanding the frontier of what data can be analyzed in real-time for anomalous signals.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;turn-high-volume-data-into-high-value-data&quot;&gt;Turn High-Volume Data into High-Value Data&lt;&#x2F;h2&gt;
&lt;p&gt;“Big data comes with a curse: most of it is useless, but you can’t tell which data is valuable and which is mundane. We’re changing that. Early users of thatDot’s Novelty Detector have quickly made critical discoveries and found new value in both existing datasets and incoming streams of new data. Unlocking the insight in real-time data streams is helping our customers accelerate product development and operational issue analysis, benefiting both the revenue and costs sides of their business” said Ryan Wright, Founder and CEO of thatDot, Inc.&lt;&#x2F;p&gt;
&lt;p&gt;Built upon Quine’s stateful streaming data engine, thatDot’s Novelty Detector easily scales the dynamic graphical models used to find true anomalies and explains why they stand out. The broad contextual information available to our graph processing engine dramatically reduces false positives so users don’t waste of time and resources with unneeded verification and issue resolution.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;use-case-real-time-access-log-fingerprinting&quot;&gt;Use Case: Real-time Access Log Fingerprinting&lt;&#x2F;h2&gt;
&lt;p&gt;Cloud services are powerful and ubiquitous, but each service is used differently, by different users in different places, and for different reasons. How can you tell if one of those user’s credentials are compromised? You’d need a system to “learn” what is normal for each service, for each user, in each location. Creating that training data would be nearly impossible. thatDot Novelty Detector trains itself (no training data required!) and immediately flags the compromised usage in real-time. By using the values from log data: [Service name, REST API endpoint, User ID, Country, City] and any other relevant information like time-of-day or specific service information, anomalous access patterns become immediately apparent. And with the context-aware explanations, thatDot Novelty detector will tell you what was so unusual about that anomaly.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;thatdot-categorical-anomaly-detection&quot;&gt;thatDot Categorical Anomaly Detection&lt;&#x2F;h2&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;uploads-ssl.webflow.com&#x2F;61f0aecf55af2565526f6a95&#x2F;628e589b623bf9240f189141_Novelty%20First%20Categorical%20Anomaly.png&quot; alt=&quot;Varieties of categorical data.&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;Categorical data provides insight into user behavior, system flows, and config changes all in real time.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;simple-to-use&quot;&gt;Simple to Use&lt;&#x2F;h2&gt;
&lt;p&gt;thatDot Novelty Detector is available as a container for rapid deployment in common container management platforms such as Kubernetes or AWS Elastic Container Service (ECS). Turn it on and thatDot Novelty Detector offers a simple REST API to ingest data—as a continual stream, or as a batch—and return a novelty score for each data observation. Together with the summary score, a valuable set of additional information explaining why that observation is anomalous is delivered with every data observation.&lt;&#x2F;p&gt;
&lt;p&gt;No tuning, training, or setting hyper-parameters is needed with thatDot Novelty Detector. It is ready to run immediately, allowing rapid use all the way from research projects to large scale industrial applications.&lt;&#x2F;p&gt;
&lt;p&gt;In support of data exploration and presentation, a graphical visualization of the observed data is provided via the included user interface.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;free-to-try&quot;&gt;Free to Try&lt;&#x2F;h2&gt;
&lt;p&gt;thatDot Novelty Detector is available now on the &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;docs.thatdot.com&#x2F;novelty&#x2F;using-novelty&#x2F;aws-quick-start&#x2F;aws-quickstart.html&quot;&gt;AWS Marketplace&lt;&#x2F;a&gt; or you can contact thatDot.com for information about custom deployments. All users receive a free tier of usage on either platform and discounted annual commitment pricing is available for high volume use.&lt;&#x2F;p&gt;
&lt;p&gt;Media interest, please contact&lt;&#x2F;p&gt;
&lt;p&gt;Robert Malnati&lt;&#x2F;p&gt;
&lt;p&gt;thatDot, Inc.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;a href=&quot;mailto:info@thatdot.com&quot;&gt;info@thatdot.com&lt;&#x2F;a&gt;&lt;&#x2F;p&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>Draw Connections to Build Insights</title>
        <published>2020-09-02T00:00:00+00:00</published>
        <updated>2020-09-02T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              Unknown
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://www.thatdot.com/blog/draw-connections-to-build-insights/"/>
        <id>https://www.thatdot.com/blog/draw-connections-to-build-insights/</id>
        
        <content type="html" xml:base="https://www.thatdot.com/blog/draw-connections-to-build-insights/">&lt;p&gt;In &lt;a href=&quot;https:&#x2F;&#x2F;www.thatdot.com&#x2F;blog&#x2F;3d-data&#x2F;&quot;&gt;the first post in this series&lt;&#x2F;a&gt;, we introduced the term “3D Data” as a mnemonic and a way to think about streaming data processing that incrementally builds toward human-level data questions with the power for deep contextual explanations (e.g. “Is my system running well?”). In this post, we dive deeper into the first “D”: &lt;em&gt;Draw Connections&lt;&#x2F;em&gt;, to further explore the benefits for data analysis and the groundbreaking result for streaming computation.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;connections-for-data-structure&quot;&gt;Connections for Data Structure&lt;&#x2F;h2&gt;
&lt;p&gt;Data is related to other data, and there is value in drawing those connections explicitly; that’s the premise. In practice, this means using edges in a graph structure to encode some of the data[1]. A Node-Edge-Node pattern in a graph corresponds to an Subject-Predicate-Object pattern in natural language. So when we talk about creating edges between nodes, it’s best think of that edge as a predicate.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;uploads-ssl.webflow.com&#x2F;61f0aecf55af2565526f6a95&#x2F;61f2ca6930c433e58e474ce3_Screen-Shot-2020-09-02-at-10.10.06-AM-1024x510.png&quot; alt=&quot;&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;Data used for edges usually comes from two sources: 1.) values given in the data itself, and 2.) tacit knowledge about what the data means.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;data-values-as-connections&quot;&gt;Data values as connections&lt;&#x2F;h2&gt;
&lt;p&gt;Let’s assume JSON objects are the input data format for a stream processing system. Every new object is a new piece of data to be considered by the system. If a NoSQL document store is used, then you can simply save the object as a new item in your document store. However if you do that, you will need to traverse the object to understand how it’s values are related to other objects in the store[2].&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;uploads-ssl.webflow.com&#x2F;61f0aecf55af2565526f6a95&#x2F;61f2ca68390c50f0306d18f1_Screen-Shot-2020-09-02-at-11.15.41-AM-1024x354.png&quot; alt=&quot;&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;An alternative data representation would treat the JSON object as a node in a graph. The key&#x2F;value pairs in the JSON object become properties on a node. But with a graph structure available, we have a new data modeling choice: an option to take a property from the JSON data and encode it as an edge to another node. The JSON key is used to create the edge label and the value is stored in the node on the other end of the edge[3]. The choice of which properties to pull out into separate nodes is a data modeling choice. So is the decision to use a single node for all occurrences of the value (i.e. from multiple different JSON objects) vs. a separate node for each occurrence of the same value.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;uploads-ssl.webflow.com&#x2F;61f0aecf55af2565526f6a95&#x2F;61f2ca6837642478d01151eb_Screen-Shot-2020-09-02-at-11.15.59-AM-1024x361.png&quot; alt=&quot;&quot; &#x2F;&gt;
&lt;img src=&quot;https:&#x2F;&#x2F;uploads-ssl.webflow.com&#x2F;61f0aecf55af2565526f6a95&#x2F;61f2ca685d2c8d65757b79ea_Screen-Shot-2020-09-02-at-11.16.10-AM-1024x224.png&quot; alt=&quot;&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;This method becomes especially interesting and powerful when applying it to nested JSON objects. Instead of each nested object treated opaquely as blob, the inner JSON object is the definition for a new collection of properties on a different node—with the edge labelled with the nested object’s key. This process can occur recursively as much as needed.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;uploads-ssl.webflow.com&#x2F;61f0aecf55af2565526f6a95&#x2F;61f2ca689f5b0568ed905005_Screen-Shot-2020-09-02-at-11.24.42-AM-1024x544.png&quot; alt=&quot;&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;h2 id=&quot;tacit-knowledge-as-connections&quot;&gt;Tacit knowledge as connections&lt;&#x2F;h2&gt;
&lt;p&gt;In addition to the obvious data being the source of connections between data elements, we have often found it useful to reify other assumed knowledge into specific nodes in the graph and connect them to other data. A trivial example might be to create nodes to represent the buckets of time that are relevant to the problem (days of the week, morning&#x2F;afternoon&#x2F;evening&#x2F;night, every second, etc.), and then connect the associated data with those time buckets.&lt;&#x2F;p&gt;
&lt;p&gt;Creating a node to represent an item of tacit knowledge provides a location to store other data for relevant conclusions. Drawing conclusions from that data will be discussed more in the next post in this 3D Data series.&lt;&#x2F;p&gt;
&lt;p&gt;There are many kinds of tacit knowledge, and many kinds of data we often use but don’t literally represent. Choosing insightful examples is very application-specific, but considering the kinds of answers for which the data is being used is often illustrative. In our experience, there are often intermediate objects like “user” or “session” which are the subject of qualitative questions (e.g. “Did all users have a good experience?”) that are easily overlooked when considering data representations. These intermediate objects are easy to overlook because they aren’t the direct subject of any single piece of data but instead are the object or concept that is behind either the data or the questions. Reifying these objects makes them available for computation, as a place to store intermediate answers, and core components in meaningful explanations.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;connections-for-computation&quot;&gt;Connections for Computation&lt;&#x2F;h2&gt;
&lt;p&gt;The data modeling techniques described above are often enlightening and useful. They can be applied across a broad collection of technologies. Relational databases or document stores can simulate graphs, with some extra computation. Joining two tables in a relational database is akin to traversing across the edges in a graph. This works well for batch operations; but we built &lt;em&gt;thatDot Quine&lt;&#x2F;em&gt; specifically to operate on streaming data. So instead of using extra computation to simulate a graph, we turn this on its head and use less computation over a graph to get iterative results we can produce in real-time.&lt;&#x2F;p&gt;
&lt;p&gt;The result of iterative processing on a native graph means that we can make use of a technique called “semantic caching.” Semantic caching is a technique that uses the structure of the data to inform how computation should be performed. While this topic deserves its own separate discussion, we leave a mention here as a pointer to the deeper computational motivation for drawing connections in data. For those who can’t wait, we touch on this topic and other related concepts in the technology overview section, and our solutions team is always ready to discuss applications to your problem space.&lt;&#x2F;p&gt;
&lt;p&gt;Both for data modeling and stream processing, the first step for realizing 3D Data is the same: Draw connections between data. You already have the data. Pulling out edges from that data and encoding other aspects you already know is a brilliant way to get started building powerful real-time answers with context to human-level questions.&lt;&#x2F;p&gt;
&lt;p&gt;[1] We are assuming a property-graph model where nodes in the graph are distinguished with IDs and contain a set of key-value pairs called “properties”, and edges connect exactly two nodes and have an edge label with an optional direction. Variations on the property-graph model are in wide use in differing contexts. Sometimes edges themselves are allowed to have properties as well as nodes, but in our model, we do not assume that is always true.&lt;&#x2F;p&gt;
&lt;p&gt;[2] Most object stores will index some values in these objects so they can be found more easily. This is an important step, but does not change the structure of the data stored. The need to pick carefully choose what to index becomes a critical consideration.&lt;&#x2F;p&gt;
&lt;p&gt;[3] This is very similar to how the W3C RDF spec represents what are often stored as node properties. &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;www.w3.org&#x2F;TR&#x2F;rdf-schema&#x2F;#ch_properties&quot;&gt;https:&#x2F;&#x2F;www.w3.org&#x2F;TR&#x2F;rdf-schema&#x2F;#ch_properties&lt;&#x2F;a&gt;&lt;&#x2F;p&gt;
&lt;p&gt;‍&lt;&#x2F;p&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>The Three D’s of Graph Data</title>
        <published>2020-07-06T00:00:00+00:00</published>
        <updated>2020-07-06T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              Unknown
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://www.thatdot.com/blog/3d-data/"/>
        <id>https://www.thatdot.com/blog/3d-data/</id>
        
        <content type="html" xml:base="https://www.thatdot.com/blog/3d-data/">&lt;h2 id=&quot;putting-data-in-perspective-with-streaming-graph&quot;&gt;&lt;strong&gt;Putting Data In Perspective with Streaming Graph&lt;&#x2F;strong&gt;&lt;&#x2F;h2&gt;
&lt;p&gt;Africa is bigger than Greenland. A LOT bigger! You already knew that—but you also know that the world is round. When you try to show a globe on a flap map, you have to distort the original shape of some parts of the globe. Depending on where you start, different parts of the map will be distorted by different degrees. This is why a Mercator Projection of the Earth shows Africa and Greenland appearing about the same size. If you live in Europe or the United States, your region probably seems about right. If you only need to focus on a small area, flat maps are very useful! However when you need a global perspective, they are impossibly broken.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;&#x2F;img&#x2F;2020&#x2F;07&#x2F;3D-image.png&quot; alt=&quot;‍Mercator Projection, Courtesy of Daniel R. Strebe&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;‍Mercator Projection, Courtesy of Daniel R. Strebe&lt;&#x2F;p&gt;
&lt;p&gt;We face the same problem when using data to understand complex systems. Data comes from one source, one vendor, or one engineer’s guess about what needs to be seen and is convenient to collect. When it works well, it gives us something like a flat map that is helpful to understand a small area that is part of a complex system. If you try to use that perspective to understand the system as a whole, the end result is grossly distorted!&lt;&#x2F;p&gt;
&lt;p&gt;This problem emerges on virtually all kinds of complex systems. CDN logs can help shed light on real-time video delivery across the internet, but dramatically obscure the quality of experience each of a million viewers is having. Monitoring file access logs can help understand security events in a live computer system, but does little to learn about injected shell code. Counting twitter followers might show you who the “influencers” are, but tells you little about the swelling revolution.&lt;&#x2F;p&gt;
&lt;p&gt;The tendency of data engineers and analysts is to convert the complexity of these data sources into many “metrics” that show counts over time for dozens of possible measurements. This converts each digital signal into one compound analog chart, and the hope is that important events will result in a spike or a dip in one of those charts… and that the analyst will be able to explain what the spike &lt;em&gt;means&lt;&#x2F;em&gt; when it recurs. This approach is cumbersome, error prone, and requires constant vigilance by experts. But there is a better way.&lt;&#x2F;p&gt;
&lt;p&gt;To understand a complex system from its data requires leaving the simplified world of flat data metrics and building a 3D model of the system’s data—use data to build a globe, instead of a map. This three dimensional model will allow you to see the system from every angle and correctly view the whole world. We have distilled this process down into three main principles for building 3D data.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;1-draw-connections-between-data&quot;&gt;&lt;strong&gt;1. Draw connections between data.&lt;&#x2F;strong&gt;&lt;&#x2F;h3&gt;
&lt;p&gt;Data &lt;em&gt;should&lt;&#x2F;em&gt; come from many different sources. Pull it into the same system so that you can connect elements in one set to the others. You will not have all possible data at the beginning, so you cannot know what the ideal data schema is. Accept that and use a system which will let you draw connections between data items over time and create new associations that weren’t obvious at the beginning. There are many ways to connect the dots. Allow for them all to be represented in the data—you won’t always know which are most valuable until the end.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;2-define-new-levels-of-data&quot;&gt;&lt;strong&gt;2. Define new levels of data.&lt;&#x2F;strong&gt;&lt;&#x2F;h3&gt;
&lt;p&gt;Deliberately build new data elements. Build them out of other data elements. Decide what a pattern of events &lt;em&gt;means&lt;&#x2F;em&gt;, and create a new data element to represent that meaningful object. Be sure to connect the new objects to the data they were produced from as well as any other data that produces a possibly useful connection. Arrange any items which have a useful order; record every order that could be useful. Build beyond a single level; after considering what new object is made of many lower-level details, do it again and consider how many of those objects can be combined to make an even more meaningful object. Building up at each level will result in fewer data elements at that level, but each of them being more meaningful—and closer to the &lt;em&gt;actual&lt;&#x2F;em&gt; questions you want to answer. Building up from low-level data into increasingly abstract data levels is what produces 3D data needed to answer complex questions.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;3-drill-down-to-answer-questions&quot;&gt;&lt;strong&gt;3. Drill down to answer questions.&lt;&#x2F;strong&gt;&lt;&#x2F;h3&gt;
&lt;p&gt;3D data &lt;em&gt;directly&lt;&#x2F;em&gt; answers human-level questions at its highest points. Literally. There is a piece of data, built from other data, that says “yes” or “no” to a question like: “Is this video stream high quality?” If the answer is “no,” drill down and look at the data used to build this last level to start explaining why? If you need more detail, keep drilling down to lower levels of data. Answering new questions is a matter of defining new data combinations that build toward literally providing that answer. There isn’t a privileged perspective or one single “right” way to look at the data. Build every useful abstraction toward every question you want to answer.&lt;&#x2F;p&gt;
&lt;p&gt;This view of 3D Data builds upon current practices and is also revolutionary in its power! We believe this so strongly that we’ve built a streaming data processing system to do 3D Data processing in real-time on a massive scale. At thatDot, our mission is to turn high-volume data into high-value data. This framework for 3D Data processing is how we do it.&lt;&#x2F;p&gt;
&lt;p&gt;‍&lt;&#x2F;p&gt;
&lt;p&gt;‍&lt;&#x2F;p&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>Defining Video Observability</title>
        <published>2020-06-16T00:00:00+00:00</published>
        <updated>2020-06-16T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              Unknown
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://www.thatdot.com/blog/defining-video-observability/"/>
        <id>https://www.thatdot.com/blog/defining-video-observability/</id>
        
        <content type="html" xml:base="https://www.thatdot.com/blog/defining-video-observability/">&lt;h2 id=&quot;streaming-graph-means-real-time-root-cause-analysis&quot;&gt;Streaming Graph Means Real-time Root Cause Analysis&lt;&#x2F;h2&gt;
&lt;p&gt;Imagine if the next time your video streaming operations dashboard-of-choice warns you that 100 users experienced video start failures in the last minute, it was only one click to see each of those sessions and that they are all related to a corrupted file for a single bit rate of an iOS-specific encoding of an asset on one of your CDNs. Better yet, your platform triggers a workflow to have this specific bit rate version of the asset re-encoded and uploaded to the CDN.&lt;&#x2F;p&gt;
&lt;p&gt;Video observability can make this a reality.&lt;&#x2F;p&gt;
&lt;p&gt;We enjoyed Mux’s recent blog, “&lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;mux.com&#x2F;blog&#x2F;what-is-video-observability&#x2F;&quot;&gt;What Is Video Observability&lt;&#x2F;a&gt;”, and thought to expand on Steve Lyons’s discussion. General observability solutions such as Splunk and DataDog ingest and aggregate CDN and origin logs as well as metrics from video specialists like &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;mux.com&#x2F;data&#x2F;&quot;&gt;Mux Data&lt;&#x2F;a&gt; to provide dashboards that help support video services. Operations teams overlay charts in these dashboards in their search for context, and then, based on intuition, dive into individual systems to explore granular event data to debug issues and answer cause-and-effect questions.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;uploads-ssl.webflow.com&#x2F;61f0aecf55af2565526f6a95&#x2F;61f2c64a1c35a12206c36048_videoobservability-before.png&quot; alt=&quot;A flow chart of logs and how they are processed for observability.&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;h4 id=&quot;traditional-monitoring-based-video-observability&quot;&gt;Traditional Monitoring-Based Video Observability&lt;&#x2F;h4&gt;
&lt;p&gt;This aggregated monitoring scales well, but shortchanges our ability to understand cause and effect, complicates debugging and forces our understanding of the system into functional components instead of a human natural or logical view. What if, instead, we preserved all the granular data while also building abstracted views of our end-to-end platform to enable a new paradigm in actioning from our data? To accommodate this new model, we need to keep the raw event data but also transform it into sessions and then new composite metrics that span multiple elements of our platform for more holistic analysis and alerting. To achieve this definition of video observability we need to rethink our data pipeline as follows.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;uploads-ssl.webflow.com&#x2F;61f0aecf55af2565526f6a95&#x2F;61f2c64b5abc8fc092e677d5_VideoObservabilityElements-900.png&quot; alt=&quot;Video Observability Timeline&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;h4 id=&quot;video-observability-solution-elements&quot;&gt;Video Observability Solution Elements&lt;&#x2F;h4&gt;
&lt;p&gt;Video streams are what our audiences are experiencing and paying for, we should assess our platform’s performance in terms that relate to their experience. Once sessionized, our observability provides an inter-related view of the streaming platform as a single system, allowing us to see the cause and effect between packagers, CDNs, asset ID, players, etc. This holistic view allows us ask questions about the platform in more natural terms independent of the components.&lt;&#x2F;p&gt;
&lt;p&gt;– “What is the root cause of this rebuffering event?”&lt;br &#x2F;&gt;
– “Is one CDN providing a better user experience than another?”&lt;br &#x2F;&gt;
– “What part of the streaming platform is causing latency in our video streams?”&lt;br &#x2F;&gt;
– “Are we delivering good quality video to users of the new Android OS?”&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;uploads-ssl.webflow.com&#x2F;61f0aecf55af2565526f6a95&#x2F;61f2c64bb716d08954fbcc2b_videoobservability-thatDot.png&quot; alt=&quot;&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;h4 id=&quot;granular-and-sessionized-video-observability&quot;&gt;Granular and Sessionized Video Observability&lt;&#x2F;h4&gt;
&lt;p&gt;Video observability enables us to see both the abstract system experience, as seen by our audience, while preserving the granular details we need for rapid discovery root causes to issues in our platform. When we have this high-confidence view of system element inter-relations we can streamline work processes and implement automation. Video observability brings opportunities to deliver both better customer service by reducing MTTR while also helping to free up our operations and engineering staff from time spent in “war rooms” inferring root causes.&lt;&#x2F;p&gt;
&lt;p&gt;Adopting this broader video observability definition supports our entire business. Developers can leverage this same insight to understand the impact of changes to individual components on the entirety of the system. Architects can use granular data to perform historical analysis without having to rebuild models from logs. Product managers can directly see the QoE benefit of new investments or lower costs infrastructure substitutions. In all cases, understanding the relationships between the logs, events and metrics in terms that relate to our services delivered enable such insights so we can take well-informed action.&lt;&#x2F;p&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>Introducing thatDot</title>
        <published>2020-05-26T00:00:00+00:00</published>
        <updated>2020-05-26T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              Unknown
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://www.thatdot.com/news/introducing-thatdot/"/>
        <id>https://www.thatdot.com/news/introducing-thatdot/</id>
        
        <content type="html" xml:base="https://www.thatdot.com/news/introducing-thatdot/">&lt;p&gt;That word… ”that.” It’s how you point with a word. What better way to express what you mean than by directly pointing? “What you mean” is the whole point. What you mean is where you spend your time—that’s why we’re here.&lt;&#x2F;p&gt;
&lt;p&gt;We started thatDot after decades of building streaming data systems to extract value from data. Over and over again, the experience has taught us: there needs to be something better than building bespoke backend systems every time. —a &lt;em&gt;product&lt;&#x2F;em&gt; to make getting value from data at scale as easy as “that.” Data arrives a piece at a time. What we’re looking for is never so simple as just one item among many. The pieces we need are buried in a sea of data we don’t. We need to put the data together and see how it all relates to find the meaningful piece. The world is swimming in data, but to piece it together into that one most meaningful event… well that’s everything.&lt;&#x2F;p&gt;
&lt;p&gt;thatDot’s mission is to produce meaning from data. Meaningful data is valuable data. —to turn high-volume data into high-value data.&lt;&#x2F;p&gt;
&lt;p&gt;For the last six years, our team has been working to create a fundamentally new technology aimed at getting value from data easily at large scale. With major funding from DARPA, we have been able to approach these fundamental problems from a fresh perspective. The result is a new technology built from the ground up to solve the challenges of modern data processing in the enterprise. A new perspective has brought new capabilities which enable radical new results.&lt;&#x2F;p&gt;
&lt;p&gt;Our technology unifies the storage and computational models for high-volume streaming enterprise data. As a result, each piece of data can perform its own computation—much like the neural network in the brain both stores information and relays signals used for complex reasoning. Combined with a powerful representation of time from end-to-end, thatDot users trigger new actions from the data itself.&lt;&#x2F;p&gt;
&lt;p&gt;We’re working with some of the world’s most remarkable companies to bring insight and understanding to inhuman amounts of data. The time it takes to conceive and build a new data pipeline, to test a data hypothesis, or add a product feature is often measured in months. thatDot is turning this into hours or minutes. And the overnight batch processing to address today’s question tomorrow, can now be answered in less time than it takes to click the checkout button.&lt;&#x2F;p&gt;
&lt;p&gt;The last few decades have seen nearly every company become a software company, and every software company has become a data company. Understanding that data well enough and fast enough to take action currently takes a small army of highly skilled software engineers—and the process is complicated and error prone, leaving many desired capabilities nothing but a distant dream. At thatDot, we believe a new product in this space can liberate engineering teams from crippling complexity, deliver unachievable capabilities, and bring enterprise companies back to building their businesses.&lt;&#x2F;p&gt;
&lt;p&gt;Ryan Wright&lt;br &#x2F;&gt;
Founder &amp;amp; CEO&lt;br &#x2F;&gt;
thatDot, Inc.&lt;&#x2F;p&gt;
&lt;p&gt;‍&lt;&#x2F;p&gt;
&lt;p&gt;‍&lt;&#x2F;p&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>thatDot Raises Funding To End Microservices Complexity</title>
        <published>2020-05-26T00:00:00+00:00</published>
        <updated>2020-05-26T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              Unknown
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://www.thatdot.com/news/thatdot-raises-funding-to-end-microservices-complexity/"/>
        <id>https://www.thatdot.com/news/thatdot-raises-funding-to-end-microservices-complexity/</id>
        
        <content type="html" xml:base="https://www.thatdot.com/news/thatdot-raises-funding-to-end-microservices-complexity/">&lt;p&gt;May 26, 2020 Portland, OR. thatDot, announces a $2M funding round led by Oregon Venture Fund (OVF), with participation by Hale Capital Partners and Galois, Inc. Leveraging years of DARPA funded R&amp;amp;D, thatDot is the creator of &lt;em&gt;thatDot&lt;&#x2F;em&gt;, an enterprise software solution for the real time discovery and navigation of data relationships in highly connected data, such as monitoring, security and commerce event streams. &lt;em&gt;thatDot&lt;&#x2F;em&gt; can ingest billions of events while building a rich relationship graph, identifying correlations, isolating anomalies, and triggering workflows, to unlock the value of big data in real time. thatDot powers use cases such as real-time root cause analysis, online video observability, streaming anomaly detection, data lineage, fraud detection, and application security tracing, reporting, and alerting.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Solving for Microservice Complexity&lt;&#x2F;strong&gt;&lt;&#x2F;p&gt;
&lt;p&gt;As a foundational element of an event driven software architecture, &lt;em&gt;thatDot&lt;&#x2F;em&gt; accelerates new service development and improves user experience for enterprise organizations by unlocking the value of their big data in real time. “Big data applications are predominately built on microservices architectures that require scarce technical talent and significant operational overhead. It’s been the only way to build highly scalable applications and services—until now. thatDot’s simplified event stream processing and analysis capabilities are a revolutionary step forward—it’s how big data applications will be built in the future,” said Nick Wade of Oregon Venture Fund.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Revolutionary Technology&lt;&#x2F;strong&gt;&lt;&#x2F;p&gt;
&lt;p&gt;“It is exciting to see this revolutionary technology, developed to satisfy the forward-looking requirements from DARPA, reach the market,” said Rob Wiltbank, CEO at Galois. “thatDot unlocks the value of streaming big data, combining high volume capabilities with automated intelligent data analysis, it will dramatically change back-end software development.”&lt;&#x2F;p&gt;
&lt;p&gt;thatDot’s distributed stream-processing fabric ingests and stores event data, combined with a semantic graph layer to accelerate complex queries over large amounts of data, spread over broad time spans. This unique combination of technology enables several critical capabilities:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;Real time pattern recognition in streaming data&lt;&#x2F;li&gt;
&lt;li&gt;Real time anomaly detection in streaming categorical data&lt;&#x2F;li&gt;
&lt;li&gt;Complete data change tracking, for every update made to data and easy historical queries&lt;&#x2F;li&gt;
&lt;li&gt;Unified “Standing Queries” that instantly trigger custom actions on complex stream patterns&lt;&#x2F;li&gt;
&lt;li&gt;Low code usability&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;&lt;strong&gt;Repeat Entrepreneurs&lt;&#x2F;strong&gt;&lt;&#x2F;p&gt;
&lt;p&gt;thatDot is led by repeat technology entrepreneurs from companies such as Citrix, Urban Airship, Motorola, Janrain and Cedexis. “Raising this foundational round of capital during Covid-19 is a strong endorsement of thatDot’s value proposition and market momentum,” said Ryan Wright Founder and CEO of thatDot. “The world is realizing that microservices are too complex and expensive to manage and orchestrate, and thatDot offers a compelling new way to program back-end operations directly from event data streams.”&lt;&#x2F;p&gt;
&lt;p&gt;For more information, please visit &lt;a rel=&quot;noopener external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;www.thatdot.com&#x2F;&quot;&gt;www.thatdot.com&lt;&#x2F;a&gt; to explore our solutions or sign up for periodic updates.&lt;&#x2F;p&gt;
&lt;p&gt;‍&lt;&#x2F;p&gt;
</content>
        
    </entry>
</feed>
