The Marriage of Fast and Big Data

John Piekos, VP-Engineering, VoltDB
500
793
154

Introduction

We live in a world of “smart everything,” with smart phones, smart devices and the emergence of the Internet of Things (IoT).  We’re always on and always connected. Both humans and machines are creating new data at an alarming rate, with data at least doubling every year. Every interaction we make with any of these applications and services is captured, logged, and often historically analyzed. In  fact, your company most likely has a Big Data initiative, collecting historical data in Hadoop, mining strategic insights from historic patterns and behavior.

This abundance of new data presents an enormous opportunity for businesses to capitalize on. The opportunity isn’t simply capturing and dumping this data into a  Hadoop data lake for historical analysis. Rather, the opportunity is ingesting, interpreting and responding to ‘Fast Data’  the moment it enters your enterprise. The ability to handle ‘fast data’ creates a new  breed of applications: real-time enterprise applications. Building these new fast data applications are the business world’s next big challenge and equally its big      opportunity.

In this article we’ll examine the emerging Enterprise application architecture needed to support the      development of real-time applications that rely on both fast data and Big Data workflow.

The Marriage of fast data and Big Data – and what it means for CIOs   

The convergence of fast data with Big Data can be described simply as the act of applying intelligence to data the moment it is created to generate business value. We call these applications fast data applications. Fast data applications provide enterprises and CIOs with the ability to perform real time monitoring and  alerting. Understanding and reacting to what is going on “right now.” Perform per-event, or per-message personalization, often applying historical intelligence to each incoming event to calculate a smart response. What are some examples of emerging fast data applications?

Consider these:

Mobile – In just over 40 years we’ve gone from rotary telephones to mobile smart phones and tablets. Mobile billing and policy applications process hundreds of thousands of requests per second,  enforcing usage policy and quotas, maintaining balances, and generating personalized offers based on historical usage patterns and device location.     

Digital Ads – Digital ad applications target specific user segments. In today’s always-connected world, households have many Internet devices, often a dozen or more. Historical data analysis can identify and segment device ids by  household, providing that result-set as fast data layer lookup/correlation tables. Using this historical result set, the fast data application can determine the proper ad to serve, based on campaign budget, in just a few milliseconds. Location – based Applications–Geofencing applications are providing realtime alerts when device sensors enter or leave regions. Personalized offers are delivered to cell phones based on proximity and past historical buying habits.     

Online Gaming – Fast data applications are being built to track real-time leaderboards and per user rankings, making them available instantly. Historical intelligence is used to provide real-time ingame up-sells.    

Smart Grids and the Internet of Things (IoT) – Our home thermostats have been replaced with smart home      systems managed by applications in our smart phones. Our electric meters are  also smart, or soon will be, monitored and managed by smart grid electric utilities. Online sensors are proliferating in every facet of our lives, from traffic monitoring to personal fitness devices. Real-time  analytics and alerts comprise the first wave of fast data applications. Quickly following are responsive and reactive applications that, for example, adjust usage and electrical fee structures based on a combination of historical seasonal modeling and real-time usage statistics. As you can see from these use cases, fast data applications have the following technical characteristics:     

1. Ability to ingest and process incoming messages, events, and transactions at tens of thousands to millions per second. Fast data applications must have the ability to keep up, in real-time, as data volumes grow.     

2. The ability to calculate and provide real-time analytics on  windows of fast-moving data. In addition to simply accepting the data, the data must be analyzed and real time analytics (counters, aggregations, leaderboards) must be maintained without slowing      down the ingestion engine.     

3. The ability to respond, in sub-or single-digit milliseconds, to provide real-time personalization and decisioning to the originator. In addition to real-time analytics, real-time computed responses must be returned, again, without slowing down the ingestion engine.

The Fast Data Application Stack

Let’s dive into what this new fast and Big application architecture looks like:

Data arrives either from a queuing system or directly from an application, perhaps a mobile device, a user’s action on a web   browser, a smart sensor (such as a smart electric meter), or from  many other possible sources.

This data is ingested and processed in sub-or single-digit milliseconds. The engine must be able to handle high ingestion rates and must have the ability to scale out – add more nodes to add  more capacity – when approaching peak rates. While traditional streaming systems are capable of this level of ingestion, recent  innovations in in-memory relational database technology make them a perfect component in this architecture.

As part of the data ingestion process, real-time analytics, by way of time-based aggregations (statistics over the last n minutes), as well as ranking and leader boards (top n ranking lists), are updated and made available for query to corporate dashboards. Additionally, if an in-memory relational database  is the ingestion engine, real-time streaming ad-hoc analytics are also available via standard relational reporting tools. A response, if required, is computed, often using historical lookup data tables residing in-process, and returned to the caller. For example, if the fast data application is serving digital ads to end-user devices, it may receive an id or cookie identifying that device. It may  then reference a set of pre-computed tables populated from Big Data mining queries that correlate device ids to households, and households to demographic segment targets. Using this segmentation mapping, the optimal ad is selected, budgets spend validated, and that ad is returned to the caller–all in under a few milliseconds.

The fast data ingestion engine usually holds enough data  to maintain the analytics used to compute real-time responses. Typically this is anywhere from a day’s worth of data to several months’ worth. As data ages out, it is quickly exported or streamed  to an historical repository such as Hadoop or a Big Data OLAP system capable of holding petabytes or more of storage, and also providing batch-oriented historical analysis.

Fast and Big–The New Corporate Data Architecture

It is likely that you already have a Big Data deployment of some sort, either a Hadoop cluster or perhaps a commercial OLAP      system such as Vertica or Netezza. How might the full architecture look when adding a Fast Data engine?

This diagram outlines the components of a fast data pipeline. Fast data arrives and is processed by the ingestion engine, an  in-memory relational database. Real-time analytics and per event responses (decisions) are handled by the fast data engine. Historical analytics are computed and provided by the Big Data layer. Intelligence mined from the Big Data repository is periodically fed into the fast data ingestion engine for use with per-event decisioning.

Conclusion

The world’s data is doubling in size every two years, driven by the forces of mobile adoption, connected devices, sensors, social  media, and the Internet of Things (IoT).The next wave of the Big  Data evolution is upon us: fast data. Successfully handling this ‘fast data’ is a challenge that requires new software architecture patterns. But it is also a huge opportunity: tapping into streams of fast-moving data provide fertile ground for innovative applications, applications that can drive new corporate revenue. Don’t be stuck in the slow lane.                          

Read Also

Speed Wins

Matt Spilich, Director - Database and Warehouse Operations, TripAdvisor [NASDAQ: TRIP]

Are You Ready To Virtualize Your Enterprise Applications and Databases?

Sachin Chheda, Director of Product and Solutions Marketing, Nutanix