Redis: Setting Big Data on Fire

redislabs_logoFor those unfamiliar with Redis, it is an open source, in-memory data structure server. Originally conceived to solve a problem that required speed and simplicity, it soon became clear that Redis had applications far beyond its original intent. Redis has since grown to include many data structures that resolve very complex programming problems with simple commands executed within the data store – reducing coding effort, increasing throughput and reducing latency, to the extent that it is now benchmarked as the fastest database in the world.

However, many question why something that runs in a computer’s memory, and is by definition equipped to handle small data, being used in big data scenarios? Redis has many use cases for big data scenarios – primarily advanced by its extreme performance, simplicity and above all, versatility.

High-Speed Data Ingest: Big data is characterized by its volume, variety and velocity. While numerous disk-based databases such as MongoDB or HBase can handle big volume and big variety, keeping up with big velocity often requires throwing hardware at the problem. The firehose of data from high traffic websites, from massively multi-player online games, chat, messaging, media, IoT applications, applications that call multiple APIs or analyze streaming data, is too fast for most disk-based databases to handle.

Redis runs extremely efficiently in memory and handles extraordinarily high-velocity data easily, needing very few simple standard servers to deliver millions of operations per second with sub-millisecond latency. It can be used to store high velocity data to avoid losing it before offline analysis. Just like other modern NoSQL databases, Redis is schema-less, but when one of its data structures (like HASH or Sorted Sets) is used, users can take advantage of its extremely efficient in-memory operations to further accelerate how data is processed. Advanced Redis use cases include using Redis data structures to avoid overhead associated with ETL.

One would use Redis as a first responder database in front of other disk-based databases to handle high-velocity throughput scenarios with extreme simplicity and performance, while using other databases for longer term storage. SF based HotelTonight uses Redis to their advantage to handle rapidly changing data such as new statistics, API interactions, network timeouts, traffic problems with customer devices, for which they don’t want to incur the overhead of designing schemas, building tables and writing to their MySQL databases.

In Database Real Time Analytics: Big data often requires that insights are derived in real time. For real-time analytics such as top scorers in an online game, recommendation engines on an e-commerce website or real time session analysis, Redis is the speediest choice of data store available. The Motley Fool (fool.com), Financial Engines, Moovit and other companies utilize Redis heavily for real-time data insights that drive application behavior.

Location-sensitive applications such as transportation or social applications benefit from geo-location features in Redis to deliver faster user insights to queries like destinations/facilities/friends nearby.

Probably less known is that Redis, when used as an off-heap cache, can accelerate HBase throughput from 200 to 1000 percent, and when used with Spark (instead of Tachyon), can cut execution latencies by up to 98 percent. As a result, Redis can play a key role in big data analytics – accelerating slow, batch analytics into real-time decisions that can result in better customer experiences or more accurate business tradeoffs.

Cost Reduction: Using Redis as a first responder database to handle high-volume data ingest, consolidate data writes or provide rapid throughput reduces the number of disk-based databases you need to run, saving tremendously on operational and maintenance costs.

David Pfister of Financial Engines finds the traffic reduction to MySQL servers alone was worth deploying Redis – all other benefits – session management, sequential processing of multi-threaded transactions, pub-sub – were a bonus. The throughput provided by Redis saves users from spending millions of dollars on additional servers to handle the load, as well as personnel needed to manage your databases.

In addition to this, by running Redis on Flash memory, users can maintain the same sub-millisecond latency with considerably lower memory costs. Flash memory when used as an extension of RAM provides a lower cost alternative to running your entire dataset in memory while still maintaining high throughput.

Some of the world’s largest companies, like Alcatel Lucent, Twitter, Uber, SnapChat and Github use Redis to power their applications as well as numerous high profile startups like Docker, AirBnB, Pinterest and Square – so Redis is well proven in production. Incorporating Redis into a big data stack is likely to increase time to value of big data.

Leena-joshiContributed by: Leena Joshi, Vice President of Product Marketing at Redis Labs