How IoT & Pre-emptive Analytics at the Edge Change How We Think About Applications

scott_gnauIn this special guest feature, Scott Gnau, CTO of Hortonworks, welcomes us to the future of data where the modern data application will need to be highly portable, containerized and connected. Scott has spent his entire career in the data industry, most recently as president of Teradata Labs where he provided visionary direction for research, development and sales support activities related to Teradata integrated data warehousing, big data analytics, and associated solutions. He also drove the investments and acquisitions in Teradata’s technology related to the solutions from Teradata Labs. Scott holds a BSEE from Drexel University.

The volume of data will continue growing exponentially over the next decade.  We already know this.  It’s already forcing enterprises to upgrade their data architectures in order to store and extract value from vast amounts of data.  They are already starting to see the value, and the opportunity cost, of doing this for all data-in-motion and data-at-rest.

But whether from millions of mobile phones, autonomous cars and trucks, street lights, or billions of meters and sensors, IOT will grow by orders of magnitude bigger than we even believe today.  Every prediction about growth I’ve ever made to date has always been too low, so I expect to be right this time.

This has a huge implication on data architectures, how and where analytics are done, and the concept of software.

It’s driven by the need for increasingly pre-emptive analytics that can drive a transaction instead of just modifying or optimizing it.   What I mean by pre-emptive is that is a lot of truly unique business value comes not from just processing data and understanding recent or even real time events but by doing it even before the consumer walks up to the counter, logs in or before a sensor triggers.

So, in terms of data architecture, thinking about how we connect all our data – across multiple cloud providers and data centers – is really important.  Why?  Because it delivers the ability to connect customer, products and supply chain into a single integrated business and user centric picture versus an application-centric view.   The value of doing this well can have a transformative impact on the ability to identify new revenue streams, to save costs and improve customer intimacy.

But where it also gets really interesting is when we think about how we’ll also need to keep a lot of decision-making and therefore software out at the edge, where the data is.

Though bandwidth is becoming ubiquitous and costs always declining, data will always have gravity.  Plus speed and convenience will always matter to both consumers and to businesses.  For example, pushing out the result of a detection algorithm for credit card fraud to the store or the cardholder before the fraud has happened, instead of reporting about it afterwards matters.  Delivering a real time or pre-emptive decision to an autonomous vehicle at 65mph matters even more.  Electrons only move at the speed of light.  There will often simply be not enough time for a round trip.

So from a data science point of view, we are always now thinking about how we can connect and manage all the the data from a matrix of communicating devices and still guarantee the provenance.  In an IOT world with sensors all over the place communicating with each other and not via a central location, managing that flow is a really big and important problem to solve.

Equally important, we now also think architecturally about how software is developed, delivered and deployed in relation to data.  Think again about those connected driverless cars driving around that are delivering and receiving data bi-directionally about what is going on through various kinds of networks. There will be no reason to ever do a lot of this processing via a data center.  Some will happen in the cloud. Some via GPS.  Some in the vehicle itself.   So we will need to be able to push those applications across the same matrix as an important part of the architecture.

In other words, true application portability will also be a central differentiator between in the future and past of data. Applications of old were silo’ed, vertically integrated, proprietary, highly structured and in many cases monolithic. This created a leverage that saved a lot of cost for IT.  But modern data applications must be able to access broad amounts of data-at-rest and data-in-motion that can be very loosely correlated in real time and applied to machine learning analysis — at the edge and everywhere in between.  The modern data application, will need to be highly portable, containerized and connected too.

Welcome to the future of data.

 

Sign up for the free insideAI News newsletter.