Overwhelmed by Data? Here’s How to Get Control of It

In this special guest feature, Amnon Drori, Co-founder and CEO of Octopai, discusses how companies can gain visibility and control over their data lineage by leveraging metadata and how this will make GDPR compliance a manageable task instead of an impossible pursuit. Amnon has over 20 years of leadership experience in technology companies. Before co-founding Octopai he led sales efforts at companies like Panaya (Acquired by Infosys), Zend Technologies (Acquired by Rogue Wave Software), ModusNovo and Alvarion, and also served as the Chief Revenue Officer at CoolaData, a big data behavioral analytics platform. Amnon studied Management and Computer Science at the Open University of Tel Aviv.

Big businesses—and smaller ones too—today generate reams of big data. And big data will only get bigger; in 2017, there were nearly 9 billion devices connected to the internet, a number that will exceed 20 billion by the end of this decade, Gartner estimates, and all of them are supplying endless amounts of data.

Indeed, companies have gotten very good at collecting data—but they aren’t as good at using it effectively. It’s estimated that anywhere from 60% to 95% of data collected by companies just lies around collecting dust. But considering that analytics today can do wonders with collected data, providing insights on how to increase sales, what new products to deploy, how to cut administrative or manufacturing costs, and much more, it seems strange that organizations would let the data lie fallow, especially when that data can make a business more profitable. Gaining control of data should be a key business strategy for any organization. So why aren’t businesses in better control of their data?

One reason is that there is just too much data for them to handle. The average GB of data represents roughly 64,782 Word pages, and there are many, many gigabytes to search through. Over 2.5 quintillion bytes of data are produced worldwide each day, and even a mid-sized company today produces far more data than the largest enterprises of the last century. With numbers like those, just structuring the data in databases has become a major problem.

And even when the data is structured, metadata issues can make accessing it a headache. When data is categorized differently in various databases or containers—for example, when some birth dates are categorized European style (year/month/date) and some American style (day/month/year)—searching for that data becomes a major challenge. That’s because search programs written to capture data in one format won’t capture the data recorded in another format. The same goes for names (first/last vs last/first vs first/middle/last), addresses (5 digit zip vs. 9 digit zip), and more.

To benefit from the data they have, organizations usually utilize Business Intelligence (BI) teams to handle the searching and querying of data. BI teams are the organization’s experts on where to find data, how to write algorithms to find it, and how to structure it in a report that provides useful insights. But BI teams are only human—and these days they too are overwhelmed by the sheer amount of data being thrown at them. In addition, the metadata issue has become overwhelming, as data is located in dozens of different places (databases, Twitter feeds, Facebook logs, Salesforce reports, etc.)—and understanding what data an organization has is a huge, near-impossible task.

On top of that, organizations now can be punished for failing to get control of their data; GDPR rules require companies (even those not based in Europe) to immediately delete upon demand personal information they have about any European citizen or entity. Failure to do so—or to demonstrate that they can do so when asked to by regulators—could result in hefty fines for an organization. And considering that BI teams have to do much of this retrieval work manually, the vulnerability of organizations to both losing out on valuable insights—and the risks of finding themselves facing the European tax man—are significant.

The bottom line is that companies are missing out on a treasure trove of information they could be using to improve their business models, customer base, and standing in the industry. There’s a world of data out there, and each nugget of it can mean more money, more efficiency, and more benefits for organizations. But to realize those benefits, organizations need solutions that will address their “pain points”—specifically, getting a handle on the huge amounts of data they need to process in order to get to the actionable insights they seek, as well overcoming the challenges presented by metadata issues.

And while there are a range of solutions available for enhancing the work of BI teams, it’s likely that organizations will have the most success with automated data search systems, which have algorithms designed to handle both those issues. With the right solution, organizations will find that they can gain control over their data—and realize the benefits that lie within.

 

Sign up for the free insideAI News newsletter.