Data Curation: The Missing Ingredient to Self-service Analytics

In this special guest feature, Stephanie McReynolds, Vice President of Marketing at Alation, discusses how data curation – and more recently, automated curation technology that relies on artificial intelligence (AI) and machine learning algorithms – is emerging as the key, missing ingredient to creating a culture that understands and embraces data. With over 15 years of data infrastructure and application experience, Stephanie has a track record of bringing new technologies to market and into the hands of business analysts. Prior to Alation, she was instrumental in building the first marketing team at the self-service data preparation provider Trifacta. She previously held senior product management positions at a number of data companies including Teradata, Aster Data and Oracle.

Businesses today are throwing millions of dollars at business intelligence (BI) and analytics tools, with the market expected to be $18.3 billion in 2017. The goal of these investments is to get more people using data to make decisions. Given the wildly increasing volumes of data within their organizations and the proven strategic impact of data-driven strategies, this is an obvious investment. Yet, despite growing interest in enablement, efforts often fail to engage employees in actually making more data-driven decisions. One survey from PwC and the Economist Intelligence Unit showed that 58 percent of managers continue to base decisions on intuition, even when they have access to the appropriate data.

The problem is not a lack of analytic tooling, but a lack of knowledge and trust in data – what one might summarize as data literacy. How we process data can be complex. Multiple transformations and calculations often manipulate a data set before it hits the desk of a business decision-maker. People need to understand these nuances and how to accurately interpret and trust the data before they can rely on it to make decisions. Data curation – and more recently, automated curation technology that relies on artificial intelligence (AI) and machine learning algorithms – is emerging as the key, missing ingredient to creating a culture that understands and embraces data.

Curation is not a new concept. We have long been curating content as a society, starting with art and newspapers and evolving to social platforms such as Facebook and Pinterest. But the concept of data curation hasn’t penetrated many modern analytics organizations.

As Dave Wells of Eckerson Group pointed out recently, five years ago, data created from ERP, CRM and other systems comprised all of the data being analyzed – but that paradigm has changed dramatically over the last couple of years. Now, data created internally is a small sliver of the pie, and the amount of data originating from outside, uncontrolled sources has increased drastically, whether accessed or purchased externally, downloaded from open data sites or generated by partners. This has created a critical need for sharing the context of data created through these new sources – what we call data curation. Organizations must build in data curation and oversight if they have any hope of getting value from their data.

At the center of data curation is the notion of a data set: a mashup of data, rather than a physical store of data. Data sets are reusable components – anyone conducting analysis should share and expect data sets that they create to be re-used. Re-usability is key to self-service at scale. Companies such as GoDaddy and eBay have already embraced this approach to harvesting and distributing data for re-use, allowing any user to become a curator of data knowledge and resulting in higher productivity.

Data curation observes the use of data, focusing on how context, narrative and meaning can be collected around a reusable data set. It creates trust in data by tracking the social network and social bonds between users of data. By employing lists, popularity rankings, annotations, relevance feeds, comments, articles and the upvoting or downvoting of data assets, curation takes organizations beyond data documentation to creating trust in data across the enterprise.

All of these features contribute to the overarching goal of self-service analytics: giving individuals a one-stop shop for insights and data knowledge. The data marketplace is fast emerging as an alternative to basing decisions on intuition, offering access to a system where users can ‘shop’ for data like they would through an online store interface. Data curation supports this marketplace by letting people across an organization discover and share their own experiences using data and as a result, build out a richer data catalog.

At the same time, effective self-service requires control over data quality. Increasing volumes of accessible data are driving organizations to adopt curation as a way to manage data value, knowledge and quality. Alongside data governance and stewardship that help manage the use and condition of data, curation can help answer questions like: In which use cases can I trust this data? Who really knows how this data was prepared? What algorithms, reports and dashboards have used it in the past?

Combining these elements creates a single source of reference for your data-driven culture and amplifies the value enterprises gain from their data. According to Gartner’s February 2017 Magic Quadrant for Business Intelligence and Analytics Platforms, “By 2020, organizations that offer users access to a curated catalog of internal and external data will realize twice the business value from analytics investments than those that do not.” By building in data curation as a practice, organizations can complete their self-service environments – and generate real business impact.

 

Sign up for the free insideAI News newsletter.