Embrace Innovation While Reducing Risk: The Three Steps to AI-grade Data at Scale

The pace of GenAI innovation is putting transformative ways of doing business within reach – but also exposing data gaps that increase AI’s risks and potential downsides. While GenAI is helping many organizations unlock operational efficiencies, according to research, a smaller percentage are realizing its potential to change the way they innovate and develop products.

There are some very public examples of the costly or embarrassing outcomes when AI projects fail, usually tied to systemic data management challenges. Even the most advanced organizations can suffer from an inability to provide the “right” data to models, and according to researchers at RAND, many lack the necessary infrastructure to work with and manage data.

Realizing AI’s potential depends on feeding machine models a large and reliable supply of data that’s of sufficient quality and capable of being managed and governed. Too many models are raised on a poor diet, making the adage “garbage-in-garbage-out” more resonant than ever.
Data professionals are working with practices and procedures that pre-date AI and are struggling to meet these requirements. So how should data pros best shape up for the AI race? The first step is recognizing the problem exists. Here are three leading indicators:

  • Heavy reliance on manual data management. This is where engineers are rolling up their sleeves to build and maintain data pipelines, standardize and classify data, and find and fix problems. It is time consuming, inefficient and unreliable – and no amount of additional human resources will solve the problem.
  • Lack of data visibility. Most of the data flowing through organizations is dark – it lacks detail about ownership, source, or who has modified it. This introduces significant risk of potentially feeding incomplete or inappropriate data into models, and possibly breaching intellectual property and data protection rules. It also makes it difficult to establish accountability for regulatory compliance.
  • Data can’t be operationalized as a safe, reliable or re-usable corporate asset. This can manifest itself in a number of ways but leading indicators include: difficulty in finding data on a consistent or repeatable basis, pushing up project costs and slowing delivery; difficulty setting and enforcing rules on data use and protection, creating regulatory and compliance gaps; and an inability to manage or move data based on priority and value, increasing storage and infrastructure costs.

If any of this sounds familiar, there’s a proven three-step course of action to getting data-fit for AI.

First, eliminate at every stage the amount of overhead involved in preparing data. That means leveraging automation and building an environment that’s capable of accessing, discovering, classifying and quality testing both unstructured and structured data regardless of its location or format. Deploy tools and techniques that speed and streamline delivery, such as pipeline templates, no matter the scale of the computing environment.

Next: establish insight and control. Automatically classify and label data at the source, using organization relevant terminology that will follow alongside the data as it moves through projects. Use a catalog capable of understanding and acting on this information – of capturing the provenance of data and its journey while setting and implementing rules on access and protection at the metadata level. A catalog of this caliber brings knowledge and power – making quality data readily accessible, streamlining projects, and ensuring it’s consumed responsibly according to policies and rules for security and governance.   

Lastly is efficient data delivery. Brush aside the manual processes that can be prone to error at scale, that heap workloads on engineers and result in poor-quality data. Automation frees up resources and sets the conditions for consistently delivering AI-ready data while saving IT teams integration headaches and avoiding technical debt.

GenAI has proved to be the calling card of modern AI. But turning pilots and pockets of deployment into game-changing outcomes means laying solid foundations for data access, data quality, data availability, data delivery and governance. Doing so sets the foundation for company-wide AI-grade data fitness.

About the Author

Kunju Kashalikar, Senior Director of Product Management at Pentaho. Kunju is a senior leader with deep expertise in product development, data management and AI/ML technologies. He has a proven track record of delivering products and solutions in the hybrid cloud in data management and edge, leveraging design thinking. He is a product management leader in the Pentaho platform.

Sign up for the free insideAI News newsletter.

Join us on Twitter: https://twitter.com/InsideBigData1

Join us on LinkedIn: https://www.linkedin.com/company/insideainews/

Join us on Facebook: https://www.facebook.com/insideAINEWSNOW

Check us out on YouTube!