Data Quality, It’s Everybody’s Problem

In this special guest feature, Andrew Herman, President of CorSource, addresses data quality, a challenge facing all companies in the age of mass data collection. CorSource is a business intelligence services firm based in Portland, Oregon.

Andrew Hermann, President & Director, CorSource Technology Group

In the past couple of years, estimates for the cost of bad data for companies have ranged from “$5 million annually” (Forbes) to 30% of total revenue (Ovum Research). Clearly, there is a data problem, and most companies are ignoring it, even as they continue to collect exponentially more data from internal and external sources.

The data challenge

As technology begins to enter every facet of business and society, the amount of data available to harvest also grows. Data is flowing from internal business systems, typically legacy systems, SaaS systems, external online sources (think Big Data), and the Internet of Things. Many companies collect at least some of this data via warehouses, but collection is about as far as most have made it. The primary reason for gathering the data is to use it for analysis, but most users don’t trust their data enough to perform analysis that is actually useful.

In order to find out if data quality is an issue at your company, all you have to do is ask a few simple questions:

Do I trust my company’s data enough to put my job on the line for it?
Are there duplicate fields and field titles when I connect data sources together?
Are the results of reporting often far off from my estimates?

In simple terms, data quality is the discipline of ensuring that data is fit for business purposes. The key word is “discipline”, and companies approach it in different ways — in general there are three commonalities:

Someone in charge of company-wide data who is not necessarily housed in IT (these days companies are establishing Chief Data Officers).
A system of checks and balances to ensure once cleaned, data remains so.
Company-wide buy-in from end users who understand how they can be a part of the data solution going forward.

Why have more companies not addressed their information challenges with the number of technology resources available? Common reasons data quality has not been solved at most companies include confusion about whose ultimate responsibility it is to act as the data caretaker, and the fact that it’s such an enormous undertaking that companies do not know where to start. Many companies try to solve data issues in fits and spurts internally, which involves more spreadsheets and long closed-door data correction/management sessions (that involve a lot of shouting). In the end, the result is often something that only some people trust, and is not sustainable year after year. So, what’s a company to do?

Tackling the data challenge

For one, addressing data quality does not have to be an all at once approach. The most successful initiatives involve starting with a couple of systems, analyzing the data challenges, and setting up rules to fix and ultimately govern the data automatically. As results become apparent, one can keep adding other systems to the mix. At the start it’s important to have buy-in from top leadership to invest in data quality, with clear ROI goals and success criteria.

The process of data improvement involves four steps: profiling, cleansing, remediation, and governing. The data profiling stage involves a high level assessment of data (duplicates, field title issues, missing information, misplaced information and more), impact analysis (“How does this affect the company?”), and cause analysis (“What happens if we do not fix these problems?”). Cleansing is where the data is standardized across systems, enriched with missing and accurate information, and rules are set up for data quality firewalls. Remediation involves implementing the aspects of the four Master Data Management styles that work best for existing and new business processes, and taking a 360 degree, multi-domain view of the data. The last step is data governance, which involves overall stewardship of the data including monitoring and ongoing data correction. Combined together and repeated daily this ensures that going forward data will continue to remain accurate for all end users and decision makers.

Successfully tackling data quality is imperative, and achievable with a progressive, methodical approach. Your competitors are struggling with this very issue, and the question is whether this is going to remain your problem, or just theirs.

Do you have concerns about Data Quality? CorSource will host a related webinar entitled Your Data, Fit or Faltering? You can register now for the event, which takes place September 16, 9:30 AM Pacific.

Data Quality, It’s Everybody’s Problem

Sponsored Guest Articles

Webinar: Getting Started with Llama 3 on AMD Radeon and Instinct GPUs

White Papers

Powering Innovation: IDC Spotlight: Private AI Infrastructure in the Enterprise

Featured RSS Feed

More News from insideHPC

Data Quality, It’s Everybody’s Problem

Sponsored Guest Articles

Webinar: Getting Started with Llama 3 on AMD Radeon and Instinct GPUs

White Papers

Powering Innovation: IDC Spotlight: Private AI Infrastructure in the Enterprise

Join Us On Social Media

Featured RSS Feed

More News from insideHPC