The importance of environmental, social, and governance (ESG) programs has increased since the COVID-19 pandemic. Spurred by societal transformation like climate change, shifting public values, and declining political trust, companies are forced to react to shifting demands from employees, customers, regulators, and investors. According to data from Gartner, 85% of investors now consider ESG factors as part of their investment strategy. [1] Management at the largest companies are responding. In the fourth quarter of 2020, nearly 130 companies mentioned ESG on earnings calls, an increase from 80 the previous quarter. [2]
ESG is no longer discretionary for managers- it’s a mandate. As stakeholder priorities change, ESG plays a role in defining companies’ overall strategy, impacting diverse operational areas like customer support, supply chain management, and IT and security operations. As society demands better visibility and accountability into how companies use data, there is a corresponding demand for better privacy controls, tighter access rights, and compliance with national and local regulations.
The cost of getting it wrong is steep. Equifax paid $575M for exposing data on nearly 150 million customers, the largest fine to date. [3] Google and H&M paid fines for GDPR violations of $56.6M and $41M, respectively. [4] Fines for GDPR violations are increasing. Research from DLA Piper found that between January 2020 and January 2021, fines increased 40% and data breach notifications rose 19%. [5]
All of this puts pressure on companies to improve their data governance programs. However, data governance is challenging for enterprises. First, there’s no widely accepted definition for what data governance encompasses. For some, it’s synonymous with data quality. This falls short of what stakeholders demand today. Another popular view positions data governance as data fit for use by data consumers. This is too vague and subjective. If we want to align data governance with today’s ESG priorities, we need a compliance-focused approach. Modern data governance programs must ensure specific data sets are available and remain in compliance with regulatory requirements, while supporting effective reporting and integrated customer management.
Structuring a data governance program around transactional data is a challenge, but governing observability data is close to impossible. The sources of transactional data are largely fixed. There are only so many ways a customer can place an order or initiate a return, for example. Transactional data represents steps in a defined business process, making its governance understandable. None of this holds true with observability data.
Observability data is the collection of metrics, events, logs, and traces (aka MELT) emitted by your applications, infrastructure, and hardware distributed across data centers, public clouds, and edge deployments. It exists in hundreds or thousands of different formats, with volumes that dwarf transactional data. It’s also inconsistent. One developer may strew detailed logging statements throughout her code. Another may prefer a terse approach, using internal lingo only consumable by a specific person or team. Yet another might leave on debug logging during a production push, embedding terabytes of personally identifiable information (PII) into events and logs stored in lightly protected cloud object storage. Observability data is a mess.
Compounding the problem is the current state of the observability and monitoring industry. These solutions rely on a dated model of restrictive data silos, where one observability source feeds a single destination. This limits visibility and access while driving up costs and compliance risk. Emerging standards, like OpenTelemetry, promise solutions, but the same vendors that created the problem are creating the standards, leaving little hope for meaningful change.
Meeting your company’s ESG goals requires a different approach to observability data governance. This approach starts with simplifying your data pipelines around a single, strategic control point. From there, you need context-aware data processing and routing and the alerting when observability data changes. Adopting an observability pipeline accomplishes each of these objectives.
Simplify data pipelines
Today’s observability data pipelines are point-to-point affairs. Common scenarios involve some kind of software agent running on a server or application and sending data to a specific data platform. These data platforms are things like log analytics platforms, time-series databases, or application performance monitoring (APM) tools. If an agent isn’t involved, a process collects data from remote sources, loading it into a destination data platform.
There are multiple challenges with this legacy setup. Each agent and platform have their own rules for masking, filtering, or routing data. These rules are inconsistently applied, leading to unpredictable governance gaps. Also, new platforms require new agents. While this sounds innocuous, agents must be deployed across hundreds of thousands of servers, resulting in massive overhead. They also expand your security exposure footprint and degrade performance on destination servers. Agents are not free.
By simplifying your data pipelines to a single strategic control point, like an observability pipeline, you can consume data from multiple agents and route data to multiple destinations. This eliminates the need for point-to-point data connections, offering a single interface for data governance of observability data. New tools can be deployed using existing agents, lowering the management burden for your team and improving security and performance.
Build context-based routing for improved control
The next step is building rules for how data should be handled based on where it came from, where it’s going, and what it contains. This allows you to comply with a range of regulations, like CCPA and GDPR, among others. You may not want certain data sources to leave or enter specific jurisdictions. Log data with accidental PII, like if someone deploys an updated application with debug logging, may need additional redaction or routing to a tool for remediation.
Because a centralized point sees all of your observability data, it gives users a single place to define these rules based on the context of the data.
Flexibility
Finally, observability data governance needs to be flexible. Regulations and compliance requirements are constantly changing. Compounding this is the changing nature of observability data. There’s always a new platform, format, or tool to deploy. Data governance needs to be agile and flexible to adapt to changing environments. A strategic point for all observability data allows data stewards to react in minutes or hours instead of weeks or months.
Conclusion
Observability data comprises the logs, events, metrics, and traces that make things like security, performance management, and monitoring possible. While often overlooked, governing these data sources is critical in today’s enterprises. The current state of observability data management is, at best, fragmented and ad hoc. By adopting an observability pipeline as a key component in your observability infrastructure, you can centralize your governance efforts while remaining agile in the face of constant change.
Evidence
- Gartner, “ESG Expectations and Corporate Purpose,” June 2021
- https://insight.factset.com/more-than-one-in-four-sp-500-companies-cited-esg-on-earnings-calls-for-q4
- https://www.csoonline.com/article/3410278/the-biggest-data-breach-fines-penalties-and-settlements-so-far.html
- https://www.tessian.com/blog/biggest-gdpr-fines-2020/
- https://www.dlapiper.com/en/us/insights/publications/2020/01/gdpr-data-breach-survey-2020/
About the Author
Nick Heudecker is the Senior Director of Market Strategy at Cribl. Prior to joining Cribl, he spent over seven years as an industry analyst at Gartner, covering the data and analytics market. With over 20 years of experience, he has led engineering and product teams across multiple successful startups in the media and advertising industries.
Sign up for the free insideAI News newsletter.
Join us on Twitter: @InsideBigData1 – https://twitter.com/InsideBigData1