No matter how much technology organizations amass—regardless of its novelty, the business cases it supports or the competitive advantage it affords—it’s bound to become a liability without a holistic data strategy dictating its long term sustainability for enterprise objectives.
Whether implementing deep neural networks or Robotic Process Automation, cloud computing or any other knee-jerk reaction to the tumult of the past year for which technology became a panacea to do more with less, organizations must strategize how the data flowing through these channels furthers mission critical objectives from all angles: positively and negatively.
According to Profisee VP & MDM Strategist and former Gartner analyst Bill O’Kane, oftentimes they don’t. “The thing that sinks a lot of data management programs is they focus on data,” O’Kane indicated. “That’s usually compelling for a little while, but the trick is to focus on the outcomes you’re going to get.”
Concentrating on the outcomes of the data coursing through myriad enterprise technologies is a prerequisite for outlining data strategy which, although containing certain universal precepts, will ultimately differ according to each organization, its technology, and its respective ends.
Nonetheless, by focusing on core strategic elements of data quality, data modeling, and data provenance, in conjunction with contemporary necessities of remote access and data security, organizations can formulate the optimal data strategy accounting for any technology—or business climate.
Strategic Implementations
Conventional data strategy consists of offense and defense, which O’Kane termed “proactive and reactive”. The former is for capitalizing on data assets and increasing revenues; the latter’s based on mitigating risk and reducing expenses. Within this framework, however, there’s a crucial vector of progress organizations would do well to consider for their strategies. O’Kane described this motion as starting on the left with “things we do now, but badly. So, we’re inefficient, we’re inaccurate, or not good at whatever it is. To the right of it is things we can’t do at all.” This approach forces companies to base their strategy on areas of remediation and those to which they aspire. When acting on this information, it’s vital to not just solicit IT involvement but also engage business stakeholders “to tell me what it is that we can’t do that we should be able to, that our competitors can,” O’Kane remarked.
Data Quality, Data Modeling
Data quality will always remain central to data strategy, data management, and data governance. It’s the basis for trusting data to make accurate decisions. Organizations can improve data quality by centralizing the rules and transformations that standardize how data are represented, which necessitates surmounting situations in which data are “siloed or fragmented,” O’Kane specified. Data virtualization, data fabrics, knowledge graphs, and Master Data Management are all centralized methods for overcoming this issue. MDM “won’t let you enter bad data,” O’Kane denoted. “If you carry it to its logical conclusion, all your apps are talking to it in real time and checking for duplicates and accessing the same set of rules.”
Machine learning is a ubiquitous driver for data quality these days; without trusted inputs, it delivers dubious outputs. The foregoing centralization methods are effective because they standardize the data modeling involved, which is foundational for integrating data to eliminate silos. At its most granular level, rectifying these varying data models requires standardizing individual terms and definitions. According to O’Kane, such “semantic consistency” is an “esoteric benefit” for rectifying differences in data models, which MDM and credible centralization approaches provide.
Remote Access
Unifying data models, implementing data quality, and centralizing certain data management aspects rectify common areas of remediation while enabling companies to do things they aspire to, like machine learning or democratizing data and analytics for users of all levels. Contemporary data strategy trends prioritize remote access, which supports proactive and reactive concerns pertaining to risk management. “Remote access is the only way some of us have been able to work during these lockdowns,” reflected BackupAssist CEO Linus Chang. “Unfortunately, that also opens the door for hackers to do exactly the same.” Common attacks include ransomware and exfiltration, the latter of which exposes organizations’ private data on the public internet. “In terms of not losing data, preventing data deletions, and mitigating ransomware, obviously backups are key,” Chang posited.
Organizations must also account for users interacting with office materials on home devices. “You need to lockdown what they can download on that personal PC: what type of access they can have,” commented T2 Tech Group Program Manager Kyle Torf. This concept is aligned with least privileged access principles predicated on assessing overall access to data systems, what functionality specific users require, and which objects to attribute that functionality—at granular levels according to individual schema, in some instances. An emergent trend in remote work is the use of virtual desktops, which Torf characterized as desktops on remote servers that “presents that desktop to you as if it’s the desktop on your PC, but really that’s running on the server and all you have is a view to that desktop where you can access the applications and do your daily work on it.”
Data Security
Protecting data assets is the capital concern of reactive data strategies, particularly because of regulatory repercussions spawned from data breaches. Remote user access heightens security concerns about “people downloading things and manipulating the data at home,” Torf admitted. “So there’s a lot more from a data loss prevention standpoint.” Chang outlined six major data security principles for this facet of data strategy, including:
- Data Integrity: There are numerous approaches for preserving data integrity, including detecting changes in file systems or suspicious files. Heuristic rules can determine if files were altered, while other techniques bait malefactors by planting “strategically known sets of data, and if those get changed, then we can say something suspicious is going on and start file analysis,” Chang revealed.
- Authenticity: Identity and Access Management mechanisms verify who users are prior to accessing information systems. Competitive ones “ensure anything outside of the network goes through multi-factor authentication,” Torf detailed. “That can be run through different SIM providers and look at the logs to see any type of variations in activity that might lead to a breach.”
- Availability: This domain involves business continuity and restoration processes for data, meaning “is it there when I need it, or is going to take me a day to get it, or is it going to take me three months to recover it?” Chang disclosed.
- Utility: Data utility extends on the availability notion to ensure data for recoveries are usable. Depending on where data are stored—especially in the cloud—proprietary or outdated formats are unusable.
- Confidentiality: Preserving confidentiality is essential when data are on others’ servers, which is typically the case in the cloud. Client side encryption minimizes this risk and safeguards against certain malware attacks. “Encryption hides the type of data on a hard drive,” Chang noted. “Ransomware typically goes after data that it can identify. It will go after Word documents or Excel documents.” Certain encryption conceals these file types “like a decoy,” Chang observed. “If a bank doesn’t look like a bank it won’t get robbed.”
- Data Sovereignty: Data sovereignty pertains to where data are stored and if they’re on a company’s native soil—which isn’t always true in public clouds and can compromise regulatory adherence.
Data Lineage and Blockchain
Prudent data strategy encompasses data lineage to show regulatory compliance and denote areas of remediation. While provenance is typically based on metadata chronicling data’s enterprise journey, modern approaches do so utilizing distributed ledgers. According to Vendia CEO Tim Wagner, Blockchain technologies can pinpoint “the provenance of every file,” including who originated and disseminated “every piece of data you shared.” Blockchain’s immutable traceability actuates data governance policy while ensuring users comply with it. This vast capacity is applicable within and between organizations—which is pivotal for auditing.
Moreover, it typifies the remote access paradigm reinforced by current public health issues since “your auditor doesn’t have to be in the same room; they don’t have to be onsite,” Wagner added. “You can just use a program now.” This newfound reality’s ramifications are manifold. They automate provenance and auditing, provide a foundation for classifying data, and hint at a future in which all data lineage is as readily transparent. “Imagine if all your cloud data came with a little tag that told you this was originally from Acme Corporation, this was shared by a different company, and now you can go back and change your mind about [using it],” Wagner said.
The Solution
Organizations seek technology to do more with less during today’s turbulent business conditions. Data strategy elucidates what ‘more’ entails, whether it really can be achieved with less, and the longstanding consequences of leveraging various technologies to this end. It requires companies to uncover the intricacies of proactive and reactive approaches to improve what they do poorly, enabling them to achieve what they currently can’t.
About the Author
Jelani Harper is an editorial consultant servicing the information technology market. He specializes in data-driven applications focused on semantic technologies, data governance and analytics.
Sign up for the free insideAI News newsletter.
Join us on Twitter: @InsideBigData1 – https://twitter.com/InsideBigData1
Speak Your Mind