Data Sovereignty Extends Beyond Real Data

In this special guest feature, Omar Ali Fdal, CEO at Statice, discusses how data sovereignty does not obstruct innovation. Rather, it enables us to become even more independent and control our digital assets. And synthetic data has a big role to play in this transformation. Statice is a Berlin-based state-of-the-art data privacy technology provider for health, insurance and banking companies. As a former engineer, Omar was always motivated by building products that solve major problems. Prior to founding Statice, Omar worked as a Research Engineer in the field of Search & Data Mining for several companies such as the Amadeus IT Group, based in France.

92% of the Western world’s data is stored on US-owned servers. In other words, most European data is governed by systems outside of Europe. As a result, EU citizens lose control of their data, local law enforcement becomes less effective, and non-EU technology companies gain more economic and social control in the EU.

That is why the European Union has set a high priority on data sovereignty, the idea that data should be subject to the laws and governance structures of the country in which they are collected. In addition to highlighting the need to develop a competitive world-class digital economy, the European Council is giving particular attention to data security and AI.

Europe’s new policy orientation is increasing data protection, preserving sovereignty, maximizing technological innovation, and sharing data for public good. 

Data collaboration is hampered by fragmented regulatory landscapes

Interestingly, security, compliance, and data protection can be major roadblocks to innovation. The regulatory landscape for data protection is extremely fragmented. Laws vary from country to country, requiring businesses to establish policies protecting PII (personally identifiable information), managing data-related risks, and complying with their legal obligations. Over 120 countries have implemented some form of international privacy laws to protect citizens and their data as of last year.

Caption: Data protection laws of the world. Source: https://www.dlapiperdataprotection.com/ 

Different regulatory systems are inevitably at odds due to the fragmentation of the global regulatory landscape. This causes friction in data collaboration, which is one of the key aspects of innovation. 

The Schrems II case illustrates this problem well. The European Court of Justice ruled on July 16th 2020 that the EU-US Data Protection Shield, used by many companies to transfer data between the US and the EU, was invalidated because of concerns about surveillance by US law enforcement. Thousands of US companies used the Data Protection Shield to conduct transatlantic trade prior to Schrems II.

Innovation relies on exploratory uses of data

The situation becomes even more complex when we consider individual sovereignty over personal data. The GDPR, for instance, requires data controllers to inform data subjects about how and why their data will be used at the time of collection. The problem is that very often, data teams struggle with defining what data they need and how it will be used at the project outset.

“At the beginning of a project, it is extremely difficult to have strong arguments for and against different variations of data use,” says Dr. Sören Erdweg, Artificial Intelligence & Data Development at Provinzial, Germany’s second largest public insurer. “We do not know which data we intend to use at the outset […] to have a clear understanding of what we require for our model.” 

Innovating requires exploratory data use, but this conflicts with data sovereignty. There is clearly a disconnect between data sovereignty recommendations and best conditions for innovation. And this disconnect calls for a paradigm shift. 

We need insights, not secrets

In reality, we do not need personal secrets to gain insights. And real data is full of personal secrets. The good news is that there is an alternative. Gartner predicts that most AI systems will use synthetic data by 2030. AI will therefore rely more on artificially generated data than on data gathered from real people or real events. This is already the case for autonomous driving, where simulated driving data are used more often than actually collected real driving data. 

But what is synthetic data? 

Synthetic data is the outcome of artificial data generation. The new data set resembles the quality of the original data and retains the statistical distribution. This means synthetic data looks like and behaves like real world data.

Synthetic data can be created in two ways:

  • Based on the previous knowledge – if you know the laws that govern your model, you can use them to generate and simulate new data. For example, if you have customers over 20 who are female and have certain characteristics, you can use that knowledge to simulate data points artificially. 
  • Created directly from the real data, which is usually accomplished by using machine learning or artificial intelligence algorithms that learn the distributions and the relationships within the original dataset. Once the relationships are learned, you can create new records.

Synthetic data can help bridge sovereignty constraints and best practices for innovation in three ways. 

First, by improving the privacy guarantees for individuals. Instead of relying on highly sensitive real datasets containing personal information, synthetic data provides an alternative that is much less sensitive and where privacy is by design and not an afterthought. 

Secondly, most of the personal data today is held in the hands of a few tech giants or traded by a few data brokers. Synthetic data provides organizations with a more independent data generation and acquisition channel. This gives them fast access to datasets that would otherwise be expensive and time-consuming to collect.

Furthermore, with this, organizations can have better control over the data they utilize and feel more confident about sharing it across borders with their partners. 

Innovation requires adaptability. Today’s organizations must be resilient not only when it comes to legal practices but also when it comes to technology solutions that can make data exchange safer, smoother, and compliant.

Data sovereignty does not obstruct innovation. Rather, it enables us to become even more independent and control our digital assets. And synthetic data has a big role to play in this transformation. 

Sign up for the free insideAI News newsletter.

Join us on Twitter: https://twitter.com/InsideBigData1

Join us on LinkedIn: https://www.linkedin.com/company/insidebigdata/

Join us on Facebook: https://www.facebook.com/insideAI NewsNOW