Top Five Data Privacy Issues that Artificial Intelligence and Machine Learning Startups Need to Know

In this special guest feature, Joseph E. Mutschelknaus, a director in Sterne Kessler’s Electronics Practice Group, addresses some of the top data privacy compliance issues that startups dealing with AI and ML applications face. Joseph prosecutes post-issuance proceedings and patent applications before the United States Patent & Trademark Office. He also assists with district court litigation and licensing issues. Based in Washington, D.C. and renown for more than four decades for dedication to the protection, transfer, and enforcement of intellectual property rights, Sterne, Kessler, Goldstein & Fox is one of the most highly regarded intellectual property specialty law firms in the world.

Last year, the Federal Trade Commission (FTC) hit both Facebook and Google with record fines relating to their handling of personal data. The California Consumer Privacy Act (CCPA), which is widely viewed the toughest privacy law in the U.S., came online this year. Nearly every U.S. state has its own data breach notification law. And the limits of the EU’s General Data Protection Regulation (GDPR), which impacts companies around the world, are being tested in European courts.

For artificial intelligence (AI) startups, data is king. Data is needed to train machine learning algorithms, and in many cases is the key differentiator from competitors. Yet, personal data, that is, data relating to an individual, is also subject an increasing array of regulations.

As last year’s $5 billion fine on Facebook demonstrates, the penalties for noncompliance with privacy laws can be severe. In this article, I review the top five privacy compliance issues that every AI or machine learning startup needs to be aware of and have a plan to address.

1. Consider how and when data can be anonymized

Privacy laws are concerned with regulating personally identifiable information. If an individual’s data can be anonymized, most of the privacy issues evaporate. That said, often the usefulness of data is premised on being able to identify the individual that it is associated with, or at least being able to correlate different data sets that are about the same individual.

Computer scientists may recognize a technique called a one-way hash as a way to anonymize data used to train machine learning algorithms. Hash operations work by converting data into a number in a manner such that the original data cannot be derived from the number alone. For example, if a data record has the name “John Smith” associated with it, a hash operation may to convert the name “John Smith” into a numerical form which is mathematically difficult or impossible to derive the individual’s name. This anonymization technique is widely used, but is not foolproof. The European data protection authorities have released detailed guidance on how hashes can and cannot be used to anonymize data.

Another factor to consider is that many of these privacy regulations, including the GDPR, cover not just data where an individual is identified, but also data where an individual is identifiable. There is an inherent conflict here. Data scientists want a data set that is as rich as possible. Yet, the richer the data set is, the more likely an individual can be identified from it.

For example, The New York Times wrote an investigative piece on location data. Although the data was anonymized, the Times was able to identify the data record describing the movements of New York City Mayor Bill de Blasio, by simply cross-referencing the data with his known whereabouts at Gracie Mansion. This example illustrates the inherent limits to anonymization in dealing with privacy compliance.

2. What is needed in a compliant privacy policy

Realizing that anonymization may not be possible in the context of your business, the next step has to be in obtaining the consent of the data subjects. This can be tricky, particularly in cases where the underlying data is surreptitiously gathered.

Many companies rely on privacy policies as a way of getting data subject’s consent to collect and process personal information. For this to be effective, the privacy policy must explicitly and particularly state how the data is to be used. Generally stating that the data may be used to train algorithms is usually insufficient. If your data scientists find a new use for the data you’ve collected, you must return to the data subjects and get them to agree to an updated privacy policy. The FTC regards a company’s noncompliance with its own privacy policy as an unreasonable trade practice subject to investigation and possible penalty. This sort of noncompliance was the basis for the $5 billion fine assessed against Facebook last year.

3. How to provide a right to be forgotten

To comply with many of these regulations, including the GDPR and CCPA, you must provide not only a way for a data subject to refuse consent, but also a way to for a data subject to withdraw consent already given. This is sometimes called a “right to erase” or a “right to be forgotten.” In some cases, a company must provide a way for subjects to restrict uses of data, offering data subjects a menu of ways the company can and cannot use collected data.

In the context of machine learning, this can be very tricky. Some algorithms, once trained, are difficult to untrain. The ability to remove personal information has to be baked into the system design at the outset.

4. What processes and safeguards need to be in place to properly handle personal data

Privacy compliance attorneys need to be directly involved in the product design effort. In even big sophisticated companies, compliance issues usually arise when those responsible for privacy compliance aren’t aware of or don’t understand the underlying technology.

The GDPR requires certain companies to designate data protection officers that are responsible for compliance. There also record-keeping and auditing obligations in many of these regulations.

5. How to ensure that data security practices are legally adequate

Having collected personal data, you are under an obligation to keep it secure. The FTC regularly brings enforcement actions against companies with unreasonably bad security practices and has detailed guidelines on what practices it considers appropriate.

In the event of a data breach does occur, you should immediately contact a lawyer. Every U.S. state has its own laws governing data breach notification and imposes different requirements in terms of notification and possibly remuneration.

Collecting personal data is essential part of many machine learning startups. Lack of a well-constructed compliance program can be an Achilles’ heel to any business plan. It is a recipe for an expensive lawsuit or government investigation that could be fatal to a young startup business. So, a comprehensive compliance program has to be an essential part of any AI/ML startup’s business plan.

Sign up for the free insideAI News newsletter.