How ML Powers Data Access Governance with Immuta & Databricks

In today’s fast-moving world, cloud data analytics can separate the industry leaders from the fast-followers. Data is one of the most valuable assets an organization has, yet data analytics hits a roadblock when the right mechanisms — including cloud data platforms that leverage machine learning — are not in place.

Without the appropriate tools to automate data access governance, data teams are often required to manually grant or restrict data access to individual users, in addition to developing a data pipeline that efficiently delivers secure, high-quality data for analytics.

Immuta’s native integration with Databricks leverages ML to help Databricks customers tackle data access governance obstacles and safely unlock data’s potential.

To zero in on how Immuta leverages ML to power secure, scalable data access governance in Databricks, let’s look at two common data access control challenges.

Manually identifying, tagging, and classifying sensitive data

The sheer volume of data sources available in today’s world and amount of data being generated and collected can be overwhelming for data teams — but not as overwhelming as sifting through all that data to identify sensitive information.

Detecting sensitive data is like trying to find a needle in a haystack. When it comes to data analytics, the consequences for not identifying sensitive data have far-reaching implications, from legal action to damaged reputations, in addition to being time intensive and risk prone.

Databricks customers can avoid these risks with sensitive data discovery. Immuta’s active data catalog works in the background so that as Databricks users register data sources with Immuta, they are automatically scanned and tagged for sensitive fields, such as PII or PHI. These same ML capabilities also apply when organizations create company-specific tags. For instance, if an organization must adhere to industry regulations, such as 23 NYCRR 500 in the financial services industry or ICD 501 in the federal intelligence community, data teams can create tags that Immuta’s algorithm remembers, applies to incoming data, and classifies appropriately for each subsequent data set registered through Databricks. Regardless of the compute engines being leveraged, Immuta has the intelligence to tag sensitive data, which reduces the burden on data teams and makes data pipeline curation more secure.

Managing role explosion with static access controls

Immuta’s research shows that 80% of data teams use role-based access control (RBAC) or “all-or-nothing” access control policies for identity and access management. However, the static nature of these data access controls works against everything machine learning aims to accomplish. Since RBAC requires new roles to be created for each new user or data set, mapping permissions to corresponding roles quickly becomes difficult to track and manage efficiently.

Immuta’s native integration with Databricks grants or restricts data access at query time using  attribute-based data access controls (ABAC). ABAC avoids the need to continuously create new roles and the human overhead that doing so requires because once attributes are defined, they are automatically applied. In practice, when Databricks data teams identify specific attributes, like user title, data location, or data type, Immuta remembers and enforces policies according to those attributes. This avoids having to create roles with each new user, data set, or regulation change, since ML aids in recognizing attributes across all new and existing data registered through Databricks. Additionally, when Databricks customers add multiple cloud data platforms to their cloud ecosystem, Immuta seamlessly carries over the same attributes and enforces policies accordingly, making policies are cloud-agnostic and the user experience more flexible.

As a result, Databricks customers report reducing the number of roles in their systems by 100 times when using Immuta’s attribute-based access control.

Leveraging ML, Immuta streamlines typically time-consuming data access governance processes, enabling Databricks customers to securely access their data faster and increase data engineering productivity by 40%:

Databricks and Immuta seamlessly implement automated data access governance in a best-of-breed data analytics platform, empowering data engineers and architects, data owners, and end-users to unlock more value from their data. To learn more about Immuta’s native integration with Databricks, download A Guide to Data Access Governance with Immuta and Databricks.