AI for Legalese

Have you ever signed a lengthy legal contract you didn’t fully read? Or have you every read a contract you didn’t fully understand? Contract review is a time-consuming and labor-intensive process for everyone concerned — including contract attorneys. Help is on the way. IBM researchers are exploring ways for AI to make tedious tasks like contract review easier, faster, and more accurate.   

A team from IBM Research-Almaden led by Yunyao Li demonstrated at the 57th Annual Meeting of the Association for Computational Linguistics (ACL 2019) a new tool created in collaboration with researchers from the University of Michigan called HEIDL (Human-in-the-loop linguistic Expressions wIth Deep Learning). HEIDL is a natural language processing (NLP) tool that works with humans to both label training data and improve the machine-learned model. Details of the research are available in the paper “HEIDL: Learning Linguistic Expressions with Deep Learning and Human-in-the-Loop.” The demo video below uses contract language labeled by IBM attorneys to illustrate how NLP can classify key terms and phrases with input from subject matter experts — bringing us one step closer to understanding both deep learning and contracts. 

While the role of humans is increasingly recognized in machine learning community, representation of and interaction with models in current human-in-the-loop machine learning (HITL-ML) approaches are too low-level and far-removed from human’s conceptual models. The researchers demonstrate, a prototype HITL-ML system that exposes the machine-learned model through high-level, explainable linguistic expressions formed of predicates representing semantic structure of text.

In HEIDL, the human’s role is elevated from simply evaluating model predictions to interpreting and even updating the model logic directly by enabling interaction with rule predicates themselves. Raising the currency of interaction to such semantic levels calls for new interaction paradigms between humans and machines that result in improved productivity for text analytics model development process. Moreover, by involving humans in the process, the human-machine co-created models generalize better to unseen data as domain experts are able to instill their expertise by extrapolating from what has been learned by automated algorithms from few labeled data.

Contributed by Daniel D. Gutierrez, Managing Editor and Resident Data Scientist for insideAI News. In addition to being a tech journalist, Daniel also is a consultant in data scientist, author, educator and sits on a number of advisory boards for various start-up companies. 

Sign up for the free insideAI News newsletter.