3 NLP Trends Prime for Improvement in the New Year

In this special guest feature, David Talby, CTO, John Snow Labs, discusses how the new year is expected to be another pivotal one for NLP growth, and there are several trends driving this. Significant improvements made in three specific areas will affect NLP adoption over the next year – read on to learn about the ones to watch out for. David Talby, PhD, MBA, has spent his career making AI, big data and data science solve real-world problems in healthcare, life science, and related fields.

From healthcare and finance, to retail and customer service, natural language processing (NLP) technology has steadily gained steam across vertical industries over the last several years. While the technology is still in its infancy, major advances are being made fast, helping democratize the technology to make a real business impact for enterprise organizations realizing its value. Like 2020, the new year is expected to be another pivotal one for NLP growth, and there are several trends driving this. Significant improvements made in three specific areas will affect NLP adoption over the next year – here are the ones to watch out for:   

  1. Accuracy: The State-of-the-Art Keeps Improving 

One of the biggest challenges cited by respondents of a recent NLP Survey was accuracy. In fact, more than 40% of all respondents cited accuracy as the most important criteria they use to evaluate NLP libraries. Accuracy is vital for highly regulated industries such as healthcare and finance, and even small misinterpretations can have big implications. That said new academic research is helping providers of NLP technology challenge the status quo, making it possible for customers to apply new, highly accurate pre-trained models into production almost immediately.  

For example, John Snow Labs recently released its new named entity recognition (NER) and Classifier for adverse drug events (ADE), which enables users to more accurately detect and prevent harmful reactions to medications. Pre-trained with Clinical Biobert embeddings, the most powerful contextual language model in the clinical domain today, these models outperformed current state-of-the-art solutions, and will only continue to improve. Enhancements like this not only help save hospitals and care providers massive overhead costs, but they protect patient safety. And the best part? This can be applied to uncover important findings in any industry.  

  1. More Pre-Trained Models Become Widely Available 

With the advent of model hubs, cloud providers, and open-source NLP libraries, data scientists are spoilt for choice when it comes to the thousands of pre-trained models at their disposal. While it’s great that the NLP community and resources have grown so exponentially, the waters get a bit murky when it comes to choosing the right model for your specific NLP project – especially for machine learning novices. As a response to this, better faceted search, curated suggestions, and smarter ranking of search results are coming to fruition. This has been a focus for model hubs, like Hugging Face, where anyone can upload models, making it tough to find what you’re looking for. 

Shifting gears to NLP libraries, many provide support for their published models. This means that models and pipelines for each NLP task are regularly updated or replaced when a better state-of-the-art algorithm, model, or embedding becomes available. To take ease-of-use a step further, running many of the most accurate and complex deep learning models in history has been reduced to a single line of Python code, with Auto NLP on the horizon. This not only helps democratize the use of NLP among those just getting started, but makes it even easier for skilled data scientists to quickly find what they need and get to work – a win-win for all.  

  1. Better Support for Under-Represented Languages  

NLP is only as powerful as the languages it’s able to understand, and until recently, those languages were fairly limited to English, Mandarin, and a few others. If we want to see further adoption and implementation of NLP, it’s crucial to expand current offerings to suit multiple languages spoken across the globe. Thankfully, now, multilingual offerings are being released almost as fast as the accuracy of the technology is improving.  

Cloud providers, such as Google, now offer support for hundreds of languages, making NLP available data scientists worldwide. With practices, such as language-agnostic sentence embeddings, zero-shot learning, and the recent public availability of multilingual embeddings, this is becoming the norm. As mentioned in the aforementioned trend, more access to code and availability of many languages evens the playing field for NLP users, promoting a culture of diversity and inclusion in technology at a time when it’s needed most.  

2020 was an exciting year for NLP, and it’s clear that the advances and enhancements to the technology coupled with its wider availability and democratization, will only accelerate its application. While there are still challenges and growing pains that come with the territory, the potential for NLP is vast and the industry’s adoption of new capabilities is just taking off.

Sign up for the free insideAI News newsletter.

Join us on Twitter: @InsideBigData1 – https://twitter.com/InsideBigData1