What do rockets have to do with large language models?
By now, everyone has seen ChatGPT and experienced its power. Unfortunately, they’ve also experienced its flaws — like hallucinations and other unsavory hiccups. The core technology behind it is immensely powerful, but in order to properly control large language models (LLMs), they need to be surrounded by a collection of other smaller models and integrations.
As a rocket nerd and a graduate in aerospace, rockets feel like a good analogy here. Everyone has seen rockets take off and have been impressed with their primary engines. However, what many don’t realize is that there are smaller rockets — called Vernier Thrusters — that are attached to the side of the rocket.
These thrusters may seem like minor additions, but in reality, they’re providing much needed stability and maneuverability to the rocket. Without these thrusters, rockets won’t follow a very controlled trajectory. In fact, the bigger engines will certainly crash the rocket absent of these thrusters.
The same is true for large language models.
The Power of Combining Models
Over the years, AI practitioners have developed task-specific machine learning models and chained them together to perform complex language tasks. At Moveworks, we leverage several machine learning models that perform unique tasks to figure out what the user is looking for — from language detection, spell correction, extracting named entities, to identifying primary entities, and statistical grammar models to figure out what the user wants. This system is very powerful and works remarkably well.
First, it is blazing fast and computationally cheap. More importantly, this system is very controllable. When several different models come together to perform the task, you can observe which part of this stack fails or underperforms. That gives you leverage over the system to influence its behavior. However, it is a complex system.
In comes a large language model — like OpenAI’s GPT-4.
Enter GPT-4: a Game Changer
GPT-4 can be controlled via prompts provided to the model.
This means that you can give it a user query and ask it to do a variety of tasks against the query. To do this programmatically, there are tools — like Langchain — that enable you to build applications around this. So in essence, you end up with a single model to rule them all.
Not so fast.
LLMs like GPT-4 still lack controllability in their current state. There are no guarantees or predictability that the model will fill the slots correctly.
Does it understand your enterprise-specific vernacular well enough to be reliable? Does it understand when it might be hallucinating? Or whether it’s sharing sensitive information to someone who shouldn’t be seeing it? In all three cases, the answer is no.
At their core, language models are designed to be creative engines. They are trained on mass data sets from the internet, which means as an out-of-the-box model they are constrained to the data they’ve been fed. If they are given a prompt based on something they haven’t been trained on, they will hallucinate, or to the model — take creative liberties.
Take for example, looking up someone’s phone number in your organization. You may ask ChatGPT what Larry from accounting’s phone number is and it could spit out a convincing 10-digit number. But, if the model was never trained on that information, it’s impossible for the model to provide an accurate response.
The same is true for org-specific vernacular. Conference room names are a great example here. Let’s say your Toronto office has a conference room named Elvis Presley, but you’re not sure where to find it. If you were to ask ChatGPT where it can find Elvis Presley, it may tell you he’s six feet underground instead of pulling up a map of your Toronto office.
Further, based on the prompt size, GPT-4 calls are expensive and have much higher latency. That makes them cost prohibitive if used without care.
Controlling the power of LLMs
Much like rockets, LLM-based systems have their primary engines — the GPT-class of models that offer impressive capabilities. However, to harness this power effectively, we must surround them with what I like to call our version of Vernier Thrusters — a collection of smaller models and integrations that provide the much-needed control and verifiability.
To avoid misleading and risky outputs, the model needs access to company-specific data sources — like HRIS systems and knowledge bases, for example. You can then build “vernier thrusters” by fine-tuning the model on internal documents, chaining model APIs with data lookups, and integrating with existing security and permissions settings — a class of techniques considered retrieval augmentation. Retrieval augmentation won’t eliminate hallucinations. So you can consider adding a class of models that can verify that the outputs produced by LLMs are based on facts and grounded data.
These complementary models provide oversight on core model imagination with real-world grounding in organizational specifics, as well as verifying the outputs of these models.
With the right vernier thrusters in place, enterprises can launch these high-powered rockets off the ground and steer them in the right direction.
About the Author
Varun Singh is the President and co-founder of Moveworks — the leading AI copilot for the enterprise. Varun oversees the Product Management, Product Design, Customer Success, and Professional Services functions, and is committed to delivering the best possible AI-powered support experience to enterprises worldwide. He holds a Ph.D. in Engineering and Design Optimization from the University of Maryland, College Park, and a Master’s degree in Engineering and Applied Mathematics from UCLA.
Sign up for the free insideAI News newsletter.
Join us on Twitter: https://twitter.com/InsideBigData1
Join us on LinkedIn: https://www.linkedin.com/company/insidebigdata/
Join us on Facebook: https://www.facebook.com/insideAI NewsNOW