In the Large Language Model space, one acronym is frequently put forward as the solution to all the weaknesses. Hallucinations? RAG. Privacy? RAG. Confidentiality? RAG. Unfortunately, when asked to define RAG, the definitions are all over the place.
RAG stands for Retrieval Augmented Generation. At the core, RAG is a simple concept. However, it is a simple concept that can be implemented a hundred different ways. …and that might be the problem. RAG is often defined as a specific solution – while in reality, RAG is an approach that has multiple implementations.
At the core, RAG will take an open-ended question – that relies on the training data of the model to give you an answer – and turn that into an in-context question. An in-context question includes everything needed to answer the question – within the question itself.
For example, an open-ended question can be: “When was Obama the President of the USA?”. To turn this into an in-context question, a list of the periods of all US Presidents might be supplied, together with the question: “This is a list of the presidents of the US and the periods they were in power, use this to answer when Obama was the President of the USA?”
Another example is the open-ended question: “Does my travel insurance cover rock climbing in Chile?” There is no way an LLM has any knowledge of what insurance company you have, what particular insurance that you have, and the specific policy of that insurance. However, if one retrieved the appropriate policy, and supplied the policy as context together with the question, “Does this specific insurance policy cover rock climbing in Chile?” Then they will have turned the open-ended question that can’t be answered by a large model, into a specific question that can be answered.
In the process of turning these open-ended questions into in-context questions, there needs to be an orchestration. This orchestration includes a retrieval step, where content that might include the answer, is retrieved and added as a context to the question. Then it’s sent to the LLM.
The retrieval step can be implemented as a simple keyword search in a search engine, or a semantic search in a vector database – or even a series of steps where content is retrieved from a knowledge graph. RAG is just any solution adding the retrieval step. The important aspect here is that the data being retrieved does not need to be part of any training data for the large language model.
A lot of business logic can be added to that retrieval step as well. Searches and queries can be constrained to a specific set of documents, or according to privacy and confidentiality rules. The retrieval step can be local and on-premise, while the answer portion can be using APIs or cloud-hosted models. Just keep in mind that the retrieved content must be sent together with the instruction to the large-language model.
The hype around RAG as an eliminator of LLM weaknesses is partly warranted. One can greatly reduce hallucinations, one can use their own enterprise data and answers can be more grounded in factual data. However, different categories of questions are better solved by different implementations of RAG. Enterprises need to evaluate what kind of questions they expect, and what kind of source material contains the answers, and then they can build what can become complex orchestrations to implement the appropriate patterns for RAG. Suddenly, it’s a project that needs domain experts, developers, data scientists, quality assurance, testing, and lifecycle management. It becomes an IT project.
It’s also not the end-all of useful LLMs. RAG can still hallucinate, it just hallucinates a lot less. RAG is also a question-and-answer technique – it can’t magically call APIs, create plans, or reason. Outside of Q&A, there are other techniques that might yield better results.
About the Author
Magnus Revang is Chief Product Officer of Openstream.ai. With 25 years of rich experience in UX strategy, design, and groundbreaking market research in artificial intelligence, Magnus is an award-winning product leader. As a recognized thought leader in the fields of UX, AI, and Conversational Virtual Assistants. Magnus Revang leads Openstream’s Eva™(Enterprise Virtual Assistant) platform working with customers and driving the future of the platform.
Sign up for the free insideAI News newsletter.
Join us on Twitter: https://twitter.com/InsideBigData1
Join us on LinkedIn: https://www.linkedin.com/company/insidebigdata/
Join us on Facebook: https://www.facebook.com/insideAI NewsNOW