NOTES :: Retrieval Augmented Generation (RAG)

A solution model pattern in how to leverage large language models.
Systems that use LLM but on their own content.

Start

Large language models can be inconsistent. Sometimes they nail the answer to questions, other times they regurgitate random facts from their training data. If they occasionally sound like they have no idea what they’re saying, it’s because they don’t. LLMs know how words relate statistically, but not what they mean. Retrieval-Augmented Generation (RAG) is the process of optimizing the output of a large language model (LLM), so it references an authoritative knowledge base outside of its training data sources before generating a response. RAG extends the already powerful capabilities of LLMs to specific domains or an organization’s internal knowledge base, all without the need to retrain the model.

Question Vector

When the user asks AI or a model a question, it is turned into a series of numbers, that best represent that question in numerical form. This number is then used to “match” the question vector to the closest “answer” vector or vectors.

In our very recent past (at the time of this post) A search engine used to “travel the internet” for you, when you asked it a question. It would present results back to you in a way that it thought was most relevant to the question you had. These could have just been a list of links, or links with a short description, and eventually we could retrieve pictures with the text and give the user the option of choosing which “answer” they wished to explore or learn more about. They were able to engage more and more senses over time.

So a large language model does all of this and then…..

it clicks on the links, and reads the text, and works through its magic to directly answer the question you asked.

So if I searched for “How do I build a kitchen cabinet”

It would have given me links to how to sites, maybe some diagrams, some pictures, etc.
Now, a LLM, would give you the diagrams and maybe the list of parts you would need. How much wood to buy, and where to make cuts so that things are sized right. The more specific and direct your question, the better the results can be for you.

So what if you want to use a closed source of data….?

Maybe it is your IT ticket system?
Maybe it is your Enterprise Architecture Project Board?
What if it is your internal sales data?

Can we present relevant information to the user, based on the personal data that we have?

We would need to add a set or sets of instructions to the bot along with the prompt from the user. IN other words, we would refine or filter the user prompt to be focused on a parcticular set of data, or a specific function.

User Prompt :: I need help with my PCs Internet

Instruction :: This is a customer service prompt, meant to answer employee questions from the internet that center around their hardware questions….. (it’s a bad example, but hopefully you get the point.)

Additional :: This is the knowledge base used by the technical support teams, with all the process and procedure data.

All of the content needs to be broken up into chunks. Those chunks are sent to a large language model, and then they are turned into a vector. Each chunk will have a vector, which is just a series of numbers. A numeric representation of that paragraph, or a hash if you will. But these hashes will be more like an SSDeep hash, where similar chunks of data will have similar looking hashes., so that they can be grouped together.

A comparison is done between the vectors of the questions, with the vectors of your content. Then return the ‘x’ number of data chunks that are closest to the question vector.

Retrieval You are retrieving the relevant data from your content

Augmented You are augmenting the AIs generation capability based on the information contained within your data.

Generation

Links ::
https://www.youtube.com/watch?v=u47GtXwePms
https://research.ibm.com/blog/retrieval-augmented-generation-RAG
https://blogs.nvidia.com/blog/what-is-retrieval-augmented-generation/
https://aws.amazon.com/what-is/retrieval-augmented-generation/#:~:text=Retrieval%2DAugmented%20Generation%20(RAG),and%20useful%20in%20various%20contexts