Exploring retrieval-augmented generation (RAG)

I wanted to explore Retrieval-Augmented Generation, so I set up Romeo, a little Flask app that only knows the text from Romeo and Juliet.

Code

App

What is Retrieval-Augmented Generation (RAG)?

Retrieval-Augmented Generation (RAG) combines known information with large language models to generate contextually accurate responses. This approach addresses several common challenges, including:

Improving search results
Providing assistance in legal or medical contexts
Summarizing large collections of documents

Large language models are only as current as their training data a particular point in time. RAG allows for dynamic updates to information, which can be particularly valuable when dealing with rapidly evolving topics. Additionally, large language models do not have direct access to private documents, which can be uploaded to an RAG database, providing a secure and controlled means of incorporating sensitive or proprietary information into the generation process.

Key Components of RAG

Retrieve Data

Gather relevant data from a predefined source, forming the knowledge base. In my case, retrieve the text of “Romeo and Juliet” from Project Gutenberg.

Parse (Chunk) Data

Segment the retrieved data into manageable chunks, such as sentences or paragraphs. Use configurable parameters to determine chunk size and overlap for optimal retrieval. I use my Text-Extractor

Generate Vectors

Utilize an embedding model that excels at capturing the semantic nuances of text. I used nominal-embed-text. It is important to note that the same model must be used throughout the process to ensure consistency of results.

Save to a Vector DB

Store the generated vectors in a vector database. I use ChromaDB deployed from my home-server. It is easy to manage, open-source, and optimized for storing and querying high-dimensional vectors.

Receive Request

Have some way for a user to submit a request. In my case, a simple Flask App with a form.

Generate Vectors for the Request

Convert the user query into a vector representation using the same embedding model, ensuring compatibility with the stored data vectors.

Retrieve Closest Vectors from the Vector DB

Query the vector database for the most similar vectors, retrieving the most semantically relevant chunks.

Generate Final Response Using Retrieved Information

Use the retrieved data chunks to generate a contextually accurate response. In Romeo, I use llama3, where I combine the information returned from the graph DB to produce a coherent answer.

Grading the model

Starting with 10 questions from a quick internet search for Romeo and Juliet Quiz.

Fill in the blank: “These violent delights have violent ends, And in their triumph die, like _____, / Which as they kiss consume.”
To which city does Romeo flee after he is banished for killing Tybalt?
How old is Juliet?
Who gives a famous speech about a fairy named Queen Mab?
Why is Friar John unable to deliver a letter to Romeo informing him of Friar Laurence and Juliet’s plan to simulate Juliet’s death?
Which characters are involved in the fight that gets Romeo banished?
Why doesn’t Juliet want to marry Paris?
What qualities describe Romeo?
What steps are crucial to making Friar Lawrence’s plan work?
What are significant themes in the play?

Starting with two students, Dave and Andy. Dave and Andy both use 100 sentences to answer each quiz question, however Dave has 5 sentences per chunk, and uses 20 chunks and Andy has 100 sentences per chunk and uses 1 chunk.

I had a LLM compare their answers.

Comparison of Dave and Andy’s Responses to “Romeo and Juliet” Quiz:

Strengths and Key Differences:

Dave: Dave showcased a strong command of the text, excelling in:

Accurate Recollection of Plot Details: He correctly identified specific plot events and the roles of characters within these events.
Direct Quotations: Dave accurately filled critical quotations from the play, demonstrating his attention to the language used by Shakespeare.

Andy: Andy demonstrated a good grasp of broader thematic elements and character analysis:

Character Descriptions: He effectively captured the emotional depth and romantic nature of Romeo.
Thematic Insight: Andy provided a nuanced understanding of the play’s themes, such as love versus hate and fate.

Educational Feedback:

Dave’s precision with factual details suggests a deep textual engagement, but he could benefit from exploring the play’s thematic implications more deeply. Andy’s strength in thematic interpretation is evident, yet he would benefit from a closer study of specific plot details to avoid inaccuracies.

Both participants display commendable knowledge, with Dave excelling in detail-oriented accuracy and Andy in thematic and character analysis. By addressing their respective areas for improvement, each could achieve a more rounded understanding of “Romeo and Juliet.”

Lessons Learned

Chunk Size Matters: Smaller chunks enable more precise retrieval of detailed information, as demonstrated by Dave’s responses, while larger chunks provide a broader understanding of themes and character insights, as observed in Andy’s answers.
Vector Database Efficiency: Vector databases are optimized for storing and querying high-dimensional vectors, which significantly improves performance.
Training and Retraining: Continual improvement is nescessary for a RAG system. Planning for retraining is essential; which includes preserving the original data and its associated metadata.