Building advanced RAG systems: Part 3

Facebook
LinkedIn

January 13, 2025

Semantic matching is very effective when users understand how to properly prompt an LLM. However, we need to build systems that also serve users who haven’t developed this skill yet.

The power of summaries

Disclaimer: This series explores concepts and creative solutions for building Advanced Retrieval Augmented Generation (RAG) Systems. If you’re new to RAG systems, I recommend exploring some introductory materials first. Microsoft’s Introduction to RAG in AI development and Google Cloud’s free course on Retrieval Augmented Generation are good places to start.

Welcome back to my deep dive into advanced RAG systems. In my previous posts, I explored document splitting and handling visual elements. Now, we’re going to discuss a technique that can significantly enhance your RAG system’s understanding and response quality – document summaries.

The context challenge

To better understand how summaries can help, we need to first address a fundamental limitation of RAG systems. When a RAG retriever return matches from the semantic search algorithms, it often misses some relevant context specific information. This happens because the document retriever focuses narrowly on finding document chunks with text segments that match the specific wording of a user’s prompt. Simply put, if the user doesn’t provide sufficient detail in their request the retriever will struggle to return specific information and tend towards broad non-specific information. Semantic matching is very effective when users understand how to properly prompt an LLM. However, we need to build systems that also serve users who haven’t developed this skill yet.

Consider a simple question: ‘What color is the sky?’ While the conventional answer might be ‘blue,’ this response lacks context. Are we discussing the sky’s general appearance? Its current color at a specific location and time? Or do we mean the sky on the planet Venus? Inexperienced chatbot users will often phrase their questions narrowly. Most will assume a chatbot understands its operational context, just as they would when speaking to a human. When developing RAG systems, we need to account for these implicit contextual assumptions.

The power of document summaries

One approach to improving the quality of results is providing document summaries. More specifically, if one or more chunks are retrieved from a document by the retriever, we also insert a complete summary of the document along with the chunks. Let me illustrate this with a real-world scenario I encountered:

Note: The data is anonymized, and the retrieved chunks and summary have been simplified for brevity.

User query: Why did project Aurora overrun its budget?

Retrieved RAG data:
– The project exceeded its projected budget by 40%.
– The development team exceeded their projected hours by 127% in Q2.
– Sarah implemented an aggressive sprint schedule to make up for lost time.

LLM response:
The project exceeded its projected budget by 40% primarily because the development team exceeded their projected hours by 127% in Q2.

While technically accurate, this response misses crucial context. The user asked why which implies some analysis of the data not simply the figures as stated. Now let’s see what happens when we add an LLM generated document summary:

Document summary added:
Project Aurora was a software migration project launched in 2021. The original
technical lead left unexpectedly in March. The new lead had to spend two months learning the legacy system, which wasn’t accounted for in the original timeline or budget. Despite the team’s best efforts to accelerate development afterward, the project exceeded its budget by 40%.

Enhanced LLM response:
The project exceeded its projected budget by 40% primarily because the original technical lead left unexpectedly in March, necessitating the new lead to spend two months learning the legacy system. This delay was not accounted for in the original timeline or budget. Despite the team’s efforts to accelerate development, they were unable to compensate for the earlier setback, leading to the budget overrun.

With the summary included, the LLM can provide a more complete and nuanced explanation that captures the root cause of the budget overrun, not just its symptoms.

Implementing document summaries with LLMs

When implementing document summaries in a RAG system, we generate them during the initial document ingestion phase. Each complete document is processed by an LLM acting as a “Summary Agent.” The Summary Agent is tasked with creating concise, well-structured summaries that capture essential information contained in a document and the document’s overall context. Summaries are not vectorized but rather stored in a separate database. The retriever collects the summary when document chunks from that document are semantically matched with a user prompt.

The summary generation process

The quality of your summaries heavily depends on providing clear, consistent instructions to your Summary Agent. Your prompt should instruct the LLM to focus on:

Main topic and primary subjects of a document.
Key events and their chronology.
Critical decisions and their impacts.
Essential relationships between events/decisions.
Important numerical data or metrics.
Optimize for brevity.

Common pitfalls to avoid

The most common issue I’ve encountered when implementing document summaries with RAG is overly verbose summaries. This can significantly increase your input token count when the retriever references multiple documents in a single prompt.”In order to control the cost of LLM queries it is important to enforce short summaries and constrain the maximum number of documents retrieved for a single prompt.

When integrating summaries with your RAG system, focus on finding the right balance. You want to provide enough context through the summary without overwhelming the LLM with redundant information. The goal of adding summaries isn’t to make responses more accurate but rather to make responses more contextually aware.

Document summaries are a powerful tool for enhancing RAG system responses. However, their effectiveness depends heavily on careful implementation. By focusing on consistent formatting and capturing only essential information you can significantly improve the contextual understanding of your RAG system’s responses.

Looking ahead

In the next post, I’ll explore the power of metadata in RAG systems. I’ll examine how metadata can enhance document retrieval and optimize semantic search results. Finally, I’ll demonstrate how metadata can be used to benchmark RAG systems and make chatbot behavior more explainable.