Innovative solutions driving your business forward.
Discover our insights & resources.
Explore your career opportunities.
Learn more about Sogeti.
Start typing keywords to search the site. Press enter to submit.
Generative AI
Cloud
Testing
Artificial intelligence
Security
January 13, 2025
Disclaimer: This series explores concepts and creative solutions for building Advanced Retrieval Augmented Generation (RAG) Systems. If you’re new to RAG systems, I recommend exploring some introductory materials first. Microsoft’s Introduction to RAG in AI development and Google Cloud’s free course on Retrieval Augmented Generation are good places to start.
Welcome back to my deep dive into advanced RAG systems. In my previous posts, I explored document splitting and handling visual elements. Now, we’re going to discuss a technique that can significantly enhance your RAG system’s understanding and response quality – document summaries.
To better understand how summaries can help, we need to first address a fundamental limitation of RAG systems. When a RAG retriever return matches from the semantic search algorithms, it often misses some relevant context specific information. This happens because the document retriever focuses narrowly on finding document chunks with text segments that match the specific wording of a user’s prompt. Simply put, if the user doesn’t provide sufficient detail in their request the retriever will struggle to return specific information and tend towards broad non-specific information. Semantic matching is very effective when users understand how to properly prompt an LLM. However, we need to build systems that also serve users who haven’t developed this skill yet.
Consider a simple question: ‘What color is the sky?’ While the conventional answer might be ‘blue,’ this response lacks context. Are we discussing the sky’s general appearance? Its current color at a specific location and time? Or do we mean the sky on the planet Venus? Inexperienced chatbot users will often phrase their questions narrowly. Most will assume a chatbot understands its operational context, just as they would when speaking to a human. When developing RAG systems, we need to account for these implicit contextual assumptions.
One approach to improving the quality of results is providing document summaries. More specifically, if one or more chunks are retrieved from a document by the retriever, we also insert a complete summary of the document along with the chunks. Let me illustrate this with a real-world scenario I encountered:
Note: The data is anonymized, and the retrieved chunks and summary have been simplified for brevity.
User query: Why did project Aurora overrun its budget?
Retrieved RAG data:– The project exceeded its projected budget by 40%.– The development team exceeded their projected hours by 127% in Q2.– Sarah implemented an aggressive sprint schedule to make up for lost time.LLM response:The project exceeded its projected budget by 40% primarily because the development team exceeded their projected hours by 127% in Q2.
While technically accurate, this response misses crucial context. The user asked why which implies some analysis of the data not simply the figures as stated. Now let’s see what happens when we add an LLM generated document summary:
Document summary added:Project Aurora was a software migration project launched in 2021. The originaltechnical lead left unexpectedly in March. The new lead had to spend two months learning the legacy system, which wasn’t accounted for in the original timeline or budget. Despite the team’s best efforts to accelerate development afterward, the project exceeded its budget by 40%.Enhanced LLM response:The project exceeded its projected budget by 40% primarily because the original technical lead left unexpectedly in March, necessitating the new lead to spend two months learning the legacy system. This delay was not accounted for in the original timeline or budget. Despite the team’s efforts to accelerate development, they were unable to compensate for the earlier setback, leading to the budget overrun.
With the summary included, the LLM can provide a more complete and nuanced explanation that captures the root cause of the budget overrun, not just its symptoms.
When implementing document summaries in a RAG system, we generate them during the initial document ingestion phase. Each complete document is processed by an LLM acting as a “Summary Agent.” The Summary Agent is tasked with creating concise, well-structured summaries that capture essential information contained in a document and the document’s overall context. Summaries are not vectorized but rather stored in a separate database. The retriever collects the summary when document chunks from that document are semantically matched with a user prompt.
The quality of your summaries heavily depends on providing clear, consistent instructions to your Summary Agent. Your prompt should instruct the LLM to focus on:
The most common issue I’ve encountered when implementing document summaries with RAG is overly verbose summaries. This can significantly increase your input token count when the retriever references multiple documents in a single prompt.”In order to control the cost of LLM queries it is important to enforce short summaries and constrain the maximum number of documents retrieved for a single prompt.
When integrating summaries with your RAG system, focus on finding the right balance. You want to provide enough context through the summary without overwhelming the LLM with redundant information. The goal of adding summaries isn’t to make responses more accurate but rather to make responses more contextually aware.
Document summaries are a powerful tool for enhancing RAG system responses. However, their effectiveness depends heavily on careful implementation. By focusing on consistent formatting and capturing only essential information you can significantly improve the contextual understanding of your RAG system’s responses.
In the next post, I’ll explore the power of metadata in RAG systems. I’ll examine how metadata can enhance document retrieval and optimize semantic search results. Finally, I’ll demonstrate how metadata can be used to benchmark RAG systems and make chatbot behavior more explainable.
Technical Lead – Robotics & AI | France
Data Governance is a foundational element for organizations striving to harness the power of data while managing associa…
Data governance – the framework that ensures data is accurate, secure, and well-managed – plays a critical role in shapi…
organizations spend time and resources on preventing outages as much as possible and when they happen make sure that eit…
We use cookies to improve your experience on our website. They help us to improve site performance, present you relevant advertising and enable you to share content in social media.
You may accept all cookies, or choose to manage them individually. You can change your settings at any time by clicking Cookie Settings available in the footer of every page.
For more information related to the cookies, please visit our cookie policy.