Sunday Paper: 5th Edition

The Next Generation of Retrieval Augmented Generation

Jun 23, 2024

This Sunday Paper will actually be a part of a trilogy of posts as I aim to make up for lost time. Each post will be released 24 hours after the other with this post being the first. The second and third posts will focus on new LLM architectures and LLM Grokking respectively so please stay tuned for those. Today though, I am exploring several papers which aim to combine Retrieval Augmented Generation (RAG) with Knowledge Graphs (KGs) in order decrease LLM hallucination. In addition to this I wanted to read a few papers on modern prompting and prompt analysis as these papers similarly aim to improve model performance without further training or fine-tuning. I hope you find these papers as interesting as I have.

Sunday: Brittle ReAct Prompting (paper)

This paper’s analysis of ReAct prompting involves a series of ablations to determine the true value of this prompting strategy. These ablations reveal that the primary strength of ReAct lies in the similarity of examples to the current task, underscoring the continued importance of human prompt construction, which is not ideal. Furthermore, the study finds that the structure of the information is not crucial; reasoning and acting can occur in unconventional orders or be grouped differently without significantly impacting performance. Overall the paper seems to show that ReAct prompting degrades to Chain of Thought (CoT) prompting and may suggest that example retrieval is much more important in prompting than the language used within the prompts themselves.

Monday: Buffer of Thoughts (paper)

The Buffer of Thoughts prompting methodology introduces an innovative approach that first distills an incoming problem query to its root problem type. By comparing this root problem type to similar problems within a buffer memory, the method enables improved learning through prompt (and example) retrieval. This means that Buffer of Thoughts prompting can actually improve and learn over time by storing unique experiences within its memory buffer. This creates a hybrid short/long-term memory, allowing the model to learn from experience. As a result, Buffer of Thoughts sets new state-of-the-art (SOTA) benchmarks on various prompting-based tasks while requiring very few large language model (LLM) hits, unlike iterative prompting strategies, thus achieving high throughput.

Tuesday: ImplicitCoT (paper)

ImplicitCoT introduces a new “training” methodology which aims to teach large language models (LLMs) to natively perform Chain of Thought (CoT) reasoning, reducing the need for long and explicit prompts. The approach within this paper is tuned for mathematics, in which the model is first trained to generate complete CoT responses. At this point the model is then iteratively taught to skip steps within its reasoning while maintaining its answer. At the end of this process the authors claim that this model now has CoT reasoning capabilities without having to generate as many tokens. While these steps are straightforward in math and arithmetic tasks, generalizing this capability to more complex language tasks holds significant potential if you believe that the model has been trained to adequately reason through this method. Further analysis will need to be done to prove that the model is not just overtraining and overfitting numerical data, but if this analysis holds it could prove to be a substantial finding. The main limit to language application at the moment is that the training method faces stability challenges within language and currently requires consistency in reasoning patterns and output token sizes, something which arithmetic can easily guarantee.

Wednesday: Arithmetic Embeddings (paper)

The paper introduces Abacus Embeddings, a new approach based on Absolute Positional embeddings but now with positional resets designed to enhance large language models' (LLMs) ability to perform arithmetic. This method involves resetting the positional offset value to zero at the end of each semantic unit, such as a K-digit number. Although currently demonstrated only for arithmetic tasks, this approach significantly improves state-of-the-art (SOTA) performance and holds potential for extension to language tasks if semantic units can be abstracted to words, sentences, and/or paragraphs. For some of you who have read earlier editions of the Sunday Paper this may seem similar to the application of Contextual Positional Embeddings.

Thursday: HippoRAG (paper)

HippoRAG aims to revolutionize RAG pipelines through the use of language extraction techniques inspired by the hippocampus in order to convert information from a document corpus into a knowledge graph. This approach addresses multi-index questions where two disparate pieces of context reference the same entity and are both needed to formulate an accurate answer. Questions such as what French soccer players have won the World Cup fit within this category and tend to frustrate traditional retrieval algorithms. HippoRAG's extraction method instead identifies language triples offline in the form of (head, relationship, tail). A common triple would be an (object, verb, subject) triple. These triples are then encoded within a Knowledge Graph and later retrieved by the LLM, resulting in significant performance improvements. It is worth noting that this method is limited to answering questions that reference at least one of these three parts of speech, making it effective for 'Who', 'When', or 'What' questions, but less so for 'Why' questions.

Friday: GNN-RAG (paper)

GNN-RAG enhances knowledge graph (KG) retrieval for Retrieval-Augmented Generation (RAG) by employing a Graph Neural Network (GNN) classifier to determine the relevance of each node in the KG to the input question. The GNN identifies candidate answer nodes, which are then traced back to relevant nodes connected to the question via the shortest path. This process creates relevant context traces that can be used to generate accurate answers, effectively reducing potential errors and improving the overall performance of the retrieval process. I actually love the simplicity and elegance of this solution as it can reduce the number of calls to much more expensive models. The primary issue I see with this methodology is the aquisition of initial training data for the GNN, especially if the network is dynamic in nature. This is something that may be solved with the emergence of Graph Foundation Models.

Saturday: GraphRAG (paper)

GraphRAG builds on the concepts introduced in the RAPTOR paper (discussed in a previous Sunday Paper), extending them into the realm of knowledge graphs. Given a KG, GraphRAG performs hierarchical graph clustering, where each cluster is summarized and then re-clustered. Through this iterative process, KG clusters effectively replace the document chunks used in RAPTOR, enhancing the organization and retrieval of information within the knowledge graph. I find this method to be interesting as it is well positioned to handle a scaling KG without linearly scaling retrieval costs. It is similarly able to take advantage of graph structure instead of document structure, something which is inherently more robust and generalizes well across data sources.

Conclusion

Overall I don’t think that any of these methods solve the current retrieval failures of modern RAG pipelines, however, I do think that these papers are a step in the right direction. I think that being able to store information in a way that captures its many relations is paramount to fast and accurate RAG. I think that the biggest problem with these methods at the moment is the ability to convert natural language into the space of knowledge graphs, a process which is far from perfect. With adequate offline information extraction I believe that we will see these KG based RAG pipelines takeover RAG research and practically eliminate hallucinations during question answering.

The only other component that is missing from this discussion of RAG is the topic of knowledge compression, something which will be briefly covered within tomorrow’s Sunday Paper with Lamini Memory. I am actually super excited about this paper and I think it proposes an entire research labs worth of questions that I would love to investigate myself or see investigated by others within the community.

If you liked this post please check out my main blog and consider subscribing for free. All of my content is free and will continue to be free. I try to post on my main blog twice a month on Mondays (may change to Wednesdays going forward) and I will aim to post here every Sunday. I like to talk about cutting edge AI research and AI philosophy in a manner that is easy to understand for semi-technical audiences.