Skip to content

Executive Summary

Retrieval-Augmented Generation (RAG) is a framework in natural language processing (NLP) that combines techniques from both retrieval-based and generation-based approaches to improve the quality of text generation models. It enables users to harness the power of Large Language Models (LLMs) to perform detailed inference on their knowledge bases without high barriers to entry like model fine-tuning.

As an emergent technology, developers in government are bound to face various challenges in taking their RAG applications from development to production. One core challenge is uncertainty over how good or reliable RAG systems are, resulting in a lack of confidence and support for deployment. This playbook aims to address this challenge by providing practical guidance on how to build, evaluate and improve RAG systems effectively, grounded in government-specific use cases.

We cover various options for building RAG applications in government, ranging from no-code/low-code solutions to custom pipelines using open-source frameworks. We also suggest metrics for evaluating RAG systems, and provide a technical overview of key concepts relevant to improving RAG performance. Finally, we conclude with a series of experiments conducted on two realistic government use cases (Hansard and Judiciary) to provide examples of how the aforementioned concepts can be applied to systematically iterate on and improve RAG performance.

As RAG becomes more mainstream, we can expect many new innovations for building more accurate and efficient RAG systems. We hope that this document provides readers with a starting point to build a fundamental understanding of how RAG systems work, and we encourage an open mind as the technology continues to develop.