Purpose of this Playbook

In this playbook, we focus on tackling the problem of Poor Performance, including Challenges in Evaluating Performance mentioned in the previous section. To address the problem of the lack of an evaluation framework, we first provide a survey of RAG evaluation metrics used in the industry. Next, we curate our own datasets and build RAG apps using both “few-click” end-to-end solutions provided by Cloud Service Providers, and our own customised RAG pipelines, and use some of the evaluation metrics surveyed to benchmark them.

With a proper evaluation framework in place, we survey methods which can be used to improve the performance of RAG pipelines and carry out experiments to apply some of them to our customised RAG pipeline. With every modification, we measure the performance difference from the baseline pipeline to show what works and what not.

The optimisation of RAG pipelines is highly dependent on the dataset and use case, so what works for one dataset and use case might not necessarily apply to another. The objective here is not to prescribe solutions which guarantee performance improvements. Instead, it is to show the approaches we have taken to improve performance for different use cases. The reader can use these as references and apply them where appropriate, at their own discretion. We encourage the reader to perform their own experiments to validate the effectiveness of the modifications on their own datasets and use cases to get the best performance.