ABSTRACT

Name : Alvin Xavier Rakha Wardhana Study Program : Computer Science Title : Optimization of Retrieval-Augmented Generation with Prompt Compression: Balancing Token Efficiency and Performance Counselor : Adila Alfa Krisnadhi, S.Kom., M.Sc., Ph.D. This research investigates the integration of context compression methods within a retrieval-augmented generation (RAG) system to address the challenges of processing long contexts in generative tasks. Specifically, it evaluates the impact of various compression techniques on reducing information redundancy while minimizing performance degradation, as measured by BERTScore, when compared to full contexts. The study also examines how different retrieval approaches, cosine similarity-based and perfect retrieval, affect the overall system performance. Datasets from various domains are being used for this research, including RAGBench, a ready-to-use benchmark for retrieval-augmented generation, and raw PDF documents, such as the Human Nutrition Textbook by the University of Hawai’i, alongside documents from the Indonesian bank BRI (mainly about finance) requiring preprocessing to extract structured content. These datasets were chosen to represent diverse challenges in retrieval and compression, such as handling varying document lengths and domain-specific contexts. Experimental results show that compression methods, such as LLMLingua and its variants, achieve significant reductions in context size while maintaining competitive BERTScore F1. Among these, LLMLingua consistently achieved the best balance between answer quality and cost reduction, as indicated by the highest CQHS values, a newly proposed harmonic metric combining answer quality (BERTScore F1) and cost reduction (compression ratio). Furthermore, the choice of retrieval method significantly influences the performance, with perfect retrieval consistently outperforming cosine similarity-based retrieval.These findings highlight the importance of selecting effective compression strategies and robust retrieval techniques for optimizing RAG pipelines. Future work aims to explore dynamic and context-aware compression methods, as well as extend the evaluation to broader datasets and real-world applications.