ABSTRACT
Name : Alvin Xavier Rakha Wardhana
Study Program : Computer Science
Title : Optimization of Retrieval-Augmented Generation with Prompt
Compression: Balancing Token Efficiency and Performance
Counselor : Adila Alfa Krisnadhi, S.Kom., M.Sc., Ph.D.
This research investigates the integration of context compression methods within a
retrieval-augmented generation (RAG) system to address the challenges of processing
long contexts in generative tasks. Specifically, it evaluates the impact of various compression techniques on reducing information redundancy while minimizing performance
degradation, as measured by BERTScore, when compared to full contexts. The study
also examines how different retrieval approaches, cosine similarity-based and perfect
retrieval, affect the overall system performance. Datasets from various domains are
being used for this research, including RAGBench, a ready-to-use benchmark for
retrieval-augmented generation, and raw PDF documents, such as the Human Nutrition
Textbook by the University of Hawai’i, alongside documents from the Indonesian bank
BRI (mainly about finance) requiring preprocessing to extract structured content. These
datasets were chosen to represent diverse challenges in retrieval and compression, such as
handling varying document lengths and domain-specific contexts. Experimental results
show that compression methods, such as LLMLingua and its variants, achieve significant
reductions in context size while maintaining competitive BERTScore F1. Among these,
LLMLingua consistently achieved the best balance between answer quality and cost
reduction, as indicated by the highest CQHS values, a newly proposed harmonic metric
combining answer quality (BERTScore F1) and cost reduction (compression ratio).
Furthermore, the choice of retrieval method significantly influences the performance,
with perfect retrieval consistently outperforming cosine similarity-based retrieval.These
findings highlight the importance of selecting effective compression strategies and robust
retrieval techniques for optimizing RAG pipelines. Future work aims to explore dynamic
and context-aware compression methods, as well as extend the evaluation to broader
datasets and real-world applications.
|