Lab
Rifat Shahriyar's Lab
Institution: Bangladesh University of Engineering and Technology
Department: Department of Computer Science and Engineering
Featured research (1)
Despite recent advancements in Large language models (LLM), vast amounts of information exist beyond their static knowledge base. Retrieval-Augmented Generation (RAG) makes it possible for models to obtain and make use of the most recent data, which is essential for fields that change quickly, like science or current affairs. Unfortunately, there is currently no effective RAG pipeline for the Bengali language. Additionally, a reliable Bengali retriever is unavailable, and there are no benchmarks for Bengali information retrieval. In this work, we extensively study retrieval performance in Bengali with various established methods, e.g., representation-based similarity models and late interaction models. We fine-tune Contextualized Late Interaction over BERT (ColBERT) and benchmark Bengali retrieval performance using SQuAD BN dataset. We also incorporate our ColBERT retriever with a Bengali LLM, BN-RAG-LLaMA3-8b, to produce a full RAG pipeline and demonstrate the notable enhancement of 8% in the LLM's question-answering capabilities by comparing the LLM with and without RAG.