Featured research (1)

Despite recent advancements in Large language models (LLM), vast amounts of information exist beyond their static knowledge base. Retrieval-Augmented Generation (RAG) makes it possible for models to obtain and make use of the most recent data, which is essential for fields that change quickly, like science or current affairs. Unfortunately, there is currently no effective RAG pipeline for the Bengali language. Additionally, a reliable Bengali retriever is unavailable, and there are no benchmarks for Bengali information retrieval. In this work, we extensively study retrieval performance in Bengali with various established methods, e.g., representation-based similarity models and late interaction models. We fine-tune Contextualized Late Interaction over BERT (ColBERT) and benchmark Bengali retrieval performance using SQuAD BN dataset. We also incorporate our ColBERT retriever with a Bengali LLM, BN-RAG-LLaMA3-8b, to produce a full RAG pipeline and demonstrate the notable enhancement of 8% in the LLM's question-answering capabilities by comparing the LLM with and without RAG.

Lab head

Rifat Shahriyar
Department
  • Department of Computer Science and Engineering
About Rifat Shahriyar
  • My research interests are memory management (garbage collection), programming language, software engineering, and natural language processing.

Members (3)

Kazi Samin Mubasshir
  • Purdue University West Lafayette
Sanju Basak
  • Bangladesh University of Engineering and Technology
Tanveer Muttaqueen
  • Bangladesh University of Engineering and Technology
Abhik Bhattacharjee
Abhik Bhattacharjee
  • Not confirmed yet
Kazi Sajeed Mehrab
Kazi Sajeed Mehrab
  • Not confirmed yet
Noshin Nawal
Noshin Nawal
  • Not confirmed yet