An example of contextual image reference, where referencing the images of a Brachiosaurus can largely enhance user comprehension and engagement. 1

An example of contextual image reference, where referencing the images of a Brachiosaurus can largely enhance user comprehension and engagement. 1

Source publication
Preprint
Full-text available
Vision-Language Models (VLMs) have demonstrated remarkable capabilities in understanding multimodal inputs and have been widely integrated into Retrieval-Augmented Generation (RAG) based conversational systems. While current VLM-powered chatbots can provide textual source references in their responses, they exhibit significant limitations in refere...

Context in source publication

Context 1
... identify this gap as the absence of Contextual Image Reference -the capability to strategically select and incorporate relevant images from retrieved documents to enhance response comprehension and user engagement. As demonstrated in Figure 1, when discussing complex subjects like the Brachiosaurus, purely textual descriptions of its physical characteristics often fail to convey information intuitively. Despite its potential impact on multimodal conversation systems, the challenge of contextual image referencing remains largely unexplored in current research. ...

Similar publications

Preprint
Full-text available
Recent advancements in Large Language Models (LLMs) have made them a popular information-seeking tool among end users. However, the statistical training methods for LLMs have raised concerns about their representation of under-represented topics, potentially leading to biases that could influence real-world decisions and opportunities. These biases...
Preprint
Full-text available
With the rapid adoption of LLM-based chatbots, there is a pressing need to evaluate what humans and LLMs can achieve together. However, standard benchmarks, such as MMLU, measure LLM capabilities in isolation (i.e., "AI-alone"). Here, we design and conduct a user study to convert MMLU questions into user-AI conversations, by seeding the user with t...
Preprint
Full-text available
Large language models (LLMs) are expected to offer structured Markdown responses for the sake of readability in web chatbots (e.g., ChatGPT). Although there are a myriad of metrics to evaluate LLMs, they fail to evaluate the readability from the view of output content structure. To this end, we focus on an overlooked yet important metric -- Markdow...
Preprint
Full-text available
Computer programming represents a rapidly evolving and sought-after career path in the 21st century. Nevertheless, novice learners may find the process intimidating for several reasons, such as limited and highly competitive career opportunities, peer and parental pressure for academic success, and course difficulties. These factors frequently cont...
Article
Full-text available
Large language model (LLM) chatbots have demonstrated significant capability in patient education by offering accessible, consistent, and personalized information. Their ability to interact in real-time and adapt responses based on user input makes them valuable tools in enhancing patient knowledge and engagement. Sexual education in developing cou...