Figure 1 - available via license: Creative Commons Attribution 4.0 International
Content may be subject to copyright.
An example of contextual image reference, where referencing the images of a Brachiosaurus can largely enhance user comprehension and engagement. 1
Source publication
Vision-Language Models (VLMs) have demonstrated remarkable capabilities in understanding multimodal inputs and have been widely integrated into Retrieval-Augmented Generation (RAG) based conversational systems. While current VLM-powered chatbots can provide textual source references in their responses, they exhibit significant limitations in refere...
Context in source publication
Context 1
... identify this gap as the absence of Contextual Image Reference -the capability to strategically select and incorporate relevant images from retrieved documents to enhance response comprehension and user engagement. As demonstrated in Figure 1, when discussing complex subjects like the Brachiosaurus, purely textual descriptions of its physical characteristics often fail to convey information intuitively. Despite its potential impact on multimodal conversation systems, the challenge of contextual image referencing remains largely unexplored in current research. ...
Similar publications
Recent advancements in Large Language Models (LLMs) have made them a popular information-seeking tool among end users. However, the statistical training methods for LLMs have raised concerns about their representation of under-represented topics, potentially leading to biases that could influence real-world decisions and opportunities. These biases...
With the rapid adoption of LLM-based chatbots, there is a pressing need to evaluate what humans and LLMs can achieve together. However, standard benchmarks, such as MMLU, measure LLM capabilities in isolation (i.e., "AI-alone"). Here, we design and conduct a user study to convert MMLU questions into user-AI conversations, by seeding the user with t...
Large language models (LLMs) are expected to offer structured Markdown responses for the sake of readability in web chatbots (e.g., ChatGPT). Although there are a myriad of metrics to evaluate LLMs, they fail to evaluate the readability from the view of output content structure. To this end, we focus on an overlooked yet important metric -- Markdow...
Computer programming represents a rapidly evolving and sought-after career path in the 21st century. Nevertheless, novice learners may find the process intimidating for several reasons, such as limited and highly competitive career opportunities, peer and parental pressure for academic success, and course difficulties. These factors frequently cont...
Large language model (LLM) chatbots have demonstrated significant capability in patient education by offering accessible, consistent, and personalized information. Their ability to interact in real-time and adapt responses based on user input makes them valuable tools in enhancing patient knowledge and engagement. Sexual education in developing cou...