More and more educational institutions are making lecture videos available online. Since 100+ empirical studies document that captioning a video improves comprehension of, attention to, and memory for the video [1], it makes sense to provide those lecture videos with captions. However, studies also show that the words themselves contribute only 7% and how we say those words with our tone, intonation, and verbal pace contributes 38% to making messages clear in human communication [2]. Consequently, in this paper, we address the question of whether an AI-based visualization of voice characteristics in captions helps students further improve the watching and learning experience in lecture videos. For the AI-based visualization of the speaker’s voice characteristics in the captions we use the WaveFont technology [3–5], which processes the voice signal and intuitively displays loudness, speed and pauses in the subtitle font. In our survey of 48 students, it could be shown that in all surveyed categories—visualization of voice characteristics, understanding the content, following the content, linguistic understanding, and identifying important words—always a significant majority of the participants prefers the WaveFont captions to watch lecture videos.