Aim/Purpose: This exploratory qualitative case study examines the perceptions of high-school learners of English regarding a pedagogical intervention involving progressive reduction of captions (full, sentence-level, keyword captions, and no-captions) in enhancing language learning. Background: Recognizing the limitations of caption usage in fostering independent listening comprehension in non-captioned environments, this research builds upon and extends the foundational work of Vanderplank (2016), who highlighted the necessity of a comprehensive blend of tasks, strategies, focused viewing, and the need to actively engage language learners in watching captioned materials. Methodology: Using a qualitative research design, the participants were exposed to authentic video texts in a five-week listening course. Participants completed an entry survey, and upon interaction with each captioning type, they wrote individual reflections and participated in focus group sessions. This methodological approach allowed for an in-depth exploration of learners’ experiences across different captioning scenarios, providing a nuanced understanding of the pedagogical intervention’s impact on their perceived language development process. Contribution: By bridging the research-practice gap, our study offers valuable insights into designing pedagogical interventions that reduce caption dependence, thereby preparing language learners for success in real-world, caption-free listening scenarios. Findings: Our findings show that learners not only appreciate the varied captioning approaches for their role in supporting text comprehension, vocabulary acquisition, pronunciation, and on-task focus but also for facilitating the integration of new linguistic knowledge with existing background knowledge. Crucially, our study uncovers a positive reception towards the gradual shift from fully captioned to uncaptioned materials, highlighting a stepwise reduction of caption dependence as instrumental in boosting learners’ confidence and sense of achievement in mastering L2 listening skills. Recommendations for Practitioners: The implications of our findings are threefold: addressing input selection, task design orchestration, and reflective practices. We advocate for a deliberate selection of input that resonates with learners’ interests and contextual realities alongside task designs that progressively reduce caption reliance and encourage active learner engagement and collaborative learning opportunities. Furthermore, our study underscores the importance of reflective practices in enabling learners to articulate their learning preferences and strategies, thereby fostering a more personalized and effective language learning experience. Recommendation for Researchers: Listening comprehension is a complex process that can be clearly influenced by the input, the task, and/or the learner characteristics. Comparative studies may struggle to control and account for all these variables, making it challenging to attribute observed differences solely to caption reduction. Impact on Society: This research responds to the call for innovative teaching practices in language education. It sets the stage for future inquiries into the nuanced dynamics of caption usage in language learning, advocating for a more learner-centered and adaptive approach. Future Research: Longitudinal quantitative studies that measure comprehension as captions support is gradually reduced (full, partial, and keyword) are strongly needed. Other studies could examine a range of individual differences (working memory capacity, age, levels of engagement, and language background) when reducing caption support. Future research could also examine captions with students with learning difficulties and/or disabilities.