December 2024
·
73 Reads
Journal of Audiovisual Translation
This paper discusses the potential use of Automatic Speech Recognition (ASR) tools to produce intralingual subtitles for broadcasting purposes. Two different ASR tools were trialled by an international broadcaster to produce automatic subtitles for pre-recorded content in English and in Italian, a British talk show and a US feature film dubbed into Italian. A study was commissioned to compare the performance of the two tools on the materials. Our evaluation focused on two key dimensions: the accuracy of the transcript and the readability of the subtitles. Accuracy was assessed quantitatively by using an adaptation of the NER and NTR models (Romero-Fresco & Martínez 2015, Romero-Fresco & Pöchhacker 2017), which focuses on ASR-generated errors and categorises them by error type (content- or form- related) and by level of severity (minor, standard and critical). Readability was assessed qualitatively by analysing text segmentation, namely line breaks and subtitle breaks. Our findings indicate that all the ASR outputs fell short of the 98% accuracy threshold expected in the broadcasting industry, although performance was notably better in English. Moreover, subtitle segmentation and timing were found to be relatively poor in the subtitles produced by both tools in both languages. Therefore, the ASR-generated subtitles from the samples provided by the broadcaster can only be considered an intermediate step. Substantial human input is required before the tools can be put to work (customisation) and after the ASR has generated the subtitles (human post-editing) to produce broadcast-ready subtitles. Lay summary This paper explores using Automatic Speech Recognition (ASR) tools to create intralingual subtitles (i.e. in the same language) for pre-recorded content. The study was commissioned by an international broadcaster to test two ASR tools for generating subtitles in English and Italian, covering a British talk show and a US film dubbed into Italian. The study compared the tools' performance, focusing on two dimensions, i.e. subtitles’ accuracy and readability. The former, i.e. accuracy, was measured using a model that enabled us to categorise and weigh errors generated by the ASR tools. The latter, i.e. readability, was measured by considering lines and subtitles breaks. The evaluation revealed that both tools fell short of the industry's expected 98% accuracy, especially in Italian. Additionally, subtitle segmentation and timing were found to be subpar in both languages. Consequently, substantial human involvement, including customisation and post-editing, is necessary to produce high-quality broadcast-ready subtitles.