Making manual scoring of typed transcripts a thing of the past: a commentary on Herrmann (2025)

Abstract

Coding the accuracy of typed transcripts from experiments testing speech intelligibility is an arduous endeavor. Herrmann (2025) presents a novel approach for automating the scoring of such listener transcripts, leveraging Natural Language Processing (NLP) models. It involves the calculation of the semantic similarity between transcripts and target sentences using highdimensional vectors, generated by such NLP models as ADA2, GPT2, BERT, and USE. This approach demonstrates exceptional accuracy, with negligible underestimation of intelligibility scores (by about 2-4%), numerically outperforming simpler computational tools like Autoscore and TSR. The method uniquely relies on semantic representations generated by large language models. At the same time, these models also form the Achilles heel of the technique: the transparency, accessibility, data security, ethical framework, and cost of the selected model directly impact the suitability of the NLP-based scoring method. Hence, working with such models can raise serious risks regarding the reproducibility of scientific findings. This in turn emphasizes the need for fair, ethical, and evidence-based open source models. With such models, Herrmann’s new tool represents a valuable addition to the speech scientist’s toolbox.

Type
Publication
Speech, Language and Hearing 28(1), doi:10.1080/2050571X.2025.2514395
Hans Rutger Bosker
Hans Rutger Bosker
Assistant Professor

My research interests include speech perception, audiovisual integration, and prosody.