A tool for efficient and accurate segmentation of speech data: Announcing POnSS

Joe Rodd, Caitlin Decuyper, Hans Rutger Bosker, Louis ten Bosch

January 2021

Abstract

Despite advances in automatic speech recognition (ASR), human input is still essential for producing research-grade segmentations of speech data. Conventional approaches to manual segmentation are very labor-intensive. We introduce POnSS, a browser-based system that is specialized for the task of segmenting the onsets and offsets of words, which combines aspects of ASR with limited human input. In developing POnSS, we identified several sub-tasks of segmentation, and implemented each of these as separate interfaces for the annotators to interact with to streamline their task as much as possible. We evaluated segmentations made with POnSS against a baseline of segmentations of the same data made conventionally in Praat. We observed that POnSS achieved comparable reliability to segmentation using Praat, but required 23% less annotator time investment. Because of its greater efficiency without sacrificing reliability, POnSS represents a distinct methodological advance for the segmentation of speech data.

Type

Publication

Behavior Research Methods, 53, 744-756, doi:10.3758/s13428-020-01449-6

Hans Rutger Bosker

Assistant Professor

My research interests include speech perception, audiovisual integration, and prosody.