Rapid Prosody Transcription as a tool to assess the perception of phrase-level prominence in Mandarin

Abstract

A linguistic unit is commonly considered to bear prominence when it is perceived as more salient than adjacent units. Acoustic factors such as duration and intensity play important roles in signaling prominence across languages. However, the realization of focal prominence through pitch variation differs between stress-accent languages (e.g., English, Dutch) and tonal languages such as Mandarin. In stress-accent languages, words with higher fundamental frequency (f0) are typically perceived as more prominent, whereas in Mandarin, lexical tones are produced within an expanded f0 range. While the acoustic cues to prominence have been studied in production, less is known about how Rapid Prosody Transcription (RPT) can capture the perception of phrase-level prominence in Mandarin. Following previous RPT methodology, 65 participants listened to 146 utterances extracted from Mandarin TED Talks and were instructed to choose the words that sounded prosodically prominent. Compared to previous RPT studies, pairwise Cohen’s Kappa analysis revealed lower agreement on prominent words (mean K = 0.255). The present study provides new insight into the perception of phrase-level prominence as a language-specific phenomenon while also highlighting the limitations of using RPT in an understudied tonal language, motivating future work linking perceptual judgments with acoustic correlates of phrase-level prominence.

Type
Publication
In Proceedings of Speech Prosody 2026, 877-881, doi:10.21437/SpeechProsody.2026-178
Patrick Louis Rohrer
Patrick Louis Rohrer
Assistant Professor

My main research interests lies in the relationship between gesture and prosody from a crosslinguistic perspective, and how these two modes jointly contribute to the conveyal of communicative meaning, as well as their effects on cognition and acquisition.

Hans Rutger Bosker
Hans Rutger Bosker
Assistant Professor

My research interests include speech perception, audiovisual integration, and prosody.