Look and listen

Listening test

In the video below, you’ll see a simple display with four objects. First, see if you know each of the four objects. Then play the video. You’ll hear a female voice asking you to press a button for one of the objects (i.e., click on it). While watching and listening, try to keep track of where your eyes go in the display…

Do you have any idea what the next word of the speaker will be? Probably not, right? Did you notice anything particular about where in the display your gaze was at? Since you probably didn’t know what object the speaker was going to name, chances are your eyes were all over the place.

OK, next video. It’s the same display, but with a new audio recording. Have a look and see if you can tell which of the four objects the speaker selects…

Well, in this case, you may already have a slight hunch, right? The speaker was hesitating at the end of her utterance, wasn’t she? Well, chances are this native speaker of British English won’t have much trouble naming common objects, like lion, ear, or bike, would she? So could it be that she’ll refer to the Italian moka pot in the top left?

Disfluencies help you predict what’s coming up

Natural speech is messy. We stumble over words, lose our line of thought, and produce tons of uhm’s and uh’s. Still, these kinds of disfluencies don’t occur randomly throughout an utterance. We are much more likely to stumble before rarely occurring (low-frequency), novel (not mentioned before), and complex (long) words than we are before common and simple words.

Interestingly, human listeners seem to be aware of this. In our experiments, we presented listeners with displays like the ones above together with spoken instructions to click on one of the objects. While people were watching/listening, we recorded where they were looking on the screen using eye-tracking (see lab photo below). This allowed us to track their gaze on a millisecond time scale as the utterance unfolds. Results showed that when people heard the speaker hesitate, they were much more likely to look at a low-frequency object, like moka pot, compared to high-frequency objects (Bosker et al., 2014).

Eye-tracking lab at the MPI. (C) Max-Planck-Gesellschaft, https://www.mpi.nl/page/mpi-labs
Eye-tracking lab at the MPI. (C) Max-Planck-Gesellschaft, https://www.mpi.nl/page/mpi-labs

OK, let’s try again…

Here’s another video, again with the same display, but another audio recording. Once again, have a listen and see if you can tell which object the speaker will name:

In the last few milliseconds of the clip, you may have discovered a glimpse of the object. Did she say “Now press the button for the li-…”? Does that mean we’ve finally figured out that it’ll be the lion after all?

Let’s find out:


Filler words can be misleading

As mentioned before, speech is messy. We don’t only produce hesitations and disfluencies, but also litter our speech with seemingly meaningless filler words, such as ‘you know’, ‘well’, and (worst of all) ’like’. Our audience, in turn, is tasked with distilling from this chaos what we actually want to communicate.

And that can be hard. Filler words share their sounds (phonology) with many other words. The filler ’like’ shares its initial sounds with words such as ’lion’, ’lime’, ’lice’, lightbulb’, etc. Our experiments have shown that listeners are actually considering these similar-sounding words (cohort competitor) when encountering ’like’. When presented with displays with one ‘cohort competitor’ (e.g., lion) and three distractors, participants were biased towards looking at the lion upon hearing “…for the like…”. This suggests that filler words, like “like” (see what I did there?), have an impact on the efficiency of word recognition (Bosker et al., 2021).

Why is this important?

Eye-tracking can reveal the time-course of speech processing. It allows tracking people’s gaze with millisecond precision, often without participants themselves being aware of their own looking behavior. As such, it can show when in time certain acoustic and/or visual cues influence speech perception. That kind of temporal information has for instance been used to discriminate between different models of word recognition.

Relevant papers

(2020). Eye-tracking the time course of distal and global speech rate effects. Journal of Experimental Psychology: Human Perception and Performance, 46(10), 1148-1163, doi:10.1037/xhp0000838.

PDF Cite Dataset DOI