Speech comprehension involves more than just identifying speech sounds. It also requires the use of prosodic cues, which can be conveyed auditorily (e.g., intonation), but also visually (e.g., prominence-lending beat gestures). Prior studies on unimodal speech emphasize the critical role of prosody in comprehension, in higher level processing (e.g., pragmatics), and even in lexical access. For instance, people are faster at identifying a word when it is prosodically accented than when it is unaccented. This study tested whether beat gestures, serving as visual prominence cues, can similarly aid lexical access even in situations where other cues are already highly supportive of word recognition (e.g., semantically constraining sentences). Moreover, we investigated if this facilitation effect would be modulated by the (mis)alignment of the beat gesture with the word-internal prominence (i.e., stressed syllables). To answer this question, we presented participants with videos of a talker producing semantically constraining sentences containing a critical disyllabic sentence-final target in a lexical decision task. The target was either produced without a gesture or accompanied by a beat gesture aligned to the stressed or unstressed syllable. Response times showed that participants were generally faster when the target was presented together with a beat gesture, regardless of its within-word alignment. Moreover, we found that this facilitatory effect was larger for words than pseudowords. These results provide evidence that beat gestures—even when they are not essential for successful speech comprehension—affect lexical access in highly constraining contexts.