Visual cues play a key role in speech perception. Beat gestures (i.e., simple up-and-down hand movements) usually co-occur with prominence in speech. Previous studies found that hand beat timing can indicate word stress. The present study further examines whether hand beat timing influences spoken word recognition in a gradient fashion. On watching videos of a native speaker of Dutch uttering a disyllabic word voornaam while making a hand beat, 40 participants needed to decide if they heard the word with initial (VOORnaam, “first name”) or final stress (voorNAAM, “respectable”). Crucially, nine beat apex timings were equally distributed between the pitch peaks of the two syllables. Results exhibited a gradient effect of hand beat timing on stress perception, which appeared not to be susceptible to brief pretest feedback implying that visual cues should be ignored. Our findings provide novel evidence for audiovisual interaction and can inform gesture generation in conversational agents.