In natural situations, speech perception often takes place during the concurrent execution of other cognitive tasks, such as listening while viewing a visual scene. The execution of a dual task typically has detrimental effects on concurrent speech perception, but how exactly cognitive load disrupts speech encoding is still unclear. The detrimental effect on speech representations may consist of either a general reduction in the robustness of processing of the speech signal (‘noisy encoding’), or, alternatively it may specifically influence the temporal sampling of the sensory input, with listeners missing temporal pulses, thus underestimating segmental durations (‘shrinking of time’). The present study investigated whether and how spectral and temporal cues in a precursor sentence that has been processed under high vs. low cognitive load influence the perception of a subsequent target word. If cognitive load effects are implemented through ‘noisy encoding’, increasing cognitive load during the recursor should attenuate the encoding of both its temporal and spectral cues, and hence reduce the contextual effect that these cues can have on subsequent target sound perception. However, if cognitive load effects are expressed as ‘shrinking of time’, context effects should not be modulated by load, but a main effect would be expected on the perceived duration of the speech signal. Results from two experiments indicate that increasing cognitive load (manipulated through a secondary visual search task) did not modulate temporal (Experiment 1) or spectral context effects (Experiment 2). However, a consistent main effect of cognitive load was found: increasing cognitive load during the precursor induced a perceptual increase in its perceived speech rate, biasing the perception of a following target word towards longer durations. This finding suggests that cognitive load effects in speech perception are implemented via ‘shrinking of time’, in line with a temporal sampling framework. In addition, we argue that our results align with a model in which early (spectral and temporal) normalization is unaffected by attention but later adjustments may be attention-dependent.