© 2016 Elsevier Inc.Immediate memory for spoken sequences depends on their rhythm - different levels of accuracy and patterns of error are seen according to the way in which items are spaced in time. Current models address these phenomena only partially or not at all. We investigate the idea that temporal grouping effects are an emergent property of a general serial ordering mechanism based on a population of oscillators locally-sensitive to amplitude modulations on different temporal scales. Two experiments show that the effects of temporal grouping are independent of the predictability of the grouping pattern, consistent with this model's stimulus-driven mechanism and inconsistent with alternative accounts in terms of top-down processes. The second experiment reports detailed and systematic differences in the recall of irregularly grouped sequences that are broadly consistent with predictions of the new model. We suggest that the bottom-up multi-scale population oscillator (or BUMP) mechanism is a useful starting point for a general account of serial order in language processing more widely.