The present study investigated the binding of verbal identity and spatial location in the retention of sequences of spatially distributed acoustic stimuli. Study stimuli varying in verbal content and spatial location (e.g. V1S1, V2S2, V3S3, V4S4) were followed by a recognition probe stimulus. A critical test of the binding or integration of the verbal and spatial features of the study stimuli comprised a comparison of intact probes that preserved the association of those features (e.g. V2S2 or V3S3) with recombined probes (e.g. V2S3 or V2S3) that used verbal and spatial features from study items, but in new combinations. A series of five experiments showed evidence of the binding of sound identity and location information for both verbal stimuli (spoken letters) and artificial non-verbal stimuli. While binding tended to be stronger for the more recent items of the sequence, there was consistent evidence of the retention of associations of features for the early sequence items, suggesting durability of binding of auditory features over time (at least 5.5 s) and despite the interpolated processing of other stimuli. Both spatial and verbal recognition judgments were affected by the association of verbal and spatial features when the test procedure required attention to the two classes of information. However, when participants were able to focus attention on one class of information and ignore the other, spatial recognition judgments showed an advantage for intact probes compared to recombined probes, whereas verbal recognition judgments did not. The results are discussed with reference to the primacy of identity and location in the representation of sounds in working memory.