Background: Scale heterogeneity, or differences in the error variance of choices, may account for a significant amount of the observed variation in the results of discrete choice experiments (DCEs) when comparing preferences between different groups of respondents. Objective: The aim of this study was to identify if, and how, scale heterogeneity has been addressed in healthcare DCEs that compare the preferences of different groups. Methods: A systematic review identified all healthcare DCEs published between 1990 and February 2016. The full-text of each DCE was then screened to identify studies that compared preferences using data generated from multiple groups. Data were extracted and tabulated on year of publication, samples compared, tests for scale heterogeneity, and analytical methods to account for scale heterogeneity. Narrative analysis was used to describe if, and how, scale heterogeneity was accounted for when preferences were compared. Results: A total of 626 healthcare DCEs were identified. Of these 199 (32%) aimed to compare the preferences of different groups specified at the design stage, while 79 (13%) compared the preferences of groups identified at the analysis stage. Of the 278 included papers, 49 (18%) discussed potential scale issues, 18 (7%) used a formal method of analysis to account for scale between groups, and 2 (1%) accounted for scale differences between preference groups at the analysis stage. Scale heterogeneity was present in 65% (n = 13) of studies that tested for it. Analytical methods to test for scale heterogeneity included coefficient plots (n = 5, 2%), heteroscedastic conditional logit models (n = 6, 2%), Swait and Louviere tests (n = 4, 1%), generalised multinomial logit models (n = 5, 2%), and scale-adjusted latent class analysis (n = 2, 1%). Conclusions: Scale heterogeneity is a prevalent issue in healthcare DCEs. Despite this, few published DCEs have discussed such issues, and fewer still have used formal methods to identify and account for the impact of scale heterogeneity. The use of formal methods to test for scale heterogeneity should be used, otherwise the results of DCEs potentially risk producing biased and potentially misleading conclusions regarding preferences for aspects of healthcare.