© 2016 Jensen et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.Background: Accurate estimates of multiple breath washout (MBW) outcomes require correct operation of the device, appropriate distraction of the subject to ensure they breathe in a manner representative of their relaxed tidal breathing pattern, and appropriate interpretation of the acquired data. Based on available recommendations for an acceptable MBW test, we aimed to develop a protocol to systematically evaluate MBW measurements based on these criteria. Methods: 50 MBW test occasions were systematically reviewed for technical elements and whether the breathing pattern was representative of relaxed tidal breathing by an experienced MBW operator. The impact of qualitative and quantitative criteria on inter-observer agreement was assessed across eight MBW operators (n = 20 test occasions, compared using a Kappa statistic). Results: Using qualitative criteria, 46/168 trials were rejected: 16.6% were technically unacceptable and 10.7% were excluded due to inappropriate breathing pattern. Reviewer agreement was good using qualitative criteria and further improved with quantitative criteria from (? = 0.53- 0.83%) to (? 0.73-0.97%), but at the cost of exclusion of further test occasions in this retrospective data analysis. Conclusions: The application of the systematic review improved inter-observer agreement but did not affect reported MBW outcomes.