Background: The increasing influence of patient-reported outcome (PRO) measurement instruments indicates their scrutiny has never been more crucial. Above all, PRO instruments should be valid: shown to assess what they purport to assess. Objectives: To evaluate a widely used fatigue PRO instrument, highlight key issues in understanding PRO instrument validity, demonstrate limitations of those approaches and justify notable changes in the validation process. Methods: A two-phase evaluation of the 40-item Fatigue Impact scale (FIS): a qualitative evaluation of content and face validity using expert opinion (n=30) and a modified Delphi technique; a quantitative psychometric evaluation of internal and external construct validity of data from 333 people with multiple sclerosis using traditional and modern methods. Results: Qualitative evaluation did not support content or face validity of the FIS. Expert opinion agreed with the subscale placement of 23 items (58%), and classified all 40 items as being non-specific to fatigue impact. Nevertheless, standard quantitative psychometric evaluations implied, largely, FIS subscales were reliable and valid. Conclusions: Standard quantitative 'psychometric' evaluations of PRO instrument validity can be misleading. Evaluation of existing PRO instruments requires both qualitative and statistical methods. Development of new PRO instruments requires stronger conceptual underpinning, clearer definitions of the substantive variables for measurement and hypothesis-testing experimental designs. © The Author(s) 2013.