The debate on the causal association between vitamin D status, measured as serum concentration of 25-hydroxyvitamin D (25[OH]D), and various health outcomes warrants investigation in large-scale health surveys. Measuring the 25(OH)D concentration for each participant is not always feasible, because of the logistics of blood collection and the costs of vitamin D testing. To address this problem, past research has used predicted 25(OH)D concentration, based on multivariable linear regression, as a proxy for unmeasured vitamin D status. We restate this approach in a mathematical framework, to deduce its possible pitfalls. Monte Carlo simulation and real data from the National Health and Nutrition Examination Survey 2005–06 are used to confirm the deductions. The results indicate that variables that are used in the prediction model (for 25[OH]D concentration) but not in the model for the health outcome (called instrumental variables), play an essential role in the identification of an effect. Such variables should be unrelated to the health outcome other than through vitamin D; otherwise the estimate of interest will be biased. The approach of predicted 25(OH)D concentration derived from multivariable linear regression may be valid. However, careful verification that the instrumental variables are unrelated to the health outcome is required.
Ding, N., Dear, K., Xiang, F., & Lucas, R. (2015). Tightrope Walking: Using Predictors of 25(OH)D Concentration Based on Multivariable Linear Regression to Infer Associations with Health Risks. PLoS One, 10(5), 1-13. https://doi.org/10.1371/journal.pone.0125551