TY - JOUR
T1 - The value of gene-based selection of tag SNPs in genome-wide association studies
AU - Wiltshire, Steve
AU - De Bakker, P.I.W.
AU - Daly, M.J.
PY - 2006
Y1 - 2006
N2 - Genome-wide association scans are rapidly becoming reality, but there is no present consensus regarding genotyping strategies to optimise the discovery of true genetic risk factors. For a given investment in genotyping, should tag SNPs be selected in a gene-centric manner, or instead, should coverage be optimised based on linkage disequilibrium alone? We explored this question using empirical data from the HapMap-ENCODE project, and we found that tags designed specifically to capture common variation in exonic and evolutionarily conserved regions provide good coverage for 15 - 30% of the total common variation (depending on the population sample studied), and yield genotype savings compared with an anonymous tagging approach that captures all common variation. However, the same number of tags based on linkage disequilibrium alone captures substantially more (30 - 46%) of the total common variation. Therefore, the best strategy depends crucially on the unknown degree to which functional variation resides in recognisable exons and evolutionarily conserved sequence. A hypothetical but reasonable scenario might be one in which trait-causing variation is equally distributed between exons plus conserved sequence, and the rest of the genome. In this scenario, our analysis suggests that a tagging approach that captures variation in exons and conserved sequence provides only modestly better coverage of putatively causal variation than does anonymous tagging. In HapMap CEU samples (with northern and western European ancestry), we observed roughly equivalent coverage for equal investment for both tagging strategies.
AB - Genome-wide association scans are rapidly becoming reality, but there is no present consensus regarding genotyping strategies to optimise the discovery of true genetic risk factors. For a given investment in genotyping, should tag SNPs be selected in a gene-centric manner, or instead, should coverage be optimised based on linkage disequilibrium alone? We explored this question using empirical data from the HapMap-ENCODE project, and we found that tags designed specifically to capture common variation in exonic and evolutionarily conserved regions provide good coverage for 15 - 30% of the total common variation (depending on the population sample studied), and yield genotype savings compared with an anonymous tagging approach that captures all common variation. However, the same number of tags based on linkage disequilibrium alone captures substantially more (30 - 46%) of the total common variation. Therefore, the best strategy depends crucially on the unknown degree to which functional variation resides in recognisable exons and evolutionarily conserved sequence. A hypothetical but reasonable scenario might be one in which trait-causing variation is equally distributed between exons plus conserved sequence, and the rest of the genome. In this scenario, our analysis suggests that a tagging approach that captures variation in exons and conserved sequence provides only modestly better coverage of putatively causal variation than does anonymous tagging. In HapMap CEU samples (with northern and western European ancestry), we observed roughly equivalent coverage for equal investment for both tagging strategies.
U2 - 10.1038/sj.ejhg.5201678
DO - 10.1038/sj.ejhg.5201678
M3 - Article
SN - 1018-4813
VL - 14
SP - 1209
EP - 1214
JO - European Journal of Human Genetics
JF - European Journal of Human Genetics
ER -