TY - JOUR
T1 - Cross-validated Between Group PCA Scatterplots
T2 - A Solution to Spurious Group Separation?
AU - Cardini, Andrea
AU - Polly, P. David
PY - 2020/3/1
Y1 - 2020/3/1
N2 - Between group PCA (bgPCA) has been developed to summarize group differences in high dimensional spaces like in geometric morphometrics and microarray data where the number of variables is often larger than sample size. However, it has been very recently shown that this technique inflates apparent differences as seen in scatterplots and, in extreme cases, can even create differences where there are none, an effect that becomes more exaggerated as dimensionality increases. In this study, we explore whether leave-one-out cross-validated scatterplots, in which cross-validated scores are used to construct the final ordination instead of the conventional ones, can mitigate the issue. Using simulated data with both isotropic variation or covariance, and increasing the number of variables, we show that cross-validated bgPCs reduce but do not completely remove the distortion of mean differences. However, although scatterplots might still depict inaccurate relationships between group means and must therefore be interpreted with great caution, cross-validation largely solves the issue of spurious separation. Thus, cross-validated bgPCA offers a big improvement for faithfully summarizing overlap or separation among groups in high dimensional spaces and its results will be largely consistent with distance-based permutation tests of significance for group mean differences in the full data space.
AB - Between group PCA (bgPCA) has been developed to summarize group differences in high dimensional spaces like in geometric morphometrics and microarray data where the number of variables is often larger than sample size. However, it has been very recently shown that this technique inflates apparent differences as seen in scatterplots and, in extreme cases, can even create differences where there are none, an effect that becomes more exaggerated as dimensionality increases. In this study, we explore whether leave-one-out cross-validated scatterplots, in which cross-validated scores are used to construct the final ordination instead of the conventional ones, can mitigate the issue. Using simulated data with both isotropic variation or covariance, and increasing the number of variables, we show that cross-validated bgPCs reduce but do not completely remove the distortion of mean differences. However, although scatterplots might still depict inaccurate relationships between group means and must therefore be interpreted with great caution, cross-validation largely solves the issue of spurious separation. Thus, cross-validated bgPCA offers a big improvement for faithfully summarizing overlap or separation among groups in high dimensional spaces and its results will be largely consistent with distance-based permutation tests of significance for group mean differences in the full data space.
KW - Classification
KW - Covariance
KW - Geometric morphometrics
KW - Group differences
KW - Multivariate analysis
KW - Sampling error
UR - http://www.scopus.com/inward/record.url?scp=85079174924&partnerID=8YFLogxK
U2 - 10.1007/s11692-020-09494-x
DO - 10.1007/s11692-020-09494-x
M3 - Article
AN - SCOPUS:85079174924
SN - 0071-3260
VL - 47
SP - 85
EP - 95
JO - Evolutionary Biology
JF - Evolutionary Biology
IS - 1
ER -