Cross-validated Between Group PCA Scatterplots: A Solution to Spurious Group Separation?

Andrea Cardini, P. David Polly

Research output: Contribution to journalArticlepeer-review

37 Citations (Scopus)

Abstract

Between group PCA (bgPCA) has been developed to summarize group differences in high dimensional spaces like in geometric morphometrics and microarray data where the number of variables is often larger than sample size. However, it has been very recently shown that this technique inflates apparent differences as seen in scatterplots and, in extreme cases, can even create differences where there are none, an effect that becomes more exaggerated as dimensionality increases. In this study, we explore whether leave-one-out cross-validated scatterplots, in which cross-validated scores are used to construct the final ordination instead of the conventional ones, can mitigate the issue. Using simulated data with both isotropic variation or covariance, and increasing the number of variables, we show that cross-validated bgPCs reduce but do not completely remove the distortion of mean differences. However, although scatterplots might still depict inaccurate relationships between group means and must therefore be interpreted with great caution, cross-validation largely solves the issue of spurious separation. Thus, cross-validated bgPCA offers a big improvement for faithfully summarizing overlap or separation among groups in high dimensional spaces and its results will be largely consistent with distance-based permutation tests of significance for group mean differences in the full data space.

Original languageEnglish
Pages (from-to)85-95
Number of pages11
JournalEvolutionary Biology
Volume47
Issue number1
DOIs
Publication statusPublished - 1 Mar 2020

Fingerprint

Dive into the research topics of 'Cross-validated Between Group PCA Scatterplots: A Solution to Spurious Group Separation?'. Together they form a unique fingerprint.

Cite this