TY - JOUR
T1 - Prediction of multivariate responses with a selected number of principal components
AU - Koch, Inge
AU - Naito, Kanta
PY - 2010/7/1
Y1 - 2010/7/1
N2 - This paper proposes a new method and algorithm for predicting multivariate responses in a regression setting. Research into the classification of high dimension low sample size (HDLSS) data, in particular microarray data, has made considerable advances, but regression prediction for high-dimensional data with continuous responses has had less attention. Recently Bair et al. (2006) proposed an efficient prediction method based on supervised principal component regression (PCR). Motivated by the fact that using a larger number of principal components results in better regression performance, this paper extends the method of Bair et al. in several ways: a comprehensive variable ranking is combined with a selection of the best number of components for PCR, and the new method further extends to regression with multivariate responses. The new method is particularly suited to addressing HDLSS problems. Applications to simulated and real data demonstrate the performance of the new method. Comparisons with the findings of Bair et al. (2006) show that for high-dimensional data in particular the new ranking results in a smaller number of predictors and smaller errors.
AB - This paper proposes a new method and algorithm for predicting multivariate responses in a regression setting. Research into the classification of high dimension low sample size (HDLSS) data, in particular microarray data, has made considerable advances, but regression prediction for high-dimensional data with continuous responses has had less attention. Recently Bair et al. (2006) proposed an efficient prediction method based on supervised principal component regression (PCR). Motivated by the fact that using a larger number of principal components results in better regression performance, this paper extends the method of Bair et al. in several ways: a comprehensive variable ranking is combined with a selection of the best number of components for PCR, and the new method further extends to regression with multivariate responses. The new method is particularly suited to addressing HDLSS problems. Applications to simulated and real data demonstrate the performance of the new method. Comparisons with the findings of Bair et al. (2006) show that for high-dimensional data in particular the new ranking results in a smaller number of predictors and smaller errors.
KW - Dimension selection
KW - Principal component regression
KW - Supervised learning
KW - Variable ranking
UR - http://www.scopus.com/inward/record.url?scp=77949568746&partnerID=8YFLogxK
U2 - 10.1016/j.csda.2010.01.030
DO - 10.1016/j.csda.2010.01.030
M3 - Article
AN - SCOPUS:77949568746
SN - 0167-9473
VL - 54
SP - 1791
EP - 1807
JO - Computational Statistics and Data Analysis
JF - Computational Statistics and Data Analysis
IS - 7
ER -