A Comprehensive Comparison of Machine Learning and Feature Selection Methods for Maize Biomass Estimation Using Sentinel-1 SAR, Sentinel-2 Vegetation Indices, and Biophysical Variables

Chi Xu, Yanling Ding, Xingming Zheng, Yeqiao Wang, Rui Zhang, Hongyan Zhang, Zewen Dai, Qiaoyun Xie

Research output: Contribution to journalArticlepeer-review

10 Citations (Scopus)


Rapid and accurate estimation of maize biomass is critical for predicting crop productivity. The launched Sentinel-1 (S-1) synthetic aperture radar (SAR) and Sentinel-2 (S-2) missions offer a new opportunity to map biomass. The selection of appropriate response variables is crucial for improving the accuracy of biomass estimation. We developed models from SAR polarization indices, vegetation indices (VIs), and biophysical variables (BPVs) based on gaussian process regression (GPR) and random forest (RF) with feature optimization to retrieve maize biomass in Changchun, Jilin province, Northeastern China. Three new predictors from each type of remote sensing data were proposed based on the correlations to biomass measured in June, July, and August 2018. The results showed that a predictor combined by vertical-horizontal polarization (VV), vertical-horizontal polarization (VH), and the difference of VH and VV (VH-VV) derived from S-1 images of June, July, and August, respectively, with GPR and RF, provided a more accurate estimation of biomass (R2 = 0.81–0.83, RMSE = 0.40–0.41 kg/m2) than the models based on single SAR polarization indices or their combinations, or optimized features (R2 = 0.04–0.39, RMSE = 0.84–1.08 kg/m2). Among the S-2 VIs, the GPR model using a combination of ratio vegetation index (RVI) of June, normalized different infrared index (NDII) of July, and normalized difference vegetation index (NDVI) of August achieved a result with R2 = 0.83 and RMSE = 0.39 kg/m2, much better than single VIs or their combination, or optimized features (R2 of 0.31–0.77, RMSE of 0.47–0.87 kg/m2). A BPV predictor, combined with leaf chlorophyll content (CAB) in June, canopy water content (CWC) in July, and fractional vegetation cover (FCOVER) in August, with RF, also yielded the highest accuracy (R2 = 0.85, RMSE = 0.38 kg/m2) compared to that of single BPVs or their combinations, or optimized subset. Overall, the three combined predictors were found to be significant contributors to improving the estimation accuracy of biomass with GPR and RF methods. This study clearly sheds new insights on the application of S-1 and S-2 data on maize biomass modeling.

Original languageEnglish
Article number4083
JournalRemote Sensing
Issue number16
Publication statusPublished - 20 Aug 2022
Externally publishedYes

Cite this