Integrating machine learning and environmental variables to constrain uncertainty in crop yield change projections under climate change

Linchao Li, Yan Zhang, Bin Wang, Puyu Feng, Qinsi He, Yu Shi, Ke Liu, Matthew Tom Harrison, De Li Liu, Ning Yao, Yi Li, Jianqiang He, Hao Feng, Kadambot H.M. Siddique, Qiang Yu

Research output: Contribution to journalArticlepeer-review

2 Citations (Scopus)


Robust crop yield projections under future climates are fundamental prerequisites for reliable policy formation. Both process-based crop models and statistical models are commonly used for this purpose. Process-based models tend to simplify processes, minimize the effects of extreme events, and ignore biotic pressures, while statistical models cannot deterministically capture intricate biological and physiological processes underpinning crop growth. We attempted to integrate and overcome shortcomings in both modelling frameworks by integrating the dynamic linear model (DLM) and random forest machine learning model (RF) with nine global gridded crop models (GGCM), respectively, in order to improve projections and reduce uncertainties of maize (Zea mays L.) and soybean (Glycine max [L.] Merrill) yield projections. Our results demonstrated substantial improvements in model performance accuracy by using RF in concert with GGCM across China's maize and soybean belt. This improvement surpasses that achieved using DLM. For maize, the GGCM+RF models increased the r values from 0.15 to 0.61–0.64–0.77 and decreased nRMSE from approximately 0.20 to 0.50–0.13–0.17 compared with using GGCM alone. For soybean, the models increased r from 0.37 to 0.70–0.54–0.70 and decreased nRMSE from 0.17 to 0.35–0.17–0.20 compared with using GGCM alone. The main factors influencing maize yield changes included chilling days (CD), crop pests and diseases (CPDs), and drought, while for soybean the primary influencing factors included CPD, tropical days (based on exceeding a maximum temperature), and drought. Our approach decreased uncertainties by 33–78% for maize and by 56–68% for soybean. The main source of uncertainty for GGCM was the crop model. For GGCM+RF, the main source of uncertainty for the 2040–2069 period was the global climate model, while the main source of uncertainty for the 2070–2099 period was the climate scenario. Our results provide a novel, robust, and pragmatic framework to constrain uncertainties in order to accurately assess the impact of future climate change on crop yields. These results could be used to interpret future ensemble studies by accounting for uncertainty in crop and climate models, as well as to assess future emissions scenarios.

Original languageEnglish
Article number126917
JournalEuropean Journal of Agronomy
Publication statusPublished - Sept 2023


Dive into the research topics of 'Integrating machine learning and environmental variables to constrain uncertainty in crop yield change projections under climate change'. Together they form a unique fingerprint.

Cite this