Crop Yield Prediction Based on Bacterial Biomarkers and Machine Learning

Li Ma, Wenquan Niu, Guochun Li, Yadan Du, Jun Sun, Kadambot H.M. Siddique

Research output: Contribution to journalArticlepeer-review


Bacteria serve as a holistic indicator of soil fertility by incorporating both biotic and abiotic aspects of past and present ecosystems. However, a research gap still exists in yield prediction models based on simple and reliable bacterial indicators. This study aims to explore whether machine learning, deep learning, and bacterial biomarker communities can be used to accurately predict crop yields. Soil moisture, nutrients, and bacterial community under different irrigation (I0, I1, I2) and fertilization (N0, N1, N2, N3, O1, O2, O3) treatments were measured using soil physicochemical properties analysis method and high-throughput sequencing approach to predict crop yield using Random Forest (RF), Extreme Gradient Boosting (XGBoost), and Back Propagation Neural Network (BPNN) models. RF and XGBoost were superior in modeling yield, with R2 values of 0.813 and 0.818 and RMSE values of 969.420 kg ha–1 and 957.000 kg ha–1, respectively, outperforming BPNN (R2 = 0.541, RMSE = 1,519.680 kg ha–1). Soil organic carbon and bacterial biomarkers are most influential factors on yield with importance of 21.54% and 40.31%, respectively. Removing the bacterial biomarker community significantly decreased models’ R2 by 15.37–56.19%, whereas removing the overall bacterial community decreased RF model R2 by 0.51% and increased the R2 of XGBoost and BPNN by 1.89% and 12.75%, respectively. These findings demonstrate the feasibility of constructing yield prediction models based on bacterial communities and emphasize the importance role of bacterial biomarkers in yield prediction for the first time. The RF and XGBoost models should be prioritized when predicting yield.

Original languageEnglish
Number of pages17
JournalJournal of Soil Science and Plant Nutrition
Early online date27 Mar 2024
Publication statusPublished - 2024


Dive into the research topics of 'Crop Yield Prediction Based on Bacterial Biomarkers and Machine Learning'. Together they form a unique fingerprint.

Cite this