Random Forest (RF) Wrappers for Waveband Selection and Classification of Hyperspectral Data

N.K. Poona, Adriaan Van Niekerk, R.L. Nadel, R. Ismail

    Research output: Contribution to journalArticle

    12 Citations (Scopus)

    Abstract

    © Society for Applied Spectroscopy. © The Author(s) 2015. Hyperspectral data collected using a field spectroradiometer was used to model asymptomatic stress in Pinus radiata and Pinus patula seedlings infected with the pathogen Fusarium circinatum. Spectral data were analyzed using the random forest algorithm. To improve the classification accuracy of the model, subsets of wavebands were selected using three feature selection algorithms: (1) Boruta; (2) recursive feature elimination (RFE); and (3) area under the receiver operating characteristic curve of the random forest (AUC-RF). Results highlighted the robustness of the above feature selection methods when used in conjunction with the random forest algorithm for analyzing hyperspectral data. Overall, the Boruta feature selection algorithm provided the best results. When discriminating F. circinatum stress in Pinus radiata seedlings, Boruta selected wavebands (n = 69) yielded the best overall classification accuracies (training error of 17.00%, independent test error of 17.00% and an AUC value of 0.91). Classification results were, however, significantly lower for P. patula seedlings, with a training error of 24.00%, independent test error of 38.00%, and an AUC value of 0.65. A hybrid selection method that utilizes combinations of wavebands selected from the three feature selection algorithms was also tested. The hybrid method showed an improvement in classification accuracies for P. patula, and no improvement for P. radiata. The results of this study provide impetus towards implementing a hyperspectral framework for detecting stress within nursery environments.
    Original languageEnglish
    Pages (from-to)322-333
    JournalApplied Spectroscopy
    Volume70
    Issue number2
    DOIs
    Publication statusPublished - 2016

    Fingerprint

    Feature extraction
    education
    spectroradiometers
    pathogens
    Pathogens
    set theory
    elimination
    receivers
    Spectroscopy
    curves
    spectroscopy

    Cite this

    Poona, N.K. ; Van Niekerk, Adriaan ; Nadel, R.L. ; Ismail, R. / Random Forest (RF) Wrappers for Waveband Selection and Classification of Hyperspectral Data. In: Applied Spectroscopy. 2016 ; Vol. 70, No. 2. pp. 322-333.
    @article{a625959b74c84fecad79058beaaee6f4,
    title = "Random Forest (RF) Wrappers for Waveband Selection and Classification of Hyperspectral Data",
    abstract = "{\circledC} Society for Applied Spectroscopy. {\circledC} The Author(s) 2015. Hyperspectral data collected using a field spectroradiometer was used to model asymptomatic stress in Pinus radiata and Pinus patula seedlings infected with the pathogen Fusarium circinatum. Spectral data were analyzed using the random forest algorithm. To improve the classification accuracy of the model, subsets of wavebands were selected using three feature selection algorithms: (1) Boruta; (2) recursive feature elimination (RFE); and (3) area under the receiver operating characteristic curve of the random forest (AUC-RF). Results highlighted the robustness of the above feature selection methods when used in conjunction with the random forest algorithm for analyzing hyperspectral data. Overall, the Boruta feature selection algorithm provided the best results. When discriminating F. circinatum stress in Pinus radiata seedlings, Boruta selected wavebands (n = 69) yielded the best overall classification accuracies (training error of 17.00{\%}, independent test error of 17.00{\%} and an AUC value of 0.91). Classification results were, however, significantly lower for P. patula seedlings, with a training error of 24.00{\%}, independent test error of 38.00{\%}, and an AUC value of 0.65. A hybrid selection method that utilizes combinations of wavebands selected from the three feature selection algorithms was also tested. The hybrid method showed an improvement in classification accuracies for P. patula, and no improvement for P. radiata. The results of this study provide impetus towards implementing a hyperspectral framework for detecting stress within nursery environments.",
    author = "N.K. Poona and {Van Niekerk}, Adriaan and R.L. Nadel and R. Ismail",
    year = "2016",
    doi = "10.1177/0003702815620545",
    language = "English",
    volume = "70",
    pages = "322--333",
    journal = "Applied Spectroscopy",
    issn = "0003-7028",
    publisher = "Society for Applied Spectroscopy",
    number = "2",

    }

    Random Forest (RF) Wrappers for Waveband Selection and Classification of Hyperspectral Data. / Poona, N.K.; Van Niekerk, Adriaan; Nadel, R.L.; Ismail, R.

    In: Applied Spectroscopy, Vol. 70, No. 2, 2016, p. 322-333.

    Research output: Contribution to journalArticle

    TY - JOUR

    T1 - Random Forest (RF) Wrappers for Waveband Selection and Classification of Hyperspectral Data

    AU - Poona, N.K.

    AU - Van Niekerk, Adriaan

    AU - Nadel, R.L.

    AU - Ismail, R.

    PY - 2016

    Y1 - 2016

    N2 - © Society for Applied Spectroscopy. © The Author(s) 2015. Hyperspectral data collected using a field spectroradiometer was used to model asymptomatic stress in Pinus radiata and Pinus patula seedlings infected with the pathogen Fusarium circinatum. Spectral data were analyzed using the random forest algorithm. To improve the classification accuracy of the model, subsets of wavebands were selected using three feature selection algorithms: (1) Boruta; (2) recursive feature elimination (RFE); and (3) area under the receiver operating characteristic curve of the random forest (AUC-RF). Results highlighted the robustness of the above feature selection methods when used in conjunction with the random forest algorithm for analyzing hyperspectral data. Overall, the Boruta feature selection algorithm provided the best results. When discriminating F. circinatum stress in Pinus radiata seedlings, Boruta selected wavebands (n = 69) yielded the best overall classification accuracies (training error of 17.00%, independent test error of 17.00% and an AUC value of 0.91). Classification results were, however, significantly lower for P. patula seedlings, with a training error of 24.00%, independent test error of 38.00%, and an AUC value of 0.65. A hybrid selection method that utilizes combinations of wavebands selected from the three feature selection algorithms was also tested. The hybrid method showed an improvement in classification accuracies for P. patula, and no improvement for P. radiata. The results of this study provide impetus towards implementing a hyperspectral framework for detecting stress within nursery environments.

    AB - © Society for Applied Spectroscopy. © The Author(s) 2015. Hyperspectral data collected using a field spectroradiometer was used to model asymptomatic stress in Pinus radiata and Pinus patula seedlings infected with the pathogen Fusarium circinatum. Spectral data were analyzed using the random forest algorithm. To improve the classification accuracy of the model, subsets of wavebands were selected using three feature selection algorithms: (1) Boruta; (2) recursive feature elimination (RFE); and (3) area under the receiver operating characteristic curve of the random forest (AUC-RF). Results highlighted the robustness of the above feature selection methods when used in conjunction with the random forest algorithm for analyzing hyperspectral data. Overall, the Boruta feature selection algorithm provided the best results. When discriminating F. circinatum stress in Pinus radiata seedlings, Boruta selected wavebands (n = 69) yielded the best overall classification accuracies (training error of 17.00%, independent test error of 17.00% and an AUC value of 0.91). Classification results were, however, significantly lower for P. patula seedlings, with a training error of 24.00%, independent test error of 38.00%, and an AUC value of 0.65. A hybrid selection method that utilizes combinations of wavebands selected from the three feature selection algorithms was also tested. The hybrid method showed an improvement in classification accuracies for P. patula, and no improvement for P. radiata. The results of this study provide impetus towards implementing a hyperspectral framework for detecting stress within nursery environments.

    U2 - 10.1177/0003702815620545

    DO - 10.1177/0003702815620545

    M3 - Article

    VL - 70

    SP - 322

    EP - 333

    JO - Applied Spectroscopy

    JF - Applied Spectroscopy

    SN - 0003-7028

    IS - 2

    ER -