Quantitative characterization of bovine serum albumin thin-films using terahertz spectroscopy and machine learning methods

Yiwen Sun, Pengju Du, Xingxing Lu, Pengfei Xie, Zhengfang Qian, Shuting Fan, Zexuan Zhu

    Research output: Contribution to journalArticle

    3 Citations (Scopus)

    Abstract

    The development of new spectral analysis methods in bio thin-film detection has generated intense interest in terahertz (THz) spectroscopy and its application in a wide range of fields. In this paper, it is the first time that machine learning methods are applied to the quantitative characterization of bovine serum albumin (BSA) deposited thin-films detected by terahertz time-domain spectroscopy. The spectra data of BSA thin-films prepared by solutions with concentrations ranging from 0.5 to 35 mg/ml are analyzed using the support vector regression method to learn the underlying model of the frequency against the target concentration. The learned mode successfully predicts the concentrations of the unknown test samples with a coefficient of determination R2 = 0.97932. Furthermore, aiming to identify the relevance of each frequency to the concentration, the maximal information coefficient statistical analysis is used and the three most discriminating frequencies in THz frequency are identified at 1.2, 1.1 and 0.5 THz respectively, which means a good prediction for BSA concentration can be achieved by using the top three relevant frequencies. Moreover, the top discriminating frequencies are in good agreement with the frequencies predicted by a long-wavelength elastic vibration model for BSA protein.

    Original languageEnglish
    Article number#313259
    Pages (from-to)2917-2929
    Number of pages13
    JournalBiomedical Optics Express
    Volume9
    Issue number7
    DOIs
    Publication statusPublished - 1 Jul 2018

    Fingerprint

    Terahertz Spectroscopy
    machine learning
    Bovine Serum Albumin
    albumins
    serums
    learning
    thin films
    spectroscopy
    Vibration
    Blood Proteins
    Spectrum Analysis
    coefficients
    Machine Learning
    statistical analysis
    spectrum analysis
    regression analysis
    proteins
    vibration

    Cite this

    Sun, Yiwen ; Du, Pengju ; Lu, Xingxing ; Xie, Pengfei ; Qian, Zhengfang ; Fan, Shuting ; Zhu, Zexuan. / Quantitative characterization of bovine serum albumin thin-films using terahertz spectroscopy and machine learning methods. In: Biomedical Optics Express. 2018 ; Vol. 9, No. 7. pp. 2917-2929.
    @article{b0f6f61e90f34e2da1eeda7efdb3681f,
    title = "Quantitative characterization of bovine serum albumin thin-films using terahertz spectroscopy and machine learning methods",
    abstract = "The development of new spectral analysis methods in bio thin-film detection has generated intense interest in terahertz (THz) spectroscopy and its application in a wide range of fields. In this paper, it is the first time that machine learning methods are applied to the quantitative characterization of bovine serum albumin (BSA) deposited thin-films detected by terahertz time-domain spectroscopy. The spectra data of BSA thin-films prepared by solutions with concentrations ranging from 0.5 to 35 mg/ml are analyzed using the support vector regression method to learn the underlying model of the frequency against the target concentration. The learned mode successfully predicts the concentrations of the unknown test samples with a coefficient of determination R2 = 0.97932. Furthermore, aiming to identify the relevance of each frequency to the concentration, the maximal information coefficient statistical analysis is used and the three most discriminating frequencies in THz frequency are identified at 1.2, 1.1 and 0.5 THz respectively, which means a good prediction for BSA concentration can be achieved by using the top three relevant frequencies. Moreover, the top discriminating frequencies are in good agreement with the frequencies predicted by a long-wavelength elastic vibration model for BSA protein.",
    author = "Yiwen Sun and Pengju Du and Xingxing Lu and Pengfei Xie and Zhengfang Qian and Shuting Fan and Zexuan Zhu",
    year = "2018",
    month = "7",
    day = "1",
    doi = "10.1364/BOE.9.002917",
    language = "English",
    volume = "9",
    pages = "2917--2929",
    journal = "Biomedical Optics Express",
    issn = "2156-7085",
    publisher = "Optical Soc Amer",
    number = "7",

    }

    Quantitative characterization of bovine serum albumin thin-films using terahertz spectroscopy and machine learning methods. / Sun, Yiwen; Du, Pengju; Lu, Xingxing; Xie, Pengfei; Qian, Zhengfang; Fan, Shuting; Zhu, Zexuan.

    In: Biomedical Optics Express, Vol. 9, No. 7, #313259, 01.07.2018, p. 2917-2929.

    Research output: Contribution to journalArticle

    TY - JOUR

    T1 - Quantitative characterization of bovine serum albumin thin-films using terahertz spectroscopy and machine learning methods

    AU - Sun, Yiwen

    AU - Du, Pengju

    AU - Lu, Xingxing

    AU - Xie, Pengfei

    AU - Qian, Zhengfang

    AU - Fan, Shuting

    AU - Zhu, Zexuan

    PY - 2018/7/1

    Y1 - 2018/7/1

    N2 - The development of new spectral analysis methods in bio thin-film detection has generated intense interest in terahertz (THz) spectroscopy and its application in a wide range of fields. In this paper, it is the first time that machine learning methods are applied to the quantitative characterization of bovine serum albumin (BSA) deposited thin-films detected by terahertz time-domain spectroscopy. The spectra data of BSA thin-films prepared by solutions with concentrations ranging from 0.5 to 35 mg/ml are analyzed using the support vector regression method to learn the underlying model of the frequency against the target concentration. The learned mode successfully predicts the concentrations of the unknown test samples with a coefficient of determination R2 = 0.97932. Furthermore, aiming to identify the relevance of each frequency to the concentration, the maximal information coefficient statistical analysis is used and the three most discriminating frequencies in THz frequency are identified at 1.2, 1.1 and 0.5 THz respectively, which means a good prediction for BSA concentration can be achieved by using the top three relevant frequencies. Moreover, the top discriminating frequencies are in good agreement with the frequencies predicted by a long-wavelength elastic vibration model for BSA protein.

    AB - The development of new spectral analysis methods in bio thin-film detection has generated intense interest in terahertz (THz) spectroscopy and its application in a wide range of fields. In this paper, it is the first time that machine learning methods are applied to the quantitative characterization of bovine serum albumin (BSA) deposited thin-films detected by terahertz time-domain spectroscopy. The spectra data of BSA thin-films prepared by solutions with concentrations ranging from 0.5 to 35 mg/ml are analyzed using the support vector regression method to learn the underlying model of the frequency against the target concentration. The learned mode successfully predicts the concentrations of the unknown test samples with a coefficient of determination R2 = 0.97932. Furthermore, aiming to identify the relevance of each frequency to the concentration, the maximal information coefficient statistical analysis is used and the three most discriminating frequencies in THz frequency are identified at 1.2, 1.1 and 0.5 THz respectively, which means a good prediction for BSA concentration can be achieved by using the top three relevant frequencies. Moreover, the top discriminating frequencies are in good agreement with the frequencies predicted by a long-wavelength elastic vibration model for BSA protein.

    UR - http://www.scopus.com/inward/record.url?scp=85049366778&partnerID=8YFLogxK

    U2 - 10.1364/BOE.9.002917

    DO - 10.1364/BOE.9.002917

    M3 - Article

    VL - 9

    SP - 2917

    EP - 2929

    JO - Biomedical Optics Express

    JF - Biomedical Optics Express

    SN - 2156-7085

    IS - 7

    M1 - #313259

    ER -