TY - JOUR
T1 - ECMSRC
T2 - A sparse learning approach for the prediction of extracellular matrix proteins
AU - Naseem, Imran
AU - Khan, Shujaat
AU - Togneri, Roberto
AU - Bennamoun, Mohammed
PY - 2017/8/1
Y1 - 2017/8/1
N2 - Background: The extracellular matrix (ECM) is a dynamic, physiologically active component of all living tissues. It plays a vital role in the functionality of living tissues. The mutation in ECM genes has shown to cause several diseases including cancer. A reliable prediction of the ECM is therefore of prognostic significance. Objective: Since the ECM proteins are closely related to secretory proteins, a number of researchers have investigated the secretory proteins to explore the extensive properties of the ECM but only few of them focus on the classification of ECM and non-ECM proteins. In this research we propose a novel approach for the prediction of the ECM proteins from the protein sequences. Method: Essentially the most discriminant features are selected by maximizing the class relevance and minimizing the redundancy (mRMR) in an information theoretic sense. The sparsity of these discriminant features is harnessed to employ the sparse representation classification (SRC) for prediction of the ECM proteins. Results: The proposed algorithm achieves a test-accuracy of 81.06% on a standard dataset which is superior compared to the EcmPred approach. For the case of prediction of the experimentally verified ECM proteins from humans, we report a verification accuracy of 80% which outperforms the EcmPred approach by a margin of 5%. Conclusion: The ECMSRC outperforms the EcmPred method in test accuracy and Youden's index. Noteworthy is the fact that the it utilizes fewer features compared to EcmPred (40 features) method to achieve this superior performance. The MATLAB implementation of the ECMSRC is available at http://sp.gsse.pafkiet.edu.pk/downloads.
AB - Background: The extracellular matrix (ECM) is a dynamic, physiologically active component of all living tissues. It plays a vital role in the functionality of living tissues. The mutation in ECM genes has shown to cause several diseases including cancer. A reliable prediction of the ECM is therefore of prognostic significance. Objective: Since the ECM proteins are closely related to secretory proteins, a number of researchers have investigated the secretory proteins to explore the extensive properties of the ECM but only few of them focus on the classification of ECM and non-ECM proteins. In this research we propose a novel approach for the prediction of the ECM proteins from the protein sequences. Method: Essentially the most discriminant features are selected by maximizing the class relevance and minimizing the redundancy (mRMR) in an information theoretic sense. The sparsity of these discriminant features is harnessed to employ the sparse representation classification (SRC) for prediction of the ECM proteins. Results: The proposed algorithm achieves a test-accuracy of 81.06% on a standard dataset which is superior compared to the EcmPred approach. For the case of prediction of the experimentally verified ECM proteins from humans, we report a verification accuracy of 80% which outperforms the EcmPred approach by a margin of 5%. Conclusion: The ECMSRC outperforms the EcmPred method in test accuracy and Youden's index. Noteworthy is the fact that the it utilizes fewer features compared to EcmPred (40 features) method to achieve this superior performance. The MATLAB implementation of the ECMSRC is available at http://sp.gsse.pafkiet.edu.pk/downloads.
KW - Extracellular matrix
KW - Human proteome
KW - MRMR feature selection
KW - Pattern classification
KW - Protein prediction
KW - Sparse representation
UR - http://www.scopus.com/inward/record.url?scp=85027320306&partnerID=8YFLogxK
U2 - 10.2174/1574893611666151215213508
DO - 10.2174/1574893611666151215213508
M3 - Article
AN - SCOPUS:85027320306
SN - 1574-8936
VL - 12
SP - 361
EP - 368
JO - Current Bioinformatics
JF - Current Bioinformatics
IS - 4
ER -