In extreme cold weather, living organisms produce Antifreeze Proteins (AFPs) to counter the otherwise lethal intracellular formation of ice. Structures and sequences of various AFPs exhibit a high degree of heterogeneity, consequently the prediction of the AFPs is considered to be a challenging task. In this research, we propose to handle this arduous manifold learning task using the notion of localized processing. In particular an AFP sequence is segmented into two sub-segments each of which is analyzed for amino acid and di-peptide compositions. We propose to use only the most significant features using the concept of information gain (IG) followed by a random forest classification approach. The proposed RAFP-Pred achieved an excellent performance on a number of standard datasets. We report a high Youden's index (sensitivity+specificity-1) value of 0.75 on the standard independent test data set outperforming the AFP-PseAAC, AFP PSSM, AFP-Pred and iAFP by a margin of 0.05, 0.06, 0.14 and 0.68 respectively. The verification rate on the UniProKB dataset is found to be 83.19% which is substantially superior to the 57.18% reported for the iAFP method.
|Number of pages||7|
|Journal||IEEE/ACM Transactions on Computational Biology and Bioinformatics|
|Publication status||Published - 1 Jan 2018|