DefinePK

DefinePK hosts the largest index of Pakistani journals, research articles, news headlines, and videos. It also offers chapter-level book search.

An Explainable Identifier of iGHBPs Peptides Based on Deep PSSM Features and Learning Approaches


Article Information

Title: An Explainable Identifier of iGHBPs Peptides Based on Deep PSSM Features and Learning Approaches

Authors: Rahu Sikander, Mujeebu Rehman, Tarique Ali Brohi, Arif Ahmed, Ali Ghulam, Sultan Ahmed

Journal: Insights-Journal of Health and Rehabilitation

HEC Recognition History
Category From To
Y 2024-10-01 2025-12-31

Publisher: Health And Research Insights (SMC-Private) Limited

Country: Pakistan

Year: 2024

Volume: 2

Issue: 2

Language: English

DOI: 10.71000/7dqqxs92

Keywords: Deep learningDPCACCGRU

Categories

Abstract

Growth hormone can be effectively and non-covalently communicated with by a growth hormone binding protein (GHBP), also referred to as a soluble carrier protein.  Accurately recognizing the GHBP from a certain protein sequence is crucial for comprehending biological processes and cell growth.  In the postgenomic era, a lot of protein sequence data has been gathered, which makes it even more urgent to build an integrated computational method that can quickly and precisely identify possible GHBPs from a huge number of candidate proteins. In this work, we provide iGHBP, a growth hormone binding protein (GHBP) predictor tool.   To date, scant attention has been paid to protein descriptors, such as the amino acid index, which is a collection of 20 numerical values that indicate different physico-chemical and biological attributes of amino acid sequences and Dipeptide Composition (DPC), are used in feature extraction approaches. This study introduces a novel machine learning predictor called accurate computational identification of growth hormone binding proteins (ac-iGHBPs), utilizing an innovative gate recurrent unit (GRU) technique. We performed a cross-validation investigation to demonstrate the effectiveness of our feature selection process, and the results showed that iGHBP had an accuracy of 84.9%, 7% higher than the control very random tree predictor trained with all characteristics. Furthermore, in an objective examination on a different data set, our new iGHBP strategy performed better than the existing method.


Research Objective

To develop an accurate and efficient computational tool, iGHBP, for predicting growth hormone binding proteins (GHBP) using advanced feature extraction and machine learning techniques.


Methodology

The study employed two feature extraction methods: amino acid composition (AAC) and dipeptide composition (DPC), both applied to position-specific scoring matrices (PSSM) generated by PSI-BLAST. Machine learning algorithms, including Gated Recurrent Unit (GRU), Random Forest (RF), and K-Nearest Neighbor (K-NN), were evaluated. Performance was assessed using five-fold and ten-fold cross-validation and an independent dataset, with evaluation metrics including accuracy, sensitivity, specificity, Matthews correlation coefficient (MCC), and area under the curve (AUC).

Methodology Flowchart
                        graph TD
    A["Data Collection & Preparation"] --> B["Feature Extraction: AAC-PSSM, DPC-PSSM"];
    B --> C["Machine Learning Model Training"];
    C --> D["Model Evaluation: Cross-validation, Independent Dataset"];
    D --> E["Performance Assessment: Accuracy, Sensitivity, Specificity, MCC, AUC"];
    E --> F["Comparison with Existing Methods"];
    F --> G["Conclusion: iGHBP Predictor"];                    

Discussion

The study highlights the effectiveness of advanced computational methods, specifically feature mining and machine learning, for predicting GHBPs. The integration of Deep-PSSM with GRU demonstrated superior performance, emphasizing the importance of precise feature selection for high predictive accuracy. The findings suggest potential for advancing GHBP research and facilitating drug discovery. Limitations include a constrained dataset and the need for external validation with larger datasets.


Key Findings

The iGHBP predictor using AAC-PSSM features achieved an accuracy of 95.4%, sensitivity of 91.8%, specificity of 99.1%, and MCC of 92.1%. The DPC-PSSM approach yielded an accuracy of 93.4%, sensitivity of 93.7%, specificity of 93.9%, and MCC of 87.9%. The GRU model, particularly with AAC-PSSM features, outperformed other machine learning models like RF and K-NN.


Conclusion

The iGHBP predictor demonstrated exceptional performance in accurately identifying growth hormone binding proteins, offering a valuable tool for biological research and therapeutic development. Future work will focus on expanding datasets and incorporating advanced validation techniques to further enhance predictive accuracy.


Fact Check

1. The iGHBP predictor achieved an accuracy of 95.4% using AAC-PSSM features. (Confirmed in Results section).
2. The DPC-PSSM approach with GRU achieved an accuracy of 93.4%. (Confirmed in Results section).
3. The study utilized a dataset of 123 proteins initially, and constructed an independent dataset of 46 true positives and 46 negative samples. (Confirmed in Methods section).


Mind Map

Loading PDF...

Loading Statistics...