DefinePK

DefinePK hosts the largest index of Pakistani journals, research articles, news headlines, and videos. It also offers chapter-level book search.

DETECTING PHISHING ATTACKS IN CYBERSECURITY USING MACHINE LEARNING WITH DATA PREPROCESSING AND FEATURE ENGINEERING


Article Information

Title: DETECTING PHISHING ATTACKS IN CYBERSECURITY USING MACHINE LEARNING WITH DATA PREPROCESSING AND FEATURE ENGINEERING

Authors: Sohaib Latif, Saher Pervaiz

Journal: Kashf Journal of Multidisciplinary Research (KJMR)

HEC Recognition History
Category From To
Y 2024-10-01 2025-12-31

Publisher: Kashf Institute of Development & Studies

Country: Pakistan

Year: 2025

Volume: 2

Issue: 3

Language: en

DOI: 10.71146/kjmr335

Keywords: Ensemble learningFraud detectionPhishing DetectionEmail SecuritySpam Filtering

Categories

Abstract

Phishing attacks are one of the most persistent cybersecurity threats, evolving rapidly to bypass traditional security measures. Given the widespread use of email for sensitive communications, detecting phishing attempts has become more critical than ever. This study explores the effectiveness of multiple machine learning models in classifying phishing emails using a dataset of 39,000 samples. To enhance accuracy, we employ preprocessing techniques such as feature engineering, vectorization, and class balancing with SMOTE (Synthetic Minority Over-sampling Technique). Our analysis compares various models, including Random Forest, XGBoost, Logistic Regression, Naïve Bayes, and AdaBoost, evaluating their performance using precision, recall, F1-score, and accuracy metrics. The results demonstrate that ensemble learning techniques, particularly XGBoost and Random Forest, significantly outperform other models, achieving accuracy rates as high as 99.00%. These findings reinforce the importance of advanced classification techniques and data preprocessing in phishing detection. Beyond academic implications, our research contributes to strengthening email security, mitigating financial losses, and protecting personal data from cyber threats. Future work could focus on integrating deep learning models and real-time detection systems to further improve accuracy and adaptability.


Paper summary is not available for this article yet.

Loading PDF...

Loading Statistics...