DefinePK hosts the largest index of Pakistani journals, research articles, news headlines, and videos. It also offers chapter-level book search.
Title: EFFECT OF SAMPLE SIZE ON THE ACCURACY OF MACHINE LEARNING CLASSIFICATION MODELS
Authors: Roidar khan, Shehzad khan, Naseemullah, Aasim Ullah, Atif khan
Journal: Spectrum of Engineering Sciences
| Category | From | To |
|---|---|---|
| Y | 2024-10-01 | 2025-12-31 |
Publisher: Sociology Educational Nexus Research Institute
Country: Pakistan
Year: 2025
Volume: 3
Issue: 7
Language: en
Keywords: Machine learningAccuracyClassification AlgorithmsPredictive PerformanceSample size
The reliability and effectiveness of machine learning classification models are heavily influenced by the size of the training dataset. This study examines the impact of varying sample sizes on the predictive performance of five widely used classification algorithms: Logistic Regression, Decision Tree, Random Forest, Support Vector Machine (SVM), and Naïve Bayes. Using simulated datasets ranging from 50 to 5000 samples, each model was evaluated based on four key performance metrics: Accuracy, Precision, Recall, and F1-score. The analysis reveals that while all models benefit from increased data, their sensitivity to sample size varies significantly. Logistic Regression and SVM present consistent and robust performance across all sample sizes, whereas Naïve Bayes performs surprisingly well even with limited data. In contrast, Decision Trees display instability in smaller datasets but show notable improvement at larger scales. Random Forests, though slower to improve, achieve competitive results as sample size increases. These findings provide valuable insights for practitioners selecting algorithms under varying data availability conditions and emphasize the importance of aligning model complexity with dataset size to achieve optimal classification performance.
Loading PDF...
Loading Statistics...