DefinePK

DefinePK hosts the largest index of Pakistani journals, research articles, news headlines, and videos. It also offers chapter-level book search.

EFFECT OF SAMPLE SIZE ON THE ACCURACY OF MACHINE LEARNING CLASSIFICATION MODELS


Article Information

Title: EFFECT OF SAMPLE SIZE ON THE ACCURACY OF MACHINE LEARNING CLASSIFICATION MODELS

Authors: Roidar khan, Shehzad khan, Naseemullah, Aasim Ullah, Atif khan

Journal: Spectrum of Engineering Sciences

HEC Recognition History
Category From To
Y 2024-10-01 2025-12-31

Publisher: Sociology Educational Nexus Research Institute

Country: Pakistan

Year: 2025

Volume: 3

Issue: 7

Language: en

Keywords: Machine learningAccuracyClassification AlgorithmsPredictive PerformanceSample size

Categories

Abstract

The reliability and effectiveness of machine learning classification models are heavily influenced by the size of the training dataset. This study examines the impact of varying sample sizes on the predictive performance of five widely used classification algorithms: Logistic Regression, Decision Tree, Random Forest, Support Vector Machine (SVM), and Naïve Bayes. Using simulated datasets ranging from 50 to 5000 samples, each model was evaluated based on four key performance metrics: Accuracy, Precision, Recall, and F1-score. The analysis reveals that while all models benefit from increased data, their sensitivity to sample size varies significantly. Logistic Regression and SVM present consistent and robust performance across all sample sizes, whereas Naïve Bayes performs surprisingly well even with limited data. In contrast, Decision Trees display instability in smaller datasets but show notable improvement at larger scales. Random Forests, though slower to improve, achieve competitive results as sample size increases. These findings provide valuable insights for practitioners selecting algorithms under varying data availability conditions and emphasize the importance of aligning model complexity with dataset size to achieve optimal classification performance.


Paper summary is not available for this article yet.

Loading PDF...

Loading Statistics...