Clustering Algorithms: An Investigation of K-mean and DBSCAN on Different Datasets

Article Information

Get PDF View Article

Title: Clustering Algorithms: An Investigation of K-mean and DBSCAN on Different Datasets

Authors: Arooj Zahra, Nabeel Asghar

Journal: Machines and Algorithms

HEC Recognition History

No recognition records found.

Year: 2023

Volume: 2

Issue: 2

Language: en

Keywords: Unsupervised machine learning; Clustering algorithms; DB-SCAN; K-Means; Classifiers;

Abstract

The branch of artificial intelligence that studies computer techniques that allow systems to learn autonomously and deliver outcomes based on past experience without being programmed. Supervised and unsupervised machine learning are major categories. Our research focuses on unsupervised learning with unlabeled data. Clustering is an unsupervised learning method that groups unlabeled data items by similarity. Several studies have compared clustering algorithms based on complexity, performance, and the impact of cluster number on performance. To our knowledge, no study has evaluated clustering methods on small and large datasets. A detailed study was conducted to evaluate DB-SCAN and K-Means algorithms on small and large datasets. We have collected 17 open access, publicly available machine learning heterogeneous datasets from online machine learning dataset sources such as the UCI repository, Keel, and Kaggle. The datasets are divided into small and large categories based on the number of instances in each dataset. Different preprocessing techniques are used to improve the quality of datasets. The class field is removed from the preprocessed datasets and then put into the two clustering techniques outlined above. The clustered data is analyzed using three classifiers (K-Nearest Neighbor, Support Vector Machine, and Naïve Bayes) to evaluate the clustering algorithm's performance. The accuracy of the KNN, SVM, and NB classifiers was calculated as part of the final algorithm performance study. The final analysis of tests found that the K-Means algorithm performs better on large datasets, whereas the DB-SCAN clustering technique is more efficient on small datasets.

Paper summary is not available for this article yet.

Loading PDF...

Loading Statistics...

DefinePK

Clustering Algorithms: An Investigation of K-mean and DBSCAN on Different Datasets

Article Information

HEC Recognition History

Categories

Abstract

DefinePK

Select Collection

Clustering Algorithms: An Investigation of K-mean and DBSCAN on Different Datasets

Article Information

HEC Recognition History

Categories

Abstract