DefinePK

DefinePK hosts the largest index of Pakistani journals, research articles, news headlines, and videos. It also offers chapter-level book search.

Experiments on document clustering in Tamil language


Article Information

Title: Experiments on document clustering in Tamil language

Authors: Syed Sabir Mohamed, Shanmugasundaram Hariharan

Journal: ARPN Journal of Engineering and Applied Sciences

HEC Recognition History
Category From To
Y 2023-07-01 2024-09-30
Y 2022-07-01 2023-06-30
Y 2021-07-01 2022-06-30
X 2020-07-01 2021-06-30

Publisher: Khyber Medical College, Peshawar

Country: Pakistan

Year: 2018

Volume: 13

Issue: 10

Language: English

Categories

Abstract

With the rapid development of the Internet, the number of documents in electronic form is huge and grows day by day. In order to effectively address the modern information overload problem, it is extremely important to organize the documents according to the topic. Commonly, this can be achieved by using clustering techniques. Document clustering is an important tool for applications such as Web search engines. This proposal deals with clustering of Tamil documents. Clustering is an un-supervised learning process that organizes documents or text files into distinct groups without having prior knowledge. This paper uses Vector Space Model to Cluster the documents. Vector Space Model is otherwise known as “Term-Frequency Approach”. Stop Words which are frequent, meaningless terms are removed from the input text document to decrease, the size of the document to be processed. Then the Cosine Similarity Measure is applied to find the similarity between the input text documents. Then clustering is done using K-Medoid Algorithm and optimal number of medoids and corresponding clusters are found.


Paper summary is not available for this article yet.

Loading PDF...

Loading Statistics...