DefinePK hosts the largest index of Pakistani journals, research articles, news headlines, and videos. It also offers chapter-level book search.
Title: Experiments on document clustering in Tamil language
Authors: Syed Sabir Mohamed, Shanmugasundaram Hariharan
Journal: ARPN Journal of Engineering and Applied Sciences
Publisher: Khyber Medical College, Peshawar
Country: Pakistan
Year: 2018
Volume: 13
Issue: 10
Language: English
With the rapid development of the Internet, the number of documents in electronic form is huge and grows day by day. In order to effectively address the modern information overload problem, it is extremely important to organize the documents according to the topic. Commonly, this can be achieved by using clustering techniques. Document clustering is an important tool for applications such as Web search engines. This proposal deals with clustering of Tamil documents. Clustering is an un-supervised learning process that organizes documents or text files into distinct groups without having prior knowledge. This paper uses Vector Space Model to Cluster the documents. Vector Space Model is otherwise known as “Term-Frequency Approach”. Stop Words which are frequent, meaningless terms are removed from the input text document to decrease, the size of the document to be processed. Then the Cosine Similarity Measure is applied to find the similarity between the input text documents. Then clustering is done using K-Medoid Algorithm and optimal number of medoids and corresponding clusters are found.
Loading PDF...
Loading Statistics...