DefinePK

DefinePK hosts the largest index of Pakistani journals, research articles, news headlines, and videos. It also offers chapter-level book search.

Utilizing lexical relationship in term-based similarity measure to improve Indonesian short text classification


Article Information

Title: Utilizing lexical relationship in term-based similarity measure to improve Indonesian short text classification

Authors: Husni Thamrin, Atiqa Sabardila

Journal: ARPN Journal of Engineering and Applied Sciences

HEC Recognition History
Category From To
Y 2023-07-01 2024-09-30
Y 2022-07-01 2023-06-30
Y 2021-07-01 2022-06-30
X 2020-07-01 2021-06-30

Publisher: Khyber Medical College, Peshawar

Country: Pakistan

Year: 2016

Volume: 11

Issue: 22

Language: English

Categories

Abstract

This paper compares the performance of text similarity algorithms that use pure cosine function and two others that use Dice function and considers word relatedness. Relatedness of two words is determined in a case by looking at lexical relationship, and in another case by looking at the co-occurrences of two words in a corpus. Text similarity score is used in classification of Indonesian short texts using k-nearest neighbour. The study employed more than 150 short texts, of which 112 were used in learning and 43 were used for testing. The short texts were sentences or phrases from a SWOT (strength, weakness, opportunity and threat) analysis of an organization. Manual classification of the SWOT issues was conducted by the organization and the result was treated as classification target. Our research shows that the factor of word relatedness in semantic vectors increase the level of sentence similarity score and it enhances the performance of text classification. Without word relatedness, the F-Measure of k-nearest neighbour classification algorithm is 0.39. Inclusion of word relatedness using lexical relationship in a classification algorithm improve F-Measure as high as 0.595, while word relatedness based on co-occurrences increases F-Measure to a level of 0.4.


Paper summary is not available for this article yet.

Loading PDF...

Loading Statistics...