DefinePK

DefinePK hosts the largest index of Pakistani journals, research articles, news headlines, and videos. It also offers chapter-level book search.

Map reduce based bag of phrases representation and distributional features incorporation for text classification


Article Information

Title: Map reduce based bag of phrases representation and distributional features incorporation for text classification

Authors: M. Janaki Meena

Journal: ARPN Journal of Engineering and Applied Sciences

HEC Recognition History
Category From To
Y 2023-07-01 2024-09-30
Y 2022-07-01 2023-06-30
Y 2021-07-01 2022-06-30
X 2020-07-01 2021-06-30

Publisher: Khyber Medical College, Peshawar

Country: Pakistan

Year: 2018

Volume: 13

Issue: 11

Language: English

Categories

Abstract

Text classification is the basis step for developing intelligent information systems such as language identification, biography generation, authorship verification, content filtering, search personalization, product classification, sentiment analysis, detection of malicious activities, patent classification and opinion mining. From early 90’s various machine learning approaches have been applied to text classification. Document representation is the process of converting raw documents into a set of features that shall be fed into machine learning algorithms. Features for applying machine learning algorithms to text corpus shall be words, n-grams (phrases) or synsets. Distribution of features in a document is also important for deciding their importance. In this research, a MapReduce based bag of phrases representation is used for classifying text using Naļve Bayes Classifier. The proposed feature selection algorithm is converted to MapReduce programming model and the results are discussed. Precision and recall are metrics that are used in this research to compare the results. It has been observed that bag of phrases representation gives better accuracy for technical documents and including distributional features improves the accuracy of the classifier.


Paper summary is not available for this article yet.

Loading PDF...

Loading Statistics...