DefinePK

DefinePK hosts the largest index of Pakistani journals, research articles, news headlines, and videos. It also offers chapter-level book search.

W-rank: A keyphrase extraction method for webpage based on linguistics and DOM-base features


Article Information

Title: W-rank: A keyphrase extraction method for webpage based on linguistics and DOM-base features

Authors: Himat Shah, Shafique Ahmed, Anwar Ali Sathio, Asadullah Burdi

Journal: VAWKUM Transactions on Computer Sciences

HEC Recognition History
Category From To
Y 2024-10-01 2025-12-31
Y 2023-07-01 2024-09-30
Y 2022-07-01 2023-06-30

Publisher: VFAST-Research Platform

Country: Pakistan

Year: 2023

Volume: 11

Issue: 1

Language: English

DOI: 10.21015/vtcs.v11i1.1493

Categories

Abstract

This paper addresses the problem of an automatic keyphrase extraction for a webpage text. Our method is unsupervised, and we call it W-rank. In our method, first we extract the text of a webpage and tokenize into three different candidate words list: unigram ,bigrams and noun phrases. Then we assign score to all words based on their individual appearance in linguistic and DOM-based feature sets. In the  final step, we rank these candidate words using score and select top 5 keyphrase from each list and combine them as a final keyphrases for a given webpage. We focus more on the relevancy of keyphrases to its content using linguistic features. We compare our method with other methods using precision, recall and f-score. The experimental result shows, W-rank improves the performance of our previous method D-rank and outperforms other state of art methods.


Paper summary is not available for this article yet.

Loading PDF...

Loading Statistics...