W-rank: A keyphrase extraction method for webpage based on linguistics and DOM-base features

Article Information

Title: W-rank: A keyphrase extraction method for webpage based on linguistics and DOM-base features

Authors: Himat Shah, Shafique Ahmed, Anwar Ali Sathio, Asadullah Burdi

Journal: VAWKUM Transactions on Computer Sciences

HEC Recognition History

Category	From	To
Y	2024-10-01	2025-12-31
Y	2023-07-01	2024-09-30
Y	2022-07-01	2023-06-30

Publisher: VFAST-Research Platform

Country: Pakistan

Year: 2023

Volume: 11

Issue: 1

Language: English

DOI: 10.21015/vtcs.v11i1.1493

Abstract

This paper addresses the problem of an automatic keyphrase extraction for a webpage text. Our method is unsupervised, and we call it W-rank. In our method, first we extract the text of a webpage and tokenize into three different candidate words list: unigram ,bigrams and noun phrases. Then we assign score to all words based on their individual appearance in linguistic and DOM-based feature sets. In the final step, we rank these candidate words using score and select top 5 keyphrase from each list and combine them as a final keyphrases for a given webpage. We focus more on the relevancy of keyphrases to its content using linguistic features. We compare our method with other methods using precision, recall and f-score. The experimental result shows, W-rank improves the performance of our previous method D-rank and outperforms other state of art methods.

Paper summary is not available for this article yet.

Loading PDF...

Loading Statistics...

DefinePK

W-rank: A keyphrase extraction method for webpage based on linguistics and DOM-base features

Article Information

HEC Recognition History

Categories

Abstract

DefinePK

Select Collection

W-rank: A keyphrase extraction method for webpage based on linguistics and DOM-base features

Article Information

HEC Recognition History

Categories

Abstract