DefinePK hosts the largest index of Pakistani journals, research articles, news headlines, and videos. It also offers chapter-level book search.
Title: W-rank: A keyphrase extraction method for webpage based on linguistics and DOM-base features
Authors: Himat Shah, Shafique Ahmed, Anwar Ali Sathio, Asadullah Burdi
Journal: VAWKUM Transactions on Computer Sciences
Publisher: VFAST-Research Platform
Country: Pakistan
Year: 2023
Volume: 11
Issue: 1
Language: English
This paper addresses the problem of an automatic keyphrase extraction for a webpage text. Our method is unsupervised, and we call it W-rank. In our method, first we extract the text of a webpage and tokenize into three different candidate words list: unigram ,bigrams and noun phrases. Then we assign score to all words based on their individual appearance in linguistic and DOM-based feature sets. In the final step, we rank these candidate words using score and select top 5 keyphrase from each list and combine them as a final keyphrases for a given webpage. We focus more on the relevancy of keyphrases to its content using linguistic features. We compare our method with other methods using precision, recall and f-score. The experimental result shows, W-rank improves the performance of our previous method D-rank and outperforms other state of art methods.
Loading PDF...
Loading Statistics...