DefinePK

DefinePK hosts the largest index of Pakistani journals, research articles, news headlines, and videos. It also offers chapter-level book search.

Focused crawling of online business Web pages using latent semantic indexing approach


Article Information

Title: Focused crawling of online business Web pages using latent semantic indexing approach

Authors: Thamer Salah, Sabrina Tiun

Journal: ARPN Journal of Engineering and Applied Sciences

HEC Recognition History
Category From To
Y 2023-07-01 2024-09-30
Y 2022-07-01 2023-06-30
Y 2021-07-01 2022-06-30
X 2020-07-01 2021-06-30

Publisher: Khyber Medical College, Peshawar

Country: Pakistan

Year: 2016

Volume: 11

Issue: 15

Language: English

Categories

Abstract

With the exponential growth of textual information available from the Internet, there has been an emergent need to find relevant, in-time and in-depth knowledge about business topic. The huge size of such data makes the process of retrieving and analyzing and use of the valuable information in such texts manually a very difficult task. In this paper, we attempt to address a challenging task i.e. a crawling business-specific knowledge on the Web. To do that, the main goal of this paper is to describe a new method of focused crawling with latent semantic indexing for online business web pages. We describe a new model for online business text crawling which seeks, acquires, maintains and filter business pages. This model consists mainly from two main modules: a crawling system and a text filtering system. The crawler is used to collect as many web pages as possible from the news websites. This focused crawler is guided by a latent semantic index and information from Word Net (business filter) which learns to recognize the relevance of a web page with respect to the business topic and it is also utilized a set of domain specific keywords. The obtained results also on online real word data show that the focused crawler is very effective for building high-quality collections of business Web documents.


Paper summary is not available for this article yet.

Loading PDF...

Loading Statistics...