DefinePK

DefinePK hosts the largest index of Pakistani journals, research articles, news headlines, and videos. It also offers chapter-level book search.

Optical Character Recognition for Nastaleeq Printed Urdu Text using Histogram of Oriented Gradient Features


Article Information

Title: Optical Character Recognition for Nastaleeq Printed Urdu Text using Histogram of Oriented Gradient Features

Authors: Awais Ahmad, Fatima Yousaf, Tanzeela Kousar

Journal: Machines and Algorithms

HEC Recognition History
No recognition records found.

Year: 2024

Volume: 3

Issue: 1

Language: en

Keywords: Support Vector MachineUrdu Languageoptical character recognitionHOG featuresConnected Components

Categories

Abstract

The focus of research on optical character recognition (OCR) has been to digitize text in images. Urdu OCR is a challenging task because of its complexity, where a character can have multiple inflections depending on its position in the word, making it more difficult than English and similar languages. The proposed research aims to detect offline Urdu printed text using a segmentation-free approach, which means a holistic approach is taken. Horizontal histogram projection is used to extract text lines from an image, while connected components labelling is used for ligature segmentation in the extracted image to text line. To train the proposed model, a set of 14 statistical features along with HOG features are extracted for each sub-word/ligature. An open-source dataset UPTI is used to train and test the proposed algorithm, and SVM with RBF kernel function is used for the classification of ligatures. The proposed algorithm achieves a 97.3%-character recognition rate on the given dataset.


Paper summary is not available for this article yet.

Loading PDF...

Loading Statistics...