DefinePK hosts the largest index of Pakistani journals, research articles, news headlines, and videos. It also offers chapter-level book search.
Title: Optical Character Recognition for Nastaleeq Printed Urdu Text using Histogram of Oriented Gradient Features
Authors: Awais Ahmad, Fatima Yousaf, Tanzeela Kousar
Journal: Machines and Algorithms
Year: 2024
Volume: 3
Issue: 1
Language: en
Keywords: Support Vector MachineUrdu Languageoptical character recognitionHOG featuresConnected Components
The focus of research on optical character recognition (OCR) has been to digitize text in images. Urdu OCR is a challenging task because of its complexity, where a character can have multiple inflections depending on its position in the word, making it more difficult than English and similar languages. The proposed research aims to detect offline Urdu printed text using a segmentation-free approach, which means a holistic approach is taken. Horizontal histogram projection is used to extract text lines from an image, while connected components labelling is used for ligature segmentation in the extracted image to text line. To train the proposed model, a set of 14 statistical features along with HOG features are extracted for each sub-word/ligature. An open-source dataset UPTI is used to train and test the proposed algorithm, and SVM with RBF kernel function is used for the classification of ligatures. The proposed algorithm achieves a 97.3%-character recognition rate on the given dataset.
Loading PDF...
Loading Statistics...