DefinePK

DefinePK hosts the largest index of Pakistani journals, research articles, news headlines, and videos. It also offers chapter-level book search.

A corpus for Amazigh transcribed to Latin OCR systems’ evaluation


Article Information

Title: A corpus for Amazigh transcribed to Latin OCR systems’ evaluation

Authors: Khadija E. L. Gajoui, Fadoua Ataa Allah, Mohammed Oumsis

Journal: ARPN Journal of Engineering and Applied Sciences

HEC Recognition History
Category From To
Y 2023-07-01 2024-09-30
Y 2022-07-01 2023-06-30
Y 2021-07-01 2022-06-30
X 2020-07-01 2021-06-30

Publisher: Khyber Medical College, Peshawar

Country: Pakistan

Year: 2018

Volume: 13

Issue: 22

Language: English

Categories

Abstract

Corpora, initially created as resources for linguistic research, are attracting more and more the attention of machine learning researchers who are examining the potential of these corpora for training/ testing optical character recognition (OCR) systems. Following the last logic, this paper is concerned with research on OCR of printed historical and recent document written in Amazigh transcribed to Latin. It focuses, especially, on building a representative corpus dedicated to this language. In this paper, we describe the construction procedure of this corpus in tree levels, which are: line, word and character. Then we conduct a comparative evaluation of the corpus using an OCR system based on Long Short Term Memory approach. The comparison of the corpus is depending on the recognition rates and convergence in term of iteration number. Evaluation shows that the corpus level line gives the best result compared to the other levels with an error rate of 10.3%.


Paper summary is not available for this article yet.

Loading PDF...

Loading Statistics...