DefinePK hosts the largest index of Pakistani journals, research articles, news headlines, and videos. It also offers chapter-level book search.
Title: A corpus for Amazigh transcribed to Latin OCR systems’ evaluation
Authors: Khadija E. L. Gajoui, Fadoua Ataa Allah, Mohammed Oumsis
Journal: ARPN Journal of Engineering and Applied Sciences
Publisher: Khyber Medical College, Peshawar
Country: Pakistan
Year: 2018
Volume: 13
Issue: 22
Language: English
Corpora, initially created as resources for linguistic research, are attracting more and more the attention of machine learning researchers who are examining the potential of these corpora for training/ testing optical character recognition (OCR) systems. Following the last logic, this paper is concerned with research on OCR of printed historical and recent document written in Amazigh transcribed to Latin. It focuses, especially, on building a representative corpus dedicated to this language. In this paper, we describe the construction procedure of this corpus in tree levels, which are: line, word and character. Then we conduct a comparative evaluation of the corpus using an OCR system based on Long Short Term Memory approach. The comparison of the corpus is depending on the recognition rates and convergence in term of iteration number. Evaluation shows that the corpus level line gives the best result compared to the other levels with an error rate of 10.3%.
Loading PDF...
Loading Statistics...