DefinePK

DefinePK hosts the largest index of Pakistani journals, research articles, news headlines, and videos. It also offers chapter-level book search.

Towards Sindhi Corpus Construction


Article Information

Title: Towards Sindhi Corpus Construction

Authors: Mutee U Rahman

Journal: Linguistics and Literature Review

HEC Recognition History
Category From To
Y 2024-10-01 2025-12-31
Y 2023-07-01 2024-09-30
Y 2022-07-01 2023-06-30
Y 2020-07-01 2021-06-30

Publisher: University of Management & Technology

Country: Pakistan

Year: 2015

Volume: 1

Issue: 1

Language: English

DOI: 10.32350/llr/11/04

Keywords: scriptcorpus constructionunigrambigramtrigram frequencies orthography

Categories

Abstract

The paper discusses the current state of Sindhi corpus construction in detail. Sindhi corpus development issues including corpus acquisition, preprocessing, and tokenization are discussed in detail. Preliminary results and observations which include letter unigram, bigram and trigram frequencies; word frequencies and word bigram frequencies are presented. Current state of Sindhi corpus with its limitations and future work is also discussed. The paper also explores the orthography and script of Sindhi language with reference to corpus development.


Paper summary is not available for this article yet.

Loading PDF...

Loading Statistics...