DefinePK hosts the largest index of Pakistani journals, research articles, news headlines, and videos. It also offers chapter-level book search.
Title: PTHP: Index for Optimizing Genome Assembly Overlapping and Read Alignment
Authors: Sherif Magdy Mohamed Abdelaziz Barakat , Roselina Sallehuddin , Siti Sophiayati Yuhaniz , Raja Farhana Raja Khairuddin, Yusliza Yusoff
Journal: International Journal of Membrane Science and Technology
Publisher: Cosmos Scholars Publishing House
Country: Pakistan
Year: 2023
Volume: 10
Issue: 1
Language: English
Unfortunately, sequencing technology can only access the genome sequence as massive numbers of short strings are called reads. The genome assembly process constructs the complete genome from these reads based on the overlapping between the reads, called the de novo approach, or aligns the reads based on their positions in the available reference genome, called the reference-guided approach. Millions of reads search for overlapping or alignment, a well-known data structure problem called all-against-all. Many studies have proposed indexing such as hash index, prefix tree index, and parallelization technique to optimize the overlapping or the read alignment individually. However, due to the massive data amount and the repeats, limitations still affect the index efficiency, requiring more enhancements. This article introduces a new hybrid index named Prefix Tree Hash Partitioned index(PTHP), which combines prefix-tree index, hash index, pigeonhole concept, and parallelization. PTHP index reveals significant results on the simulation and real dataset, reducing the computational time complexity of overlapping and read alignment, thus the assembly time outperforming prefix tree index and hash index. Improving the performance of overlapping and read alignment using the PTHP index reveals great results in optimizing the hybrid genome assembly that combines both.
Loading PDF...
Loading Statistics...