DefinePK

DefinePK hosts the largest index of Pakistani journals, research articles, news headlines, and videos. It also offers chapter-level book search.

Real-Time Voice-to-Voice Translation for Cross-Lingual Communication: Cascade Pipeline and RNN Based Approach


Article Information

Title: Real-Time Voice-to-Voice Translation for Cross-Lingual Communication: Cascade Pipeline and RNN Based Approach

Authors: Shanza Bibi, Hina Sattar, Laraib Fatima, Ayesha Iqbal, Umar Farooq Shafi

Journal: Journal of Computing & Biomedical Informatics

HEC Recognition History
Category From To
Y 2023-07-01 2024-09-30
Y 2022-07-01 2023-06-30

Publisher: Research Center of Computing & Biomedical Informatics

Country: Pakistan

Year: 2025

Volume: 9

Issue: 1

Language: en

Keywords: Real-Time Voice TranslationVoice-to-Voice TranslationSpeech-to-Text TranslationText-to-Speech TranslationMulti-Languages Translation

Categories

Abstract

To facilitate smooth conversations, language diversity presents communication challenges, particularly in face-to-face conversations. Real-time voice-to-voice translation for cross-lingual communication bridges these gaps. Most of the population of Pakistan speaks Urdu and is not proficient in English. Language is a major barrier to accessing information and participating in global discourse. This study focused on overcoming the barrier by utilizing machine learning for multilingual voice translation. This system is designed to translate Pakistan’s native languages into English, supporting real-time communication. A real-time speech translation system utilizes a two-stage approach. First, the System is trained by combining a custom and pre-trained Wav2Vec 2.0 unlabeled dataset, and achieves 98.76% accuracy. Second, the cascade pipeline is employed to support accurate translation of text from the source into the target language. In the cascade pipeline architecture, each language demonstrates a distinct recognition accuracy, which corresponds to its linguistic prominence and availability of training data. It operates by taking the user's voice as input from a microphone and employs Automatic Speech Recognition (ASR) for speech recognition and to convert speech into text [1]. To convert translated text back to the voice Text-to-Speech (TTS) [2] module is employed. End-to-end pipelines enable effective real-time communication and offer an effective and user-friendly solution for overcoming the language barrier in a multi-lingual environment. This work significantly minimizes the gaps in multilingual communication.


Paper summary is not available for this article yet.

Loading PDF...

Loading Statistics...