DefinePK

DefinePK hosts the largest index of Pakistani journals, research articles, news headlines, and videos. It also offers chapter-level book search.

ENHANCING SPEECH EMOTION RECOGNITION WITH DEEP LEARNING THROUGH DATA FUSION, SPECTROGRAM AUGMENTATION, AND HYBRID FEATURE INTEGRATION


Article Information

Title: ENHANCING SPEECH EMOTION RECOGNITION WITH DEEP LEARNING THROUGH DATA FUSION, SPECTROGRAM AUGMENTATION, AND HYBRID FEATURE INTEGRATION

Authors: Muhammad Talha Jahangir, Mujahid Hussain , Nashitah Alwaz, Muhammad Musawir Saeed, Waheed Ahmad, Uzair Ahmad, Hammad Toheed Khan

Journal: Spectrum of Engineering Sciences

HEC Recognition History
Category From To
Y 2024-10-01 2025-12-31

Publisher: Sociology Educational Nexus Research Institute

Country: Pakistan

Year: 2025

Volume: 3

Issue: 9

Language: en

Keywords: Deep learningSpeech Emotion RecognitionConvolutional Neural NetworkMFCCdata fusionHuman-Computer Interaction (HCI)Bidirectional Long Short-Term MemorySpectrogram AugmentationMel SpectrogramRoot mean square

Categories

Abstract

Speech Emotion Recognition (SER), which lets computers decode human feelings using vocal clues, is among the most vital elements of affective computing. The range of speech patterns, lack of data, and difficulty of emotional expression make it still difficult to get excellent SER accuracy. Data fusion from four baseline datasets RAVDESS, TESS, CREMA-D, and SAVEE is used by our proposed deep learning-based SER architecture. The suggested model design combines Convolutional Neural Networks (CNNs) with Bidirectional Long Short-Term Memory (BiLSTM) to efficiently capture spatial and temporal characteristics. With a remarkable classification accuracy of 98%, the proposed framework improving SER performance and giving computers the ability to immediately detect and respond to human feelings that helps our system foster a more sympathetic and flexible human-computer connection.


Paper summary is not available for this article yet.

Loading PDF...

Loading Statistics...