Transformers-based Approach for Speech Emotion Recognition

Loading...
Thumbnail Image
Date
2024
Journal Title
Journal ISSN
Volume Title
Publisher
Faculty of Sciences
Abstract
Most of the smart devices voice assistants or robots present in the world are not smart enough to understand emotions. They are just like command and follow devices they have no emotional intelligence. When people are talking to each other based on their voice they understand the situation and react to it, for instance, if someone is angry then another person will try to calm him by conveying in a soft tone, these kinds of harmonic changes are not possible with smart devices or voice assistants as they lack emotional intelligence. So adding emotions and making devices understand emotions will significantly enhance their capabilities and take them one step further to human-like intelligence. To address this limitation, our system introduces a novel approach to integrating emotional intelligence into smart devices. The proposed approach in this thesis follows a typical machine learning workflow, encom- passing data preparation, model training, and evaluation. It leverages pre-trained models and transfers learning for feature extraction from emotion datasets, with key components including Mel-frequencyspectrogramextractionalongsidetheWev2vecpre-trainedTransformermodelfor feature extraction. Other steps involve dataset splitting, fine-tuning the HuBERT pre-trained model for SER, and emotion classification. The system also facilitates speakergender identifica- tion (male or female). Standard datasets RAVDESS and CREMA-D were utilized for training and evaluation, yielding accuracies of 84.25% and 71%, respectively
Description
Keywords
Citation
Collections