Using neural network technologies in determining the emotional state of a person in oral communication
Human oral speech often has an emotional connotation; this is due to the fact that emotions and our mood influence the physiology of the vocal tract and, as a result, speech. When a person is happy, worried, sad or angry, it is reflected in various characteristics of the voice, the pace of speech and its intonation. However, assessing a person’s emotional state through speech can have a beneficial effect on various areas of life, for example, medicine, psychology, criminology, marketing and education, etc. In medicine, the use of assessing emotions by speech can help in the diagnosis and treatment of mental disorders, as well as in monitoring the emotional state of the patient, identifying diseases such as Alzheimer’s in its early stages, diagnosing autism, etc. In psychology, this method can be useful for studying emotional reactions to various stimuli and situations. In criminology, speech analysis and emotion detection can be used to detect false statements and deception. In marketing and advertising, it can help understand consumer reactions to a product or advertising campaign. In education, assessing emotions from speech can be used to analyze the emotional state of students and optimize the educational process.
Thus, automation of the emotion recognition process is a promising area of research, and the use of various machine learning methods and image recognition algorithms can make the process more accurate and efficient.
In order to address the challenge of identifying paralinguistic expressions of emotion in human speech, it is proposed that a neural network approach be employed. This methodology has demonstrated efficacy in addressing complex problems where an exact solution may be elusive. The work presents a neural network of convolutional architecture that allows to recognize four human emotions (sadness, joy, anger, neutral) from spoken speech. Particular attention is paid to the formation of a dataset for training and testing the model, since at present there are practically no open speech databases for the study of paralinguistic phenomena (especially in Russian). This study uses the Dusha emotional speech database.
Mel-spectrograms of the speech signal are used as features for recognizing emotions, which made it possible to increase the percentage of recognition and the speed of operation of the neural network compared to the use of low-level descriptors.
The results of experiments in the test sample showed that the presented neural network helps to recognize human emotions from oral speech in 75% of cases, which is a high result.
Further research involves training and upgrading (if necessary) the presented neural network to recognize paralinguistic phenomena not presented in this study, for example, lies, fatigue, depression, etc.
While nobody left any comments to this publication.
You can be first.
The references will appear later