<?xml version='1.0' encoding='utf-8'?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.2 20190208//EN" "http://jats.nlm.nih.gov/publishing/1.2/JATS-journalpublishing1.dtd">
<article article-type="research-article" dtd-version="1.2" xml:lang="ru" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><front><journal-meta><journal-id journal-id-type="issn">2313-8912</journal-id><journal-title-group><journal-title>Research Result. Theoretical and Applied Linguistics</journal-title></journal-title-group><issn pub-type="epub">2313-8912</issn></journal-meta><article-meta><article-id pub-id-type="doi">10.18413/2313-8912-2024-10-4-0-2</article-id><article-id pub-id-type="publisher-id">3673</article-id><article-categories><subj-group subj-group-type="heading"><subject>Large Language Models and Prompt Engineering in Linguistics</subject></subj-group></article-categories><title-group><article-title>&lt;strong&gt;Using neural network technologies in determining&amp;nbsp;the emotional state of a person in oral communication&lt;/strong&gt;</article-title><trans-title-group xml:lang="en"><trans-title>&lt;strong&gt;Using neural network technologies in determining&amp;nbsp;the emotional state of a person in oral communication&lt;/strong&gt;</trans-title></trans-title-group></title-group><contrib-group><contrib contrib-type="author"><name-alternatives><name xml:lang="ru"><surname>Balabanova</surname><given-names>Tatyana N.</given-names></name><name xml:lang="en"><surname>Balabanova</surname><given-names>Tatyana N.</given-names></name></name-alternatives><email>Sozonova@bsu.edu.ru</email><xref ref-type="aff" rid="aff1" /></contrib><contrib contrib-type="author"><name-alternatives><name xml:lang="ru"><surname>Gaivoronskaya</surname><given-names>Diana I.</given-names></name><name xml:lang="en"><surname>Gaivoronskaya</surname><given-names>Diana I.</given-names></name></name-alternatives><email>trubitsyna@bsuedu.ru</email><xref ref-type="aff" rid="aff1" /></contrib><contrib contrib-type="author"><name-alternatives><name xml:lang="ru"><surname>Doborovich</surname><given-names>Anna N.</given-names></name><name xml:lang="en"><surname>Doborovich</surname><given-names>Anna N.</given-names></name></name-alternatives><email>doborovich@bsu.edu.ru</email><xref ref-type="aff" rid="aff2" /></contrib></contrib-group><aff id="aff2"><institution>Belgorod State National Research University, Russia</institution></aff><aff id="aff1"><institution>Belgorod State National Research University</institution></aff><pub-date pub-type="epub"><year>2024</year></pub-date><volume>10</volume><issue>4</issue><fpage>0</fpage><lpage>0</lpage><self-uri content-type="pdf" xlink:href="/media/linguistics/2024/4/Research_Result_4-42-18-40.pdf" /><abstract xml:lang="ru"><p>Human oral speech often has an emotional connotation; this is due to the fact that emotions and our mood influence the physiology of the vocal tract and, as a result, speech. When a person is happy, worried, sad or angry, it is reflected in various characteristics of the voice, the pace of speech and its intonation. However, assessing a person&amp;rsquo;s emotional state through speech can have a beneficial effect on various areas of life, for example, medicine, psychology, criminology, marketing and education, etc. In medicine, the use of assessing emotions by speech can help in the diagnosis and treatment of mental disorders, as well as in monitoring the emotional state of the patient, identifying diseases such as Alzheimer&amp;rsquo;s in its early stages, diagnosing autism, etc. In psychology, this method can be useful for studying emotional reactions to various stimuli and situations. In criminology, speech analysis and emotion detection can be used to detect false statements and deception. In marketing and advertising, it can help understand consumer reactions to a product or advertising campaign. In education, assessing emotions from speech can be used to analyze the emotional state of students and optimize the educational process.

Thus, automation of the emotion recognition process is a promising area of research, and the use of various machine learning methods and image recognition algorithms can make the process more accurate and efficient.

In order to address the challenge of identifying paralinguistic expressions of emotion in human speech, it is proposed that a neural network approach be employed. This methodology has demonstrated efficacy in addressing complex problems where an exact solution may be elusive. The work presents a neural network of convolutional architecture that allows to recognize four human emotions (sadness, joy, anger, neutral) from spoken speech. Particular attention is paid to the formation of a dataset for training and testing the model, since at present there are practically no open speech databases for the study of paralinguistic phenomena (especially in Russian). This study uses the Dusha emotional speech database.

Mel-spectrograms of the speech signal are used as features for recognizing emotions, which made it possible to increase the percentage of recognition and the speed of operation of the neural network compared to the use of low-level descriptors.

The results of experiments in the test sample showed that the presented neural network helps to recognize human emotions from oral speech in 75% of cases, which is a high result.

Further research involves training and upgrading (if necessary) the presented neural network to recognize paralinguistic phenomena not presented in this study, for example, lies, fatigue, depression, etc.



</p></abstract><trans-abstract xml:lang="en"><p>Human oral speech often has an emotional connotation; this is due to the fact that emotions and our mood influence the physiology of the vocal tract and, as a result, speech. When a person is happy, worried, sad or angry, it is reflected in various characteristics of the voice, the pace of speech and its intonation. However, assessing a person&amp;rsquo;s emotional state through speech can have a beneficial effect on various areas of life, for example, medicine, psychology, criminology, marketing and education, etc. In medicine, the use of assessing emotions by speech can help in the diagnosis and treatment of mental disorders, as well as in monitoring the emotional state of the patient, identifying diseases such as Alzheimer&amp;rsquo;s in its early stages, diagnosing autism, etc. In psychology, this method can be useful for studying emotional reactions to various stimuli and situations. In criminology, speech analysis and emotion detection can be used to detect false statements and deception. In marketing and advertising, it can help understand consumer reactions to a product or advertising campaign. In education, assessing emotions from speech can be used to analyze the emotional state of students and optimize the educational process.

Thus, automation of the emotion recognition process is a promising area of research, and the use of various machine learning methods and image recognition algorithms can make the process more accurate and efficient.

In order to address the challenge of identifying paralinguistic expressions of emotion in human speech, it is proposed that a neural network approach be employed. This methodology has demonstrated efficacy in addressing complex problems where an exact solution may be elusive. The work presents a neural network of convolutional architecture that allows to recognize four human emotions (sadness, joy, anger, neutral) from spoken speech. Particular attention is paid to the formation of a dataset for training and testing the model, since at present there are practically no open speech databases for the study of paralinguistic phenomena (especially in Russian). This study uses the Dusha emotional speech database.

Mel-spectrograms of the speech signal are used as features for recognizing emotions, which made it possible to increase the percentage of recognition and the speed of operation of the neural network compared to the use of low-level descriptors.

The results of experiments in the test sample showed that the presented neural network helps to recognize human emotions from oral speech in 75% of cases, which is a high result.

Further research involves training and upgrading (if necessary) the presented neural network to recognize paralinguistic phenomena not presented in this study, for example, lies, fatigue, depression, etc.



</p></trans-abstract><kwd-group xml:lang="ru"><kwd>Speech data</kwd><kwd>Speech databases</kwd><kwd>Neural networks</kwd><kwd>Convolutional neural networks</kwd><kwd>Emotion recognition</kwd><kwd>Classification</kwd><kwd>Classification methods</kwd></kwd-group><kwd-group xml:lang="en"><kwd>Speech data</kwd><kwd>Speech databases</kwd><kwd>Neural networks</kwd><kwd>Convolutional neural networks</kwd><kwd>Emotion recognition</kwd><kwd>Classification</kwd><kwd>Classification methods</kwd></kwd-group></article-meta></front><back><ref-list><title>Список литературы</title><ref id="B1"><mixed-citation>Abramov,&amp;nbsp;K.&amp;nbsp;V., Balabanova,&amp;nbsp;T.&amp;nbsp;N. and Gaivoronskaya,&amp;nbsp;D.&amp;nbsp;I. (2024). Ispolzovanie nejronnyh setej dlja raspoznavanija agressii po rechevomu signal [Using neural networks to recognize aggression by speech signal], Information Systems and Technologies, №&amp;nbsp;2&amp;nbsp;(142), 28&amp;ndash;36. (In Russian)</mixed-citation></ref><ref id="B2"><mixed-citation>Albornoz,&amp;nbsp;E.&amp;nbsp;M., Milone,&amp;nbsp;D.&amp;nbsp;H. and Rufiner,&amp;nbsp;H.&amp;nbsp;L. (2011). Spoken emotion recognition using hierarchical classifiers, Computer Speech &amp;amp; Language, 25&amp;nbsp;(3), 556&amp;ndash;570. (In English)</mixed-citation></ref><ref id="B3"><mixed-citation>Ayadi, M.&amp;nbsp;El., Kamel,&amp;nbsp;M.&amp;nbsp;S. and Karray,&amp;nbsp;F. (2011). Survey on speech emotion recognition: Features, classification schemes, and databases, Pattern Recognition, 44&amp;nbsp;(3), 572&amp;ndash;587. (In English)</mixed-citation></ref><ref id="B4"><mixed-citation>Balabanova,&amp;nbsp;T.&amp;nbsp;N., Abramov,&amp;nbsp;K.&amp;nbsp;V. (2023). Paralingvisticheskij analiz dlja raspoznavanija agressii po rechi cheloveka [Paralinguistic analysis for recognizing aggression from human speech], Naukoemkie tehnologii i innovacii (XXV nauchnye chtenija): Sbornik dokladov Mezhdunarodnoj nauchno-prakticheskoj konferencii, Belgorod, Belgorodskij gosudarstvennyj tehnologicheskij universitet im. V.&amp;nbsp;G.&amp;nbsp;Shuhova, 697&amp;ndash;700. (In Russian)</mixed-citation></ref><ref id="B5"><mixed-citation>Balabanova,&amp;nbsp;T.&amp;nbsp;N., Abramov,&amp;nbsp;K.&amp;nbsp;V., Boldyshev,&amp;nbsp;A.&amp;nbsp;V. and Dolbin,&amp;nbsp;D.&amp;nbsp;M. (2023). Automatic Detection of Anger and Aggression in Speech Signals, Economics. Information technologies, 50&amp;nbsp;(4), 944&amp;ndash;954. DOI: 10.52575/2687-0932-2023-50-4-944-954 (In Russian)</mixed-citation></ref><ref id="B6"><mixed-citation>Chen,&amp;nbsp;L., Mao,&amp;nbsp;X., Xue,&amp;nbsp;Y. and Cheng,&amp;nbsp;L.&amp;nbsp;L. (2012). Speech emotion recognition: Features and classification models, Digital Signal Processing, 22&amp;nbsp;(6), 1154&amp;ndash;1160. (In English)</mixed-citation></ref><ref id="B7"><mixed-citation>Cowie,&amp;nbsp;R., Douglas-Cowie,&amp;nbsp;E., Tsapatsoulis,&amp;nbsp;N., Votsis,&amp;nbsp;G., Kollias,&amp;nbsp;S., Fellenz,&amp;nbsp;W. and Taylor,&amp;nbsp;J.&amp;nbsp;G. (2001). Emotion recognition in human-computer interaction, IEEE Signal Processing Magazine. 18&amp;nbsp;(1), 32&amp;ndash;80. (In English)</mixed-citation></ref><ref id="B8"><mixed-citation>Dellaert,&amp;nbsp;F., Polzin,&amp;nbsp;T. and Waibel,&amp;nbsp;A. (1996). Recognizing emotion in speech, Proceeding of Fourth International Conference on Spoken Language Processing (ICSLP), 1970&amp;ndash;1973. (In English)</mixed-citation></ref><ref id="B9"><mixed-citation>Dvoynikova,&amp;nbsp;A.&amp;nbsp;A., Karpov,&amp;nbsp;A.&amp;nbsp;A. (2020). Analiticheskij obzor podhodov k raspoznavaniju tonal&amp;rsquo;nosti russkojazychnyh tekstovyh dannyh [An analytical review of approaches to recognizing the tonality of Russian-language text data], Informacionno-upravljajushhie sistemy, 4&amp;nbsp;(107), 20&amp;ndash;30. DOI: 10.31799/1684-8853-2020-4-20-30 (In Russian)</mixed-citation></ref><ref id="B10"><mixed-citation>Fedotov,&amp;nbsp;D., Kaya,&amp;nbsp;H. and Karpov&amp;nbsp;A. (2018). Context Modeling for Cross-Corpus Dimensional Acoustic Emotion Recognition: Challenges and Mixup, Proceedings of 20th International Conference on Speech and Computer (SPECOM-2018), 155&amp;ndash;165. DOI: 10.1007/978-3-319-99579-3_17 (In English)</mixed-citation></ref><ref id="B11"><mixed-citation>Gorshkov,&amp;nbsp;Yu.&amp;nbsp;G., Dorofeev,&amp;nbsp;A.&amp;nbsp;V. (2003). Rechevye detektory lzhi kommercheskogo primenenija [Speech lie detectors for commercial use], Informacionnyj most (INFORMOST). Radiojelektronika i Telekommunikacija, 6, 13&amp;ndash;15. (In Russian)</mixed-citation></ref><ref id="B12"><mixed-citation>Grimm,&amp;nbsp;M., Kroschel,&amp;nbsp;K., Mower,&amp;nbsp;E. and Narayanan,&amp;nbsp;S. (2007). Primitives-based evaluation and estimation of emotions in speech, Speech Communication, 49&amp;nbsp;(10&amp;ndash;11), 787&amp;ndash;800. (In English)</mixed-citation></ref><ref id="B13"><mixed-citation>Holden,&amp;nbsp;K.&amp;nbsp;T. and Hogan,&amp;nbsp;J.&amp;nbsp;T. (1993). The emotive impact of foreign intonation: An experiment in switching English and Russian intonation, Language and Speech, 36&amp;nbsp;(1), 67&amp;ndash;88. (In English)</mixed-citation></ref><ref id="B14"><mixed-citation>Hozjan,&amp;nbsp;V. and Kačič,&amp;nbsp;Z. (2003). Context-Independent Multilingual Emotion Recognition from Speech Signals, International Journal of Speech Technology, 6, 311&amp;ndash;320. (In English)</mixed-citation></ref><ref id="B15"><mixed-citation>Hsu,&amp;nbsp;W.&amp;nbsp;N., Bolte,&amp;nbsp;B., Tsai,&amp;nbsp;Y.-H.&amp;nbsp;H., Lakhotia,&amp;nbsp;K., Salakhutdinov,&amp;nbsp;R., Mohamed,&amp;nbsp;A.-r. (2021). Hubert: Self-supervised speech representation learning by masked prediction of hidden units, IEEE/ACM Transactions on Audio, Speech, and Language Processing, 29, 3451&amp;ndash;3460. (In English)</mixed-citation></ref><ref id="B16"><mixed-citation>Kerkeni,&amp;nbsp;L., Serrestou,&amp;nbsp;Y., Mbarki,&amp;nbsp;M., Raoof,&amp;nbsp;K., Ali Mahjoub,&amp;nbsp;M. and Cleder,&amp;nbsp;C. (2020). Automatic Speech Emotion Recognition Using Machine Learning, Virginia Commonwealth University, United States of America. (In English)</mixed-citation></ref><ref id="B17"><mixed-citation>Kim,&amp;nbsp;J., Truong,&amp;nbsp;K.&amp;nbsp;P., Englebienne,&amp;nbsp;G., Evers,&amp;nbsp;V. (2017). Learning spectro-temporal features with 3D CNNs for speech emotion recognition, Proceedings of the 7th International Conference on Affective Computing and Intelligent Interaction (ACII), 383&amp;ndash;388. DOI: 10.1109/ACII.2017.8273628 (In English)</mixed-citation></ref><ref id="B18"><mixed-citation>Lemaev,&amp;nbsp;V.&amp;nbsp;I., Lukashevich,&amp;nbsp;N.&amp;nbsp;V. (2024). Avtomaticheskaja klassifikacija jemocij v rechi: metody i dannye [Automatic classification of emotions in speech: methods and data], Litera, 4, 159&amp;ndash;173. DOI: 10.25136/2409-8698.2024.4.70472 (In Russian)</mixed-citation></ref><ref id="B19"><mixed-citation>Makarova,&amp;nbsp;V. (2000). Acoustic cues of surprise in Russian questions, Journal of the Acoustical Society of Japan (E), 21&amp;nbsp;(5), 243&amp;ndash;250. DOI: 10.1250/ast.21.243 (In English)</mixed-citation></ref><ref id="B20"><mixed-citation>Maysak,&amp;nbsp;N.&amp;nbsp;V. (2010). Matrica social&amp;rsquo;nyh deviacij: klassifikacija tipov i vidov deviantnogo povedenija [The matrix of social deviations: classification of types and types of deviant behavior], Sovremennye problemy nauki i obrazovanija, 4, 78&amp;ndash;86. (In Russian)</mixed-citation></ref><ref id="B21"><mixed-citation>Neiberg,&amp;nbsp;D., Elenius,&amp;nbsp;K. and Laskowski,&amp;nbsp;K. (2006). Emotion recognition in spontaneous speech using GMMs, INTERSPEECH 2006 &amp;ndash; ICSLP, Ninth International Conference on Spoken Language Processing, 809&amp;ndash;812. (In English)</mixed-citation></ref><ref id="B22"><mixed-citation>New,&amp;nbsp;T.&amp;nbsp;L., Foo,&amp;nbsp;S. W. and De&amp;nbsp;Silva,&amp;nbsp;L.&amp;nbsp;C. (2003). Speech emotion recognition using hidden Markov models, Speech Communication, 41&amp;nbsp;(4), 603&amp;ndash;623. (In English)</mixed-citation></ref><ref id="B23"><mixed-citation>Nogueiras,&amp;nbsp;A., Moreno,&amp;nbsp;A., Bonafonte,&amp;nbsp;A., Mari&amp;ntilde;o,&amp;nbsp;J.B. (2001) Speech emotion recognition using hidden Markov models, Proceedings of EUROSPEECH 2001, 7th European conference on speech communication and technology, 746&amp;ndash;749. (In English)</mixed-citation></ref><ref id="B24"><mixed-citation>Perepelkina,&amp;nbsp;O., Kazimirova,&amp;nbsp;E., Konstantinova,&amp;nbsp;M. (2018).&amp;nbsp;RAMAS: Russian Multimodal Corpus of Dyadic Interaction for studying emotion recognition, PeerJ Preprints, 6:e26688v1. https://doi.org/10.7287/peerj.preprints.26688v1 (In English)</mixed-citation></ref><ref id="B25"><mixed-citation>Russell,&amp;nbsp;J.&amp;nbsp;A, Posner,&amp;nbsp;J., Peterson,&amp;nbsp;B.&amp;nbsp;S. (2005). The circumplex model of affect: an integrative approach to affective neuroscience, cognitive development, and psychopathology, Dev Psychopathol. 17&amp;nbsp;(3), 715-34. DOI: 10.1017/S0954579405050340 (In English)</mixed-citation></ref><ref id="B26"><mixed-citation>Raudys,&amp;nbsp;S. (2003). On the universality of the single-layer perceptron model, Neural Networks and Soft Computing. Physica, Heidelberg, 79&amp;ndash;86. (In English)</mixed-citation></ref><ref id="B27"><mixed-citation>Sadiq,&amp;nbsp;S., Mehmood,&amp;nbsp;A., Ullah,&amp;nbsp;S., Ahmad,&amp;nbsp;M., Sang Choi,&amp;nbsp;G., On,&amp;nbsp;Byung-Won. (2021). Aggression detection through deep neural model on twitter, Future Generation Computer Systems, 114, 120&amp;ndash;129. (In English)</mixed-citation></ref><ref id="B28"><mixed-citation>Sahoo,&amp;nbsp;S., Routray,&amp;nbsp;A. (2016). Detecting aggression in voice using inverse filtered speech features, IEEE Transactions on Affective Computing, 9&amp;nbsp;(2), 217&amp;ndash;226. DOI: 10.1109/TAFFC.2016.2615607 (In English)</mixed-citation></ref><ref id="B29"><mixed-citation>Santos,&amp;nbsp;F., Dur&amp;atilde;es,&amp;nbsp;D., Marcondes,&amp;nbsp;F.&amp;nbsp;M., Hammerschmidt,&amp;nbsp;N., Lange,&amp;nbsp;S., Machado,&amp;nbsp;J., Novais,&amp;nbsp;P. (2021). In-car violence detection based on the audio signal, Proceedings of the International Conference on Intelligent Data Engineering and Automated Learning. Springer, 437&amp;ndash;445. https://doi.org/10.1007/978-3-030-91608-4_43 (In English)</mixed-citation></ref><ref id="B30"><mixed-citation>Shakhovsky,&amp;nbsp;V.&amp;nbsp;I. (2009). Jemocii kak obekt issledovanija v lingvistike [Emotions as an object of research in linguistics], Voprosy psiholingvistiki, 9, 29&amp;ndash;43. (In Russian)</mixed-citation></ref><ref id="B31"><mixed-citation>Siging,&amp;nbsp;W. (2009). Recognition of human emotion in speech using modulation spectral features and support vector machines: master of science thesis, Department of Electrical and Computer Engineering Queen&amp;rsquo;s University, Kingston, Ontario. (In English)</mixed-citation></ref><ref id="B32"><mixed-citation>Surabhi,&amp;nbsp;V., Saurabh,&amp;nbsp;M. (2016). Speech emotion recognition. A review, International Research Journal of Engineering and Technology (IRJET), 03, 313&amp;ndash;316. (In English)</mixed-citation></ref><ref id="B33"><mixed-citation>Svetozarova,&amp;nbsp;N.&amp;nbsp;D. (1982). Intonacionnaja sistema russkogo jazyka [Intonation system of the Russian language], Izd-vo Len. un-ta, Leningrad. (In Russian)</mixed-citation></ref><ref id="B34"><mixed-citation>Uzdyaev,&amp;nbsp;M.&amp;nbsp;Yu. (2020). Nejrosetevaja model&amp;rsquo; mnogomodal&amp;rsquo;nogo raspoznavanija chelovecheskoj agressii [Neural network model of multimodal recognition of human aggression], Vestnik KRAUNC. Fiziko-matematicheskie nauki, 33&amp;nbsp;(4), 132&amp;ndash;149. DOI: 10.26117/2079-6641-2020-33-4-132-149 (In Russian)</mixed-citation></ref><ref id="B35"><mixed-citation>Velichko,&amp;nbsp;A., Markitantov,&amp;nbsp;M., Kaya,&amp;nbsp;H., Karpov,&amp;nbsp;A. (2022). Complex Paralinguistic Analysis of Speech: Predicting Gender, Emotions and Deception in a Hierarchical Framework, Proceedings of Interspeech, 4735&amp;ndash;4739. DOI: 10.21437/Interspeech.2022-11294 (In English)</mixed-citation></ref><ref id="B36"><mixed-citation>Vu,&amp;nbsp;M.T., Beurton-Aimar,&amp;nbsp;M. and Marchand,&amp;nbsp;S. (2021). Multitask multi-database emotion recognition, Proceedings of IEEE/CVF International Conference on Computer Vision, 3637&amp;ndash;3644. DOI: 10.1109/ICCVW54120.2021.00406 (In English)</mixed-citation></ref><ref id="B37"><mixed-citation>Wang,&amp;nbsp;J., Xue,&amp;nbsp;M., Culhane,&amp;nbsp;R., Diao,&amp;nbsp;E., Ding, J., Tarokh, V. (2020). Speech emotion recognition with dual-sequence LSTM architecture, Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP), 6474&amp;ndash;6478. DOI: 10.1109/ICASSP40776.2020.9054629 (In English)</mixed-citation></ref><ref id="B38"><mixed-citation>Wu,&amp;nbsp;S., Falk,&amp;nbsp;T.&amp;nbsp;H. and Chan,&amp;nbsp;W.&amp;nbsp;Y. (2011). Automatic speech emotion recognition using modulation spectral features, Speech Communication, 53, 768&amp;ndash;785. (In English)</mixed-citation></ref><ref id="B39"><mixed-citation>Zeiler,&amp;nbsp;M.&amp;nbsp;D., Fergus,&amp;nbsp;R. (2013).&amp;nbsp;Visualizing and understanding convolutional networks, Computer Vision and Pattern Recognition (ECCV 2014), 818&amp;ndash;833. DOI: 10.48550/arXiv.1311.2901 (In English)</mixed-citation></ref></ref-list></back></article>