Emotion recognition in conversation


Emotion recognition in conversation is a sub-field of emotion recognition, that focuses on mining human emotions from conversations or dialogues having two or more interlocutors. The datasets in this field are usually derived from social platforms that allow free and plenty of samples, often containing multimodal data. Self- and inter-personal influences play critical role in identifying some basic emotions, such as, fear, anger, joy, surprise, etc. The more fine grained the emotion labels are the harder it is to detect the correct emotion. ERC poses a number of challenges, such as, conversational-context modeling, speaker-state modeling, presence of sarcasm in conversation, emotion shift across consecutive utterances of the same interlocutor.

The task

The task of ERC deals with detecting emotions expressed by the speakers in each utterance of the conversation. ERC depends on three primary – the conversational context, interlocutors' mental state, and intent.

Datasets

IEMOCAP, SEMAINE, DailyDialogue, and MELD are the four widely used datasets in ERC. Among these four datasets, MELD contains multiparty dialogues.

Methods

Approaches to ERC consist of unsupervised, semi-unsupervised, and supervised methods. Popular supervised methods include using or combining pre-defined features, recurrent neural networks , graph convolutional networks , and attention gated hierarchical memory network. Most of the contemporary methods for ERC are deep learning based and rely on the idea of latent speaker-state modeling.