In human-to-human conversations, showing empathy and thus understanding for the situation of the opposite party is crucial for a natural and personal conversation. Thereby, emotional mimicry, i.e. imitating facial, vocal, or postural expressions of the person whom we are interacting with, is one of the basic mechanisms contributing to empathy. State-of-the-art speech dialogue systems still lack the ability of showing empathy, which limits naturalness. Thus, we developed EmpathicSDS, a prototype to investigate the potential of lexical and acoustic mimicry for improving empathy in conversational interfaces. Our prototype comprises three different modes:
- neutral, where the systems response to a user query is static
- lexical mimicry, where the wording of the user is reappraised by the system
- lexical and acoustic mimicry, which combines both lexical mimicry and matching of the system’s voice tonality to the users emotional state.
Example of lexical mimicry for the scenario „start navigation to work“:
User: Could you navigate me to my office?
Static Response: Sure, the route is calculated.
Mimicry Response: Sure, I will navigate you to the office.
For the acoustic mimicry, our prototype uses more positive and more negative voice tonalities based on Google WaveNet, which is a text-to-speech service that allows voice manipulation via SSML code. Examples can be found below.
Neutral Voice:
Negative Voice:
Positive Voice: