Published on in Vol 23, No 11 (2021): November

Preprints (earlier versions) of this paper are available at https://preprints.jmir.org/preprint/30624, first published .
Effects of Emotional Expressiveness of a Female Digital Human on Loneliness, Stress, Perceived Support, and Closeness Across Genders: Randomized Controlled Trial

Effects of Emotional Expressiveness of a Female Digital Human on Loneliness, Stress, Perceived Support, and Closeness Across Genders: Randomized Controlled Trial

Effects of Emotional Expressiveness of a Female Digital Human on Loneliness, Stress, Perceived Support, and Closeness Across Genders: Randomized Controlled Trial

Original Paper

1Department of Psychological Medicine, The University of Auckland, Auckland, New Zealand

2Soul Machines Ltd, Auckland, New Zealand

3Auckland Bioengineering Institute, The University of Auckland, Auckland, New Zealand

4Department of Psychiatry, Harvard Medical School, Boston, MA, United States

5Benson-Henry Institute of Mind Body Medicine, Massachusetts General Hospital, Boston, MA, United States

Corresponding Author:

Elizabeth Broadbent, PhD

Department of Psychological Medicine

The University of Auckland

Building 507, Level 3

22-30 Park Avenue

Auckland, 1023

New Zealand

Phone: 64 9 923 6756

Email: e.broadbent@auckland.ac.nz


Background: Loneliness is a growing public health problem that has been exacerbated in vulnerable groups during the COVID-19 pandemic. Social support interventions have been shown to reduce loneliness, including when delivered through technology. Digital humans are a new type of computer agent that show promise as supportive peers in health care. For digital humans to be effective and engaging support persons, it is important that they develop closeness with people. Closeness can be increased by emotional expressiveness, particularly in female relationships. However, it is unknown whether emotional expressiveness improves relationships with digital humans and affects physiological responses.

Objective: The aim of this study is to investigate whether emotional expression by a digital human can affect psychological and physiological outcomes and whether the effects are moderated by the user’s gender.

Methods: A community sample of 198 adults (101 women, 95 men, and 2 gender-diverse individuals) was block-randomized by gender to complete a 15-minute self-disclosure conversation with a female digital human in 1 of 6 conditions. In these conditions, the digital human varied in modality richness and emotional expression on the face and in the voice (emotional, neutral, or no face; emotional or neutral voice). Perceived loneliness, closeness, social support, caring perceptions, and stress were measured after each interaction. Heart rate, skin temperature, and electrodermal activity were assessed during each interaction. 3-way factorial analyses of variance with post hoc tests were conducted.

Results: Emotional expression in the voice was associated with greater perceptions of caring and physiological arousal during the interaction, and unexpectedly, with lower feelings of support. User gender moderated the effect of emotional expressiveness on several outcomes. For women, an emotional voice was associated with increased closeness, social support, and caring perceptions, whereas for men, a neutral voice increased these outcomes. For women, interacting with a neutral face was associated with lower loneliness and subjective stress compared with no face. Interacting with no face (ie, a voice-only black screen) resulted in lower loneliness and subjective stress for men, compared with a neutral or emotional face. No significant results were found for heart rate or skin temperature. However, average electrodermal activity was significantly higher for men while interacting with an emotional voice.

Conclusions: Emotional expressiveness in a female digital human has different effects on loneliness, social, and physiological outcomes for men and women. The results inform the design of digital human support persons and have theoretical implications. Further research is needed to evaluate how more pronounced emotional facial expressions in a digital human might affect the results.

Trial Registration: Australia New Zealand Clinical Trials Registry (ANZCTR) ACTRN12621000865819; https://www.anzctr.org.au/Trial/Registration/TrialReview.aspx?id=381816&isReview

J Med Internet Res 2021;23(11):e30624

doi:10.2196/30624

Keywords



Background

Loneliness is a growing public health issue that has been exacerbated in vulnerable groups during the COVID-19 pandemic [1]. It is a subjective psychological state in which a person perceives a discrepancy between the quality and quantity of their actual and ideal social relations [2]. It has been described as a syndrome-like condition that reliably affects social cognition and behavior in a manner that perpetuates feelings of loneliness [3-5]. Loneliness has been shown to be most prevalent in young adults (aged 18-29 years); older adults; people with a physical or mental health condition; people with low income; people who live alone; and people who are separated, widowed, or divorced [1,6].

Loneliness is a particularly pressing public health problem because it has been associated with a multitude of ill health effects, which places a burden on health care systems [3,7]. Loneliness produces a greater mortality risk than smoking 15 cigarettes every day [8], and it increases the risk of many physical and mental morbidities. This includes a greater risk of coronary heart disease, stroke [9], and psychiatric conditions, including major depressive disorder, generalized anxiety disorder, and suicide [10,11]. Interventions are needed to alleviate the burden of loneliness on population health and health care systems.

Social Support Interventions

Social support has been shown to be a major protective factor against loneliness in adults [6,12] and an effective intervention strategy for loneliness [4]. In a survey of 38,217 adults in the United Kingdom, a high level of social support was associated with 89% lower odds of developing severe loneliness [12]. Moreover, reporting ≥3 close relationships was associated with 42% lower odds of severe loneliness. This suggests that interventions for improving social support and close relationships may help combat loneliness, which has been found in a systematic review of loneliness interventions [4].

Social support is a construct that refers to the structure or function of interpersonal relationships [13]. Structural social support describes aspects of a person’s social network, including the existence of relationships (eg, number of friends) or the types of connections between people (eg, spousal). Functional social support describes the functions of a particular relationship or interaction for a person (eg, emotional, informational, instrumental, or appraisal support). Both structural and functional support have been associated with reductions in loneliness to varying degrees [5]. However, satisfaction with functional support that has been received has been found to have the strongest relationship with loneliness. This suggests that social support interventions focused on improving functional support could have the strongest effect on loneliness.

Psychological interventions have been shown to increase social support [14]. This may be through one of several techniques including teaching skills to help people strengthen their support networks or to better solicit functional support, or the direct provision of functional support to people (eg, emotional support from a support group). Psychological interventions for improving social support have been shown to reduce loneliness [4] and improve health outcomes [14]. Social support may improve health outcomes indirectly through the provision of functional support (eg, transportation to a health appointment) or directly by exerting a stress-buffering effect [15,16]. Social support has been associated with reduced sympathetic nervous system activation [17] and reduced cortisol and increased oxytocin release [18], which may help to buffer against the effects of loneliness-related stress on the body. Chronic stress has been associated with reduced heart rate variability [19], increased inflammation [17,20], and impaired immune response [21], which increases the risk of physical and mental health conditions.

Social support may have stress-buffering and loneliness-reducing effects because humans have evolved to survive in groups. Owing to this evolution, humans experience separation distress and find relief in their social attachments [22]. Social support is an attachment solution that is encouraged in mammalian brain structure and neural processing loops (eg, the protolimbic and paralimbic cortico-striato-thalamocortical loops) to provide solace and alleviate distress. Attachment solutions may be found in other people, animals, religious figures, and transitional objects (eg, a child’s teddy bear) [22]. In-person social support interventions may not always be available (eg, because of physical restrictions during a pandemic, hospitalization, and living rurally). When faced with the absence of human connection, artificial agents (such as robots or computer agents) might provide digital social support to people that may be salutogenic in the context of loneliness.

Robot and Computer Agent Support

Research has increasingly shown that social robots and computer agents may help to reduce loneliness and improve health outcomes for people by providing functional social support. A social robot is a technology with a body in physical hardware form, which is programmed to perceive and act autonomously within its physical environment and is capable of social interaction [23]. For example, Paro is a fluffy baby harp seal robot that includes artificial intelligence for social interactions and a heater for body warmth. Paro may provide emotional support to people through companionship and physical touch akin to a pet. Paro has been shown to reduce loneliness in older adults, and reduce pulse rate and blood pressure [24-26]. Other robots have reduced loneliness by facilitating interactions between people and their support networks over video calls [27,28].

A computer agent refers to a computer-generated entity that may include a dialog system and an embodiment (eg, an animation of a face or body) [29,30]. Computer agents are often capable of social interaction and may include technologies such as embodied conversational agents, chatbots, game characters, and digital humans. There is limited research investigating the effect of support from a computer agent on loneliness [31]. However, 1 study found that companionship from an animal-like conversational agent reduced loneliness in hospitalized older adults [32]. Other studies have found that Tanya, a humanlike computer agent companion for older adults at home, was effective at improving loneliness and was highly acceptable to users [33,34]. Similarly, studies in young adult populations have found that computer agents show promise for improving loneliness [35,36].

Digital humans are a new form of computer agent that show promise for applications in health care, including acting as a supportive peer. Digital humans are computer-based, autonomous animations in the form of a human face or a full body. They use complex neurobehavioral modeling techniques involving virtual neurotransmitters and a visual computing framework described by Sagar et al [37]. These techniques enable emotional intelligence, personality, and complex social interactions. For example, digital humans can show attachment and separation distress toward a user while in high oxytocin mode. High oxytocin mode is a setting that involves more rapid firing of virtual oxytocin in the digital human’s autonomous brain model when a human is detected in front of the computer’s webcam [38]. This setting influences the digital human’s emotional expressions to become increasingly distressed when the user goes out of view and become more positive when the user returns. As digital humans exist on a screen, they may be better suited to providing emotional and informational support to people as opposed to instrumental support (eg, helping with physical tasks).

Digital humans (and computer agents in general) are a new technology, and research is needed to understand how to design them in a way that is conducive to providing support to people. Research has shown that the perceived effectiveness of social support interventions may be influenced by how socially close people feel to the support partner [39]. This suggests that it may be important to design digital humans to engage in behaviors that build closeness with people while providing support. However, there is limited research on which behaviors increase closeness with computer agents [40].

Improving Relationships With Computer Agents

Psychological research has suggested that closeness can be developed in human relationships using several techniques. These include engaging in reciprocal self-disclosure, undertaking shared activities or interests, expressing the value of the relationship, and showing empathy, among others [41,42]. However, there may be gender differences in the techniques that are most important for building closeness [41,43,44]. Women may place greater importance on emotional self-disclosure in close relationships, whereas men may find shared ideas and hobbies to be more important to closeness [41,44]. It is unclear whether behavioral techniques that increase closeness in human relationships translate to human and computer agent relationships; however, the computers are social actors (CASA) paradigm would suggest so [45].

CASA posits that people interact with computers that provide social cues in a manner similar to how they would with another person [46]. Findings from several experiments support CASA [45], showing that people make personality judgments [47] and gender stereotypes [48], engage in mindless social behaviors such as reciprocity [45], and elicit in-group and outgroup behaviors toward computers [49]. More recently, studies have shown that CASA applies to advanced technologies. A voice agent in an autonomous vehicle was perceived more positively when its communication style conformed to gender stereotypes (eg, a sociable female voice vs an informative male voice) [50]. Another study found that the similarity-attraction principle applied to a voice agent in a smart-home environment [51]. Extroverted users were found to prefer a talkative voice agent, whereas introverted users preferred an agent with multiple voices because it felt like talking to several less-talkative agents. According to CASA, computer agents that display closeness-building behaviors should form closer relationships with people than agents that do not.

Emotional Expressiveness

Emotional expressiveness is a technique that can increase closeness in human relationships, particularly in female relationships [41,44]. Emotional expressiveness refers to expressing an intimate degree of negative or positive affect (eg, through facial expression, speech, and gaze), usually while engaging in behaviors such as talking at a personal level about fears or personal problems, or while demonstrating care [44]. Emotional expressiveness may help build emotional intimacy, a type of psychological intimacy that is important in close relationships [52]. It may also be a way of providing nurturance, which is one of the 3 types of social attachment–building behaviors in mammals, according to the Mammalian Behavioural Triad theory [53]. As digital humans have a detailed animated face modeled on human musculature and a voice, they are capable of engaging in multimodal emotional expression. This involves delivering congruent emotional cues through different communication modalities, such as facial expressions and speech tone.

Although promising, research has yet to investigate whether emotional expressiveness in a digital human increases feelings of closeness toward it. Related research on social robots has found gender differences in the effect of emotional expressiveness on closeness. Women have been shown to feel closer to a social robot that expresses emotions on the face during interactions (alongside other relational behaviors, eg, self-disclosure, mirrored posture, and speaking rate), whereas men reported feeling closer to robots that expressed fewer emotions and relational behaviors [54]. It is possible that similar effects could be found with computer agents; however, experimental research is needed to investigate this.

Prior research on computer agents has shown that emotional expressiveness can increase social perceptions related to closeness. A computer agent that expressed empathy on the face and in the voice was rated as significantly more supportive, caring, and likable compared with an agent that displayed more self-focused emotion (eg, joy at its own wins in a game) [55]. Similarly, another computer agent that displayed a range of emotions on the face and in the voice was perceived as significantly more warm, caring, cooperative, and trustworthy than an agent with no emotion [56].

These studies show that emotional expressiveness in a computer agent may increase perceptions related to closeness; however, no studies have looked at the effects on closeness directly. In addition, from the existing research, it is unclear whether emotions expressed through the face or voice are more important for closeness building and supportive interactions with a computer agent. Some studies have shown that the presence of a face may help to increase attentional engagement with a computer agent [57], which could improve how well users attend to emotional cues. However, it remains unclear whether the presence of a computer agent face can affect the development of social closeness. Several instruments have been developed to quantify closeness, including the Inclusion of Other in the Self scale [58], the Perceived Interpersonal Closeness Scale [59], and the Relationship Closeness Induction Task (RCIT) measure (for experimentally induced feelings of closeness) [60].

This Study

Research has yet to investigate the effects of emotional expressiveness in a digital human’s face and voice on user outcomes, and how emotional expressiveness interacts with user gender during a supportive interaction. The aim of this study is to investigate the effect of emotional expressiveness in a female digital human on loneliness, closeness, caring perceptions, social support, stress, and physiological arousal in a community sample. It was hypothesized that there would be gender differences in the effect of emotional expressiveness in a female digital human on outcomes. Particularly, it was anticipated that women would report the greatest reductions in loneliness, stress, and physiological arousal and the greatest increases in social support, closeness, and caring perceptions in response to a female digital human with an emotional face and an emotional voice. In contrast, it was predicted that men would report better outcomes in response to a female digital human with a neutral face and a neutral voice.


Study Design

A between-group experimental study was conducted to investigate the effect of multimodal emotional expression and user gender on loneliness after a self-disclosure interaction with a female digital human. Secondary outcomes included social closeness, caring perceptions, social support, stress, and physiological arousal. Participants were block-randomized by gender to one of 6 conditions in which the digital human’s design differed by face type (emotional, neutral, or no face) and voice type (emotional or neutral; described in the Digital Human section).

The study procedures were approved by the University of Auckland Human Participants Ethics Committee on November 1, 2018 (reference number: 022191). Retrospective registration was provided by the Australia New Zealand Clinical Trials Registry (registration number: ACTRN12621000865819). Retrospective registration was sought as the study was conceived of as a human-computer interaction experiment (to understand the effect of a digital human’s design) as opposed to a clinical trial. However, later registration was sought, given the number of psychological and physiological outcomes collected as part of the experiment. This study is reported in keeping with the CONSORT (Consolidated Standards of Reporting Trials) 2010 guidelines [61].

Participant Recruitment

A community sample of 198 adults (101 women, 95 men, and 2 gender-diverse individuals) was recruited to participate in a study on relationships with a digital human. Participants were considered eligible if they were aged ≥18 years and were fluent in English. The study was advertised using flyers distributed in the University of Auckland campuses and through Facebook advertising to the Auckland city area. Compensation of a NZ $20 (US $14.33) shopping voucher was offered for participation. Participants were recruited between February 20, 2019, and July 24, 2019.

A power analysis was conducted using G*Power software to determine the required sample size. This was informed by the results of Brave et al [55], who compared the effects of agent empathy versus no empathy on feelings of support and found an effect size of f=0.32. 198 participants would be needed to detect an effect size of f=0.32, with 80% power and an α level of .05 for a 3 (face) by 2 (voice) by 2 (gender) analysis of variance (ANOVA). A recruitment target was set for 100 women and 100 men.

Data from 1.5% (3/198) of participants were excluded from analyses for the following reasons: the software would not start (1/198, 0.5%) or the participant was identified as gender diverse, and this group was not of sufficient size for adequate statistical power in gender analyses (2/198, 1%). As a result, data from 195 participants were included in the analyses. A CONSORT diagram depicts participant flow through the study (Figure 1).

Figure 1. CONSORT (Consolidated Standards of Reporting Trials) flow diagram (emo: emotional; neu: neutral).
View this figure

Measures

Figure 2 depicts the time points at which the measures were administered and data were collected in the study.

Figure 2. Time points for administration of measures and data collection. RCIT: Relationship Closeness Induction Task; VAS: visual analog scale.
View this figure
Demographics

Demographic variables including age, gender, ethnicity, marital status, and occupation were assessed by self-report at baseline. Gender was assessed using a single-response, multiple-choice question with response options (male, female, gender-diverse, and prefer not to divulge). Other demographic variables were evaluated using an open-ended self-report item (age) and multiple-choice questions with response options (ethnicity, marital status, and occupation).

Perceived Emotional Expressiveness Manipulation Check

A manipulation check evaluated the extent to which the digital human was perceived as emotionally expressive using a visual analogue scale. Participants marked their answers by placing an X at the appropriate position along a 100-mm line with response anchors at either end (0=not expressive; 100=very expressive). A score was derived by measuring the distance of the X along the line from the left-hand side. A higher score indicated that the digital human was perceived as more emotionally expressive. Perceived emotional expressiveness was measured immediately after the digital human interaction.

Loneliness

General loneliness was measured at baseline using the 20-item Revised UCLA Loneliness Scale [62]. Participants were asked to rate how often they felt the way described in general across a range of statements using a 4-point Likert scale (1=never; 4=often). The Revised UCLA Loneliness Scale has been shown to have good internal consistency reliability (Cronbach α=.94), discriminant validity with mood measures, and concurrent validity with measures of social isolation (eg, feelings of isolation, time spent alone per day, and number of close friends) [62]. Responses were summed to derive a total loneliness score. A higher score indicated greater loneliness.

State loneliness was evaluated using a visual analogue scale with response anchors placed at each end of the scale (0=not at all; 100=extremely). Participants marked how lonely they currently felt by placing an X at the appropriate place along a 100-mm line. Scores were derived by measuring the distance of the X along the line. A higher score indicated that the participant experienced a greater degree of loneliness. A visual analogue scale was chosen as it would be more sensitive to changes in feelings of loneliness within a 1-hour experiment session compared with a general loneliness measure.

Social Closeness

Social closeness with the digital human was assessed using the RCIT follow-up measure [60]. This 4-item scale evaluates the degree of perceived relationship closeness, similarity, liking, and likelihood of a future friendship using a 9-point scale. Responses range from 0 (not at all) to 9 (very). Scores were derived by measuring the distance of an X placed along a 100 mm line and transforming this value to a score from 0 to 9. Scores were summed for the 4 items. A higher score indicated stronger social closeness to the conversation partner. The scale demonstrated good internal consistency reliability in this study (Cronbach α=.86).

Perceived Social Support

Perceived social support was measured using a visual analogue scale with response anchors (0=not at all; 100=extremely) [63]. Participants rated their extent of agreement with the statement “I feel supported” by placing an X at the appropriate spot along a 100-mm line. The distance of the X along the line was measured, and a score was derived. Higher scores indicated feeling more supported. A visual analogue scale was chosen as it may be more sensitive to changes in perceived support after a short experiment session than a general measure. Although lower in internal reliability compared with a multidimensional scale, the 1-item visual analogue scale helped to reduce participant burden and was thought to be suitable for assessing feelings of support after a brief interaction.

Caring Perceptions

Caring perceptions are judgments about an agent’s traits (whereas feeling supported pertains to the effect of the interaction on the user). Caring perceptions were assessed using the Caring Perceptions Scale by Brave et al [55]. This is a 5-item scale that evaluates the extent to which a computer agent is perceived as caring. Items are rated on a 10-point scale with semantic differential response anchors (eg, not compassionate to compassionate). Items in the scale evaluated how warm, compassionate, selfish, friendly, and cooperative the digital human was. This scale has been used in prior research looking at relationships with embodied conversational agents and has shown good internal consistency reliability in university student samples (Cronbach α=.88) [55]. Items were summed, and a higher score indicated that the participant perceived the digital human as more caring.

Perceived Stress

Current stress was measured using a 100-mm visual analogue scale with response anchors (1=not at all; 100=extremely) [64,65]. Participants rated their extent of agreement with the phrase “I feel stressed” by placing an X at the appropriate point along the line. The distance from the left edge of the line was measured, and a score was derived. A higher score indicated greater perceived stress. This visual analogue scale was chosen because it was shown to be sensitive to changes in perceived stress after a brief experimental session in previous research [64,65].

Physiological Stress Response

Heart rate, electrodermal activity, and skin temperature are measures of physiological arousal that can indicate stress [66,67]. Heart rate, electrodermal activity, and skin temperature were measured using a wrist-worn sensor device called Empatica E4 (Empatica Inc). Data were continuously collected during the interaction and subsequently processed using a Python script to create average heart rate (converted from blood volume pulse data), electrodermal activity, and skin temperature scores per participant. Higher average scores in heart rate and electrodermal activity and lower scores in wrist skin temperature indicated greater physiological arousal during the interaction with the digital human. Data were analyzed from the beginning to the end of the participant’s interaction with the digital human. Data from only 170 participants were included in the analyses for physiological stress, as 25 data files were lost to processing errors.

Procedure

Participants attended a 45-minute experimental session at the University of Auckland Clinical Research Centre. After providing written informed consent, participants secured an Empatica E4 sensor to their wrist and completed a baseline questionnaire on paper. The baseline questionnaire assessed demographic and psychological variables, including loneliness, stress, and social support. Participants were then block-randomized by gender to interact with 1 of 6 versions of a digital human. Block randomization was completed before the session by a member of the research team, who automatically generated a randomization table using Research Randomizer software. Allocations were concealed from the researcher and the participant in opaque envelopes. The researcher remained blinded to the participant’s condition until the envelope was opened immediately before starting the appropriate computer program for the participant. Although the participants were deblinded to their condition upon starting their interaction, they remained unaware of what digital humans in the other experimental conditions were like. The 6 digital humans varied in terms of modality richness and emotional expression (emotional face, neutral face, no face; emotional voice, and no voice).

The researcher provided the participant with verbal and written instructions for interacting with the digital human, started the digital human program on the laptop computer, then closed the door, and exited the room for the duration of the interaction. Participants were instructed to call on the researcher (who was sitting on a chair down the hallway) by ringing a loud desk bell if they encountered any technical difficulties that they could not resolve.

Participants completed the RCIT with a digital human. The RCIT is a 15-minute structured conversation involving reciprocal self-disclosure of personal information, feelings, and memories over a list of 28 personal questions [60]. Questions gradually increase in intimacy and cover topics from “What is your name?” to “Describe the last time you felt lonely.” The RCIT has been shown to reliably induce a sense of closeness among human-stranger dyads in experimental psychology research [60], and it has been shown to promote health benefits found in naturally occurring close relationships, including faster wound healing speed and reduced stress [64]. The RCIT is divided into 3 sections with a time limit for each (2, 5, and 8 minutes). To maintain a private interaction between the participant and the digital human, the participant was required to ring a loud desk bell each time they reached the end of a section. The researcher sat in the hallway running a digital timer that was cleared at the beginning of each section. If a participant went over the time limit for a given section, the researcher knocked on the door and the participant was instructed to move on to the next section. Participants’ language from their interaction with the digital human was recorded and analyzed for emotional content in another paper [68]. Once the RCIT was completed, the participant rang the desk bell, and the researcher returned to the room. The participant removed the wrist sensor and completed a second paper questionnaire on perceived loneliness, closeness, caring perceptions, social support, and stress. Participants also answered a series of open-ended, written qualitative questions on their perceptions of the digital human (reported in another paper; Loveys, unpublished data, March 2021). Once the questionnaire was completed, the participant was provided with a NZ $20 (US $14.33) shopping voucher.

Digital Human

Overview

The digital human was presented on a laptop computer screen in a portrait view with an embodiment of the head and shoulders depicted in front of a black background. The digital human was named Holly, and she was modeled on a young adult female of mixed ethnicity (New Zealand Māori, New Zealand European, Indian, and European). In the no-face conditions, the digital human appeared as a black screen only. Figure 3 illustrates how the digital human appeared in the no-face, neutral face, and emotional face conditions. The digital human functioned autonomously using a finite-state conversation engine and a neurobehavioral modeling and visual computing framework, as described by Sagar et al [37].

Figure 3. The digital human across the 3 visual conditions (1: no face; 2: neutral face; 3: emotional face). Each visual condition could be paired with either an emotional or neutral voice.
View this figure

Before programing the digital human’s emotional expressions, a member of the research team (KL) recruited 4 psychology master’s students to complete the RCIT in 2 dyads (1 female-female; 1 male-male). Each dyad completed the task separately in a private room with the researcher present. The researcher recorded notes on the types of emotions expressed by the dyads during each of the 28 questions in the conversation task. Notes were also made regarding the types of responses people gave to different questions. This information was used to inform the answers the digital human gave to the RCIT questions and the types of emotions it expressed on the face and in the voice for each question. This approach was taken to improve how natural the digital human’s responses were. The digital human’s language was consistent between participants, as were the emotional expressions on the face and in the voice for individuals in those respective conditions. This design choice was made to maintain experimental control against the potential confounds of different conversation content or amounts of emotional expression, which could have affected the outcomes.

The digital human’s script underwent an iterative development process involving the research team drafting responses to the RCIT questions, programming responses into the digital human as part of a finite-state conversation engine, and then testing the conversation with other members of the research team. The digital human’s responses were refined over several iterations to make them seem natural and convincing. The digital human interacted autonomously with participants.

Emotional Face

The emotional face condition included a mixture of positive and negative expressions, including compassion, joy, fear, and sadness. Compassion and joy were expressed most frequently (113/213, 53% and 82/213, 38.5% of the interactions, respectively). The emotional expressions were subtle in order to seem more natural. The emotional expressions on the face were preprogrammed and triggered based on the phrases that the digital human spoke using a text-to-speech Emotional Markup Language. Programing the Emotional Markup Language required manually classifying each sentence the digital human spoke into compassion, joy, fear, or sadness categories in the digital human program. The result was that each time the digital human spoke a phrase, she elicited the associated emotional expression on her face. Emotional expressions on the face were autonomously generated in real time as the digital human spoke to the participant using neurobehavioral modeling and visual computing techniques. These techniques are described in detail by Sagar et al [37,38]. While listening, the digital human did not express emotions on the face, although she engaged in humanlike facial movements, such as blinking, as described in the Other Behaviors section.

The neutral face condition contained the same digital human face as the emotional face condition; however, no facial expressions were made during the interaction. The only movements the neutral face made were of the mouth to accommodate speech and the natural behaviors described in the Other Behaviors section (eg, blinking and head movements with speech). In the no-face condition, participants interacted with a plain black screen that contained no digital human animation and only a voice.

The digital human was capable of mirroring people’s facial expressions; however, this capability was turned off for the experiment. This was to ensure that the digital human did not provide emotional facial expressions in the neutral face condition and that participants in the emotional face condition received the same amount of emotional expression from the digital human. However, the tradeoff was that the digital human could not return a person’s smile, for example.

Emotional Voice

Participants interacted with a digital human that had either an emotional or neutral voice. The digital human’s voice was recorded by a young adult female voiceover artist with a local New Zealand accent. Emotions expressed in the emotional voice condition were compassion (ie, low arousal positive affect; 113/213 statements, 53% of the conversation), joy (ie, high arousal positive affect; 82/213 statements, 38.5% of the conversation), and sadness or anxiety (ie, low arousal negative affect; 18/213 statements, 8.5% of the conversation). The neutral voice had an absence of positive and negative emotions; however, it maintained normal speech intonation for a New Zealand accent. Each sentence in the digital human’s script was recorded twice—one in a neutral voice and the other in an emotional voice. For the emotional voice, the voiceover artist read from a script with annotations to guide when certain emotional expressions should take place. These emotional expressions were tied to phrases to correspond with when emotions were portrayed in the digital human’s face. Recording the voice clips was an iterative process where a member of the research team (KL) and an audio engineer provided subjective feedback to the vocal artist at the time of recording, and the vocal artist rerecorded clips where necessary. Clips were rerecorded when the emotional expression was either too strong or not strong enough or where there appeared to be emotion in a neutral voice clip.

To ensure that the emotional and neutral voices were sufficiently different, an emotion classifier was used to objectively analyze the vocal clips. The classifier operated with 79.9% accuracy and classified vocal clips based on energy and pitch contours, which were modeled using Gaussian mixture models. Analyses revealed that the compassionate and joyful emotional voice clips were significantly different from the neutral voice clips. However, there were insufficient sadness and anxiety voice clips to reach significance. The voice recordings were generated in a voice package that was connected to a finite-state conversation engine.

Other Behaviors

The digital human engaged in several other behaviors to appear more humanlike during the interaction. She maintained eye contact with people for the majority of the interaction using computer vision technology. Before speaking, she would often look at the top-right of the screen as if thinking. She also engaged in humanlike movements, such as blinking, moving her head, raising her eyebrows with speech, and lightly swaying her body while listening or speaking. If she did not understand the participant, she would say, “I didn’t understand, you can try rephrasing.” The digital human did not perform any other behaviors during the interaction.

Data Analysis

Data were analyzed using SPSS software, version 27 (IBM Corporation). Data were checked for violations of test assumptions, and bootstrapping was applied to tests where data were not normally distributed. Chi-square tests and ANOVA tests were conducted to check for baseline group differences in demographic and psychological variables. As no significant differences were found in the baseline variables, they were not controlled for in subsequent analyses. A series of 3-way factorial ANOVA tests were conducted to evaluate the effect of face type, voice type, and gender on outcomes at time point 2 (T2). Post hoc tests with Sidak correction were applied as follow-up analyses.


Participants

Participants were predominantly young adults (mean 28.31, SD 10.97), single (96/195, 49.2%), and university students (132/195, 67.7%). There was an approximately equal number of men (94/195, 48.2%) and women (101/195, 51.8%). Participants represented a mix of ethnicities (80/195, 41.0% Asian; 67/195, 34.4% New Zealand European; 38/195, 19.5% other; 6/195, 3.1% Māori; and 4/195, 2.1% Pacific Peoples). Participants reported a moderate degree of loneliness at baseline (mean 36.97, SD 10.04). There were no significant differences in baseline demographic or psychological variables between the experimental groups, and there were no significant differences in psychological variables between men and women at baseline.

Manipulation Check

A manipulation check revealed that participants perceived the emotional face and voice conditions as more emotionally expressive. There were significant main effects of face type (F2,183=6.97; P=.001) and voice type (F1,183=3.94; P=.049) on perceived emotional expressiveness of the digital human. An emotional face was rated as significantly more emotionally expressive than a neutral face (mean 53.02, SD 27.33, vs mean 40.14, SD 23.06; P=.01). No face was rated as more emotionally expressive than a neutral face (mean 55.17, SD 24.90; P=.002). In terms of voice type, emotional voice was rated as significantly more emotionally expressive than neutral voice (mean 52.79, SD 24.73, vs mean 45.91, SD 26.71; P=.049). There were no other significant main, 2-way, or 3-way interaction effects (all F values <1.60).

At times, errors with the conversation engine meant that the digital human occasionally interrupted the participants during their speech. However, interruptions were equally likely across the experimental conditions (F5,179=1.11; P=.36).

Perceived Loneliness (T2)

Analyses revealed gender differences in the effect of a digital human’s face type on perceived loneliness. There was a significant 2-way interaction effect between face type and user gender (F2,182=4.95; P=.008; Figure 4). A digital human with no face was associated with significantly less loneliness in men than in women (mean 15.76, SD 19.07, vs mean 26.71, SD 24.86; P=.047). A neutral face digital human was associated with significantly less loneliness in women than in men (mean 13.55, SD 17.11, vs mean 26.88, SD 26.64; P=.01). A neutral face digital human was also associated with significantly less loneliness in women than a no-face digital human (P=.04). There were no other significant main, 2-way, or 3-way interaction effects (all F values <1.68).

Figure 4. A significant two-way interaction effect of face type and gender on perceived loneliness after the digital human conversation (*P<.05). Bars indicate the SE.
View this figure

Social Closeness

An emotional voice increased feelings of closeness toward a digital human but only in women. There was a significant 2-way interaction effect between voice type and gender (F1,181=4.17; P=.04; Figure 5). Women reported significantly greater closeness with an emotional voice digital human than men did (mean 19.18, SD 7.21, vs mean 15.68, SD 7.53; P=.02). There was a trend toward women reporting greater closeness with an emotional voice digital human compared with a neutral voice digital human (mean 16.45, SD 7.06; P=.07). There were no other significant main, 2-way, or 3-way interaction effects (all F values <1.06).

Figure 5. A significant two-way interaction effect of voice type and gender on social closeness with a digital human after the conversation (*P<.05). Bars indicate the SE.
View this figure

Perceived Support (T2)

A neutral voice was associated with significantly greater feelings of support across men and women. Analyses revealed a significant main effect of voice type on perceived support (F1,182=5.77; P=.02). A neutral voice digital human was associated with significantly higher ratings of perceived support than an emotional voice digital human (mean 65.77, SD 19.09, vs mean 59.23, SD 19.01; P=.02; Figure 6).

Figure 6. A significant main effect of digital human voice type on perceived support after the digital human conversation (*P<.05 and **P<.01). Bars indicate the SE.
View this figure

Analyses also revealed that there may be gender differences in how supported people felt in response to emotional and neutral voices in digital humans. There was a trend toward a significant 2-way interaction effect between voice type and gender (F1,182=3.66; P=.06). Men felt significantly more supported with a neutral voice digital human than with one with an emotional voice (mean 68.03, SD 18.49, vs mean 56.29, SD 18.38; P=.002), whereas women felt more supported with an emotional voice digital human than men (mean 62.18, SD 19.25, vs mean 56.29, SD 18.38), however, this was a trend toward significance (P=.10). There were no other significant main, 2-way, or 3-way interaction effects (all F values <2.33).

Caring Perceptions

An emotional voice was found to increase caring perceptions in both men and women. Analyses revealed a significant main effect of voice type on caring perceptions (F1,183=4.26; P=.04). An emotional voice digital human was rated as significantly more caring than a neutral voice digital human (mean 29.45, SD 6.56, vs mean 27.42, SD 6.94; P=.04).

There was a trend toward a significant 2-way interaction effect between voice type and gender (F1,183=3.31; P=.07; Figure 7). Women rated the emotional voice digital human as significantly more caring than men did (mean 30.88, SD 5.82, vs mean 27.96, SD 6.99; P=.03). Women also rated the emotional voice digital human as significantly more caring than a neutral voice digital human (mean 27.14, SD 6.83; P=.006). There were no other significant main, 2-way, or 3-way interaction effects (all F values <1.55).

Figure 7. A significant main effect of voice type on caring perceptions, with a trend for an interaction with gender (*P<.05 and **P<.01). Bars indicate the SE.
View this figure

Perceived Stress (T2)

Gender was found to impact the effect of a digital human’s face type on perceived stress. A significant 2-way interaction effect was found between face type and gender (F2,182=6.01, P=.003; Figure 8). A nonface digital human was associated with significantly lower perceived stress in men than a neutral face (mean 17.55, SD 15.87, vs mean 34.36, SD 22.53; P=.01) and an emotional face digital human (mean 33.13, SD 19.09; P=.02), whereas women reported significantly greater stress with a no-face digital human than men (mean 35.00, SD 23.90; P=.003). Women also reported lower stress with a neutral face digital human than men; however, this was a trend toward significance (mean 24.09, SD 23.67; P=.07). There were no other significant main, 2-way, or 3-way interaction effects (all F values <2.68).

Figure 8. A significant two-way interaction effect of face type and gender on perceived stress after the digital human interaction (*P<.05 and **P<.01). Bars indicate the SE.
View this figure

Physiological Outcomes

There were no significant main, 2-way, or 3-way interaction effects of face type, voice type, or gender on average skin temperature (all F values <0.89) or heart rate (all F values <0.98) over the conversation. However, there were significant main effects of voice type (F1,158=4.72; P=.03) and gender (F1,158=6.54; P=.01) on average electrodermal activity. An emotional voice was associated with significantly greater electrodermal activity over the conversation than a neutral voice (mean 3.07, SD 5.38, vs mean 1.71, SD 2.13; P=.03; Figure 9). In addition, men experienced higher electrodermal activity during the interaction than women (mean 3.19, SD 5.69, vs mean 1.59, SD 2.04; P=.01). There were no other significant main, 2-way, or 3-way interaction effects (all F values <2.38).

Figure 9. Significant main effects of voice type and gender on electrodermal activity during the digital human interaction (*P<.05). Bars indicate the SE.
View this figure

Principal Findings

This is the first study to investigate whether emotional expressiveness in a female digital human and user gender interact to affect loneliness, social, and physiological outcomes following a self-disclosure conversation.

Several main effects were found for the impact of emotional expressiveness in a female digital human’s voice on physiological arousal and social outcomes. Perceived social support was significantly higher for participants after interacting with a neutral voice digital human than after interacting with an emotional voice digital human. However, an emotional voice digital human was perceived as more caring overall than one with a neutral voice. Average electrodermal activity was higher during interactions with an emotional voice digital human than with a neutral voice one. These findings suggest that emotional expression in a digital human’s voice can increase physiological arousal and perceptions of caring yet reduce feelings of being supported across both genders.

Gender did impact the effect of emotional expressiveness on loneliness, physiological arousal, and social outcomes. Women benefited more from a female digital human with emotional expressions in the voice and with a face. For women, emotional expression in the voice increased closeness with the digital human, caring perceptions, and perceived social support. In addition, for women, the presence of a neutral face was associated with reduced loneliness and subjective stress than a digital human with no face.

In contrast, men had better outcomes with a female digital human without emotional expression in the voice and with no face (ie, a black screen). For men, a neutral voice digital human was associated with increased closeness to the digital human, caring perceptions, and perceived social support. A digital human without a face (ie, a black screen) was associated with less loneliness and lower subjective stress in men.

The potential reasons for these effects and how these findings relate to other research are discussed in subsequent sections.

Main Effects

Emotional expression in the voice increased physiological arousal and perceptions of care yet reduced perceptions of support. The reason for why people felt less supported may be because the emotions were focused on the digital humans’ own words, as opposed to varying in response to the emotions expressed by the user. It is likely that the users expected the digital human to respond emotionally to their personal stories. Indeed, prior research has found that a computer agent that consistently delivered other focused emotions (eg, compassion) in response to users’ behavior in a game was perceived as more caring and was associated with greater feelings of support than a computer agent that delivered self-focused emotion (eg, joy at its own success in the game) [55].

The emotional voice was associated with greater physiological arousal than a neutral voice. High electrodermal activity could indicate either positively or negatively valanced arousal (eg, excitement or fear) [69]. This result is consistent with previous research showing that electrodermal activity increases when listening to emotionally valanced sounds both pleasant and unpleasant, although the greatest increases occur in response to pleasant sounds [69].

Men were found to have greater electrodermal activity than women during the interaction with the digital human. This may be explained by prior research showing that men have different brain responses to expressed emotions than women, along with different responses to viewing a female face [70,71]. For example, men and women have been shown to differentially activate brain regions in response to observing contempt and disgust facial expressions (eg, the medial frontal gyrus) [69]. This differential activation of brain regions may have downstream physical effects on the body, influencing outcomes such as electrodermal activity.

Gender Differences

The finding that women felt closer to a digital human with an emotional voice but men felt closer to a digital human with a neutral voice is in keeping with prior literature. Emotional expressiveness is an important part of close female relationships [41,43,44], but it is a less important part of close male relationships. For men, shared activities, interests, and ideas are more important aspects of close relationships [41,44].

Similar results have been found in research examining the effect of emotional expressiveness on closeness with other types of artificial agents, including robots. Women felt closer to social robots that expressed emotions during interactions, whereas men felt closer to robots that expressed fewer emotions [54]. In research with computer agents, affective support in language was associated with greater rapport in girls, whereas for boys, an agent with only task-oriented language was associated with higher rapport [72]. The results of this study build upon the literature by showing that emotional expressiveness by computer agents can foster closer relationships for women, especially when emotions are expressed in the voice.

The gendered effects of the emotionality of the voice on perceived closeness may have also had an impact on perceived support. Women felt more supported by the emotional voice and men felt more supported by the neutral voice. This aligns with previous research showing that perceptions of social support are related to perceived closeness toward the support partner in human relationships [39].

This research also found gender effects on the impact of the digital human’s face type on perceived loneliness and stress. Men felt least lonely and stressed after interacting with a no-face digital human. In contrast, women felt the least lonely and stressed with a neutral face digital human. These results were unexpected; we predicted that women would experience the least amount of loneliness and stress with an emotional face and voice digital human. This was based on the existing literature showing lower loneliness and stress after interactions with emotionally expressive computer agents [73,74]. However, the effects of emotional expression on the face and in the voice were not compared in these studies.

This is the first study to investigate the effects of emotional expression in the voice and on the face separately in computer agents. It is possible that the effects of emotional expression are stronger when expressed in the voice than on the face. This is supported by findings that have shown that voice-only communication can increase perceptions of empathic accuracy [42]. However, as the emotional faces were very subtle in their expressions and the neutral face included humanlike movements that could have inadvertently portrayed interest (eg, raising eyebrows with speech), it is possible that the emotional and neutral face conclusions are both more about the presence of a face. This study leaves open the question of the degree of emotional expression; it is possible that different results may have been achieved with more pronounced emotional expressions on the face.

It is unclear why a neutral face was found to be more helpful for loneliness and stress in women than an emotional face, given that a manipulation check revealed that the emotional face was perceived as more emotionally expressive. It is possible that either the emotional expressions were not strong enough or the types of emotions that the digital human expressed in this study influenced outcomes. The digital human expressed a combination of self- (eg, joy and sadness) and other focused (eg, compassion and concern) facial emotions during the interaction. An interaction involving only other focused emotions could have more accurately mimicked the role of a supportive peer and had stronger effects on loneliness and stress.

Other research has shown that smiling is associated with better perceptions of a health care robot than not smiling [75]. In this study, the emotional face digital human did not smile all the time, and instead had a slight frown when expressing concern and looked sad when expressing its own sad experiences. The presence of one or both of these negative emotions might have negatively affected outcomes.

This study adds to the literature by showing that emotional expressiveness (a behavior that is important in the development of close relationships in women but not men) shows similar effects in relationships with a computer agent. In addition, the results of this study show that loneliness, stress, and social outcomes that have been associated with close human relationships can also be influenced by conversations with a computer agent. The results provide some supportive evidence for the CASA paradigm [46], which suggests that people engage in the same social behaviors toward computers that elicit social cues as they do toward other people. It makes sense that behaviors that help to build closeness in human relationships (eg, emotional expressiveness) have similar effects in relationships with a computer agent. However, technological limitations in responding may have affected cues being entirely the same as for real human conversations, as discussed in the Limitations section.

Design Recommendations

Overall, the results of this study provide several design recommendations for digital humans intended for the role of a supportive peer. First, when designing a supportive peer for female users, incorporating an emotional voice may be important for promoting a closer relationship and feelings of support. For women, including an animated face was also important for reducing loneliness and perceived stress.

In contrast, when designing a supportive peer for men, a neutral voice may help to promote a closer relationship and increase feelings of support. For men, no face (ie, a black screen with a voice) was associated with less loneliness and stress than a female face with some or no emotional expression. However, it is possible that different results may have been found for men had they interacted with a male digital human.

It is important to note that these recommendations were based on results obtained with a predominantly young adult, New Zealand student sample and with a young female digital human. However, previous research has also shown gender differences in the effects of different closeness-building behaviors with other age groups in different geographic locations [41,43,44].

These findings may not generalize to all cultures. It is possible that people’s preferences regarding different aspects of emotional expressiveness (eg, type and strength) may shift at the intersection of gender and other demographic characteristics, such as culture. Prior research has shown cultural differences in targets for emotional expressiveness, including how emotions are expressed and what types are the most socially desirable to portray [76,77]. For example, in East Asian cultures, it is more socially desirable to express low arousal positive affect (eg, calm), and negative emotions are less often expressed directly [77]. In this study, there were 41% (80/195) Asian participants and 34.4% (67/195) New Zealand Europeans.

In contrast, in North American White culture, high arousal positive affect (eg, excitement) is the most socially desirable to express [77]. As a result, culture may affect how a digital human’s emotional expressiveness is perceived by people and, subsequently, the effect it has on loneliness, social, and physiological outcomes. It is possible that emotional expressiveness in a digital human may need to be tailored to the culture of its intended users alongside their gender. The results may have been different if the study was conducted in the United States.

Limitations

This study had several limitations that could affect the generalizability of the results. First, the digital human conversation engine experienced occasional errors with utterance detection and speech-to-text translation during its interactions with people. These errors manifested as behaviors such as interrupting people or asking people to repeat themselves, which could have increased negative mood in some participants [68]. However, the errors were not more or less frequent in any of the experimental conditions.

Another possible limitation is that the study was conducted in a community sample predominantly comprised of young adult students. It is unclear whether the findings would apply to other groups who might benefit from a social support intervention with a digital human, such as older adults (who could have lower digital literacy and more challenges with using a digital human program) or particular patient samples (who may be considerably more stressed at baseline). In addition, this study focused on gender differences between men and women; therefore, it is unknown how gender-diverse persons might respond to emotional expressiveness in a digital human, including the effects on their loneliness, social, and physiological outcomes.

Another consideration is that other features of the digital human could have inadvertently influenced people’s responses to it. For example, research has shown that the proportion of a computer agent’s facial features may influence personality perceptions [78]. A narrower face and smaller eyes have been associated with perceptions of greater aggression in computer agents [78]. Both narrow and wide faces have been associated with lower perceptions of trustworthiness than a standard width face, and smaller eyes have been rated as less trustworthy than medium-sized eyes in a computer agent [78]. The face design of the digital human in this study was consistent across the neutral and emotional face conditions; however, each of these components of the digital human’s face could have contributed in part to how participants evaluated the digital human. It is also possible that perceived attractiveness and eeriness of the digital human (ie, the uncanny valley effect [79]), as well as prior experiences with digital humans could have influenced the responses. However, because participants were randomized into groups, these factors should not confound the results.

An important consideration is that demographic matching (eg, gender and ethnicity) between the participant and the digital human could have influenced the social responses. The digital human in this study was female, and different results may have been found if a male digital human had been used. In prior research, gender matching has been shown to influence preferences toward computer agents [80], and ethnicity matching has increased perceptions of trustworthiness [81], social presence [82], and credibility toward computer agents [83]. Ethnicity matching has also been shown to increase closeness, disclosure comfort, and future relationship expectations with computer agents [84].

In this study, the digital human was of mixed ethnicity (New Zealand Māori, New Zealand European, Indian, and European), and the participants were predominantly New Zealand European or Asian, but all resided in New Zealand. There was no significant difference between groups in the proportions of participants of each ethnicity; therefore, this would not have affected our main results. Nevertheless, across the entire sample, Asian ethnicity was related to greater perceived closeness with the digital human (P=.01). This may be because of the tendency for Asian cultures to be more accepting of robotic technologies in general [85]. Matching the user’s accent and conversational style has also been shown to increase acceptability and trust with robots and computer agents [86-90]. In this study, the digital human spoke in a local New Zealand accent; however, she did not match the conversational style of participants. We do not have data on participants’ accents from this study.

The effects of demographic and conversational style matching may be because of the similarity-attraction hypothesis, which predicts that people like others more who are more similar to themselves [91]. This social phenomenon has been shown to apply to relationships with computers [46]. Matching the digital human’s demographic characteristics to each participant was not feasible in this study; however, future research could investigate how this might affect the results. The similarity-attraction hypothesis may also explain some gender effects, as women are generally more emotionally expressive than men.

Finally, the expressions in the emotional face condition may not have been expressive enough to elicit changes in outcomes for users. It is possible that more pronounced emotional expressions may have resulted in different findings. In addition, humanlike behaviors performed by the neutral face condition could have been interpreted as showing interest, which may have influenced the results (eg, head movements while listening and eyebrow raises while speaking).

Future Research

The results of this study indicate several important areas for future research. First, experimental research is needed to investigate the generalizability of the results at the intersection of gender and other demographic or health characteristics (eg, ethnicity, age, and disability). In addition, this study did not examine the effect of emotional expressiveness in a digital human on individuals who are gender diverse, and this topic should be explored in future research.

Second, it is possible that the results may have been affected by the degree of emotional expression in the digital human’s face and the types of emotions that the digital human expressed during the interaction (ie, a mixture of self- and other focused emotions and positive and negative emotions). The expression of these types of emotions has been shown to influence perceptions of computer agents and robots in prior research [55,75]. Future studies could investigate whether expressing more pronounced emotions on the face improves outcomes and evaluate the effect of other focused emotional expression in a digital human (eg, compassion and concern) on loneliness, social, and physiological outcomes. It is likely that this style of emotional expression would better mimic a therapeutic or supportive interaction and may result in greater improvements in outcomes than a digital human that expresses mixed or self-focused emotion. Research could also compare the effects of expressing positive, negative, and a mixture of these expressions.

This study identified emotional expressiveness as an important design feature that should be included in female digital human support persons for women. However, there are likely many other digital human behaviors that could contribute to reductions in loneliness, social, and physiological outcomes for people [40]. This study found that men experienced better outcomes with a female digital human that had less emotional expressiveness in the voice and had no face. However, this finding should be further investigated as men (and women) could have responded differently to a male digital human.

Conclusions

Overall, emotional expressiveness in a female digital human’s voice was associated with perceiving her as more caring and with experiencing greater physiological arousal during a self-disclosure conversation. However, emotion in the voice was associated with lower feelings of support after the interaction. Gender was an important moderator of these effects. The findings indicate that when designing a female digital human support person for women, an emotional voice and the presence of a face may be important features to include. In contrast, when designing a female digital support person for men, a neutral voice and no face may result in better outcomes than a female with emotional expressiveness. The results provide support for the CASA paradigm and may inform the design of computer agent support persons.

Acknowledgments

The authors wish to thank the engineering team at Soul Machines Ltd for their assistance in building the digital human for this experiment. The authors would also like to thank Andy Ha for assistance with sensor data processing.

Conflicts of Interest

MS is the chief executive officer of Soul Machines (an artificial intelligence company), which supports KL with a PhD stipend, contracts EB for consultancy work, and hired XZ as an engineer at the time of the research.

Multimedia Appendix 1

CONSORT e-HEALTH checklist (V 1.6.1).

PDF File (Adobe PDF File), 397 KB

  1. Bu F, Steptoe A, Fancourt D. Who is lonely in lockdown? Cross-cohort analyses of predictors of loneliness before and during the COVID-19 pandemic. Public Health 2020 Sep;186:31-34 [FREE Full text] [CrossRef] [Medline]
  2. Perlman D, Peplau LA. Toward a social psychology of loneliness. In: Duck S, Gilmour R, editors. Personal Relationships in Disorder. Cambridge, Massachusetts: Academic Press; 1981:31-56.
  3. Cacioppo S, Capitanio JP, Cacioppo JT. Toward a neurology of loneliness. Psychol Bull 2014 Nov;140(6):1464-1504 [FREE Full text] [CrossRef] [Medline]
  4. Masi CM, Chen H, Hawkley LC, Cacioppo JT. A meta-analysis of interventions to reduce loneliness. Pers Soc Psychol Rev 2011 Aug 17;15(3):219-266 [FREE Full text] [CrossRef] [Medline]
  5. Jones WH, Moore TL. Loneliness and social support. J Soc Behav Personal 1987;2(2):145-156 [FREE Full text]
  6. Groarke JM, Berry E, Graham-Wisener L, McKenna-Plumley PE, McGlinchey E, Armour C. Loneliness in the UK during the COVID-19 pandemic: cross-sectional results from the COVID-19 Psychological Wellbeing Study. PLoS One 2020 Sep 24;15(9):e0239698 [FREE Full text] [CrossRef] [Medline]
  7. Gerst-Emerson K, Jayawardhana J. Loneliness as a public health issue: the impact of loneliness on health care utilization among older adults. Am J Public Health 2015 May;105(5):1013-1019. [CrossRef] [Medline]
  8. Holt-Lunstad J, Smith TB, Layton JB. Social relationships and mortality risk: a meta-analytic review. PLoS Med 2010 Jul 27;7(7):e1000316 [FREE Full text] [CrossRef] [Medline]
  9. Valtorta NK, Kanaan M, Gilbody S, Ronzi S, Hanratty B. Loneliness and social isolation as risk factors for coronary heart disease and stroke: systematic review and meta-analysis of longitudinal observational studies. Heart 2016 Jul 01;102(13):1009-1016 [FREE Full text] [CrossRef] [Medline]
  10. Lee SL, Pearce E, Ajnakina O, Johnson S, Lewis G, Mann F, et al. The association between loneliness and depressive symptoms among adults aged 50 years and older: a 12-year population-based cohort study. Lancet Psychiatry 2021 Jan;8(1):48-57 [FREE Full text] [CrossRef] [Medline]
  11. Beutel ME, Klein EM, Brähler E, Reiner I, Jünger C, Michal M, et al. Loneliness in the general population: prevalence, determinants and relations to mental health. BMC Psychiatry 2017;17(1):97 [FREE Full text] [CrossRef] [Medline]
  12. Bu F, Steptoe A, Fancourt D. Loneliness during a strict lockdown: trajectories and predictors during the COVID-19 pandemic in 38,217 United Kingdom adults. Soc Sci Med 2020;265:113521 [FREE Full text] [CrossRef] [Medline]
  13. Uchino BN, Cacioppo JT, Kiecolt-Glaser JK. The relationship between social support and physiological processes: a review with emphasis on underlying mechanisms and implications for health. Psychol Bull 1996;119(3):488-531. [CrossRef] [Medline]
  14. Hogan BE, Linden W, Najarian B. Social support interventions: do they work? Clin Psychol Rev 2002;22(3):381-440. [CrossRef]
  15. Lepore SJ. Problems and prospects for the social support-reactivity hypothesis. Ann Behav Med 1998;20(4):257-269. [CrossRef] [Medline]
  16. Ditzen B, Heinrichs M. Psychobiology of social support: the social dimension of stress buffering. Restor Neurol Neurosci 2014;32(1):149-162. [CrossRef] [Medline]
  17. Kiecolt-Glaser JK, Gouin J, Hantsoo L. Close relationships, inflammation, and health. Neurosci Biobehav Rev 2010;35(1):33-38 [FREE Full text] [CrossRef] [Medline]
  18. Heinrichs M, Baumgartner T, Kirschbaum C, Ehlert U. Social support and oxytocin interact to suppress cortisol and subjective responses to psychosocial stress. Biolog Psychiatry 2003;54(12):1389-1398. [CrossRef]
  19. Kim H, Cheon E, Bai D, Lee YH, Koo B. Stress and heart rate variability: a meta-analysis and review of the literature. Psychiatry Investig 2018;15(3):235-245 [FREE Full text] [CrossRef] [Medline]
  20. Cohen S, Janicki-Deverts D, Doyle WJ, Miller GE, Frank E, Rabin BS, et al. Chronic stress, glucocorticoid receptor resistance, inflammation, and disease risk. Proc Natl Acad Sci 2012;109(16):5995-5999 [FREE Full text] [CrossRef] [Medline]
  21. Glaser R, Kiecolt-Glaser J. How stress damages immune system and health. Discov Med 2005;5(26):165-169 [FREE Full text] [Medline]
  22. Fricchione G. Compassion and Healing in Medicine and Society: On the Nature and Use of Attachment Solutions to Separation Challenges. Baltimore, MD: Johns Hopkins University Press; 2011:1-552.
  23. Hegel F, Muhl C, Wrede B, Hielscher-Fastabend M, Sagerer G. Understanding social robots. In: Proceedings of the Second International Conference on Advances in Computer-Human Interactions. 2009 Presented at: Second International Conference on Advances in Computer-Human Interactions; Feb 1-7, 2009; Cancun, Mexico p. 169-174. [CrossRef]
  24. Robinson H, Macdonald B, Kerse N, Broadbent E. The psychosocial effects of a companion robot: a randomized controlled trial. J Am Med Dir Assoc 2013 Sep;14(9):661-667. [CrossRef] [Medline]
  25. Petersen S, Houston S, Qin H, Tague C, Studley J. The utilization of robotic pets in dementia care. J Alzheimers Dis 2017;55(2):569-574 [FREE Full text] [CrossRef] [Medline]
  26. Robinson H, MacDonald B, Broadbent E. Physiological effects of a companion robot on blood pressure of older people in a residential care facility: a pilot study. Australas J Ageing 2015;34(1):27-32. [CrossRef] [Medline]
  27. Moyle W, Jones C, Cooke M, O'Dwyer S, Sung B, Drummond S. Connecting the person with dementia and family: a feasibility study of a telepresence robot. BMC Geriatr 2014;14(1):7 [FREE Full text] [CrossRef] [Medline]
  28. Niemelä M, van Aerschot L, Tammela A, Aaltonen I, Lammi H. Towards ethical guidelines of using telepresence robots in residential care. Int J of Soc Robotics 2019;13(3):431-439. [CrossRef]
  29. Powers A, Kiesler S, Fussell S, Torrey C. Comparing a computer agent with a humanoid robot. In: Proceedings of the ACM/IEEE International Conference on Human-Robot Interaction. 2007 Presented at: International Conference on Human Robot Interaction; March 10 - 12, 2007; Arlington, VA, USA p. 145-152. [CrossRef]
  30. Cassell J, Sullivan J, Prevost S, Churchill EF. Embodied Conversational Agents. Cambridge, MA: MIT Press; 2000.
  31. Gasteiger N, Loveys K, Law M, Broadbent E. Friends from the future: a scoping review of research into robots and computer agents to combat loneliness in older people. Clin Interv Aging 2021;Volume 16:941-971. [CrossRef]
  32. Bott N, Wexler S, Drury L, Pollak C, Wang V, Scher K, et al. A protocol-driven, bedside digital conversational agent to support nurse teams and mitigate risks of hospitalization in older adults: case control pre-post study. J Med Internet Res 2019;21(10):e13440 [FREE Full text] [CrossRef] [Medline]
  33. Ring L, Shi L, Totzke K, Bickmore T. Social support agents for older adults: longitudinal affective computing in the home. J Multimodal User Interfaces 2014;9(1):79-88. [CrossRef]
  34. Vardoulakis LP, Ring L, Barry B, Sidner CL, Bickmore T. Designing relational agents as long term social companions for older adults. In: Proceedings of the International Conference on Intelligent Virtual Agents. 2012 Presented at: International Conference on Intelligent Virtual Agents; September 12-14, 2012; Santa Cruz, CA, USA p. 289-302. [CrossRef]
  35. Kim J, Kim Y, Kim B, Yun S, Kim M, Lee J. Can a machine tend to teenagers' emotional needs?: a study with conversational agents. In: Proceedings of the Extended Abstracts of the 2018 CHI Conference on Human Factors in Computing Systems. 2018 Presented at: CHI '18: CHI Conference on Human Factors in Computing Systems; April 21 - 26, 2018; Montreal, QC, Canada p. 1-6. [CrossRef]
  36. Ta V, Griffith C, Boatfield C, Wang X, Civitello M, Bader H, et al. User experiences of social support from companion chatbots in everyday contexts: thematic analysis. J Med Internet Res 2020;22(3):e16235 [FREE Full text] [CrossRef] [Medline]
  37. Sagar M, Bullivant D, Robertson P, Efimov O, Jawed K, Kalarot R, et al. A neurobehavioural framework for autonomous animation of virtual human faces. In: Proceedings of the SIGGRAPH Asia 2014 Autonomous Virtual Humans and Social Robot for Telepresence. 2014 Presented at: SA'14: SIGGRAPH Asia 2014; December 3 - 6, 2014; Shenzhen China. [CrossRef]
  38. Sagar M, Seymour M, Henderson A. Creating connection with autonomous facial animation. Commun ACM 2016;59(12):82-91. [CrossRef]
  39. Rini C, Schetter CD, Hobel CJ, Glynn LM, Sandman CA. Effective social support: antecedents and consequences of partner support during pregnancy. Pers Relat 2006;13(2):207-229. [CrossRef]
  40. Loveys K, Sebaratnam G, Sagar M, Broadbent E. The effect of design features on relationship quality with embodied conversational agents: a systematic review. Int J of Soc Robotics 2020;12(6):1293-1312 [FREE Full text] [CrossRef]
  41. Parks MR, Floyd K. Meanings for closeness and intimacy in friendship. J Soc Pers Relat 2016;13(1):85-107. [CrossRef]
  42. Burleson BR. The experience and effects of emotional support: what the study of cultural and gender differences can tell us about close relationships, emotion, and interpersonal communication. Pers Relat 2003;10(1):1-23. [CrossRef]
  43. Floyd K. Gender and closeness among friends and siblings. J Psychol 1995;129(2):193-202 [FREE Full text] [CrossRef] [Medline]
  44. Nass C, Moon Y. Machines and mindlessness: social responses to computers. J Soc Isssues 2000;56(1):81-103. [CrossRef]
  45. Kraus MW. Voice-only communication enhances empathic accuracy. Am Psychol 2017;72(7):644-654. [CrossRef] [Medline]
  46. Reeves B, Nass C. The Media Equation: How People Treat Computers, Television, and New Media Like Real People and Places. Cambridge, UK: Cambridge University Press; 1997:1-324.
  47. Nass C, Lee KM. Does computer-synthesized speech manifest personality? Experimental tests of recognition, similarity-attraction, and consistency-attraction. J Exp Psychol Appl 2001;7(3):171-181. [CrossRef]
  48. Nass C, Moon Y, Green N. Are machines gender neutral? Gender-stereotypic responses to computers with voices. J Appl Social Pyschol 1997;27(10):864-876. [CrossRef]
  49. Nass C, Fogg B, Moon Y. Can computers be teammates? Int J Hum Comput Stud 1996;45(6):669-678. [CrossRef]
  50. Lee S, Ratan R, Park T. The voice makes the car: enhancing autonomous vehicle perceptions and adoption intention through voice agent gender and style. Multimodal Technol Interact 2019;3(1):20. [CrossRef]
  51. Moon Y, Kim KJ, Shin DH. Voices of the Internet of Things: an exploration of multiple voice effects in smart homes. In: Proceedings of the International Conference on Distributed, Ambient, and Pervasive Interactions. 2016 Presented at: International Conference on Distributed, Ambient, and Pervasive Interactions; July 17-22, 2016; Toronto, Canada p. 270-278. [CrossRef]
  52. Gaia AC. Understanding emotional intimacy: a review of conceptualization, assessment and the role of gender. Int Soc Sci Rev 2002;77(3/4):151-170 [FREE Full text]
  53. MacLean PD. Brain evolution relating to family, play, and the separation call. Arch Gen Psychiatry 1985;42(4):405-417. [CrossRef] [Medline]
  54. Kory-Westlund JM. Relational AI: creating long-term interpersonal interaction, rapport, and relationships with social robots. PhD Thesis, Massachusetts Institute of Technology. 2019.   URL: https://hdl.handle.net/1721.1/123627 [accessed 2021-11-05]
  55. Brave S, Nass C, Hutchinson K. Computers that care: investigating the effects of orientation of emotion exhibited by an embodied computer agent. Int J Hum-Comput Stud 2005 Feb;62(2):161-178. [CrossRef]
  56. Maldonado H, Lee JR, Brave S, Nass C, Nakajima H, Yamada R, et al. We learn better together: enhancing eLearning with emotional characters. In: Proceedings of the Conference on Computer Support for Collaborative Learning: Learning 2005: The Next 10 Years!. 2005 Presented at: Conference on Computer Support for Collaborative Learning: Learning 2005: The Next 10 Years!; May 30 - June 4, 2005; Taipei Taiwan p. 408-417. [CrossRef]
  57. Koda T, Maes P. Agents with faces: the effect of personification. In: Proceedings 5th IEEE International Workshop on Robot and Human Communication. RO-MAN'96 TSUKUBA. 1996 Presented at: 5th IEEE International Workshop on Robot and Human Communication. RO-MAN'96 TSUKUBA; Nov. 11-14, 1996; Tsukuba, Japan p. 189-194. [CrossRef]
  58. Aron A, Aron EN, Smollan D. Inclusion of Other in the Self Scale and the structure of interpersonal closeness. J Personal Soc Psychol 1992;63(4):596-612. [CrossRef]
  59. Popovic M, Milne D, Barrett P. The scale of perceived interpersonal closeness (PICS). Clin Psychol Psychother 2003 Sep;10(5):286-301. [CrossRef]
  60. Sedikides C, Campbell WK, Campbell GD, Elliot AJ. The relationship closeness induction task. Represent Res Soc Psychol 1999;23:1-4 [FREE Full text]
  61. Schulz KF, Altman DG, Moher D, CONSORT Group. CONSORT 2010 Statement: updated guidelines for reporting parallel group randomised trials. Trials 2010;11(1):32 [FREE Full text] [CrossRef] [Medline]
  62. Russell D, Peplau LA, Cutrona CE. The revised UCLA Loneliness Scale: concurrent and discriminant validity evidence. J Pers Soc Psychol 1980 Sep;39(3):472-480. [CrossRef]
  63. Aitken RC. Measurement of feelings using visual analogue scales. Proc R Soc Med 1969 Oct;62(10):989-993 [FREE Full text] [Medline]
  64. Robinson H, Ravikulan A, Nater UM, Skoluda N, Jarrett P, Broadbent E. The role of social closeness during tape stripping to facilitate skin barrier recovery: preliminary findings. Health Psychol 2017 Jul;36(7):619-629. [CrossRef] [Medline]
  65. Law M, Jarrett P, Nater UM, Skoluda N, Broadbent E. The effects of environmental enrichment on skin barrier recovery in humans: a randomised trial. Sci Rep 2020 Jun 17;10(1):9829 [FREE Full text] [CrossRef] [Medline]
  66. Lazarus RS, Speisman JC, Mordkoff AM. The relationship between autonomic indicators of psychological stress: heart rate and skin conductance. Psychosom Med 1963;25(1):19-30. [CrossRef]
  67. Rimm-Kaufman SE, Kagan J. The psychological significance of changes in skin temperature. Motiv Emot 1996 Mar;20(1):63-78. [CrossRef]
  68. Loveys K, Sagar M, Broadbent E. The effect of multimodal emotional expression on responses to a digital human during a self-disclosure conversation: a computational analysis of user language. J Med Syst 2020 Jul 22;44(9):143. [CrossRef] [Medline]
  69. Bradley MM, Lang PJ. Affective reactions to acoustic stimuli. Psychophysiology 2000 Mar;37(2):204-215. [CrossRef]
  70. Aleman A, Swart M. Sex differences in neural activation to facial expressions denoting contempt and disgust. PLoS One 2008 Nov 5;3(11):e3622 [FREE Full text] [CrossRef] [Medline]
  71. Kveraga K, De Vito D, Cushing C, Im HY, Albohn DN, Adams RB. Spatial and feature-based attention to expressive faces. Exp Brain Res 2019 Apr 25;237(4):967-975 [FREE Full text] [CrossRef] [Medline]
  72. Burleson W, Picard RW. Gender-specific approaches to developing emotionally intelligent learning companions. IEEE Intell Syst 2007 Jul;22(4):62-69. [CrossRef]
  73. Ring L, Barry B, Totzke K, Bickmore T. Addressing loneliness and isolation in older adults: proactive affective agents provide better support. In: Proceedings of the Humaine Association Conference on Affective Computing and Intelligent Interaction. 2013 Presented at: Humaine Association Conference on Affective Computing and Intelligent Interaction; Sept. 2-5, 2013; Geneva, Switzerland p. 61-66. [CrossRef]
  74. Mori J, Prendinger H, Ishizuka M. Evaluation of an embodied conversational agent with affective behavior. In: Proceedings of the AAMAS03 Workshop on Embodied Conversational Characters as Individuals. 2003 Presented at: AAMAS03 Workshop on Embodied Conversational Characters as Individuals; 2003; Melbourne, Australia   URL: http://www.miv.t.u-tokyo.ac.jp/papers/jmoriECA03.pdf
  75. Johanson DL, Ahn HS, Sutherland CJ, Brown B, MacDonald BA, Lim JY, et al. Smiling and use of first-name by a healthcare receptionist robot: effects on user perceptions, attitudes, and behaviours. Paladyn J Behav Robot 2020;11(1):40-51 [FREE Full text] [CrossRef]
  76. Chentsova-Dutton YE, Chu JP, Tsai JL, Rottenberg J, Gross JJ, Gotlib IH. Depression and emotional reactivity: variation among Asian Americans of East Asian descent and European Americans. J Abnorm Psychol 2007;116(4):776-785. [CrossRef]
  77. Tsai JL, Knutson B, Fung HH. Cultural variation in affect valuation. J Pers Soc Psychol 2006 Feb;90(2):288-307. [CrossRef] [Medline]
  78. Ferstl Y, McDonnell R. A perceptual study on the manipulation of facial features for trait portrayal in virtual agents. In: Proceedings of the 18th International Conference on Intelligent Virtual Agents. 2018 Presented at: IVA '18: International Conference on Intelligent Virtual Agents; November 5 - 8, 2018; Sydney NSW Australia p. 281-288. [CrossRef]
  79. Mori M. The uncanny valley. Energy 1970;7(4):33-35. [CrossRef]
  80. Johnson AM, DiDonato MD, Reisslein M. Animated agents in K-12 engineering outreach: preferred agent characteristics across age levels. Comput Hum Behav 2013 Jul;29(4):1807-1815. [CrossRef]
  81. Nass C, Isbister K, Lee E. Truth is beauty: researching embodied conversational agents. In: Cassell J, Sullivan J, Prevost S, Churchill E, editors. Embodied Conversational Agents. Cambridge, MA: MIT Press; 2000:374-402.
  82. Benbasat I, Dimoka A, Pavlou P, Qiu L. Incorporating social presence in the design of the anthropomorphic interface of recommendation agents: insights from an fMRI study. In: Proceedings of the 31st International Conference on Information Systems (ICIS). 2010 Presented at: 31st International Conference on Information Systems (ICIS); 2010; St. Louis, Missouri   URL: https:/​/www.​semanticscholar.org/​paper/​Incorporating-Social-Presence-in-the-Design-of-the-Benbasat-Dimoka/​dab618d213b4c8cfe4447308d5c4538f859f13a6
  83. Spence PR, Lachlan KA, Spates SA, Shelton AK, Lin X, Gentile CJ. Exploring the impact of ethnic identity through other-generated cues on perceptions of spokesperson credibility. Comput Hum Behav 2013;29(5):3-11. [CrossRef]
  84. Liao Y, He J. Racial mirroring effects on human-agent interaction in psychotherapeutic conversations. In: Proceedings of the 25th International Conference on Intelligent User Interfaces. 2020 Presented at: IUI '20: 25th International Conference on Intelligent User Interfaces; March 17 - 20, 2020; Cagliari Italy p. 430-442. [CrossRef]
  85. Kaplan F. Who is afraid of the humanoid? Investigating cultural differences in the acceptance of robots. Int J Hum Robot 2012;1(3):465-480. [CrossRef]
  86. Tamagawa R, Watson CI, Kuo IH, MacDonald BA, Broadbent E. The effects of synthesized voice accents on user perceptions of robots. Int J of Soc Robot 2011 Jun 2;3(3):253-262. [CrossRef]
  87. Rheu M, Shin JY, Peng W, Huh-Yoo J. Systematic review: trust-building factors and implications for conversational agent design. Int J Hum–Comput Interact 2020;37(1):81-96. [CrossRef]
  88. Shamekhi A, Czerwinski M, Mark G, Novotny M, Bennett GA. An exploratory study toward the preferred conversational style for compatible virtual agents. In: Proceedings of the International Conference on Intelligent Virtual Agents. 2017 Presented at: International Conference on Intelligent Virtual Agents; August 27-30, 2017; Stockholm, Sweden p. 40-50. [CrossRef]
  89. Thomas P, McDuff D, Czerwinski M, Craswell N. Expressions of style in information seeking conversation with an agent. In: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 2020 Presented at: SIGIR '20: The 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval; July 25 - 30, 2020; Virtual Event China p. 1171-1180. [CrossRef]
  90. Hoegen R, Aneja D, McDuff D, Czerwinski M. An end-to-end conversational style matching agent. In: Proceedings of the 19th ACM International Conference on Intelligent Virtual Agents. 2019 Presented at: IVA '19: ACM International Conference on Intelligent Virtual Agents; July 2 - 5, 2019; Paris France p. 111-118. [CrossRef]
  91. Byrne D. Interpersonal attraction and attitude similarity. J Abnorm Soc Psychol 1961 May;62(3):713-715. [CrossRef] [Medline]


ANOVA: Analysis of Variance
CASA: Computers Are Social Actors
CONSORT: Consolidated Standards of Reporting Trials
RCIT: Relationship Closeness Induction Task


Edited by R Kukafka, G Eysenbach; submitted 21.05.21; peer-reviewed by V Ta, R Zhang; comments to author 30.07.21; revised version received 26.08.21; accepted 04.10.21; published 25.11.21

Copyright

©Kate Loveys, Mark Sagar, Xueyuan Zhang, Gregory Fricchione, Elizabeth Broadbent. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 25.11.2021.

This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on https://www.jmir.org/, as well as this copyright and license information must be included.