Guest blog: Talking with Alexa at home

I imagine that many interaction researchers will have been curious about how a voice-activated internet-connected device might be integrated (or not) into conversations at home.  Martin Porcheron along with Stuart Reeves,  Joel Fischer and Sarah Sharples (all at the University of Nottingham) went the next step, and did the research. Here Martin and Stuart explain how the research was done…

Screen Shot 2018-02-15 at 11.22.00

Martin Porcheron

Screen Shot 2018-02-15 at 11.23.24

Stuart Reeves

Voice-based ‘smartspeaker’ products, such as the Amazon Echo, Google Home, or Apple HomePod have become popular consumer items in the last year or two. These devices are designed for use in the home, and offer a kind of interaction where users may talk to an anthropomorphised ‘intelligent personal assistant’ which responds to things like questions and instructions. The widespread adoption of this new kind of interaction modality (i.e. voice) provided us with a great opportunity to consider how we could bring ethnomethodology and conversation analysis to bear on talk with and around such devices.

In this guest blog post, we wanted to give some background to our study of these devices and to discuss something we think might interest the ROLSI community. We recently published our findings as a paper to be presented at CHI 2018, the ACM conference for Human-Computer Interaction. We also posted couple of more easily digestible elaborations our findings and data:

There are many different ways you could study how interactions with such a device could unfold. Often people do lab studies or observational studies. However, for us the key considerations were: (1) getting the most ‘naturalistic’ interactions of people with such a device,  and (2) recording the conversational context in which those interactions take place,  before someone says ‘Alexa’ or ‘OK Google’ to wake up the device, as well as capturing what unfolds thereafter.

To achieve this, we provided a number of households with an Amazon Echo for about a month and also gave them a custom device (built by Martin) to record interactions with the Echo.

Screen Shot 2018-02-15 at 11.27.21.png

Martin’s bespoke device to record interactions with the Echo

This device, the ‘Conditional Voice Recorder’ (CVR), is essentially a Raspberry Pi (a credit-card sized, but functionally complete, computer) with a conference microphone stuck on the top. It is always listening — much like the Amazon Echo — but differs in a range of ways. Firstly, it has lights to show when it was listening and when it was recording. Secondly, a button was added to enable/disable recording, as we wanted participant households to feel comfortable with the study and have control over the data that was being collected.

What it records 

To collect contextual information of how an interaction was occasioned, the CVR listens continuously for an interaction with the Amazon Echo (the Amazon Echo interaction always starts with the word ‘Alexa’) and keeps the last minute of audio in the memory of the device. When an interaction with the Echo starts, it saves the last minute of audio to an internal memory card, and also records for one further minute. If people use the Echo again in that minute, recording is extended.

Screen Shot 2018-02-15 at 11.46.06.png

We encountered many challenges with the CVR, not least the challenges of people with non-English accents. In much the same way that commercial systems struggle with peoples’ accents, so did this one. Fortunately, we built in a system to remotely update the CVR once it was deployed into people’s homes. We also adjusted settings remotely to make the device more sensitive to detecting ‘Alexa’, or sometimes ‘less’ sensitive. Another concern was what happens if the recorder were to crash, or people left it turned off without realising (something that we found was easy to do during development) — the solution here was to turn the device automatically off and on again overnight.

How to analyse it?

As any CA researchers who work with audio-only recordings only will know well, one of the challenges during our research was working out the situational relevancy, particularly of interactions that seemed to not involve the Echo but were nevertheless occurring alongside it (a video camera would have been useful!). A mitigating factor here, we found, of course, was that participants routinely produced accounts of their interactions with the Echo and the relation of these to the current course of their activities in the home. Participants tended to audibly orient between (as well as embed) their interactional work with the Amazon Echo both alongside and ‘inside’ various other typical tasks of home life: cooking, watching TV, or eating dinner together.

To give a flavour of the sort of data we have collected, consider the short (but very rich) fragment below of a family using the Amazon Echo, taken from our paper. Here, Susan (the mother; all names changed) announces to the group her desire to play a particular game (called ‘Beat the Intro’) with the Amazon Echo, with a negative assessment from Liam (the son) overlapping Susan’s almost immediate instigation of the “wake word” to the Amazon Echo. Carl (the father) approves, inserting a quick ‘yeah’ in the pause between the wake word and the subsequent request by Susan:

Screen Shot 2018-02-16 at 08.00.37.png

However, the request fails, but not before Susan turns to Liam and instructs him to keep eating his food, with Carl providing support. Susan then repeats the request, implicating her assessment that the prior request has failed….

By adopting a CA approach, we very quickly were able to draw out some of the nuanced ways in which interaction with the Amazon Echo gets sequentially interleaved with other ongoing activities. Naturally, however, we also learned much of the character of home life of our participants just by listening to these interactions with / around the Echo. Through just our audio recordings of Amazon Echo use, you begin to understand habits such as meal times, music tastes, TV interests, shopping habits, and so on. We have plans to release some of our audio for others to use in research but ensuring we can maintain the confidentiality and anonymity of participants makes this a significant challenge.

In summary, we started out with a rather exciting challenge of trying to understand how interaction with new voice-based devices is practically achieved amidst the complex multiactivity setting of the home, facing a number of challenges in the process. Some of these we overcame with technology, such as running collected data through further automatic speech recognition software, and others, through analysing the audio recordings and drawing on EMCA.

Hopefully this guest blog provides some of the background to our work. If you’re interested in the outcomes of our analyses, we encourage you to check out the other posts linked to above, or the paper itself.



Guest blog: Melisa Stevanovic and Elina Weiste on impossible content analysis

Two of Finland’s most active and productive young Conversation Analysis researchers, Melisa Stevanovic and Elina Weiste,  tried their hand at an intriguing experiment: analysing what people said about doing CA. The result was a thoughtful article (not in ROLSI) but clearly there was more to it than that, so I was delighted when they agreed to do a guest blog here.

The title they suggested was “On the impossibility of conducting content analysis: Back story of our data-session paper”, which sets the scene tantalisingly…

Melisa head

Dr Melisa Stevanovic, Helsinki University

Screen Shot 2018-01-06 at 16.27.10

Dr Elina Weiste, Helsinki University

We wanted to do something entirely new for us. Since both of us had started our academic careers more or less straightaway with conversation analysis (CA), neither of us had ever before conducted any ordinary content-analytic interview study.

This was certainly a gap in our research records—given that for some people this method appeared to be the only real way of conducting qualitative (social interaction) research.

Furthermore, during the previous years, we had been pursuing studies in university pedagogy and, in that context, been exposed to the general idea of studying teaching and learning. We were thinking of the various ways in which CA was generally taught in our university and how, despite the many CA courses offered, it was the CA data sessions that ultimately worked to socialize newcomers into the CA research practice. All of this led us to make the decision: we wanted to conduct a study on what CA folks generally think about the CA data sessions from a pedagogical point of view. What a perfect opportunity to fill the content-analysis gap in our lives!

A simple two-group design

Since we thought that the opinions of the CA experts and novices could be essentially different from one another, we decided to conduct focus-group interviews for the CA experts and novices separately. We also decided to use a succession of stimulus materials as interview prompts. To generate the stimulus materials, we audio-recorded one real CA data session. From this recording we selected five (in our opinion) particularly interesting segments, hoping that they would inspire the focus-group participants to talk.

For instance, we selected a segment where someone’s CA observation was met with a total silence and a segment where someone’s observations were focused solely on the personal traits and other non-visible qualities of the participants in the data. We had much fun when anonymizing the stimulus materials by altering the speakers’ pitch levels in Audacity. It struck us that the same analytic observation generated a very different impression depending on whether the chosen pitch level represented a female or male voice. Thus, as a prompt to generate talk about possible status hierarchies among CA data-session participants, we created two different audio clips from one speech segment to be played for the focus-groups.

How do you actually do interviewing?

Shortly before the first interview, it actually struck us that we did not necessarily know how to carry out an interview. Melisa, in particular, became anxious when she realized that she could sabotage the whole interview simply by talking too much. Of course, for a large part, our interview protocol consisted of the presentation of stimuli and related open questions such as: “What kinds of thoughts does the heard segment elicit in you?”. However, the question whether and when follow-up questions would be needed appeared to us as more tricky—and risky, since it provided an opportunity for the interviewer to slip in into the conversation. So we decided that Elina—who is used to the long silences associated with psychotherapeutic interaction—should always be the one making the first follow-up question, while Melisa could make the second and third one if needed.

A rogue anecdote about Paul Drew

The interviews went very well. The stimulation material functioned as we had hoped: after listening to a data session clip, the participants started to talk—first somewhat hesitantly but then becoming more and more relaxed and engaged in a free-flowing conversation. They also generated talk about those very the topics that we hoped they would. This was the case, for instance, for the female-male voice prompt described above. After the prompt, the participants discussed the issue of gender only very shortly but then, without us asking anything, moved on to consider the possibility of other types of status hierarchies that might prevail in CA data sessions. Also our distribution of labor regarding the follow-up questions worked pretty well. As for our part, we managed to keep our own talk to a minimum (expect for one anecdote about Paul Drew told by Melisa).

And how do you actually do content analysis?

Then, finally, it was the time to engage in proper content analysis! What would then be the themes that CA folks would talk about? In our data, these appeared to be: the structural organization of the data session, determination of the focus line, the making of notes, the “round” of first analytic observations, and – ultimately – also something about the pedagogical function of the data sessions. Interesting themes, certainly! Curiously, though, we could observe a remarkably high correlation between these “themes” and our interview protocol. Somehow, it did not feel quite right to write a research paper where we would report the (equal) occurrence of all these themes in both the expert and novice groups.

Paul Drew

Paul Drew, Loughborough University, seemed to crop up a lot

So, we thought that maybe it would be more worthwhile to consider those less interviewer-led parts of the group members’ talk that came across as spontaneous. What would then be the topics voluntarily raised by the CA folks? In our data, these included: interjections, intersexuality, California, medical consultations, PhD students, speculations about intentionality, and – evidently – Paul Drew. Again, we got a weird feeling that a paper reporting the occurrence of these themes would raise more questions than answer them. At this point at the latest, we began to realize that the idea of sticking to the mere content of the participants’ speech might not be sensible. In the end, we could not come up with any way of doing it in a sane way.

After this realization, we “loosened up” our approach a little. We decided that also the following four categories, apparent in the participants’ tellings, would count as “content”: ostensibly neutral descriptions of practices, stories with an affective stancepersonal judgments of practices, and generic normative evaluations of practices. We realized that the CA novices were quite eager to tell narratives of their first data-session experiences, while the tellings of the CA experts were more focused on describing the adventures of the international gurus in the field. However, we did not really know what to make out of this observation. During the subsequent rounds of analysis, we also started to consider the accessibility of the experiences told and the valence of the tellings, and – at the point at which we finally gave in to our old instincts – the reception of the tellings by other participants in the group. This led us to come up with a compromise, which has now been reported in our Learning, Culture and Social Interaction article Conversation-analytic data session as a pedagogical institution.

So, at the end, we were really happy that we managed to find a way to combine content-analysis with CA, but needed to notice that: the leopard doesn’t change its spots

Guest blog: Jason Turowetz on “I just thought…”

“I just thought… ” is one of those phrases whose meaning we think we know, but there are intriguing subtleties in what people do with it in conversation. In a recent article for the journal, Jason Turowetz delved into some of its main uses. Here he gives the background to the story. 

Jason Turowetz

My article on ‘I just thought formulations’ has its origins in a study of speed dating I conducted with a colleague, Matthew Hollander, in 2009, when we were graduate students at the University of Wisconsin-Madison. It seems a long way back, but that shows how a phenomenon can lodge in your head and inspire a continuing thread of research. Continue reading

Guest blog: Gareth Walker on how acoustic data are represented

Quite often a ROLSI article touches on a matter than will interest a very wide range of readers, and Gareth Walker‘s account of how acoustic data is represented is a very good example. The range of representations is wide, and not all are equally good for the same things; some may even be misleading. I’m delighted that Gareth has agreed to go into some of the thinking that prompted him to write the piece. Continue reading

Guest Blog: The 8th biannual EM/CA Doctoral Network meeting

Twice a year, UK postgraduates meet to thrash out issues in ethnomethodology and Conversation Analysis, generously hosted by staff at a University. The second meeting this year was held at Newcastle. Jack Joyce tells the story, and Marc Alexander muses on the pros and cons of parallel sessions.

Jack Joyce, Loughborough DARG

The 8th biannual EMCA Doctoral Network event was hosted at Newcastle University. It brought the marvellous event to the land of Applied Linguistics, and gave us EMCA researchers a further opportunity to explore different ways with which EMCA is employed around the UK. The collegial and supportive spirit highlighted at past EMCA Doctoral Networks was again, present, giving us the chance to meet old friends and make new connections. Continue reading

Guest blog: Jack Joyce on Loughborough’s “Resistance Day”

The community of interactional researchers in Loughborough’s Discourse and Rhetoric Group occasionally put on an informal themed day of presentations and data sessions. In September this year the theme was “Resistance”, meant to encompass all kinds of practices. Doctoral student Jack Joyce takes up the story.

Jack Joyce, Loughborough DARG

On 13 September 2017, the first ‘Resistance in Talk-in-Interaction’ seminar day was hosted at Loughborough University as a joint-DARG event, funded by the Loughborough Doctoral College. Continue reading

Why ROLSI uses double-blind review

Many journals in our field, perhaps most, anonymise the submissions they send out for review, and pass comments back to authors anonymised in turn: a “double-blind” system.  This has always been ROLSI’s practice  (at least, it has been under the editorship of the last five editors). But occasionally a reader or potential reviewer raises the question as to why this is preferable to signed reviews, or indeed submissions with the author’s name attached.


Charles Antaki, ROLSI Editor

I thought readers might be interested in a recent e-mail dialogue with a reader on just these issues. Continue reading