Guest blog: Jacob Davidsen and Paul McIlvenny on Experiments with Big Video

How good are your video records? One angle? Two? Wide-angle? Was the camera static or did you move to catch things – and miss other things? How good was the sound? All of us have occasionally been frustrated with what we find on the screen when we come to analyse it, but Jacob Davidsen and Paul McIlvenny have some more fundamental concerns. Just how “big” should data be?


Paul McIlvenny (l.) and Jacob Davidsen

At Aalborg University over the last year, we have been experimenting with new technologies and enhanced methods for EMCA and video ethnography, supported by our respective departments and the Digital Humanities Lab 1.0 national infrastructure project. We’ll talk about some of the methodological ups and downs we’ve come across. That should interest anyone thinking of dipping their toes in what we call “big video“.

Very few EMCA papers discuss data collection in itself. The general assumption is that once the ‘Tape’ is made – and it is often made fairly automatically and locked into a ‘block box’ functionality (eg. auto-gain, auto-focus, auto-white balance) – it is an unquestionably accurate record. There is often a simple faith that as little interference with, and understanding of, the operation of the technology as possible is more conducive to collecting better recordings, despite the algorithmic normativity of default functions. Ashmore & Reed (2000) have indicated how naïve such a realist view can be, especially when considering the relationship between ‘the Tape’ and ‘the Transcript’.

Some history

Intriguingly, in 1987, ROLSI published a paper by Wendy Leeds-Hurwitz that sifted through the unpublished records of some of the early experiments with audio and film recordings as ‘data’ for microanalysis by a multidisciplinary research group started at Stanford University in 1955-56. Publications and records of these experiments and proto-‘data sessions’ (“soaking”) are not easily accessible, eg. the 16-mm family therapy ‘Doris’ film with synced sound that Gregory Bateson brought in 1956 to the group. (It is a crying shame that in this digital age there is not a free, open-access, online archive of core and marginal publications, unpublished documents and recordings from the 1950s and 60s – Sacks’ transcribed lectures being a prime exception.)

This multidisciplinary group, which included Ray Birdwhistell, focused on the ‘natural history of an interview’ over a number of years (according to Frederick Erickson, Erving Goffman met with Bateson and other members of the group in the late 1950s). In hindsight, Erickson (2004) suggests that the differing affordances of film (for example, for this Palo Alto group) versus video (and cassette audio) for later scholars may have had an impact on the routine seeing/looking and listening/hearing practices of scholars in each period, a difference that may have privileged sequentiality over simultaneity. We could ask: are we at a similar juncture today?

What’s new

In the past, there were a few papers published that brought readers up-to-speed on new technological developments and best practices, and though they are now anachronistic, they were important pedagogically (eg. Goodwin, 1993 and Modaff & Modaff, 2000). However, the impact of digitalisation is so pervasive now that their specific analogue concerns are mostly irrelevant. From our experiments and reflections, we contend that today there are a set of paradigm shifts that are important to note:

  • From analogue to digital: eg. computationally intensive;
  • From singular to plural: eg. multiple recording devices, such as cameras and microphones;
  • From sound as secondary to sound as covalent, eg. in-built microphones versus spatial audio;
  • From frame to field of vision: eg. 16:9 versus 360°;
  • From flat to depth: eg. 2D versus stereoscopic 3D;
  • From spectator to POV: eg. cinema versus VR.

With the complexity of the recording scenarios, and the increasing use of computational tools and resources, we position ourselves in what we call BIG VIDEO. We use this rather facile term to counter the hype about quantitative big data analytics. Basically, we are hoping to develop an enhanced infrastructure for analysis with innovation in four key areas:1) capture, storage, archiving and access of digital video; 2) visualisation, transformation and presentation; 3) collaboration and sharing; and 4) qualitative tools to support analysis.

What have we been doing in Aalborg?

One key focus has been to collect richer video and sound recordings in a variety of settings. And this means developing a sense of good camerawork/micwork (with both existing and new technologies) in order to collect analytically adequate recordings. We have used swarm public video (see McIlvenny forthcoming), 4k video cameras, network cameras, stereoscopic 360° cameras, S3D cameras, spatial and ambisonic audio, multi-track video and audio, GPS and heart rate tracking, and multi-track subtitling and video annotation, and we are beginning to work with local positioning systems (LPS) and beacons, as well as biosensing data, to see what is relevant to our EMCA concerns.


A 2-D visualisation comprising 6 video cameras and 10 multi-track audio channels from a recording of a guided nature tour

Since January 2016, we have collected video recordings in a variety of settings, including, in chronological order, dog sledding, architecture student group work, disability mobility scootering, live interactive 360° music streaming to a acquired brain injury centre, mountain bike off-road racing, guided nature tours on foot and mountain bike, home hairdressing (hair extensions), a family cycle camping holiday and just lately, Pokemon Go hunting. Taking the latter of these as a case study, we will elaborate a little on what we did. Unfortunately, given the sensitivity of the recordings, we cannot illustrate with video examples on a public website.

In order to collect some mobility data in new ways, especially in relation to sound, it was desirable to move away from a reliance on the GoPro cameras that many of us now use to collect recordings. GoPros, and other action cameras, are fine for reliable wide-angle video recording in auto-mode, but the in-built audio quality is poor, especially when the camera is encased in its waterproof housing and mounted on a vibrating vehicle or a human body. Also, there are certain constraints when trying to document such a complex activity as a Pokemon Go hunt.

Catch them all!

In the summer of 2016, streets, parks and other public spaces that were previously unnoticed for many people turned into inhabited places for Pokemon Go players around the globe.

In Aalborg, many people gather in parks and close to a previously forgotten part of the harbour area near the city’s railway bridge. At any time of day or night, players congregate at these locations in small groups. Sometimes many people gather as one big group to hunt a rare and desirable ‘Dragonite’ or ‘Pichachu’.

From our perspective, the game and the behaviour of the players serve as an interesting case study for EMCA, but how can we record what each individual is doing with their smartphone and still capture how the whole group is being mobile together? In this case, five players wore GoPros on their body pointing at their individual smartphones as well as  lavalier microphones. In addition, one of us (wearing spy glasses) walked around the city with a 360° camera mounted on a long pole.


A 2-D visualisation comprising 5 Go-Pros, 1 spycam and stereo image of a 360-degree recording

With this setup, we captured high quality video and audio from the individuals and a 360° recording of the surroundings and the mobility of the group. Some persons walking by asked what was on the pole and tried to hide when we told them it was a camera! It is getting easier to capture Big Video, but the post-editing is getting more complex, richer and more fun. Questions such as how can we transcribe 360° data or what microphone to privilege in the post-editing become crucial to address. With the Pokemon Go data, we experienced that the composite stereo image of all the audio sources presented too much information. For instance, the group of five persons in a ‘mobile with’ split into two subgroups at one point, and to make this analysable, multi-track audio channels are needed. Thus, we cannot rely on one microphone to capture such complex data – each microphone gives a highly selective record of the interaction unfolding (as does the video camera). In a recent data session, the general reaction was something like “I don’t know what to make of it” or “it is to complex, I don’t know what to look at”.

But when we started moving around in the 360° recording with embedded 2D recordings from the GoPros and 2D transcripts – some of the participants started to notice “what one otherwise cannot see”, namely how other people oriented to the Pokemon Go players on the hunt as they walked by. For EMCA researchers the opportunities of 360° are thought provoking on methodological, theoretical and practical levels, eg. what to transcribe, what to select for presentations and what to make of the context.

The pros and cons of this new view

We have ascertained that there are distinct advantages to using some of these new technologies. First, we can capture the situated relevancies of more extreme and complex multi-party practices. Second, there are new phenomena to study that were unavailable to enquiry or unimaginable before. Third, there are new modes of presentation and visualisation. Lastly, we have found it necessary to re-examine and rethink what ‘data’ is. There are also dangers and disadvantages, including the idiosyncrasies of human perception, the illusion of presence and the problem of sensor and computational artefacts, as well as the question of the reliance on a surveillance architecture, and other ethical problems. Methodologically, we are grappling with issues of perspectivation, incommensurability and epistemic adequacy.

The future?

For the future, we speculate that some emerging technologies have potential. For example, virtual reality and augmented/mixed reality are being hyped by the computer and gaming industry at present. They boast of its capability to enhance ‘immersion’. These technologies could be used to visualise, navigate, annotate and share data in new ways, what we call inhabited data. In addition, they provide the means for an investigation of the taken-for-granted in a setting, in a similar vein to Garfinkel’s tutorials for students who tried to follow instructions while wearing inverting lenses. Another example is ‘light field’ technology, which promises a revolution in imaging that virtualises the camera (eg. position, depth of field, focus and frame rate). This could be used to explicitly challenge or problematize the objectivity of the ‘recording’ (or the ‘Tape’). A rediscovered conception of sound in three-dimensions – nth order ambisonics – allows computationally effective representations of a sound field. This could enhance our sense of where in space an utterance or sound comes from. And lastly, there are bio-sensing devices, developed under the banner of the ‘quantified self’ (we prefer the ‘qualifiable we’), that may bring all the senses and more subtle perceptions of embodiment into our enquiry.

If you are engaged in similar experiments, then contact us, or consider a sabbatical at Aalborg University on the cutting edge of the periphery.


Ashmore, Malcolm & Reed, Darren (2000). Innocence and Nostalgia in Conversation Analysis: The Dynamic Relations of Tape and Transcript. Forum: Qualitative Social Research 1(3). [Online].

Erickson, Frederick (2004). Origins: A Brief Intellectual and Technological History of the Emergence of Multimodal Discourse Analysis. In LeVine, Philip & Scollon, Ron (Eds.), Discourse and Technology: Multimodal Discourse Analysis, Washington, DC: Georgetown University Press.

Goodwin, Charles (1993). Recording Interaction in Natural Settings. Pragmatics 3(2): 181-209.

Leeds‐Hurwitz, Wendy (1987). The Social History of the Natural History of an Interview: A Multidisciplinary Investigation of Social Communication. Research on Language and Social Interaction 20(1-4): 1-51.

McIlvenny, Paul (forthcoming). “Mobilising the Micro-Political Voice: Doing the ‘Human Microphone’ and the ‘Mic-Check’.” Journal of Language and Politics 16(1).

Modaff, John V. & Modaff, Daniel P. (2000). Technical Notes on Audio Recording. Research on Language & Social Interaction 33(1): 101-118.