Guest blog: Gareth Walker on how acoustic data are represented

Quite often a ROLSI article touches on a matter than will interest a very wide range of readers, and Gareth Walker‘s account of how acoustic data is represented is a very good example. The range of representations is wide, and not all are equally good for the same things; some may even be misleading. I’m delighted that Gareth has agreed to go into some of the thinking that prompted him to write the piece.

Gareth Walker, University of Sheffield

In a new article in ROLSI I talk about how visual representations of acoustic data (pitch traces, waveforms, spectrograms etc.) have been prepared and used in ROLSI articles, trying to encourage researchers to think about their construction, purpose and use. For me, the main purposes are to provide corroborative evidence for the researcher’s claims, and to allow the reader to independently verify those claims.

I’m interested in the ability of visual representations to efficiently convey relevant information. Along with a dialectologist colleague, Chris Montgomery, I am currently leading a theme within a module for MA students in the School of English at The University of Sheffield on visual representations of linguistic data (graphs, tables, maps, charts etc.). Students bring along to each session visual representations they have found for themselves in published research.

Where things can go wrong

Boo-boos we have identified so far include tables where the numbers don’t add up, graphs which are harder to read than the accompanying tables (and sometimes at odds with them), and other graphs which are fundamentally meaningless because of the way they have been prepared. (I’m pleased to say none were from ROLSI!) We have also noted that redundancy in visual representations – the same data being presented in several different ways – is rife. Remember: part of the point of a visual representation is that it conveys relevant information more efficiently than a textual description. The sample is of course skewed: given the context of the discussions, students are no doubt drawn to visual representations that are deficient in some way. But the point is that it is not at all difficult to find visual representations which are deficient in some, often major, way.

Phoneticians being visual

A funny thing about being a phonetician is that nowadays as well as listening, we spend a lot of our research time looking at pictures of one type or another. In part this is because of advances in desktop computing, and the ready availability of software to assist with phonetic analysis. Praat is the obvious example but there are others: Wavesurfer, for example. A lot of the time we are looking at pictures to see if we can locate corroborative evidence for our auditory impressions. It’s almost inevitable that (hopefully!) corroborative evidence will be provided for our colleagues, audience-members and readers in the form of some kind of visual representation. So we have to do what we can to ensure that the visual representations we offer are doing their job. As I point out in my article it’s through those visual representations that we ‘get at’ the data.

In their excellent article in ROLSI, Steve Clayman and Chase Raymond provide pitch traces and spectrograms of portions of their examples to help the reader ‘get at’ their data and verify their observations. I’ll talk about just one, and quite a specific aspect. The transcription and analysis of their example (10) is accompanied by this visual representation with a labelled pitch trace at the top and a spectrogram at the bottom. (Incidentally, anyone interested in beefing up their knowledge of phonetics, including reading spectrograms, should read Richard Ogden‘s wonderful An Introduction to English Phonetics.)

From Clayman and Raymond (2015)

One of the points that they make is that there is no break in voicing between “turkey” and “you know”. They are, as we would expect, exactly right in making this observation. You can listen to the whole call on Talkbank; the utterance in question is at line 133. We don’t normally have the luxury of listening to the data being discussed, and instead have to rely on the visual representations provided by the researchers to verify their claims. But does their image do its job? As a phonetician when I’m looking for evidence of the continuation of voicing (vibrations of the vocal folds) I look for two main things: periodicity (a repeating pattern) in a waveform, and striations (vertical lines) in a spectrogram. In the case of this particular join I would be looking at a display something like this, listening to the join as I looked.

A portion of the above data, re-presented

different look at the same data

“From this display of a portion of the same data (“key yih” of “turkey yihknow”) it’s clear that the waveform across the join is periodic, and that there are striations in the spectrogram. All great evidence to support the original claim of continued voicing across the join. But this is not so obvious from the visual representation in the original article: there is no waveform, and the spectrogram presents so much information discrete striations can’t be identified.

The usefulness of representations

I’ll finish with one last point which might provoke a bit of thought concerning visual representations. Let’s assume, for the sake of argument, that Leonardo da Vinci’s Mona Lisa is an accurate likeness of Lisa Gherardini. If we could travel back to when and where she lived, we would expect to be able to identify her from that visual representation. On the other hand, if what we had to go on was this charming line drawing by Melissa we surely wouldn’t stand a chance. The point is not that all visual representations of acoustic data should be as detailed as possible, but that the nature of the visual representation we are given has a significant impact on its usefulness to end-users. And with that we are back to thinking about the purpose of visual representations: in the case of visual representations of acoustic data in ROLSI, to provide corroborative evidence for the researcher’s claims, and to allow the reader to independently verify them.


Clayman, S. E., & Raymond, C. W. (2015). Modular pivots: A resource for extending turns at talk. Research on Language and Social Interaction, 48(4), 388-405.