Guest blog: Elliott Hoey and Chase Raymond on classic data

When Conversation Analysts gather, they sometimes analyse excerpts from recordings made by the early pioneers in the 1960s and 1970s – hissy audiotapes or scratchy black-and-white videotape.  Some bits are so well known now that they’ve become shibboleths for the knowing community, which can feel off-putting to newcomers. Elliott Hoey and Chase Raymond have been looking into the matter for a forthcoming publication, and I’m very pleased that they’re willing to share their thoughts here.

Elliott Hoey

Chase Wesley Raymond

One distinctive research practice in CA is its longstanding reliance on a body of ‘classic’ data. Reading papers, especially from the early days, can sometimes feel like watching a soap opera starring Emma, Virginia, Shane, Geri, and Bud (the show is syndicated in ROLSI, among other places).

We’ve been reflecting on this research practice and came up with a few pros and cons of using classic data. This post is based on a chapter in a forthcoming volume The Open Handbook of Linguistic Data Management, edited by Andrea Berez-Kroeker, Brad McDonnell, and Eve Koller – so look out for that when it comes out!

What’s good….

To start with the positives, first, the data are convenient. Using classic data precludes the need for the researcher to undertake the laborious work of transcription because transcripts already exist and are of reliably high quality, most having been transcribed by Jefferson herself. And on top of this, these recordings predate review boards, and so no ethical approval is required to use them for research.

Second, classic data enjoy wide familiarity within the CA research community. Many if not most analysts know these materials, either from working with them directly or by encountering them again and again in papers, presentations, and data sessions. They embody a kind of material culture for the discipline; not only are they well known, but particular parts of particular recordings have become shorthand for particular phenomena (‘n I’m up here in the Glen?,Sibbie’s sistuh hadda baby boy, yer line’s been busy, etc.).

Third, and building upon their familiarity, classic data contribute to CA’s empirical rigor. Because the interactions in classic recordings enjoy widespread recognition, analyses based on them may be more readily comprehended and consequently verified/contested.

Fourth, classic data are empirically generative. Contemporary CA research continues to be informed by these materials some half a century later. They are routinely a source for novel findings on their own (e.g., Holt, 2017) and also serve to corroborate analyses that are based on newer data (e.g., Clift, 2014). Methodologically, the continued usability of classic data and the endurance of the findings they have engendered together point to the strength and reliability of CA methods for analyzing interactive language use.

What’s not so good …

The practice of relying on classic data is not unproblematic, however. From a less flattering perspective, they are not ‘classic’ but rather ‘legacy’ data that can inhibit scientific development and exclude particular groups. Overreliance on specific data necessarily guides the kinds of questions that we (can) ask, the places we look for answers, and the shape such answers take. This kind of inertia or myopia isn’t specific to CA, however; a founder effect is likely to appear in any discipline with strong links to its original documents.

Perhaps the most pernicious effect of classic data is its contribution to the English-bias in CA. While this is a natural consequence of CA’s historical emergence (anglophones analyzing English data), it can also have an exclusionary effect. Papers using English data will be read and cited more, while research on and researchers of other languages are at a comparative disadvantage. Similarly, those who might otherwise be interested in or convinced by CA may turn elsewhere.

Third, there’s something of an uncritical acceptance of the data. Their importance in the establishment and development of CA is unquestionable, but this doesn’t mean they should be treated as sacrosant. This can serve to obscure the unavoidably political act of rendering speakers into text.

Finally, another way the research practice may exclude emerges from its communal familiarity. The extensive use and recognition of these sources as well as their cultural importance for the discipline produces the appearance of communal ownership—that everyone has these recordings and transcripts. But this is a false communality. Access to the classic data is not equal, and in fact appears to be confined to those with connections to CA’s historical centers of gravity.

Final thoughts, for the moment…

We make these observations as a way to encourage reflection on the use of classic data, especially regarding our practices around data sharing. Classic recordings and transcripts are not merely the materials out of which we fashion our findings. Perhaps that is what they were at the time of recording, but today they also stand as objects that mediate our professional relationships and shape our disciplinary culture.