First steps in qualitative data analysis: Transcribing


Qualitative research in primary care deepens understanding of phenomena such as health, illness and health care encounters. Many qualitative studies collect audio or video data (e.g. recordings of interviews, focus groups or talk in consultation), and these are usually transcribed into written form for closer study. Transcribing appears to be a straightforward technical task, but in fact involves judgements about what level of detail to choose (e.g. omitting non-verbal dimensions of interaction), data interpretation (e.g. distinguishing I don't, no' from I don't know') and data representation (e.g. representing the verbalization hwarryuhh' as How are you?'). Representation of audible and visual data into written form is an interpretive process which is therefore the first step in analysing data. Different levels of detail and different representations of data will be required for projects with differing aims and methodological approaches. This article is a guide to practical and theoretical considerations for researchers new to qualitative data analysis. Data examples are given to illustrate decisions to be made when transcribing or assigning the task to others
Family Practice
25:127-131, 2008. First published 27 Feb 2008; Fam. Pract.
Julia Bailey
First steps in qualitative data analysis: transcribing
First steps in qualitative data analysis: transcribing
Julia Bailey
Bailey J. First steps in qualitative data analysis: transcribing. Family Practice 2008; 25: 127–131.
Keywords. Audio recording, data transcription, data analysis, qualitative research, video recording.
Qualitative research can explore the complexity and
meaning of social phenomena,
for example patients’
experiences of illness
and the meanings of apparently
irrational behaviour such as unsafe sex.
Data for
qualitative study may comprise written texts (e.g.
documents or field notes) and/or audible and visual
data (e.g. recordings of interviews, focus groups or
consultations). Recordings are transcribed into written
form so that they can be studied in detail, linked with
analytic notes and/or coded.
Word limits in medical journals mean that little de-
tail is usually given about how transcribing is actually
done. Authors’ descriptions in papers convey the im-
pression that transcribing is a straightforward technical
task, summed up using terms such as ‘verbatim tran-
However, representing audible talk as writ-
ten words requires reduction, interpretation and
representation to make the written text readable and
This article unpicks some of the theo-
retical and practical decisions involved in transcribing,
for researchers new to qualitative data analysis.
What are the aims of the research project?
Researchers’ methodological assumptions and disci-
plinary backgrounds influence what are considered rel-
evant data and how data should be analysed. To take
an example, talk between hospital consultants and
medical students could be studied in many different
ways: the transcript of a teaching session could be ana-
lysed thematically, coding the content (topics) of talk.
Analysis could also look at the way that developing
an identity as a doctor involves learning to use lan-
guage in particular ways, for example, using medical
terminology in genres such as the ‘case history’.
same data could be analysed to explore the construc-
tion of ‘truth’ in medicine: for example, a doctor say-
ing ‘the patient’s blood pressure is 120/80’ frames this
statement as an objective, quantifiable, scientific truth.
In contrast, formulating a patient’s medical history
with statements such as ‘she reports a pain in the left
leg’ or ‘she denies alcohol use’ frames the patient’s ac-
count as less trustworthy than the doctor’s observa-
The aims of a project and methodological
assumptions have implications for the form and
Level 2, Holborn Union Building, Highgate Hill, London N19 3UA, UK.
content of transcripts since different features of data
will be of analytic interest.
What level of detail is required?
Making recordings involves reducing the original data,
for example, selecting particular periods of time and/
or particular camera angles. Selecting which data have
significance reflects underlying assumptions about
what count as data for a particular project, for exam-
ple, whether social talk at the beginning and end of
an interview is to be included or the content of a tele-
phone call which interrupts a consultation.
Visual data
Verbal and non-verbal interaction together shape com-
municative meaning.
The aims of the project should
dictate whether visual information is necessary for
data interpretation, for example, room layout, body
orientation, facial expression, gesture and the use of
equipment in consultation.
However, visual data are
more difficult to process since they take a huge length
of time to transcribe, and there are fewer conventions
for how to represent visual elements on a transcript.
Capturing how things are said
The meanings of utterances are profoundly shaped by
the way in which something is said in addition to what
is said.
Transcriptions need to be very detailed to
capture features of talk such as emphasis, speed, tone
of voice, timing and pauses but these elements can be
crucial for interpreting data.
Example 1
The following example shows how the addition of
pauses, laughter and body conduct to a transcript invites
a different interpretation of an exchange between doctor
and patient. The excerpt below is taken from near the
end of a consultation, after the doctor has made the di-
agnosis of a viral infection which does not warrant anti-
biotics. Transcribing the verbal content alone, it appears
that the patient is happily accepting the doctor’s advice:
Dr 9: I would suggest yes paracetamol is a good
symptomatic treatment, and you’ll be fine
Pt K: fine, okay, well, thank you very much.
Representing (some) non-verbal features of the in-
teraction on the transcript changes the interpretation
of this two-line interaction (see Appendix, transcrip-
tion conventions):
Dr 9: (..) I would suggest (..) yes paracetamol or
ibuprofen is a good (..) symptomatic treatment (..)
um (.) (slapping hands on thighs) and you’ll be fine
Pt K: fine (..) okay (.) well (..) (shrugging should-
ers and laughing) thank you very much
In the second representation of this interaction, both
speakers pause frequently. The doctor slaps his thigh
and uses the idiom ‘you’ll be fine’ to wrap up his ad-
vice giving. In response, Patient K is hesitant and he
uses the mitigation ‘well’, shrugs his shoulders and
laughs, suggesting turbulence or difficulty in in-
Although the patient’s words seem to in-
dicate agreement, the way these words are said
seem to indicate the opposite.
Example 2
In another example, the addition of non-verbal fea-
tures again gives a deeper understanding of the doc-
tor–patient interaction. This patient has consulted on
a Saturday morning with sore throat symptoms. In the
extract below, the doctor seeks clarification about Pa-
tient F’s symptoms:
Dr 5: So let’s just go back to this. So, so you’ve
had this for a few weeks
Pt F: yes
Adding in non-verbal features conveys that this is
a potentially problematic exchange:
Dr 5: .hhh so let’s just go back to this (.) so (..) so
you’ve had this for a few weeks
Pt F: yes (1.0) (left hand on throat, stroking with
The doctor starts with a prominent in-breath and
stresses the word ‘weeks’ in her recapping of the dura-
tion of symptoms. Patient F responds, but there is then
a prominent pause during which he strokes his throat
with his fingers (the site of his sore throat). The 1-second
pause is ‘accountable’, in other words something is ex-
pected in this space.
Patient F does not expand on
his answer, but his gesture visibly demonstrates his
symptoms. The duration of the symptoms (a few
weeks) appears therefore to be accountable, in other
words to need explaining. The doctor addresses this
accountability directly in her next turn:
Dr 5: I must ask you (.) why have you come in to-
day because it is a Saturday morning (1.0) it’s for
urgent cases only that really have just started
Pt F: Yes because it has been troubling me since
last last night (left hand still on neck)
This more detailed level of transcribing facilitates
analysis of the social relationship between doctor and
patient; in this example, the consequences for the doc-
tor–patient interaction of consulting in an urgent sur-
gery with ‘minor’ symptoms.
Data must inevitably be reduced in the process of
transcribing, since interaction is hugely complex. Deci-
sions therefore need to be made about which features
Family Practice—an international journal128
of interaction to transcribe: the level of detail neces-
sary depends upon the aims of a research project, and
there is a balance to be struck between readability
and accuracy of a transcript.
Who should do the transcribing?
Transcribing is often delegated to a junior researcher
or medical secretary for example, but this can be a mis-
take if the transcriber is inadequately trained or
briefed. Transcription involves close observation of
data through repeated careful listening (and/or watch-
ing), and this is an important first step in data analysis.
This familiarity with data and attention to what is ac-
tually there rather than what is expected can facilitate
realizations or ideas which emerge during analysis.
Transcribing takes a long time (at least 3 hours per
hour of talk and up to 10 hours per hour with a fine
level of detail including visual detail)
and this should
be allowed for in project time plans, budgeting for re-
searchers’ time if they will be doing the transcribing.
What contextual detail is necessary to
interpret data?
Recordings may be difficult to understand because of
the recording quality (e.g. quiet volume, overlaps in
speech, interfering noise) and differing accents or styles
of speech. Utterances are interpretable through knowl-
edge of their local context (i.e. in relation to what has
gone before and what follows),
for example, allowing
differentiation between ‘I don’t, no’ and ‘I don’t know’.
Interaction is also understood in wider context such as
understanding questions and responses to be part of an
‘interview’ or ‘consultation’ genre with particular ex-
pectations for speaker roles and the form and content
of talk.
For example, the question ‘how are you?’
from a patient in consultation would be interpreted as
a social greeting, while the same question from a doctor
would be taken as an invitation to recount medical
Contextual information about the research
helps the transcriber to interpret recordings (if they
are not the person who collected the data), for exam-
ple, details about the project aims, the setting and par-
ticipants and interview topic guides if relevant.
How should data be represented?
Written language is represented in particular standard-
ized ways which are quite different from audible
speech. For example, ‘hwaryuhh’ is much more easily
read and understood if represented as separate words,
with punctuation and capital letters, as ‘How are
Choosing to use the grammar and spelling
conventions of standard UK written English aids
readability, but at the same time irons out the linguis-
tic variety which is an important feature of cultural
and subcultural identity.
For example, the following
extract represents a patient speaking a Cockney En-
glish dialect (typically spoken by working class Lon-
doners), in consultation with a doctor speaking
English with Received Pronunciation (typically spo-
ken by educated, middle class English people):
Dr 1: so what are your symptoms since yesterday
(..) the aches
Pt B: aches ere (..) in me arm (..) sneezing (..) ed-
Dr 1: ummm (..) okay (..) and have you tried any-
thing for this (.) at all?
Pt B: no (..) I ain’t a believer of me- (.) medicine
to tell you the truth
Although this attempts to represent linguistic vari-
ety, using a more literal spelling is difficult to read
and runs the risk of portraying respondents as inartic-
ulate and/or uneducated.
Even using standard writ-
ten English, transcribed talk appears faltering and
inarticulate. For example, verbal interaction includes
false starts, repetitions, interruptions, overlaps, in- and
out-breaths, coughs, laughs and encouraging noises
(such as ‘mm’), and these features may be omitted to
avoid cluttering the text.
If talk is mediated via an interpreter, decisions must
be made about how to represent translation on a tran-
for example, whether to translate ‘literally’,
and then to interpret the meaning in terms of the sec-
ond language and culture. For example, from French
to English, ‘j’ai mal au coeur’ translates literally as ‘I
have bad in the heart’, interpreted in English as ‘I feel
sick’. Translation therefore adds an additional layer of
interpretation to the transcribing process.
Written representations reflect researchers’ interpre-
tations. For example, laughter could be transcribed as
‘he he he’, ‘laughter (2 seconds)’, ‘nervous laughter’,
‘quiet laughter’ or ‘giggling’ and these representations
convey different interpretations. The layout on paper
and labelling also reflect analytic assumptions about
For example, labelling speakers as ‘patient’
and ‘doctor’ implies that their respective roles in
a medical encounter are more salient than other attrib-
utes such as ‘man’, ‘mother’, ‘Spanish speaker’ or ‘ad-
vice giver’. Talk is often presented in speech turns,
with a new line for the next speaker (as in the data ex-
amples given), but could also be laid out in a timeline,
in columns or in stanzas like poetry, for example.
Transcripts are not therefore neutral records of events,
but reflect researchers’ interpretations of data.
Presenting quotations in a research paper involves
further steps in reduction and representation through
the choice of which data to present and what to
highlight. There is debate about what counts as rele-
vant context in qualitative research.
For example,
studies usually describe the setting in which data were
collected and demographic features of respondents
such as their age and gender, but relevant contextual
information could also include historical, political and
policy context, participants’ physical appearance, re-
cent news events, details of previous meetings and so
Authors’ decisions on which data and what con-
textual information to present will lead to different
framing of data.
What equipment is needed?
Decisions about the level of detail needed for a project
will inform whether video or audio recordings are
Taking notes instead of making recordings
is not sufficiently accurate or detailed for most qualita-
tive projects. Digital audio and video recorders are
rapidly replacing analogue equipment: digital record-
ings are generally better quality, but require computer
software to store and process, and digital video files
take up huge quantities of computer memory. It is
usually necessary to playback recordings repeatedly:
a foot-controlled transcription machine facilitates this
for analogue audio tapes (see Fig. 1) and transcribing
software is recommended for digital audio or video
files, since this allows synchronous playback and typing
(see Fig. 2).
Representation of audible and visible data into written
form is an interpretive process which involves making
judgments and is therefore the first step in analysing
data. Decisions about transcribing are guided by the
methodological assumptions underpinning a particular
research project, and there are therefore many differ-
ent ways to transcribe the same data. Researchers
need to decide which level of transcription detail is re-
quired for a particular project and how data are to be
represented in written form.
Transcribing is an interpretive act rather than simply
a technical procedure, and the close observation that
transcribing entails can lead to noticing unanticipated
phenomena. It is impossible to represent the full
FIGURE 2Digital video recording equipment: video camera with firewire computer lead, mini DV cassette and Transana
transcribing software
FIGURE 1Analogue audio recording equipment: dictaphone with microphone and mini-cassette tape and foot-pedal controlled
transcription machine with headphones
Family Practice—an international journal
complexity of human interaction on a transcript and so
listening to and/or watching the ‘original’ recorded
data brings data alive through appreciating the way
that things have been said as well as what has been said.
This paper derives from a PhD thesis written by Julia
Bailey entitled ‘Doctor-patient consultations for upper
respiratory tract infections: a discourse analysis’,
which was supervised by Celia Roberts, Roger Jones
and Jane Barlow. Thanks are due to doctors and pa-
tients who participated in the project, to practice staff,
and to Anne Rouse for her advice on the practicalities
of transcribing.
Funding: Primary Care Researcher Development
award, Department of Health National Coordinating
Centre for Research Capacity Development.
Ethical approval: East London and the City Ethical
Conflict of interest: None.
Transcription Conventions
(?) talk too obscure to transcribe.
Hhhhh audible out-breath
.hhh in-breath
[ overlapping talk begins
] overlapping talk ends
(.) silence, less than half a second
(..) silence, less than one second
(2.8) silence measured in 10
a second
:::: lengthening of a sound
Becau- cut off, interruption of a sound
he says. Emphasis
= no silence at all between sounds
LOUD sounds
? rising intonation
(left hand on neck) body conduct
[notes, comments]
