This paper presents a concise introduction to Multimodal (inter)action analysis (MIA), which began to be developed in the early 2000s in tandem with technological advances for visual qualitative research. By now, MIA has grown into a fully-fledged research framework, including multimodal philosophy, theory, method and methodology for the study of human action, interaction and identity. With systematic phases from data collection to transcription (including transcription conventions) and data analysis, this framework allows researchers to work in a data-driven and replicable manner moving past common interpretive paradigms (Norris 2019, 2020).
1. Introduction
Video-based qualitative research is an exciting area to work in. There are many new
developments and perspectives. The one that we work in at the AUT Multimodal
Research Centre is multimodal (inter)action analysis (MIA) (Norris 2004a, b, 2011a,
2029, 2020; Geenen et al. 2015), which has grown from theory and methodology to a
fully-fledged framework. I will take a moment to set out the state of this framework
and discuss the role that audio-visual technology plays in MIA.
2. What is MIA?
First, multimodal (inter)action analysis is a coherent framework to analyse video-
based qualitative research of human action, interaction and identity, which was made
possible through technological advances. Second, and often in connection with video-
based data, this framework can be used to analyse other types of data from interviews,
images and chats to diaries.
What makes MIA coherent is that all parts from philosophy to theory, method and
methodology are interconnected. On the overarching philosophical strata, MIA posits
that human action, interaction and identity come about through a primacy of
perception and a primacy of embodiment (Norris, 2019). On the theoretical strata,
MIA follows MDA (Scollon, 1998, 2001), insisting on the principles of social action
(including communication) and history (Norris, 2020). On this theoretical level, MIA
further follows MDA by arguing that all human actions are mediated, and that
mediated actions with a history are practices. Then, MIA moves on from MDA to
multimodal mediated theory, noting that mediated actions appear on different levels
(lower-level, higher-level and frozen) (Norris, 2004a). Multimodal mediated theory
further suggests that modes are systems of mediated actions (Norris, 2013) and
discourses are practices (mediated actions with a history) with an institutional and/or
ideological dimension (Norris, 2020).
Primacy of Perception & Primacy of Embodiment
Principles of Social action (incl. communication) & history
Units of Analysis: Mediated action (lower-level, higher-level & frozen),
Mode as system of mediated action, Practices & Discourses
Phase I
Phase II
Phase III
Phase IV
Phase V
data set
action &
video data
Analysing data by
engaging with
methodological tools such
as Modal density, Modal
continuum of
means, Scale of Actions,
Levels of action, Agency,
Site of engagement etc.
Data Collection
Figure 1 - Multimodal (inter)action analysis is a coherent framework for the analysis of
human action, interaction and identity
When looking at Figure 1, we see how the philosophical and the theoretical strata
seemingly sit above method and methodology. However, a two-dimensional image
does not do this framework justice. Rather, philosophy and theory play an integral
part in all phases depicted in the method and methodology sections.
2.1. The Phases of MIA and the technology needed
The method part is divided into four phases (Norris, 2019), all of which depend on
audio-visual technology, but none are technology-specific. In other words, researchers
can use the kind of technology they have easy access to, without having to download
any kind of specific software. The first two phases help us collect and keep track of
multimodal, and often diverse, data pieces in a systematic manner. In these first two
phases, the researcher initially produces data collection tables and then data set tables.
Regular computer software from Word to Excel, Free Writing software, or some kind
of qualitative data analysis software, can be used for this.
The next two phases are the transcription phases. In Phase III, interviews, videos
as well as all other data are transcribed into higher-level action tables and the higher-
level actions are then bundled. By doing this, we systematically work through all of
the data that has been collected in a consistent manner, allowing us to work in a data-
driven way that moves beyond common interpretive paradigms (Norris 2019, 2020).
As with Phases I & II, easily accessible computer software chosen by the researcher
is used.
During and/or after the bundling of higher-level actions in Phase III, we select
video-data pieces for micro analyses, which are then transcribed in detail by using
multimodal transcription conventions (Norris 2004a, 2011, 2019, 2020) in Phase IV.
Here, we again rely on readily accessible audio-visual technology. Rather than relying
on a particular video-editing software, researchers can use whichever one that allows
them to examine individual frames and take screenshots.1 As transcription continues
and researchers assemble the screenshots and embed circles, arrows, overlaid
language, etc., the researcher again chooses an easily accessible software that allows
this kind of assembly. This can be done in writing programs, photo-editing programs,
or even in software such as PowerPoint.
The video excerpts that we transcribe in this detail are usually quite small (often
from a few seconds to less than a minute). The reason for this is that we are interested
in the great detail of an unfolding moment: how does the moment unfold with and
through the multiple modes that are involved? As we transcribe the actions performed
in great detail, we begin to see what we cannot see when watching a video clip. There
are small movements, gaze-changes, etc. that are so very easily missed when watching
a video, but which in fact drive the interaction under scrutiny in the direction that it is
moving. During multimodal transcription, we embed those findings in the transcript
and highlight them through the circles, arrows, etc. mentioned above to make them
clearly visible (Norris, 2016). Because of our transcription conventions, we can ensure
that our process is replicable.
Once we have produced transcripts of pertinent excerpts from our video data, we
engage with methodological tools that are relevant for the data pieces such as modal
density (Norris 2004a), modal configuration (Norris 2009a), the foreground-
background continuum of attention/awareness (Norris 2004a; 2008),
semantic/pragmatic means (Norris 2004a), levels of action (Norris 2009b), scales of
1 As to which frames are selected, please have a look at Norris (2004a, 2019).
action (Norris 2017b), agency (Norris 2005; Pirini, 2017), or the site of engagement
(Scollon 1998, 2001; Norris, 2004a, 2019, 2020; Norris and Jones 2005). Here again,
we rely on audio-visual technology without, however, favouring any one kind.
2.2. Some areas of human action, interaction, and identity where MIA has been
Multimodal (inter)action analysis is thus a coherent and comprehensive research
framework for the analysis of qualitative video-based data.2 All the pieces in this
framework fit together (Norris 2012; Pirini 2014b), allowing the researcher to build a
coherent picture of whatever human action, interaction or identity is being studied. In
this way, we have made strides in examining space and place or children’s acquisition
(Geenen 2013; Geenen 2017, 2018); identity (Norris 2005, 2007, 2008, 2011; Norris
and Makboon 2015; Matelau-Doherty and Norris 2021); video conferences (Norris
2017a; Norris and Pirini 2017); business coaching, high school tutoring and
intersubjectivity (Pirini 2013, 2014a, 2016), to name but a few areas in which the
framework has been used. What we at the AUT Multimodal Research Centre are
finding is that with a coherent framework such as MIA, there is much potential to
discover new insight and knowledge about any kind of human action, interaction, and
3. In summary
Multimodal (inter)action analysis is a framework that allows us to make new
discoveries about human action, interaction and identity, and while MIA uses and
relies on audio-visual technology, MIA is software-independent. No doubt, as
technology advances, MIA will advance in tandem.
