QUIVIRR, VOL. 2 (2021) DOI: 10.5278/ojs.quivirr.v2.2021.a0004
Multimodal (Inter)action Analysis in a Nutshell: Philosophy,
Theory, Method and Methodology
School of Communication Studies
Auckland University of Technology
This paper presents a concise introduction to Multimodal (inter)action analysis
(MIA), which began to be developed in the early 2000s in tandem with technological
advances for visual qualitative research. By now, MIA has grown into a fully-fledged
research framework, including multimodal philosophy, theory, method and
methodology for the study of human action, interaction and identity. With systematic
phases from data collection to transcription (including transcription conventions) and
data analysis, this framework allows researchers to work in a data-driven and
replicable manner moving past common interpretive paradigms (Norris 2019, 2020).
Multimodal (inter)action analysis, multimodal philosophy, multimodal theory,
multimodal method, multimodal methodology
Video-based qualitative research is an exciting area to work in. There are many new
developments and perspectives. The one that we work in at the AUT Multimodal
Research Centre is multimodal (inter)action analysis (MIA) (Norris 2004a, b, 2011a,
2029, 2020; Geenen et al. 2015), which has grown from theory and methodology to a
fully-fledged framework. I will take a moment to set out the state of this framework
and discuss the role that audio-visual technology plays in MIA.
2. What is MIA?
First, multimodal (inter)action analysis is a coherent framework to analyse video-
based qualitative research of human action, interaction and identity, which was made
possible through technological advances. Second, and often in connection with video-
based data, this framework can be used to analyse other types of data from interviews,
images and chats to diaries.
What makes MIA coherent is that all parts from philosophy to theory, method and
methodology are interconnected. On the overarching philosophical strata, MIA posits
SIGRID NORRIS MIA IN A NUTSHELL
that human action, interaction and identity come about through a primacy of
perception and a primacy of embodiment (Norris, 2019). On the theoretical strata,
MIA follows MDA (Scollon, 1998, 2001), insisting on the principles of social action
(including communication) and history (Norris, 2020). On this theoretical level, MIA
further follows MDA by arguing that all human actions are mediated, and that
mediated actions with a history are practices. Then, MIA moves on from MDA to
multimodal mediated theory, noting that mediated actions appear on different levels
(lower-level, higher-level and frozen) (Norris, 2004a). Multimodal mediated theory
further suggests that modes are systems of mediated actions (Norris, 2013) and
discourses are practices (mediated actions with a history) with an institutional and/or
ideological dimension (Norris, 2020).
Primacy of Perception & Primacy of Embodiment
MULTIMODAL MEDIATED THEORY
Principles of Social action (incl. communication) & history
Units of Analysis: Mediated action (lower-level, higher-level & frozen),
Mode as system of mediated action, Practices & Discourses
Analysing data by
methodological tools such
as Modal density, Modal
means, Scale of Actions,
Levels of action, Agency,
Site of engagement etc.
Figure 1 - Multimodal (inter)action analysis is a coherent framework for the analysis of
human action, interaction and identity
When looking at Figure 1, we see how the philosophical and the theoretical strata
seemingly sit above method and methodology. However, a two-dimensional image
does not do this framework justice. Rather, philosophy and theory play an integral
part in all phases depicted in the method and methodology sections.
QUALITATIVE VIDEO RESEARCH REPORTS
2.1. The Phases of MIA and the technology needed
The method part is divided into four phases (Norris, 2019), all of which depend on
audio-visual technology, but none are technology-specific. In other words, researchers
can use the kind of technology they have easy access to, without having to download
any kind of specific software. The first two phases help us collect and keep track of
multimodal, and often diverse, data pieces in a systematic manner. In these first two
phases, the researcher initially produces data collection tables and then data set tables.
Regular computer software from Word to Excel, Free Writing software, or some kind
of qualitative data analysis software, can be used for this.
The next two phases are the transcription phases. In Phase III, interviews, videos
as well as all other data are transcribed into higher-level action tables and the higher-
level actions are then bundled. By doing this, we systematically work through all of
the data that has been collected in a consistent manner, allowing us to work in a data-
driven way that moves beyond common interpretive paradigms (Norris 2019, 2020).
As with Phases I & II, easily accessible computer software chosen by the researcher
During and/or after the bundling of higher-level actions in Phase III, we select
video-data pieces for micro analyses, which are then transcribed in detail by using
multimodal transcription conventions (Norris 2004a, 2011, 2019, 2020) in Phase IV.
Here, we again rely on readily accessible audio-visual technology. Rather than relying
on a particular video-editing software, researchers can use whichever one that allows
them to examine individual frames and take screenshots.1 As transcription continues
and researchers assemble the screenshots and embed circles, arrows, overlaid
language, etc., the researcher again chooses an easily accessible software that allows
this kind of assembly. This can be done in writing programs, photo-editing programs,
or even in software such as PowerPoint.
The video excerpts that we transcribe in this detail are usually quite small (often
from a few seconds to less than a minute). The reason for this is that we are interested
in the great detail of an unfolding moment: how does the moment unfold with and
through the multiple modes that are involved? As we transcribe the actions performed
in great detail, we begin to see what we cannot see when watching a video clip. There
are small movements, gaze-changes, etc. that are so very easily missed when watching
a video, but which in fact drive the interaction under scrutiny in the direction that it is
moving. During multimodal transcription, we embed those findings in the transcript
and highlight them through the circles, arrows, etc. mentioned above to make them
clearly visible (Norris, 2016). Because of our transcription conventions, we can ensure
that our process is replicable.
Once we have produced transcripts of pertinent excerpts from our video data, we
engage with methodological tools that are relevant for the data pieces such as modal
density (Norris 2004a), modal configuration (Norris 2009a), the foreground-
background continuum of attention/awareness (Norris 2004a; 2008),
semantic/pragmatic means (Norris 2004a), levels of action (Norris 2009b), scales of
1 As to which frames are selected, please have a look at Norris (2004a, 2019).
SIGRID NORRIS MIA IN A NUTSHELL
action (Norris 2017b), agency (Norris 2005; Pirini, 2017), or the site of engagement
(Scollon 1998, 2001; Norris, 2004a, 2019, 2020; Norris and Jones 2005). Here again,
we rely on audio-visual technology without, however, favouring any one kind.
2.2. Some areas of human action, interaction, and identity where MIA has been
Multimodal (inter)action analysis is thus a coherent and comprehensive research
framework for the analysis of qualitative video-based data.2 All the pieces in this
framework fit together (Norris 2012; Pirini 2014b), allowing the researcher to build a
coherent picture of whatever human action, interaction or identity is being studied. In
this way, we have made strides in examining space and place or children’s acquisition
(Geenen 2013; Geenen 2017, 2018); identity (Norris 2005, 2007, 2008, 2011; Norris
and Makboon 2015; Matelau-Doherty and Norris 2021); video conferences (Norris
2017a; Norris and Pirini 2017); business coaching, high school tutoring and
intersubjectivity (Pirini 2013, 2014a, 2016), to name but a few areas in which the
framework has been used. What we at the AUT Multimodal Research Centre are
finding is that with a coherent framework such as MIA, there is much potential to
discover new insight and knowledge about any kind of human action, interaction, and
3. In summary
Multimodal (inter)action analysis is a framework that allows us to make new
discoveries about human action, interaction and identity, and while MIA uses and
relies on audio-visual technology, MIA is software-independent. No doubt, as
technology advances, MIA will advance in tandem.
Geenen, Jarret. 2013. Actionary Pertinence: Space to Place in Kitesurfing.
Multimodal Communication 2 (2): 123–153. https://doi.org/10.1515/mc-2013-
Geenen, Jarret. 2017. Show (and Sometimes) Tell: Identity Construction and the
Affordances of Video‐Conferencing. Multimodal Communication 6 (1): 1–18.
2 MIA also allows the integration of other kinds of data and allows for the same systematic collection,
analysis and transcription of all varied data.
QUALITATIVE VIDEO RESEARCH REPORTS
Geenen, Jarret. 2019. The Acquisition of Interactive Aptitudes: A Microgenetic Case
Study. Pragmatics & Society. 9 (4): 518-544.
Geenen, Jarret, Sigrid Norris, and Boonyalakha Makboon. 2015. Multimodal
Discourse Analysis. In The International Encyclopedia of Language and
Social Interaction, edited by Karen Tracy, Christina Ilie and Todd Sandel, 1-
17. Hoboken: NJ: Wiley‐Blackwell.
Matelau-Doherty, M. Tui, and Sigrid Norris. 2021. A Methodology to Examine
Identity: Multimodal (Inter)action Analysis. In The Cambridge Handbook of
Identity, edited by Michael Bamber, Carolin Demuth and Meike Watzlawik,
304-323. Cambridge: Cambridge University Press.
Norris, Sigrid. 2004a. Analyzing Multimodal Interaction: A Methodological
Framework. London: Routledge.
Norris, Sigrid. 2004b. Multimodal Discourse Analysis: A Conceptual Framework. In
Discourse and Technology: Multimodal Discourse Analysis, edited by Philip
Levine and Ron Scollon, 101–115. Washington, DC: Georgetown University
Norris, Sigrid. 2005. Habitus, Social Identity, the Perception of Male Domination –
And Agency? In Discourse in Action: Introducing Mediated Discourse
Analysis, edited by Sigrid Norris and Rodney Jones, 183–197. London:
Norris, Sigrid. 2006. Multiparty Interaction: A Multimodal Perspective on
Relevance. Discourse Studies 8 (3), 401–421.
Norris, Sigrid. 2007. The Micropolitics of Personal National and Ethnicity Identity.
Discourse and Society 18 (5), 653–674.
Norris, Sigrid. 2008. Some Thoughts on Personal Identity Construction: A
Multimodal Perspective. In Advances in Discourse Studies, edited by Vijay
Bhatia, John Flowerdew and Rodney Jones, 132–149. London: Routledge.
Norris, Sigrid. 2009a. Modal Density and Modal Configurations: Multimodal
Actions. In The Routledge Handbook of Multimodal Analysis, edited by Carey
Jewitt. London: Routledge.
Norris, Sigrid. 2009b. Tempo, Auftakt, Levels of Actions, and Practice: Rhythms in
Ordinary Interactions. Journal of Applied Linguistics 6 (3), 333–356.
Norris, Sigrid. 2011a. Identity in (Inter)action: Introducing Multimodal (Inter)action
Analysis. Berlin: De Gruyter Mouton.
SIGRID NORRIS MIA IN A NUTSHELL
Norris, Sigrid. 2012. Multimodal (Inter)action Analysis: An Integrative
Methodology. In Body – Language – Communication, edited by Cornelia
Müller, Ellen Fricke, Alan Cienki and David McNeill, 275-286. Berlin: De
Norris, Sigrid. 2013. What is a Mode? Smell, Olfactory Perception, and the Notion
of Mode in Multimodal Mediated Theory. Multimodal Communication 2 (2),
Norris, Sigrid. 2016. Concepts in Multimodal Discourse Analysis with Examples
from Video Conferencing. Yearbook of the Poznan Linguistic Meeting 2 (1),
Norris, Sigrid. 2017a. Rhythmus und Resonanz in Internationalen
Videokonferenzen. In Resonanz, Rhythmus & Synchronisierung:
Erscheinungsformen und Effekte, edited by Thiemo Breyer, Michael B.
Buchholz, Andreas Hamburger, Stefan Pfänder and Elke Schumann, 85-102.
Norris, Sigrid. 2017b. Scales of Action: An Example of Driving & Car Talk in
Germany and North America. Text & Talk 37 (1), 117–139.
Norris, Sigrid. 2019. Systematically Working with Multimodal Data: Research
Methods in Multimodal Discourse Analysis. Hoboken, NJ: Wiley Blackwell.
Norris, Sigrid. 2020. Multimodal Theory and Methodology: For the Analysis of
(Inter)action and Identity. London: Routledge.
Norris, Sigrid, and Jones, Rodney H. 2005. Discourse in Action: Introducing
Mediated Discourse Analysis. London: Routledge.
Norris, Sigrid, and Boonyalakha Makboon. 2015. Objects, Frozen Actions, and
Identity: A Multimodal (Inter)action Analysis. Multimodal Communication 4
(1), 43–60. https://doi.org/10.1515/mc-2015-0007
Norris, Sigrid, and Jesse Pirini. 2017. Communicating Knowledge, Getting
Attention, and Negotiating Disagreement via Videoconferencing Technology:
A Multimodal Analysis. Journal of Organizational Knowledge
Communication 3 (1):23-48. https://doi.org/10.7146/jookc.v3i1.23876
Pirini, Jesse. 2013. Analysing Business Coaching: Using Modal Density as a
Methodological Tool. Multimodal Communication 2 (2): 195–215.
Pirini, Jesse. 2014a. Producing Shared Attention/Awareness in High School
Tutoring. Multimodal Communication 3 (2): 163–179.
QUALITATIVE VIDEO RESEARCH REPORTS
Pirini, Jesse. 2014b. Introduction to Multimodal (Inter)action Analysis. In
Interactions, Images and Texts: A Reader in Multimodality, edited by Sigrid
Norris and Carmen D. Maier, 77–92. Berlin: De Gruyter Mouton.
Pirini, Jesse. 2016. Intersubjectivity and Materiality: A Multimodal Perspective.
Multimodal Communication 5 (1): 1–14. https://doi.org/10.1515/mc-2016-
Pirini, Jesse. 2017. Agency and Co‐production: A Multimodal Perspective.
Multimodal Communication 6 (2): 1–20. https://doi.org/10.1515/mc-2016-
Scollon, Ron. 1998. Mediated Discourse as Social Interaction: A Study of News
Discourse. London: Longman.
Scollon, Ron. 2001. Mediated Discourse: The Nexus of Practice. London: