ChapterPDF Available

NVivo and AI: (Semi)-Automatic Coding

Authors:

Abstract and Figures

This chapter explores the built-in Artificial Intelligence tools in NVivo, focusing on semi-automatic and automatic coding features. It discusses how AI can assist in the initial coding phases, potentially accelerating the analytical process in its early stages. The chapter critically examines the implications of AI in qualitative research, debating the balance between automation and researcher oversight. It shows how AI is only partially integrated into the program, is limited to the built-in language libraries and relies on crude statistical associations. NVivo cannot recognise language variants like sarcasm, double negatives, slang, dialect variations, idioms, or ambiguity, implying the researchers need to remain critical of the result of the AI auto-coding tools.
Content may be subject to copyright.
CHAPTER 19
NVivo and AI: (Semi)-Automatic Coding
Key messages in this chapter
.NVivo has both semi-automated and AI-driven automated coding
tools.
.In non-AI-driven coding, NVivo uses structures already available in
your data (like paragraph styles) as an aid to code your data.
.The AI-driven coding identifies themes in your data or uses
previous coding work to code your data.
Artificial Intelligence (AI) and Automatic
Coding in Qualitative Research
AI is everywhere. The introduction of ChatGPT brought AI to the broad
public and introduced it to daily life and research. In data science, methodol-
ogists were using AI long before the introduction of ChatGPT. It is known as
Text Mining (Jo, 2019; Macanovic, 2022) and is quite advanced and success-
fully applied to large databases of textual data. Some scholars already used the
technology in qualitative research projects (Ciechanowski et al., 2020) and
on a philosophical level, methodologists are discussing the implications of AI
for qualitative research (Christou, 2023a, 2023b). Some AI tools were already
available in older versions of NVivo but only available as an add-on for which
users had to pay an additional license. From version 14, these tools are inte-
grated into the main program at no additional cost. Therefore, we will discuss
© The Author(s) 2025
D. Mortelmans, Doing Qualitative Data Analysis with NVivo, Springer
Texts in Social Sciences, https://doi.org/10.1007/978-3-031-66014-6_19
229
230 D. MORTELMANS
in this chapter the old semi-automatic coding tools and the new AI-based
coding tools. We define semi-automated coding as coding that does not use
AI (text mining or machine learning) technology but merely uses some struc-
ture already present in the data and exploiting that structure to convert them
into codes. When referring to autocoding tools, we refer to the AI-based tools.
Earlier in this book, we have already discussed three ways of semi-automatic
coding. First, in Section Aggregating Coding Work in Chap. 8, we showed
the In Vivo Coding technique. When selecting a word, you can automatically
let NVivo create the name of a new code and code the selected reference with
that new code. Second, in paragraph 13.9, we showed how to code material
with queries. Most prominently, the Text Search Query is well-suited to search
for textual phrases and code them afterwards. Third, we showed in Section
Importing Databases in Chap. 18 how to autocode survey data with cases
in step 3 of the Survey Import Wizard.
In the following paragraphs, we start with the introduction of other tech-
niques of semi-automated coding that have not yet been discussed in Chap. 8.
Later, we introduce the AI-driven tools. We end the chapter with autocoding
on social media datasets.
Semi-auto Coding Based on the Paragraph Style
Paragraph styles are a collection of layout features that can be assigned to a
paragraph. The term comes from word processing and is ver y well established
in, for example, Word, where they are called heading styles. For example, the
“Standard” paragraph style in Word includes the basic characteristics of plain
text: what font will that text be in, what font size, what language, and so on.
Each word processor has several built-in profiles that are very similar among
themselves. For example, in addition to a “Standard” layout profile, there is
usually one or more layout profiles for titles in a text, the so-called Header
layout profiles. A title on the first level then gets “Heading 1” as its profile, a
title on the level below it gets “Heading 2 ,” and so on. These “heading”-styles
are built-in both in Word and in NVivo.
A first way of semi-automatic coding in NVivo uses these paragraph styles
in a text. The assumption behind this type of autocoding is that NVivo recog-
nises the paragraph styles in a text and uses them as a basis to add codes to
the formatted paragraphs. As a consequence, researchers must prepare their
transcripts in such a way that paragraphs in their transcripts have a particular
paragraph style. Moreover, the text must also be structured so that more or
less the same questions have the same styles attached to them.
We clarify this with an example in Fig. 19.1 from the interview with Barbara
(Navigation View > Data > Files > Interviews).
The researcher has structured the interview of Barbara with paragraph
styles. Plain text has the standard profile Normal. The main questions (Q1
in the example) are labelled with the paragraph style Heading 1. Names of
interviewers or interviewees have received the Heading 2 style. All questions
19 NVIVO AND AI: (SEMI)-AUTOMATIC CODING 231
Interview with Barbara on February 19th, 2009, at her
home in Bee, North Carolina. Barbara writes cooking
curriculum materials and does earth science
environmental consulng work for soil sciensts.
Q.1. Connecon to Down East
Henry
Tell me about your personal and family histor y in Down
East. How long have you or your family lived Down East
full-me or part-me?
Barbara
My family moved here when I was two years old in 1969.
My parents sll live here. They live down in Gloucester. But
I was raised in Beaufort, in town, and went to Beaufort
This paragraph has a style Normal.
This paragraph has paragraph style Heading 1
This paragraph has paragraph style Heading 2
This paragraph has a paragraph style. The Interviewer
This paragraph has paragraph style Heading 2
This paragraph has a style Normal.
Elementary and middle school and high school, then
moved away for college. So I’ve lived here most of my life
although I’ve moved away.
Fig. 19.1 Structuring transcripts with paragraph styles
the interviewer asks (in blue text) are marked with the style Interviewer.
These paragraph styles can be attached to the text in Word (or another word
processor) and imported into NVivo. Or you can import a transcript without
any paragraph styles and add them while your document is in Edit mode.
To use the paragraph styles as an autocoding tool, you start the Autocode
Wizard in Document > Autocode (Fig. 19.2). In Step 1, you determine
which type of autocoding you want. For this example, we choose the use of
the style or paragraph option. In the next step, you specify that you want
the coding to be done on Paragraph Style in the option Code by. In Step
3, NVivo asks which paragraph style you want to use to determine (1) code
names and (2) coding ranges. In the interview with Barbara, all questions (Q1,
Q2, etcetera) have the paragraph style Heading 1. Therefore, we click Heading
1 to the right-hand side in Selected paragraph styles. Note that you do not need
to use all paragraph styles in the document. NVivo has found several styles, but
we only use Heading 1 to autocode. In Step 4, you determine the location of
the new codes and the grouping of these codes. Either the codes can be placed
under an existing or new code that will serve as parent code for the new codes.
The new codes can be placed in an existing or newfolder.
In Fig. 19.3, we show the results of the Autocode Wizard. First, NVivo
has created six new codes. It used the same principle as the In Vivo coding: it
selects the paragraph with heading 1 and uses this paragraph integrally as the
new name of the code to be created. That implies that when you use paragraph
styles to code, you should use titles or short questions to format instead of
complete paragraphs. If you use this type of coding with long paragraphs, your
232 D. MORTELMANS
STEP 1 Selecng the autocode method STEP 2 Selecng the type of style
STEP 3 Choosing the type of classificaon STEP 4 Parent code or Folder?
Determine whether the new
codes will be placed as child
codes under this parent
code (Quesons) OR as
codes in a separate folder.
Paragraph style that will be
used to determine the
name of the new codes.
Choose Paragraph Styles
Fig. 19.2 Using the autocode wizard with paragraph styles
code names will be unworkable. In this example, all questions started with a
clear indicator: Q1, Q2, etcetera. As a consequence, all new codes start with
the Q1 Q2 indicators followed by the summary of the question (or, better,
the question module in the questionnaire).
A second effect of auto-coding is that NVivo will immediately code your
file with the new codes. The principle it uses is that all text between each
instance of Heading 1 is coded with the new code. In this example, the code
Q.1. Connection to Down East is applied to all text starting from the “Q”
in Q.1.” until the last character before “Q.2.” because there, the coding with
Q.2. Connection to Down East natural environment starts. Note in Fig. 19.1
that this text has been formatted with other paragraph styles like Heading 2
19 NVIVO AND AI: (SEMI)-AUTOMATIC CODING 233
Not coded (outside the scope of any
Heading 1 style)
Coded with Q1-code
(reference goes up
unl Q2 starts)
Newly created codes
(in the codebook
Fig. 19.3 Results of semi-auto coding with paragraph styles: document (upper
panel) and codebook (lower panel)
or Interviewer”. None of these are used as we have not inserted them in Step
2 of the Autocode Wizard. NVivo has ignored all these paragraph styles.
Semi-auto Coding Based on Paragraphs
A second way to code semi-automatically is based on the paragraphs in a docu-
ment. You can use this when the questions ar e very strictly organised into
paragraphs. If every first paragraph contains the answer to question 1 and
every second paragraph contains the answer to question 2, then this method
is useful. Again, it comes down to building your transcript so tightly that these
methods work. But note that any error will be punished in NVivo.
To let a paragraph end in a text, you use the ENTER key in your word
processor. The sign in Word indicates where a paragraph ends.1 If you want
to start on a new line without starting a new paragraph, you can press Shift
+ ENTER. In that case, the text will remain one paragraph, even if part of it
starts on a new line.
We clarify this with an example in Fig. 19.1 from the interview with Barbara
(Navigation View > Data > Files > Interviews). The interviews in the sample
1 When you press the icon in the menu Home in Word, you will see the paragraph
endings on your screen. We made these signs visible in the text in Fig. 19.4.
234 D. MORTELMANS
1. Interview with Barbara on February 19th, 2009, at her
home in Bee, North Carolina. Barbara writes cooking
curriculum materials and does earth science
environmental consulng work for soil sciensts.
2. Q.1. Connecon to Down East
3. Henry
4. Tell me about your personal and family history in Down
East. How long have you or your family lived in Down
East, full-me or part-me?
5. Barbara
6. My family moved here when I was two years old in
1969. My parents sll live here. They live down in
Gloucester. But I was raised in Beaufort, in town, and
went to Beaufort Elementary and middle school and
high school, then moved away for college. So I’ve lived
here most of my life although I’ve moved away.
Paragraph number 1
Paragraph number 2
Paragraph number 3
Paragraph number 4
Paragraph number 5
Paragraph number 6
Fig. 19.4 Structuring transcripts based on the paragraphs
project have all been prepared to be autocoded with paragraph styles (see
the previous paragraph). To illustrate this method of autocoding, we have
numbered all paragraphs in the interview of Barbara (Navigation View > Data
> Files > Interviews) (Fig. 19.4). It is not necessary to number the para-
graphs, but seeing these numbers will help you understand how the Autocode
Wizard will approach this text.
Autocoding with paragraphs is also done with the Autocode Wizard. You
start this Wizard in Document > Autocode (Fig. 19.5). The first step is iden-
tical to the previous method: you select the option Use the style or structure. In
Step 2, you now select Paragraph in the option Code by, as this is the method
we will use in this example. In Step 3, NVivo asks for the location of the new
codes and the grouping of these codes. Again, the codes can be placed under
an existing or new code that will serve as a parent code for the new codes.
The new codes can be placed in an existing or new folder. In this example, we
have opted for the folder option as we will generate many codes (it is a long
interview with many paragraphs).
To conclude the example, we again illustrate the effects in both the inter-
view text and the codebook (upper and lower panel, respectively). In total,
NVivo has created 77 new codes with names “1”, “2”, etcetera. Labelling of
codes is now done based on the paragraph number and not on the content of
19 NVIVO AND AI: (SEMI)-AUTOMATIC CODING 235
STEP 1 Selecng the autocode method STEP 2 Selecng the type of style
STEP 3 Parent code or Folder?
Choose Paragraph
Determine whether the new
codes will be placed as child
codes under this parent
code (Quesons) OR as
codes in a separate folder.
Fig. 19.5 Using the Autocode wizard with paragraphs
the paragraphs. The interview with Barbara is coded accordingly: the first para-
graph is coded with the code “1”, the second with code “2”, and so on. As
said before, these interviews were not prepared to use this type of autocoding.
As a result, the coding is also not very useful. You can only use this type of
autocoding if the first paragraph always refers to the same type of content, as
is the case with the second paragraph, the third paragraph, and so on. As this
is rarely the case, this type of autocoding will probably be used less than the
one with the paragraph styles (Fig. 19.6).
236 D. MORTELMANS
Paragraph 1 is coded with Code “1”
Paragraph 4 is coded with Code “4”
For each paragraph, NVivo created one code
labelled with the number of the paragraph.
(there are 77 new codes in total)
Fig. 19.6 Results of semi-auto coding with paragraphs: document (upper panel) and
codebook (lower panel)
Semi-automatic Range Coding
The example in the previous paragraph shows the difficulty of coding with
paragraphs. Also, the result is just a long list of numbered codes that have
been attached both to questions and answers without making any difference
between them. To circumvent the totally automated coding of paragraphs,
Range Coding allows you to choose the paragraphs that need coding and
leave out the others.
For example, we refer to Fig. 19.4, where we have numbered all paragraphs
in the interview with Barbara. With range code, we can now refer to those
paragraphs where the overall question and all the answers of Barbara to that
question are found: paragraphs 2, 6, 10, 14, 18, 22, and 26. Activate the range
coding in Document > Range Code. In the Range Code window, the para-
graph numbers are entered in the option Code (see Fig. 19.7). In the option
Code at, you select the code that needs to be used to code these paragraphs.
Note that this time, there is no option to create a new code. You can only
apply the range coding to existing codes in your codebook. We selected the
code Q.1. Connection to Down East in this example, as the paragraphs all
refer to that answer.
The result of the Range coding is shown in Fig. 19.8. All the paragraphs
mentioned earlier are now coded to the code Q.1. Connection to Down East”.
19 NVIVO AND AI: (SEMI)-AUTOMATIC CODING 237
The paragraph numbers that need to be coded
The code to which these
paragraphs need to be coded
Fig. 19.7 Definition of the range coding
Paragraph 2 is coded
Paragraph 6 is coded with
the same code
Fig. 19.8 Results of range coding
The paragraphs in between (with the names or the questions of the inter-
viewer) are not coded. This result makes more sense than the autocoding with
paragraphs. But still, you have much work identifying the right paragraphs
to be coded. In this case, it might have been quicker to manually code the
answers of Barbara (see Sections Deductive Coding in NVivo and Inductive
Coding in NVivo in Chap. 8) instead of using the Range Coding.
Semi-auto Coding Based on Speaker
Names (in Focus Groups)
A second way of semi-automated coding that the Autocode Wizard offers is
to code speakers’ names in transcripts. This is excellent for transcripts of focus
groups as you have multiple participants in one focus group, and they all need
their own case to be properly linked to their Case Classification. One way of
obtaining this is by using the Text Search Query and code the names of partic-
ipants one by one (see Section Using Queries in Focus Groups in Chap. 15).
However, the Autocode Wizard offers a more efficient way to obtain the same
result by coding Speaker Names. The example project does not have any exam-
ples of focus groups. But we will use the interview with Maria and Daniel. This
238 D. MORTELMANS
interview is an interview with two participants (Maria and Daniel) which makes
it a bit resemblant to the transcript of a focus group. As shown in Fig. 19.9,
three names are mentioned in the transcript. This example aims to create three
cases, one for each actor and code their answer with their respective cases. The
example used is the inter view with Maria and Daniel (Navigation View > Data
> Files > Interviews).
In the Autocode Wizard in Document > Autocode (Fig. 19.2), the first
step consists of choosing the option Speaker Name to start this type of
autocoding. In Step 2, you need to identify the names of all speakers by adding
them as rows in the upper half of the window. The lower part of the window
is a control for identifying these names. With colours, NVivo shows you which
part of the text will be coded with which participant. Whenever a participant’s
name is found in the text, NVivo will show a green tick in the column Found.
In step three, you either choose to add the cases to an existing Case Classifica-
tion or you decide to let NVivo create a new one for you. You also determine
where the speaker cases will be stored in your project (Fig. 19.10).
Figure 19.3 shows the results in both the text and the cases folder. Each
answer is coded with a case code referring to the participant speaking in that
part of the conversation. We have chosen to save our new Cases in Navigation
Elizabeth
So are there any parcular places, aside from maybe the
church and the store, that are important to you?
Daniel
The whole area.
Maria
Especially the waters. I mean, we both like to fish, and I’m
a big sheller. When you live down here, and you see
people trash – you know, I just can’t stand trash blowing,
cause I know what it can do to the animals. You know,
when you’re out on the boat or down to Atlanc and
watch some of your peers come in and bring dolphins that
swallowed a big plasc bag. You know, you don’t realize
those lile things and how important it is to keep these
waters and shores – and the roads; they don’t realize what
you throw out on the road’s gonna blow and get into the
water.
Referral to speaker 1: Elizabeth
Referral to speaker 2: Daniel
Referral to speaker 3: Maria
Fig. 19.9 A transcript with three speaker names (Elizabeth, Maria and Daniel)
19 NVIVO AND AI: (SEMI)-AUTOMATIC CODING 239
STEP 1 Selecng the autocode method STEP 2 Adding the Speaker's Names
STEP 3 Parent code or Folder?
Choose Speaker name
Add all speaker names that as
they are named in the file
Determine where the new
Cases will be created
NVivo shows a preview of names
found (in different colors)
Choose (or create) the Case Classificaon
to which the new cases need to be coded.
Fig. 19.10 Using the autocode wizard with speaker names
View > Cases > Speaker Names. In that folder, we find three new cases, one
for each speaker. Each case is also attached to the Case Classification Person
and the data can be filled in by asking the Case Classification Sheet (see Section
Using the Classification Sheet for Data Entry”in Chap. 9) (Fig. 19.11).
AI-Based Auto Coding of Sentiment
Sentiment coding was the first introduction of a Text Mining technique in
NVivo. Sentiment coding tries to identify the sentiment (positive or negative)
in the data. It is important to know that this tool looks at words and not at
sentences or paragraphs (context). The coding is, therefore, done in isolation
which may have as a result that you have a positive and negative sentiment
coded in the same sentence. Also, NVivo is not able to recognize language
240 D. MORTELMANS
Each answer is coded with the case
code of the speaker that is talking.
Newly created cases
Fig. 19.11 Results of autocoding with speaker names: document (upper panel) and
codebook (lower panel)
variants like sarcasm, double negatives, slang, dialect variations, idioms, or
ambiguity. So you need to remain critical of the result of the sentiment coding
but it can identify places where positive or negative emotions are at play or
at least identify when words are used that might say something about them.
Also, note that only positive or negative sentiments are identified while neutral
sentences remain uncoded. In addition, it can only handle one language at
a time and is limited to the standard languages for which the program has
a librar y available: Chinese (Simplified), English, French, German, Japanese,
Portuguese, and Spanish. For projects in other languages, sentiment coding
will not work.
We illustrate the sentiment coding in the interview of Barbara (Navigation
View > Data > Files > Inter views ). When you open the document of this
interview, you can start the Autocode wizard from Document > Autocode
(see Fig. 19.12). In the second step, NVivo only asks you what the context
is that needs to be coded. You can choose between sentences or paragraphs.
Clicking on finish immediately starts the coding process.
When the coding is finished, NVivo shows you two visualisations (see upper
panel in Fig. 19.13): a matrix with the source(s) in the rows and the sentiments
in the columns. Next, a hierarchy chart is shown that represents the weight of
each sentiment by the size of rectangles in a box. Sentiments are coded to your
data. That is visible when you open the Coding Strips in the interview with
Barbara. However, in your codebook, no sentiment codes are shown. The
19 NVIVO AND AI: (SEMI)-AUTOMATIC CODING 241
STEP 1 Selecng the autocode method STEP 2 Sentences or paragraphs?
Fig. 19.12 Using the autocode wizard to code the sentiment of your data
sentiment codes are fixed (named) codes and can be found in (Navigation
View) > Sentiment. The structure of the codebook contains two parent codes
(Positive and Negative), with two child codes identifying the intensity of the
emotion undern eath.
Fig. 19.13 Results of autocoding sentiment: visualisations (upper panel) and coding
(lower panel)
242 D. MORTELMANS
AI-Based Auto Coding of Themes
The second AI-driven tool scans your data, looks for themes in the data and
subsequently codes the data with the themes found. NVivo uses noun phrases
to detect the themes so it does not understand the content of your data. It
merely looks at the themes that are most commonly found and groups them.
Again, it can only handle one language at a time and is limited to the standard
languages for which the program has a library available: Chinese (Simplified),
English, French, German, Japanese, Portuguese, and Spanish. For projects in
other languages, the tool will not work.
As an example, we will code all interviews in the sample project. Therefore,
we go to our interview material in Navigation View > Data > Files > Inter-
views and you select all interviews with CTRL + A. Next, you go to Home >
Autocode to start the Autocode Wizard (see Fig. 19.14). The wizard steps are
analogous to sentiment coding. After selecting the autocode method, NVivo
immediately starts identifying themes (in some cases the program needs to
install a language module but that only takes a few seconds). The second step
presents an over view of the themes identified in this data. You can review
them and unselect themes that you do not want to create in your project. In
the third step, you determine the coding context: paragraphs or sentences (or
cells in case you work on a dataset). The last step asks for the place in (Navi-
gation View) > Coding > Codes where you want to save the codes from the
themes. They will all be placed under one parent code (NVivo suggests the
name Autocoded Themes).
The results are shown in two visualisations: a matrix with the interviews
in the rows and the main themes in the columns and a hierarchy chart that
also gives an idea of the frequency of occurrence of the new themes (see
Fig. 19.15). These two give a nice overview of the themes identified in the
data and can help you further refine and rename the codes.
A final caveat: this tool can not replace the thematic coding of your content.
The autocoding tool seems a quick way to get the content of your data into
a codebook. But again, we warn that the AI is not capable of understanding
the content and certainly not the subtleties of content in interview or focus
group transcripts. So use the tool only as an exploratory instrument for this
type of data. For data with more structured language, like literature, policy
documents or law documents like court cases, the tool may produce much
better results as standardized language is easier to cluster with noun groups
than natural language in face-to-face interviews.
Machine Learning Based Auto
Coding of Coding Patterns
The third AI-driven tool works with machine learning (Pinheiro & Patetta,
2021). Machine learning differs from automated theme detection in that it
needs training data to learn from to perform its actual task (coding). In NVivo,
19 NVIVO AND AI: (SEMI)-AUTOMATIC CODING 243
STEP 1 Selecng the autocode method STEP 2 Preview of the idenfied themes
STEP 3 Sentences or paragraphs? STEP 4 Where to save the new codes?
Fig. 19.14 Using the autocode wizard to identify themes in your data
Fig. 19.15 Results of autocoding for themes: visualisations
244 D. MORTELMANS
the tool to autocode based on coding patterns works is based on this principle.
The tool expects you to code part of your data with the thematic coding tools
we discussed earlier (see Chap. 8). You present that coding work to NVivo
and the autocoding tool will use your coding to code new material based on
what it has available.
We start our example by selecting the interview of Barbara (Navigation
View > Data > Files > Interviews). When selected, the button Autocode
becomes accessible and you can start the Autocode wizard from Document >
Autocode (see Fig. 19.16). By selecting the interview with Barbara, we will
code this interview based on coding work found elsewhere. You can also select
multiple files that are not yet coded and have NVivo code a larger number at
once.
STEP 1 Selecng the autocode method STEP 2 Idenfying the training dataset
STEP 3 Preview of the coding result STEP 4 Sentences or paragraphs?
Choose the codes that will
serve as training material.
Choose the coded files that need
to serve as training material
Fig. 19.16 Using the autocode wizard to code on existing coding patterns
19 NVIVO AND AI: (SEMI)-AUTOMATIC CODING 245
Fig. 19.17 Result of autocoding for existing patterns
Step 2 is the most important step in this wizard. Here, you identify the
training dataset. Two selections need to be made (see Fig. 19.16). First, you
need to select the codes from your codebook that will serve as example codes
for the machine learning. Second, you need to select the files that have been
coded with these codes to identify the training dataset NVivo can use to
learn how your codes were applied to actual data. A slider in the middle
of the window asks how much coding you want to procedure. The choice
“less” implies a higher threshold to identify “same coding reference” while
the side “More” lowers that threshold and allows the machine learning to find
similarities in the data more easily.
The third step in the wizard gives feedback on the machine learning. You
gain insight into which codes could be used to code new material (No issues
detected) and which codes were more problematic. When reviewing the result,
you might go back one step and move the slider more to the left or the right
to change the threshold of similarity (Fig. 19.17).
The result of the procedure is only a Matrix in this case. In the rows, you
get the files that were coded. In the columns, you get an overview of the
(existing) codes that were applied to this new material. By double-clicking on
a cell, you get the references that were coded with the codes.
A caveat: this procedure is more refined than the autocoding for themes.
Here the machine learning actually uses your scholarly input to do autocoding.
But again, machine learning does not understand the new data and only looks
for similar patterns in the data. When similar references get coded, they might
be erroneous because the interviewee was sarcastic and meant the exact oppo-
site of what was said. The autocoding tool will not recognize this and just code
the reference as similar to earlier references. Keep a critical attitude towards the
coding that is done with this procedure to avoid erroneous conclusions in your
research.
Semi-auto Coding Social Media and Surveys
In Chap. 17, we discussed how you work with Social media data. One
of the possibilities of importing social media is to divert the social media
data to a dataset where all data are, for example, tweets. In Chap. 18, we
discussed working with dataset project items. As these project items are similar
(they are both considered dataset project items), we will discuss the semi-
autocoding of dataset project items in this paragraph. You can also use the
246 D. MORTELMANS
AI-generated autocoding tools which we will not repeat here. They work simi-
larly as discussed in the previous paragraphs. Only the semi-automatic coding
based on the database structure differs from what we discussed in Sections
Semi-auto Coding Based on the Paragraph Styleand Semi-auto Coding
Based on Paragraphs”.
As an example, we use the tweets on Carteret County, stored in a dataset at
Navigation View > Data > Files > Social Media. We developed the example
on a dataset with social media, but the explanation is completely analogous
to datasets containing survey data. Therefore, we only present the autocoding
once and apply it to a dataset with tweets.
When you open the dataset, we can start the Autocode Wizard at Dataset
> Autocode. In the first step, you see all AI-based tools. But for this example,
we choose the semiautomatic tool Use the style or structure (Fig. 19.18).
In Step 2, NVivo presents three ways of coding the data in the dataset. We
show each way in the screenshots below with a short explanation. Note that
in the screenshots, NVivo illustrates with schemes what will be coded (yellow
= coded text) and with which codes (red area Code). These schemes perfectly
illustrate the choice you make in your autocoding.
Autocode Wizard CHOICE 1: Code to codes or cases for each value in
predefined Twitter columns
Fig. 19.18 Autocode wizard—start screen
19 NVIVO AND AI: (SEMI)-AUTOMATIC CODING 247
In the first choice (Fig. 19.19), you work within the rows of the dataset.
NVivo will create new codes based on the content of one of the fields in the
database. These codes are used to code text in another field.
In step 3 of the wizard (not shown in the figure), you typically use the text
in the variable “Hashtags” (or closed questions in a dataset with survey data)
as new codes and apply these codes to the text of the variable “Tweet”. Other
possible fields to be used as codes are the variables “Mention”, “Tweet type”,
and “Location”. At the same time, NVivo will create new cases for each data
line (i.e. each tweet) using the variable “Username” as the name of the case.
The variable “username” can only be used to create cases and is not available
to be selected for the coding part.
In Step 4 (not shown in the figure), you determine the name and location
of both the new (parent) codes (or folder if you wish) and the new cases (can
also be a folder).
Autocode Wizard CHOICE 2: Code columns
The second choice allows you to code all the fields in a particular column in the
dataset with the name of that column (see Fig. 19.20). In step 3 (not shown),
NVivo asks the user to select the column that should be coded. Typically,
this is the column of the variable “Tweet”, as this contains the content of
the tweets (or an open question when you autocode surveys). In step 4 (not
Schemac overview of this choice
Fig. 19.19 Autocode wizard—choice 1
248 D. MORTELMANS
shown), the final choices about the name and location of codes, cases or folders
must be determined.
Schemac overview of this choice
Fig. 19.20 Autocode wizard—choice 2
19 NVIVO AND AI: (SEMI)-AUTOMATIC CODING 249
Schemac overview of this choice
Fig. 19.21 Autocode wizard—choice 3
Autocode Wizard CHOICE 3: Code rows
The last option (see Fig. 19.21) is the opposite of the second one and codes
rows. Typically, cases are created from the variable “Username” (Step 3, not
shown) (or the ID variable in a survey dataset). For each user, one case is
created. In the fourth step (also not shown), you determine the content that
needs to be coded with these cases. Again, the most used variable here is
“Tweet”, as this contains the actual data of the social media (or an open ques-
tion in your survey). In the final step, you again choose location and type
(parent case or folder) to store the new cases.
References
Christou, P. A. (2023a). The use of artificial intelligence (AI) in qualitative research
for theory development. The Qualitative Report. https://doi.org/10.46743/2160-
3715/2023.6536
Christou, P. A. (2023b). How to use artificial intelligence (AI) as a resource, method-
ological and analysis tool in qualitative research? The Qualitative Repor t . https://
doi.org/10.46743/2160-3715/2023.6406
Ciechanowski, L., Jemielniak, D., & Gloor, P. A. (2020). TUTORIAL: AI research
without coding: the art of fighting without fighting: Data science for qualitative
researchers. Journal of Business Research, 117, 322–330. https://doi.org/10.1016/
j.jbusres.2020.06.012
250 D. MORTELMANS
Jo, T. (2019). Text mining. Concepts, implementation, and big data challenge.
Springer.
Macanovic, A. (2022). Text mining for social science—The state and the future
of computational text analysis in sociology. Social Science Research, 108, 102784.
https://doi.org/10.1016/j.ssresearch.2022.102784
Pinheiro, C. A. R., & Patetta, M. (2021). Introduction to statistical and machine
learning methods for data science. SAS Institute Inc.
Open Access This chapter is licensed under the terms of the Creative Commons Attri-
bution 4.0 International License (http://creativecommons.org/licenses/by/4.0/),
which permits use, sharing, adaptation, distribution and reproduction in any medium
or format, as long as you give appropriate credit to the original author(s) and the
source, provide a link to the Creative Commons license and indicate if changes were
made.
The images or other third party material in this chapter are included in the chapter’s
Creative Commons license, unless indicated otherwise in a credit line to the mate-
rial. If material is not included in the chapter’s Creative Commons license and your
intended use is not permitted by statutory regulation or exceeds the permitted use,
you will need to obtain permission directly from the copyright holder.
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Theory development is an important component of academic research since it can lead to the acquisition of new knowledge, the development of a field of study, and the formation of theoretical foundations to explain various phenomena. The contribution of qualitative researchers to theory development and advancement remains significant and highly valued, especially in an era of various epochal shifts and technological innovation in the form of Artificial Intelligence (AI). Even so, the academic community has not yet fully explored the dynamics of AI in research and there are significant gaps in our understanding of how AI can be used most effectively in the context of theory building. The aim of this paper, which is driven by critical and conceptualization methodological dynamics, is to investigate the role of AI in the theory development process. As such, it critically evaluates the opportunities and limitations of AI in theory building, delivers a conceptual map of the nexus between AI and theory development, and presents key considerations for the use of AI in the creation of new theories or the advancement of existing ones. Though the necessity of AI tools in theory creation is contested given that the researchers' cognitive and evaluative skills are regarded as critical in this process, the value of AI in advancing theory is not to be underestimated.
Article
Full-text available
Artificial Intelligence (AI) has had far-reaching effects in research and the academic world. It has been used in many ways by the scientific community within the context of qualitative research, such as literature and systematic reviews, for conceptualization purposes, thematic and content analysis. It has however prompted concerns and questions about the potential for unreliable research, bias, and unethical behavior in the outcomes of AI-produced research. The purpose of this paper is to examine the current use of AI in research, its strengths and limitations, dilemmas and ethical considerations from theoretical critical perspective principles, while delivering five key considerations for the appropriate, rigorous, and reliable use of AI in research practice. The first step is to become acquainted with the data generated by AI systems. The second is concerned with removing biased content and addressing ethical concerns when using AI, while the third is concerned with cross-referencing information generated by AI. The fourth step is to control the analysis process. The fifth and most important key consideration is the demonstration of cognitive input and skills by the researcher throughout the process of using AI in any qualitative research study and in reaching conclusions.
Article
Full-text available
In this tutorial, we show how to scrape and collect online data, perform sentiment analysis, social network analysis, tribe finding, and Wikidata cross-checks, all without using a single line of programming code. In a step-by-step example, we use self-collected data to perform several analyses of the glass ceiling. Our tutorial can serve as a standalone introduction to data science for qualitative researchers and business researchers, who have avoided learning to program. It should also be useful for experienced data scientists who want to learn about the tools that will allow them to collect and analyze data more easily and effectively.
Article
The emergence of big data and computational tools has introduced new possibilities for using large-scale textual sources in sociological research. Recent work in sociology of culture, science, and economic sociology has shown how computational text analysis can be used in theory building and testing. This review starts with an introduction of the history of computer-assisted text analysis in sociology and then proceeds to discuss five families of computational methods used in contemporary research. Using exemplary studies, it shows how dictionary methods, semantic and network analysis tools, language models, unsupervised, and supervised machine learning can assist sociologists with different analytical tasks. After presenting recent methodological developments, this review summarizes several important implications of using large datasets and computational methods to infer complex meaning in texts. Finally, it calls researchers from different methodological traditions to adopt text mining tools while remaining mindful of lessons learned from working with conventional data and methods.
Introduction to statistical and machine learning methods for data science
  • C A R Pinheiro
  • M Patetta
Pinheiro, C. A. R., & Patetta, M. (2021). Introduction to statistical and machine learning methods for data science. SAS Institute Inc.