Content uploaded by Nathan Francis Gillespie
Author content
All content in this area was uploaded by Nathan Francis Gillespie on Jan 17, 2024
Content may be subject to copyright.
ANALYZING AUDITORY PERCEPTUAL EXPERIENCES 1
Natural Language Processing of Auditory Perceptual Experiences:
A Content-Analytic Approach
*Nathan F. Gillespiea and Gregory Coxa
a University at Albany, State University of New York, 1400 Washington Ave, Albany, NY 12222,
U.S.A. Email: Nathangillespie97@gmail.com
Author Note
Nathan F. Gillespie https://orcid.org/0000-0003-0447-445X
Gregory E. Cox https://orcid.org/0000-0002-0602-1545
*Correspondence concerning this article should be addressed to Nathan F. Gillespie,
Dept. of Psychology, 1400 Washington Avenue, Albany, NY 12222, USA. Telephone: 773-648-
0268. Email: Nathangillespie97@gmail.com
ANALYZING AUDITORY PERCEPTUAL EXPERIENCES 2
Abstract
Much research in auditory cognition focuses on quantitative outcome variables that cannot fully
capture individual differences in auditory perceptual experiences. We address this gap by using
topic modeling to investigate the different ways that people perceive and remember a set of
artificial timbre stimuli. We analyzed 779 written responses to three questions regarding: 1) how
people judged similarity between sounds; 2) how people remembered and recognized previously
heard sounds; and 3) how people formed impressions of the sounds they heard. Cross-validation
showed that 20 topics characterized the similarity responses, 16 characterized the recognition
judgements, and 30 characterized people’s impressions. Principal components analysis of the
topic distributions identified latent themes within each set of topics. Similarity strategies
clustered into three themes: Featural Separation, Impression Formation, and Listening Effort.
Recognition strategies clustered into five themes: Featural Irregularity, Contrast, Holistic
Processing, Timbral Impressions, and Featural Change. Impressions clustered into two themes:
Machinery, and Electricity. Results are analogous to the output of a traditional content analysis—
but were produced in a fraction of the time using a replicable methodology that can scale to large
datasets. Our work represents a new method for triangulating quantitative, qualitative, and
computational methods in auditory research.
Keywords: Natural language processing, auditory perceptual experiences, content
analysis, individual differences, methods, timbre
ANALYZING AUDITORY PERCEPTUAL EXPERIENCES 3
Natural Language Processing of Auditory Perceptual Experiences:
A Content-Analytic Approach
Introduction
People hear sounds differently. The great variety in what people find enjoyable about
music, for instance, is one example of the many ways in which listeners attend to various aspects
of auditory signals. Similarly, when engaged in conversation, different listeners may attend to
different aspects of a speaker's voice (Ishida et al., 2016). Many models of auditory cognition
(e.g., Bharucha & Todd, 1989; Margulis & Beatty, 2008; Pearce, 2018) focus on quantifying and
condensing perceptual experiences from the perspective of statistical learning and information
theory. This comes at a cost—while such models provide insight into certain aspects of auditory
information processing, they do so at the expense of individual differences that are not captured
by quantitative measures. Two questions underlie this problem: 1) “how can we use qualitative
data to help characterize individual differences in auditory cognition?”, and 2) “is there a way to
integrate qualitative and quantitative approaches to studying auditory cognition?”. Our work
highlights a method that addresses the first of these questions and makes it easier to address the
second. Focusing on timbre perception, we introduce a scalable methodology for analyzing free
response text data to characterize individual differences in how people hear sounds. In doing so,
we develop a framework for triangulating quantitative and qualitative methods for enhancing
experimental design and theory building in auditory cognition research.
One important aspect of an auditory signal is its “timbre”. Timbre encompasses a range of
acoustic features other than pitch, duration, and loudness—it is mainly, but not exclusively, a
function of how a listener perceives the relative strengths of the harmonic and non-harmonic
frequencies present in a signal (Randel, 1986). Timbral information facilitates how people
ANALYZING AUDITORY PERCEPTUAL EXPERIENCES 4
remember and relate different auditory events (e.g., the sound of a violin vs. the sound of a
piano, one person’s voice compared to another, etc.). Like pitch and loudness, timbre is a
perceptual property that emerges from the interactions between the acoustic features in an
auditory signal and an individual’s attentional, sensory, and cognitive capacities and biases. The
way people conceptualize and describe timbre reveals important information about how they
perceive it (Siedenburg et al., 2019).
Prior research indicates that listeners often perceive timbre semantically; people apply verbal
labels to sounds to make sense of them (e.g., von Bismarck, 1974; Zacharakis et al., 2014).
Indeed, a popular method for studying timbre perception involves asking people to rate sounds
using scales anchored with opposing adjectives (e.g., “full” vs. “empty”, etc.; Siedenburg et al.,
2019). Other methods include making similarity ratings for pairs of sounds and analyzing those
ratings using techniques like multidimensional scaling (Shepard, 1962). Although such methods
have provided a great deal of insight into how timbral information is represented
psychologically, the ways in which people conceptualize their auditory perceptual experiences
may carry a great deal of additional meaning that is not fully captured by rarefied quantitative
measures.
One shortcoming of the methods described above is that they render the experience of timbre
entirely in quantitative terms. This reduces the amount of information the data can communicate
about how listeners perceive timbre and why different people attend to different timbral features.
For instance, verbal rating scales are typically used to create empirical measures of the extent to
which a particular sound can be described by an adjective of interest. This approach assumes
people conceptualize the adjective under examination uniformly; everyone thinks of “roughness”
in the same way, for example. While this assumption might hold for words that have concrete
ANALYZING AUDITORY PERCEPTUAL EXPERIENCES 5
meanings, timbre-adjectives are often abstract and metaphorical (e.g., “dull”, “bright”,
“colorful”, etc.). This problem is particularly acute outside of a Western context, since
metaphorical conceptions of auditory features are extremely diverse and have very culturally
specific meanings (Silpayamanant, 2023). Moreover, the way people talk about timbral
information in real-life typically relies on extended descriptions that are part of a larger semantic
context; people describe their auditory experiences over the course of phrases and sentences. It is
this kind of extended description that is needed to understand timbre perception naturalistically.
Thus, the data produced by rating methods paint an incomplete picture of timbre perception and
may not generalize well beyond laboratory settings. How can timbre perception be measured
more naturalistically without sacrificing experimental control?
A straightforward method for assessing naturalistic timbre perception is to ask listeners to
describe auditory signals. For example, after a similarity rating task, one might ask people to
share the strategies they used to compare sounds to one another (e.g., Mobley et al., 2023). The
data could then be content analyzed to assess how aspects of the auditory signals were related to
one another for the purpose of making a similarity rating. This approach would provide insight
into qualitative differences in how people perceive a sound while retaining the information about
the context within which that sound was conceptualized.
Traditional content analytic methods are useful for interpreting text data in a naturalistic
fashion (Hsieh & Shannon, 2005). They enable a researcher to determine the latent factors that
contribute to the structure of a dataset by drawing directly on people’s experiences. Typically,
researchers conduct content analysis by asking multiple independent raters to read through the
entire dataset and identify themes that capture patterns of responses. Themes are compared
across raters to create a more reliable coding structure. These themes are then refined, partitioned
ANALYZING AUDITORY PERCEPTUAL EXPERIENCES 6
into sub-categories, and organized in a hierarchical structure, if possible. The results allow
researchers to describe a phenomenon naturalistically.
Traditional content analysis has two major limitations. First, it is highly subjective; the
results from a particular content analysis may not replicate from one lab group to another.
Second, it is time intensive. The responses must be read in their entirety and coded manually, one
by one. Raters must then compare their results, making many judgment calls along the way. This
makes content analysis infeasible for larger datasets. As a result, it is rarely employed in timbre
perception research (or auditory cognition, more broadly).
The path to developing more naturalistic theories of timbre perception thus appears to
have several obstacles. Specifically, the methods for studying qualitative differences in how
people conceptualize timbre remain underdeveloped—and while it would be relatively easy to
simply ask people to describe their perceptual experiences, the tools for analyzing the data are
time-consuming and vulnerable to bias. The goal of our work is to begin outlining some
solutions to these problems.
The Present Work
Our work proposes a novel method for collecting and analyzing naturalistic data from
experiments on timbre perception. We demonstrate an approach to gathering data that addresses
the phenomenological experience of timbre and provides tools for analyzing this data in a time-
effective manner. We do this by using computational models of semantics to content analyze data
from a series of studies on perception and memory for a set of artificial timbre stimuli. We use
probabilistic topic models (Blei, 2012) to extract organized clusters of terms (“topics”) that
characterize individual responses to questions about timbral information. We then use principal
ANALYZING AUDITORY PERCEPTUAL EXPERIENCES 7
components analysis to distill these topics into the smaller number of latent themes responsible
for generating the responses. Finally, we interpret these themes using qualitative content
analysis. This mixed-methods approach overcomes the practical limitations of traditional content
analysis while retaining its benefits. Our approach is flexible and can be used to content analyze
datasets at a scale not feasible for human raters.
The development of this method is still in its early stages, so the present paper serves
mainly as a proof of concept. We made some arbitrary, simplifying assumptions that may need
revision in future research. Nevertheless, these assumptions offer the advantage of allowing us to
implement the basic concepts of our approach in a straightforward manner. By laying out the
reasoning behind our choices, we hope to guide other researchers who wish to apply these
methods to their own data. In the remainder of our report, we describe the results of these
techniques and highlight implications for their use in auditory cognition research.
Method
Data
The data we analyzed were drawn from a series of studies on perception and memory for
a set of artificial timbre stimuli constructed via additive synthesis (Gillespie & Cox, in prep.).
Across three experiments, people listened to 5 second clips of these stimuli and completed two
behavioral tasks. The first task involved listening to sequentially presented pairs of stimuli and
making similarity ratings using an unmarked rating scale. The second task involved studying two
sequentially presented stimuli. Following the presentation of the second sound, a third sound was
played which either matched one of the first two or was another sound from the stimulus set.
People were asked to indicate whether the third sound matched one of the first two sounds
ANALYZING AUDITORY PERCEPTUAL EXPERIENCES 8
(“old”) or not (“new”). Following these tasks, people were asked free-response questions. After
the similarity rating task, people were asked: “what strategies did you use to compare the sounds
you heard when making similarity ratings?”. After the recognition task, people were asked:
“what strategies did you use to make your judgements about whether the sounds were old or
new?”. At the end of the entire experiment, people were asked: “in general, what did the clips
you heard in this study sound like to you?”.
The dataset consisted of 779 responses to these three free-response questions. Responses
ranged in length from 1 to 46 words, with an average length of 6.93 words (SD = 6.68). Of the
experiments that we drew from, all three contained the similarity rating paradigm, while only the
second two contained the recognition task and the question about people’s impressions of the
clips they heard. Thus, we ended up with 299 responses to the question about similarity rating
strategies, along with 240 responses to both the recognition strategy question and the impression
question.
The people who provided these responses came from a range of backgrounds. 219 were
undergraduate students from the University at Albany, SUNY, who participated in exchange for
course credit. 80 were adults from the United Kingdom, Canada, Ireland, and the United States
who were paid $5.00 to complete a survey on Prolific. Both groups participated after providing
informed consent in accord with local Institutional Review Board policy. Altogether, our sample
was comprised of 299 people (119 men, 174 women, 6 gender non-conforming) aged 18-75
years old (M = 24.15, SD = 11.97). 146 were White or Caucasian, 53 were Black, 40 were
Hispanic, 42 were Asian or Pacific Islander, 17 were Multiracial, and 1 was Indigenous or Native
American. 158 had at least 1 year of formal musical training.
ANALYZING AUDITORY PERCEPTUAL EXPERIENCES 9
After aggregating the data, we prepared it for analysis. This involved removing all
symbols, characters, and numbers from the responses. To facilitate later modeling, we converted
all letters to lowercase, eliminated extraneous symbols and whitespace from the responses, and
removed words found on the stop word list of the SMART system (Salton, 1971). We did not
attempt to correct potential misspellings for two reasons. First, an apparent misspelling may be
an instance of slang, informal language, or onomatopoeia; this makes interpretation ambiguous.
Second, our methods treat any misspelled terms as functionally equivalent to their correct
counterparts. This is because our analyses depend only on how a term is used in context—not
whether it may be found in a dictionary.
Materials
We accomplished our analyses using several packages in the R programming language.
We cleaned and restructured the data using the tm (Feinerer et al., 2023), dplyr (Wickham et al.,
2023), tidyr (Wickham et al., 2023), and tidytext (Silge & Robinson, 2016) packages. We
performed topic modeling using the topicmodels (Grün & Hornik, 2023), and quanteda (Benoit
et al., 2018) packages. We conducted principal components analysis using the psych (Revelle,
2023) package. We visualized the topic space using the LDAvis (Sievert & Shirley, 2022),
ldatuning (Nikita & Chaney, 2022), and ggplot2 (Wickham, 2016) packages.
Analytical Approach
We analyzed the data using probabilistic topic modeling (Blei, 2012). This style of
analysis belongs to a larger family of natural language processing techniques used to
automatically assign text responses to categories based on their content. A topic model assumes
that each person’s response to a question represents a mixture of a finite number of distinct
ANALYZING AUDITORY PERCEPTUAL EXPERIENCES 10
topics; a topic governs how often people use different words. The output of a topic model is both
the degree to which specific words are associated with specific topics and the degree to which
any given response is associated with those topics. For example, when asked about their
recognition strategy, one respondent might use words related to effort, describing how they
closed their eyes and paid close attention to what they were hearing (e.g., “I closed my eyes and
listened hard”). These words would be associated with Topic a. Another respondent might talk
about attending to pitches or overtones (e.g., “I paid attention to the overtones”)—these words
would be associated with Topic b. Finally, a third respondent might use words that mix effort and
attention to pitch, blending Topics a and b in their response (e.g., “I closed my eyes and listened
hard to the overtones”).
To determine the number of topics, we applied 10-fold cross validation to find the
number of topics necessary to meaningfully distinguish between the types of responses that
people gave without overfitting. Each of the three sets of responses was randomly divided into
10 subsets containing a roughly equal number of responses. For each iteration of 10-fold cross
validation, one of these subsets served as the “test” set while the other nine were used as
“training” data. We fit topic models to the training data with different numbers of topics ranging
from 1 to 300.
A fitted topic model is characterized by two matrices: a word-by-topic matrix that gives,
for each topic, the probability with which a word is used, and a topic-by-response matrix that
gives, for each response, the probability that that response is associated with each possible topic.
These matrices are constructed using Latent Dirichlet Allocation (Blei et al., 2012), which finds
matrices that maximize the posterior probability of the training data conditional on the number of
ANALYZING AUDITORY PERCEPTUAL EXPERIENCES 11
topics. Because more topics will always yield a higher probability for the training data, we
selected the number of topics by computing the “perplexity” of the left-out test data.
Perplexity is related to the expected log-probability of the test data and is analogous to
residual variance or entropy. It represents how poorly a model can account for variation in the
data. Thus, higher perplexity is worse. A model can have high perplexity either because it is not
sufficiently complex to account for the structured variation in how words are used across
responses (i.e., it fails to capture the “signal”) or because it is too complex and fits idiosyncratic
variation in the training data (i.e., “noise”) that does not generalize to the test data. For varying
numbers of topics between 1 and 300, we fit ten different topic models, one for each subdivision
of the data into training/test sets and selected the number of topics that resulted in the lowest
average perplexity on the test data. We repeated this procedure three times, once for the set of
responses to the similarity strategies question, once for the set of responses to the recognition
strategies question, and once for the set of responses to the impressions question.
Once we determined the number of topics that captured how people answered each of the
three questions, we reduced these topics to smaller groups of themes that captured sets of related
topics. We accomplished this by applying principal components analysis (PCA) to the word-by-
topic matrices from the topic model selected by our cross-validation procedure—since these
topics were not expected to align with an underlying construct a priori, PCA was more
appropriate than a common factor model extraction (i.e., exploratory factor analysis; Conway &
Huffcutt, 2003). We then used content analysis to interpret the resulting solutions.
The combination of topic modeling and principal components analysis allowed us to sort,
in order, the extent to which each of the topics loaded onto the components in the three PCA
solutions. We decided to investigate the top ten individual responses identified as being
ANALYZING AUDITORY PERCEPTUAL EXPERIENCES 12
characteristic of each topic that loaded highly onto a PCA component. We began by reading all
the responses associated with a particular topic from start to finish, as one might read a novel.
During this process, we highlighted words or phrases that best captured a key idea or theme. As
we worked through the first few responses, we limited these developing codes wherever
possible. Following open coding of the first several responses within each topic, we decided on
initial codes. Then, we coded the remaining responses within each topic using these codes,
adding new categories when responses did not map onto an existing code. We repeated this
process for all the topics that the algorithm identified as loading highly onto a PCA component.
Then, we re-examined all the data. The resulting codes allowed us to interpret and define each
topic. By looking at the pattern of topic loadings, we were able to discern the latent themes
represented by each of the PCA components. We have summarized the preceding analytical
method in Figure 1. [Figure 1 near here]
Results
Similarity Rating Strategies
We begin with the results from our analysis of the responses to the similarity strategies
question. 20 topics minimized perplexity in this set of responses (Figure 2). Based on the
frequency with which particular words were used, we determined the degree of overlap between
each of the 20 topics. We then applied PCA to the output of the cross-validation procedure to
reduce the dimensionality of the topics. This allowed us to identify latent themes within each set
of responses. [Figure 2 near here]
First, we conducted a parallel analysis and inspected the scree plot using the acceleration
rule to determine the appropriate number of components to extract. As Figure 2 shows, the
acceleration rule and parallel analysis both suggested that the 20 topics can be meaningfully
ANALYZING AUDITORY PERCEPTUAL EXPERIENCES 13
summarized in terms of two components, each of which represents what we call a “theme”.
Topics 8 and 14 showed strong positive loadings on one component; Topics 4, 15, and 20 also
loaded onto the same component, but in the negative direction. As such, this component seems to
reflect a tradeoff between these two sets of topics. Meanwhile, topics 10 and 18 loaded onto the
other component. The remainder of the topics did not load highly onto either component and
may reflect less prominent themes or sampling variability.
Content analysis of the top ten responses under topics 8 and 14 revealed words that
indicated a strategy of identifying specific acoustic features (e.g., pitches, overtones, vibrations)
and using those features to compare sounds. The top ten responses under topics 4, 15, and 20
contained words that indicated a strategy of forming an overall impression of the sounds and
using that impression to make similarity ratings (e.g., buzzing, cleanliness, grainy). Because
these two sets of topics loaded on the same component with opposite signs, we interpreted this
component as representing two themes which tradeoff with one another—namely, acoustic
feature separation (topics 8 and 14), and formation of a holistic impression (topics 4, 15, and 20).
The top ten responses under topics 10 and 18, which loaded onto the other component, contained
words related to focus (e.g., concentrated, paid attention, listened). We interpreted this second
component as representing a theme of increased effort. Examples of responses characterized by
each theme can be found in Table 1. [Table 1 near here]
Recognition Judgement Strategies
16 topics minimized perplexity in the set of responses to the question about recognition
strategies. Once again, we applied PCA to the output of the cross-validation procedure to reduce
the dimensionality of this solution. Parallel analysis and the acceleration rule suggested
extracting five components (Figure 2). Topic 8 loaded onto the first component, while topic 10
ANALYZING AUDITORY PERCEPTUAL EXPERIENCES 14
loaded onto the second. Topics 14 and 4 loaded onto the third component. Topics 3 and 2 loaded
onto the fourth component. Topic 6 loaded onto the fifth component. Topics 11 and 12 showed
high negative loadings onto component one and were analyzed together as a stand-alone
outcome variable. Topic 7 showed a high negative loading onto component 2. Topic 9 showed a
high negative loading onto component five.
Content analysis of the top ten responses under topic 8 (i.e., component one) revealed a
general strategy of comparing irregularities between sounds to guide recognition decisions. The
top ten responses under topic 10 (component two) revealed a strategy of comparing contrasting
features (e.g., high vs. low pitch) between sounds. A similar trend emerged for the top ten
responses under topics 14 and 4 (component three). The top ten responses under topics 3 and 2
(component four) indicated a strategy of forming a holistic impression of the sounds and using
that impression to guide recognition. The top ten responses topic 6 (component five) revealed a
strategy of cuing into timbral features, as did the responses under topics 11 and 12 (the topics
with high negative loadings onto component one) and the responses under topic 7 (the topic with
a high negative loading onto component two). Finally, the top ten responses under topic 9 (the
topic with a high negative loading onto component five) suggested a strategy of tracking featural
changes from one sound to the next.
A re-examination of these strategies led us to decide that several components shared the
same underlying characteristics. Thus, we determined that recognition judgements were guided
by five latent themes: Featural Irregularity (topic 8), Contrast (topics 10, 14, and 4), Holistic
Processing (topics 3 and 2), Timbral Impressions (topics 6, 11, 12, and 7), and Featural Change
(topic 9). Examples of responses that capture these themes are shown in Table 1.
ANALYZING AUDITORY PERCEPTUAL EXPERIENCES 15
Impressions
30 topics minimized perplexity in the set of responses to the question about people’s impressions
of the sounds from our stimulus set. We applied PCA to the output of the cross-validation
procedure to reduce the dimensionality of this solution. Parallel analysis and the acceleration rule
suggested extracting two components (Figure 2). None of the topics loaded positively onto
component one, however, topic 15 did show a strong negative loading. Topic 15 contained words
related to electric sounds (e.g., static, sine waves, electricity). We therefore interpreted it as a
stand-alone outcome variable that corresponded to a theme of electricity. Topics 12, 13, and 27
loaded onto the second component. These topics contained words that described mechanical
sounds (e.g., machinery, printing sounds, intercom). Thus, the themes that captured how people
formed their impressions of sounds from our stimulus set were: Electricity, and Machinery
(Table 1).
Discussion
We developed and applied a novel method for parsing naturalistic descriptions of
people’s auditory experiences. Our method used modeling techniques from natural language
processing to extract topics from a relatively large and unstructured set of text data from studies
on perception and memory for artificial timbre stimuli. We then used principal components
analysis to distill these topics into a smaller set of latent themes. We found that people made their
similarity ratings based on either attending to individual features or a holistic impression of each
auditory signal, and by increasing the attentional resources devoted to perceiving and encoding
sounds during the experiment. Meanwhile, people made their recognition judgements by
comparing featural irregularities between sounds, contrasting acoustic elements with one
another, forming overall impressions of the sounds, conceptualizing the sounds in terms of
ANALYZING AUDITORY PERCEPTUAL EXPERIENCES 16
timbral properties, and tracking changes in features from one sound to the next. Finally, people’s
overall impressions of the stimuli were characterized largely in terms related to electricity or
machinery.
Our methodology produces results that are comparable in form and interpretability to that
of a traditional content analysis, but with three distinct advantages. The first advantage is clarity;
the computational nature of our approach means that every step is laid bare. The second
advantage is replicability; the results of our methods can be easily reproduced by following the
same computational procedure—and choices along the way can easily be revisited and revised.
The third advantage is practicality; instead of a few human raters spending dozens or hundreds of
hours to identify major topics and themes, which may be too subjective to replicate, the present
computational approach takes only a few hours and can be applied to sets of text data far larger
than would be possible with traditional content analyses. We further consider each of these
aspects of our methodology as they may be applied more broadly in the study of auditory
cognition and perception.
The entirety of our analysis, from the selection of the number of topics to the
identification of principal components, was conducted in R. Thus, each step of the process can be
replicated in a straightforward fashion. Additionally, our methods make it easy to explore the
consequences of different choices at various stages of analysis. This differs from traditional
content analysis, which often relies on judgement calls and on-the-fly revisions that may not be
communicated in the final research report—and which may be difficult to replicate. While some
aspects of our work involve a degree of subjectivity (i.e., the interpretation of each PCA
component), these pieces are situated in a framework that is both empirically consistent and
grounded in the data.
ANALYZING AUDITORY PERCEPTUAL EXPERIENCES 17
The inherent transparency of our approach enables it to be more flexible. The outputs
from each step of the process can be used to add greater context and flavor to data. Depending on
the research question, one could easily retain PCA component scores, perplexity, and response
topic loading data to use as covariates or outcome measures in regression analysis, or other
quantitative models of auditory cognition. For instance, we have correlated the presence of the
themes in our text data with performance metrics in other tasks (e.g., d’, perceptual similarity,
etc.), personality assessments, and parameters of cognitive models to more richly capture the
nature of the perceptual experiences that people have when listening to the sounds in our
stimulus set. Of note, we found that people who scored highly on Featural Irregularity
(component 1 from the recognition strategies PCA solution) tended to perceive increased
similarity between sounds in the study list and test sounds, r = .14, p < .05. These people also
tended to have significantly greater self-reported auditory acuity, r = .15, p < .05. Additionally,
people with greater musical expertise scored higher on Holistic Processing (component 4 from
the recognition strategies PCA solution), r = .14, p < .05. Such analyses would not be possible
with traditional methods. Indeed, conventional content analysis is generally limited to descriptive
reports of the prevalence of a particular topic within a dataset. Thus, this method allows
researchers to draw more explicit connections between qualitative, individual differences in
auditory perception and empirical performance metrics.
Third, and perhaps most critically, this approach scales to datasets far beyond the limits
of traditional content analytic methods. Conventional content analysis of our dataset would have
likely taken between several weeks to several months to complete1, depending on the number of
independent raters. In contrast, our method produced results within a single afternoon. Moreover,
while the total number of individual responses in the present report was 779, topic modeling can
ANALYZING AUDITORY PERCEPTUAL EXPERIENCES 18
process datasets with hundreds of thousands of entries—a figure large enough to accommodate
practically all use cases one might encounter in datasets from cognitive experiments. This
method therefore allows researchers to content analyze an amount of naturalistic text data that
would not be feasible for human raters. In doing so, our approach can enable auditory scientists
to freely ask people about their auditory perceptual experiences without fear of the logistics
involved in analyzing the resulting data.
Conclusions
The present report describes a new computational method for processing and interpreting
naturalistic text data. We outline an analytical pipeline that can distill latent themes from large
sets of auditory perceptual data. The output from this approach is analogous to that of a
traditional content analysis but was produced in a fraction of the time. Although the tools
developed here were used to shed light on how people perceive and recognize timbral
information, they can be easily adapted to analyze text data about other phenomena. Thus, our
work provides a framework for triangulating quantitative, qualitative, and computational
methods to develop more integrative theories of auditory cognition.
ANALYZING AUDITORY PERCEPTUAL EXPERIENCES 19
References
Bharucha, J. J., & Todd, P. M. (1989). Modeling the perception of tonal structure with neural
nets. Computer Music Journal, 13(4), 44-53.
Benoit, K., Watanabe, K., Wang, H., Nulty, P., Obeng, A., Müller, S., Matsuo, A. (2018).
quanteda: An R package for the quantitative analysis of textual data. The Journal of Open
Source Software, 3(30), 774.
Blei, D. M. (2012). Probabilistic topic models. Communications of the ACM, 55(4), 77-84.
Conway, J. M., & Huffcutt, A. I. (2003). A review and evaluation of exploratory factor analysis
practices in organizational research. Organizational Research Methods, 6(2), 147-168.
Gillespie, N.F., Cox, G.E. (2023). Relating perception and memory for a novel set of
reconfigurable auditory stimuli: A noisy exemplar approach [Manuscript in preparation].
Department of Psychology, University at Albany, SUNY.
Grün B, Hornik, K. (2023). topicmodels: Topic models (Version 0.2-15).
https://CRAN.R-project.org/package=topicmodels.
Hsieh, H. F., & Shannon, S. E. (2005). Three approaches to qualitative content analysis.
Qualitative Health Research, 15(9), 1277-1288.
Ishida, M., Samuel, A. G., & Arai, T. (2016). Some people are “more lexical” than others.
Cognition, 151, 68-75.
Margulis, E. H., & Beatty, A. P. (2008). Musical style, psychoaesthetics, and prospects for
entropy as an analytic tool. Computer Music Journal, 32(4), 64-78.
Mobley, F. S., Bowers, G., Ugolini, M., Fox, E., & Gillespie, N. F. (2023). Modeling aircraft
similarity with musical auditory feature extraction. Applied acoustics, 214, 109689.
Nikita, M., Chaney, N. (2022). Tuning of the latent dirichlet allocation models parameters
(Version 1.0.2). https://github.com/nikita-moor/ldatuning.
ANALYZING AUDITORY PERCEPTUAL EXPERIENCES 20
Pearce, M. T. (2018). Statistical learning and probabilistic prediction in music cognition:
Mechanisms of stylistic enculturation. Annals of the New York Academy of Aciences,
1423(1), 378-395.
Salton, G. (1971). The SMART retrieval system—experiments in automatic document processing.
Prentice-Hall, Inc.
Siedenburg, K., Saitis, C., McAdams, S., Popper, A. N., & Fay, R. R. (Eds.). (2019). Timbre:
Acoustics, perception, and cognition (Vol. 69). New York: Springer.
Shepard, R. (1962) The analysis of proximities: Multidimensional scaling with an unknown
distance function. I. Psychometrika 27(2):125–140.
Sievert, C., Shirley, K. (2022). Interactive visualization of topic models (Version 0.3.2).
https://github.com/cpsievert/LDAvis.
Silge J., Robinson, D. (2016). tidytext: Text mining and analysis using tidy data principles in R.
The Journal of Open Source Software, 1(3).
Silpayamanant, J. (2023). Musical pitch is not "high" or “low" (Version 2). figshare.
https://doi.org/10.6084/m9.figshare.24002442.v2
Von Bismarck, G. (1974). Timbre of steady sounds: A factorial investigation of its verbal
attributes. Acta Acustica united with Acustica, 30(3), 146-159.
Wickham, H. (2016). ggplot2: Elegant graphics for data analysis. Springer-Verlag New York,
https://ggplot2.tidyverse.org.
Wickham, H., François, R., Henry, L., Müller, K., Vaughan, D. (2023). dplyr: A grammar of data
manipulation (Version 1.1.4). https://github.com/tidyverse/dplyr.
Wickham, H., Vaughan, D., Girlich, M. (2023). tidyr: Tidy messy data (Version 1.3.0).
https://github.com/tidyverse/tidyr.
ANALYZING AUDITORY PERCEPTUAL EXPERIENCES 21
Randel, D.M. (1986). Tone Color. In The “new” Harvard Dictionary of Music (6th ed. p. 863).
Revelle, W. (2023). psych: Procedures for psychological, psychometric, and personality research
(Version 2.3.12). https://CRAN.R-project.org/package=psych.
Zacharakis, A., Pastiadis, K., & Reiss, J. D. (2015). An interlanguage unification of musical
timbre: Bridging semantic, perceptual, and acoustic dimensions. Music Perception: An
Interdisciplinary Journal, 32(4), 394-412.
ANALYZING AUDITORY PERCEPTUAL EXPERIENCES 22
Footnote
1. We determined this estimate by asking four independent raters to content analyze the 240
responses to the question about recognition judgement strategies. This produced an
output similar to the one generated by our topic model, however, it took our raters
(dedicated undergraduate research assistants) three weeks to complete. If they were to
replicate this process to analyze the responses to the two other open-ended questions in
our dataset, it would likely have taken around three months total. This calculation does
not factor in time for training in content analytic methods, or other variables that would
lengthen the analysis timeline.
ANALYZING AUDITORY PERCEPTUAL EXPERIENCES 23
Table Caption
Table 1. Examples of Major Themes in Free-Response Questions
ANALYZING AUDITORY PERCEPTUAL EXPERIENCES 24
Figure Captions
Figure 1. Summary of Analytical Approach
Figure 2. Cross-Validation (leftmost column), Inter-Topic Distances (middle column), and PCA
Results (rightmost column) for Responses to Each Free-Response Question.
ANALYZING AUDITORY PERCEPTUAL EXPERIENCES 25
Table 1
Examples of Major Themes in Free-Response Questions
Question
Theme
Sample Responses
What strategies did you use
to compare the sounds you
heard when making
similarity ratings?
Featural Separation
➢ “The pitches and timbre and whether the more
percussive grindy sound was present”
➢ “I tried creating a comparison using my hands of
where the pitch sounded”
Impression Formation
➢ “I tried to describe each sound (e.g., long buzz, high to
low, two sounds)”
➢ “Tried to sound them like a humming noise”
Listening Effort
➢ “Shut my eyes and tried to hear”
➢ “I put my ear to the mic”
What strategies did you use
to make your judgements
about whether the sounds
were old or new?
Featural Irregularity
➢ “Taking note of irregularities [or] weird pitches in the
notes”
➢ “Concentration mainly on pitch and tone differentials”
Contrast
➢ “How strong and fast each sounds vibration [is] and
how high or low each sound is”
➢ “There was a distinction in the texture of the tones”
Holistic Processing
➢ “Trying to hold the noise to the front of my memories”
➢ “Determining if I thought the new sound was at all
similar to the old ones”
Timbral Impressions
➢ “How smooth and rough each sound was”
➢ “If the sound felt heavy or had sharpness to it”
Featural Change
➢ “Determine[d] if the second sound was higher or
lower and then see if the 3rd sound reverted back in
the opposite direction”
➢ “Tried to memorise the sounds and match the volume
and pitches”
In general, what did the
clips you heard in this study
sound like to you?
Machinery
➢ “Sounded like one of those buzzers you get on the
intercom for entry into a flat or apartment”
➢ “Electrical razors”
Electricity
➢ “Sine waves being distorted”
➢ “They sounded like electricity generators”
ANALYZING AUDITORY PERCEPTUAL EXPERIENCES 26
Figure 1
Summary of Analytical Approach
ANALYZING AUDITORY PERCEPTUAL EXPERIENCES 27
Figure 2
Cross-Validation (leftmost column), Inter-Topic Distances (middle column), and PCA
Results (rightmost column) for Responses to Each Free-Response Question.
Note. Each row corresponds to one of three questions about: 1) the strategies people used to
make similarity ratings between pairs of sounds; 2) the strategies people used to make their
“old”/ “new” recognition judgements about a target sound; and 3) how people formed their
impressions of the sounds they heard. Each point in the leftmost column is the perplexity from
one of the 10 runs of cross-validation with the blue line being the average. The middle column
plots topics by their score on the top 2 principal components. The plots in the rightmost column
indicate the number of components to retain based on the point at which the blue "X"s stay
above the red line.