ArticlePDF Available

Information content of note transitions in the music of J. S. Bach

Authors:

Abstract and Figures

Music has a complex structure that expresses emotion and conveys information. Humans process that information through imperfect cognitive instruments that produce a gestalt, smeared version of reality. How can we quantify the information contained in a piece of music? Further, what is the information inferred by a human, and how does that relate to (and differ from) the true structure of a piece? To tackle these questions quantitatively, we present a framework to study the information conveyed in a musical piece by constructing and analyzing networks formed by notes (nodes) and their transitions (edges). Using this framework, we analyze music composed by J. S. Bach through the lens of network science, information theory, and statistical physics. Regarded as one of the greatest composers in the Western music tradition, Bach's work is highly mathematically structured and spans a wide range of compositional forms, such as fugues and choral pieces. Conceptualizing each composition as a network of note transitions, we quantify the information contained in each piece and find that different kinds of compositions can be grouped together according to their information content and network structure. Moreover, using a model for how humans infer networks of information, we find that the music networks communicate large amounts of information while maintaining small deviations of the inferred network from the true network, suggesting that they are structured for efficient communication of information. We probe the network structures that enable this rapid and efficient communication of information—namely, high heterogeneity and strong clustering. Taken together, our findings shed light on the information and network properties of Bach's compositions. More generally, our simple framework serves as a stepping stone for exploring further musical complexities, creativity, and questions therein. Published by the American Physical Society 2024
Content may be subject to copyright.
PHYSICAL REVIEW RESEARCH 6, 013136 (2024)
Editors’ Suggestion Featured in Physics
Information content of note transitions in the music of J. S. Bach
Suman Kulkarni ,1,*Sophia U. David,2,3Christopher W. Lynn,4,5and Dani S. Bassett1,2,6,7,8,9,
1Department of Physics & Astronomy, College of Arts & Sciences, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA
2Department of Bioengineering, School of Engineering & Applied Science, University of Pennsylvania,
Philadelphia, Pennsylvania 19104, USA
3Department of Psychology, Yale University, New Haven, Connecticut 06520, USA
4Initiative for the Theoretical Sciences, Graduate Center, City University of New York, New York, New York 10016, USA
5Joseph Henry Laboratories of Physics, Princeton University, Princeton, New Jersey 08544, USA
6Department of Electrical & Systems Engineering, School of Engineering & Applied Science, University of Pennsylvania,
Philadelphia, Pennsylvania 19104, USA
7Department of Neurology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA
8Department of Psychiatry, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA
9Santa Fe Institute, Santa Fe, New Mexico 87501, USA
(Received 6 September 2023; accepted 6 December 2023; published 2 February 2024)
Music has a complex structure that expresses emotion and conveys information. Humans process that
information through imperfect cognitive instruments that produce a gestalt, smeared version of reality. How
can we quantify the information contained in a piece of music? Further, what is the information inferred by a
human, and how does that relate to (and differ from) the true structure of a piece? To tackle these questions
quantitatively, we present a framework to study the information conveyed in a musical piece by constructing
and analyzing networks formed by notes (nodes) and their transitions (edges). Using this framework, we analyze
music composed by J. S. Bach through the lens of network science, information theory, and statistical physics.
Regarded as one of the greatest composers in the Western music tradition, Bach’s work is highly mathematically
structured and spans a wide range of compositional forms, such as fugues and choral pieces. Conceptualizing
each composition as a network of note transitions, we quantify the information contained in each piece and
find that different kinds of compositions can be grouped together according to their information content and
network structure. Moreover, using a model for how humans infer networks of information, we find that the
music networks communicate large amounts of information while maintaining small deviations of the inferred
network from the true network, suggesting that they are structured for efficient communication of information.
We probe the network structures that enable this rapid and efficient communication of information—namely,
high heterogeneity and strong clustering. Taken together, our findings shed light on the information and network
properties of Bach’s compositions. More generally, our simple framework serves as a stepping stone for exploring
further musical complexities, creativity, and questions therein.
DOI: 10.1103/PhysRevResearch.6.013136
I. INTRODUCTION
From Tibetan throat singing to Scottish piobaireachd to
modern hip-hop, music is a universal aspect of human culture,
enjoyed by people of all ages from all around the world.
It has even been proposed that music is a fundamental part
of being human [1]. Though styles, sounds, and instruments
vary drastically from one culture and time period to another,
it is indisputable that music has had a substantial impact on
the development of humans and society [2,3]. Through music
we can tell stories [4], convey messages [5], and imbue the
*Corresponding author: sumank@sas.upenn.edu
Corresponding author: dsb@seas.upenn.edu
Published by the American Physical Society under the terms of the
Creative Commons Attribution 4.0 International license. Further
distribution of this work must maintain attribution to the author(s)
and the published article’s title, journal citation, and DOI.
strongest of emotions [68]. It is a common human experience
to feel pensive or despondent after hearing a slow song in a
minor key or to feel carefree or energized after hearing an up-
beat song in a major key. But how does something as abstract
as music communicate so much? Past literature has discussed
music in terms of expectation and surprise [913]. To be
evolutionarily successful, our brains are adept at forming ex-
pectations based on prior events. When these expectations
are contradicted by an experience, we feel surprised. With
surprise can come a host of other emotions: We may feel
relief when the dissonant sound we expected was actually
consonant or we may feel distress when the musical resolution
we expected did not occur [1216]. But how do we quantify
these expectations and surprises? How do we mathematically
formalize and measure the information conveyed by a piece
of music? Fundamentally, music is comprised of fleeting and
elusive sounds, and hence may appear hard to measure.
Here, we seek to extract order from music’s complexity
by examining music through the lens of network science. A
2643-1564/2024/6(1)/013136(17) 013136-1 Published by the American Physical Society
KULKARNI, DAVID, LYNN, AND BASSETT PHYSICAL REVIEW RESEARCH 6, 013136 (2024)
network consists of nodes and edges—representing entities
and the connections between them, respectively. Conceptual-
izing each note as a node and each transition between two
notes as an edge, we can build a network for any piece
of music [1721]. This representation enables us to use
physics-based approaches to quantitatively analyze aspects of
a musical piece. Using music networks, we build a framework
to study the information conveyed by a piece and apply this
framework to provide a comprehensive analysis of Bach’s
compositions. Bach is a natural case study given his prolific
career, the wide appreciation his compositions have garnered,
and the influence he had over contemporaneous and subse-
quent composers. His diverse compositions (from chorales to
fugues) for a wide range of musicians (from singers to orches-
tra members) often share a fundamental underlying structure
of repeated—and almost mathematical—musical themes and
motifs. These features of Bach’s compositions make them par-
ticularly interesting to study using a mathematical framework.
As we listen to music, we form expectations [9,11,12,15].
Upon hearing a particular note, we anticipate which notes
might come next based on past transitions. The less likely
the outcome, the more surprised we are upon hearing it.
This “suprisal” can be quantified by the Shannon information
entropy [22]. Ideas from information theory have led to illumi-
nating insights in a wide range of settings, including language
[23,24], social networks [25,26], transportation patterns [27],
and music [28,29]. We draw upon these ideas to quantify
the information present in the music networks. While prior
research has attempted to quantitatively identify patterns and
features present across different kinds of music [19,3032],
understanding how humans perceive these patterns is more
nuanced and complex than simply evaluating the structure
of compositions because humans are not perfect learners.
Rather, studies have consistently found that humans assimilate
patterns of information presented to them through imperfect
perceptual systems, resulting in slightly inaccurate representa-
tions of transition structures [3336]. This observation raises
interesting questions about the information that is perceived
by a human; in particular, how does the inferred structure
relate to, and differ from, the true structure of a musical piece?
Further, are there any patterns in music that particularly shine
through the messy process of human perception and, if so,
how do these patterns vary across different kinds of music?
While these questions are nuanced and can depend on factors
like level of training [3740], musical cultural conditioning
[38,4144], and even language [4548], recent advances in
the study of how humans learn networks of information offer a
valuable framework to address these questions [34,35,4951].
Here we draw upon ideas from network science, infor-
mation theory, and cognitive science to build a framework
to investigate the information conveyed by music. We then
use this framework to provide a systematic analysis of music
composed by J. S. Bach. We begin in Sec. II with a discussion
of how music can be represented as a network along with
details of the compositions analyzed in our work. Next, in
Sec. III, we study the information present in the networks.
We find that Bach’s music networks contain more information
than expected from typical (or random) transition structures.
Strikingly, we also find that certain composition forms are
clustered together based on their information content. We
investigate how the network structure influences information
content and show that the higher information in these music
networks and the differences observed across musical pieces
within each compositional form can be explained by the het-
erogeneity in node degrees (or the number of distinct pitches
that follow a given note). Next, in Sec. V, we use a computa-
tional model for how humans learn networks of information
to examine how closely the inferred transition structure of a
piece aligns with the true network structure. We hypothesize
that the music networks maintain a low deviation between
the inferred and true network, and this property is driven
by tight clustering in the network. Additionally, we find that
certain compositional forms can be distinguished based on the
discrepancies between the original and the inferred network.
Together, our framework introduces a fresh perspective on
music, and sheds light on properties of Bach’s music. By per-
forming a systematic study of how information in a complex
system, like music, is structured and perceived by humans, our
paper provides insights on human creativity and how humans
experience the world around them. Our study also opens up
numerous interesting directions for further inquiry, which we
outline in Sec. VII. Additionally, we emphasize the limitations
of our analysis and discuss how future work could improve
upon this paper to incorporate more realism in Sec. VII.
II. MUSIC AS A NETWORK OF NOTE TRANSITIONS
We note that there have been previous efforts in construct-
ing and analyzing different network representations of music
[1721]. In our paper, we focus on investigating the informa-
tion conveyed by note transitions in music and begin with a
basic representation of the note transitions. We study a wide
range of Bach’s compositions, including preludes, fugues,
inventions, cantatas, English suites, French suites, chorales,
Brandenburg concertos, toccatas, and concertos. The audio
files for these pieces were collected and read in MIDI format,
from which the sequence of notes was extracted (see Methods
Sec. A1 for further details on each compositional type and
the sources for each piece). Each note present in a piece is
represented as a node in the network, with notes from different
octaves represented as distinct nodes. The transitions between
notes are calculated separately for different instruments. If
there is a transition from note ito note j, then we draw a
directed edge from node ito node j(see Fig. 1). For chords,
where multiple notes occur at the same time, edges are drawn
between all notes in the first chord to all notes in the second
chord. To simplify our analysis, we remove any self-loops in
the network, thereby restricting ourselves to understanding the
structure of transitions to the next different note in the piece.
We begin by examining unweighted networks of note tran-
sitions to focus on how the network structure alone impacts
the information content and perception of a musical piece.
After understanding the skeleton of the transitions, we then
add weights to the edges based on how frequently various
transitions occur. This procedure allows us to disentangle the
effects of the network structure (comprising the set of pos-
sible note transitions) and edge weights (comprising the note
transition probabilities). Examples of note transition networks
constructed using our simplified representation are illustrated
in Fig. 2. Although our emphasis has been on building a
013136-2
INFORMATION CONTENT OF NOTE TRANSITIONS IN PHYSICAL REVIEW RESEARCH 6, 013136 (2024)
FIG. 1. An example of a network constructed from a musical
piece using the method described in our paper. At the top, we show
a toy musical piece. Below, we show the network in which notes
are nodes and transitions between notes, whether isolated or played
simultaneously as part of a chord, are directed edges. The direction
of the edge matches the temporal direction of the transition.
FIG. 2. Examples of note transition networks derived from select
musical pieces: (a) Chorale BWV 437, (b) Fugue 11 BWV 856, from
the Well Tempered Clavier Book I, (c) Prelude 9 BWV 878, from the
Well Tempered Clavier Book II, and (d) Toccata BWV 916. Here
we display the largest connected component of each network. The
node size and color is based on the sum of its in and out degrees,
such that nodes with a higher degree are larger in size and lighter
in color, while nodes with a lower degree are smaller in size and
darker in color. The edge thickness indicates the relative frequency
of transitions.
FIG. 3. The model of information production using random
walks. (a) An example of a random walk on the network of note
transitions is shown using the blue dotted line. At each node, the
walker chooses an outgoing edge to traverse, each weighted with
equal probability. This walk generates a sequence of notes as shown
below. (b) The amount of information, or the entropy, generated
when a walker traverses an edge from a node depends on the degree
of the node. When traversing nodes with a high versus low degree,
the walker has more choices for which edge to pick and, hence, such
a transition generates more information. Thus, nodes with a higher
degree (right) are said to have higher entropy than nodes with a low
degree (left). (c) To calculate the entropy of the entire network, one
needs to weigh the contribution of each node by the probability that
a walker will occupy it. For networks with the same average degree,
those with a wider range of degrees (right) have a higher entropy than
those with a narrower range of degrees (left).
basic representation of the note transitions present in a mu-
sical piece, it is important to highlight the potential to extend
this representation to capture other essential aspects of music
(such as timbre, rhythm, duration of the notes, as well as tech-
niques like counterpoint). We expand on how future efforts
could incorporate more musical realism and complexity in
Sec. VII.
III. QUANTIFYING THE INFORMATION IN NETWORKS
We seek to measure the amount of information produced
by a sequence of notes. Although note sequences can have
long-range temporal dependencies [52,53] and higher order
structures [54,55], as a first analytical step, we focus on the
Markov transition structure. That is, we study the information
contained in individual note transitions. This information is
quantified by the Shannon entropy of a random walk on the
network [22,56] (Fig. 3; see also the Methods Sec. A2 for
further details). Given a network of transitions, the contribu-
tion of the ith node to the entropy can be written in terms of
013136-3
KULKARNI, DAVID, LYNN, AND BASSETT PHYSICAL REVIEW RESEARCH 6, 013136 (2024)
(a) (b) (c)
FIG. 4. Quantifying the information of Bach’s music using the entropy of random walks on networks of note transitions. (a) Entropy of
Bach’s music networks (Sreal) compared with random networks of the same size (Srand). We report the entropy of the corresponding random
networks after averaging over 100 independent realizations. The error bars for Srand indicate the standard error of the sample. (b) The entropy of
Bach’s music networks (Sreal) compared with random networks that preserve the in and out degrees of each node (Sdeg). We report the entropy
of the corresponding degree-preserving random networks after averaging over 100 independent realizations. The error bars for Srand indicate
the standard error of the sample. (c) The entropy of the chorales as a function of the average in-degree heterogeneity Hin =Var ( kin )/kin (top)
and out-degree heterogeneity Hout =Va r ( kout )/kout(bottom) of the networks. In (a) and (b), each data point represents a single piece. Color
and marker indicate the type of piece, as shown in the legend. The dashed line represents the line y=x. In (c), the dotted line indicates the
best linear fit, the reported rsvalue is the Spearman correlation coefficient, and the pvalue is obtained by performing a permutation test.
the entries of the transition probability matrix Pas
Si=−
j
Pij log Pij.(1)
In the case of directed unweighted networks, Pij =1/kout
i,
where kout
iis the out degree of the node. Here, the base of
the logarithm is 2. Hence, for unweighted networks, the node-
level entropy is Si=log (kout
i), which is solely determined by
the out degree.
To calculate the entropy of the entire network, the con-
tributions of the nodes are weighted by their stationary
distribution—the probability that a walker ends up at node i
after infinite time—which we denote by πi[56]. The entropy
of the network is then
S=
i
πiSi=−
i
πi
j
Pij log Pij.(2)
For undirected and unweighted networks, the stationary
distribution has a simple analytical form πi=ki/2E, where
kiis the degree of node i, and Eis the total number of edges.
The network entropy is then
S=1
2E
i
kilog ki.(3)
By contrast, for directed networks the stationary distribution
depends on the detailed structure of the network and cannot
be written in closed form. Hence, for our directed music
networks, we calculate the stationary distribution numerically
and use Eq. (2) to compute the entropy of each piece.
To understand the amount of information produced by
the music networks, we compare them to randomized (or
“null”) networks of the same size; that is, networks with
the same number of nodes and edges (see Methods Sec. A5
for details on generating null networks). This helps develop
an intuition for the amount of information that networks of
the same size typically contain. If the note transitions in the
music networks do have distinct properties that allow them to
communicate a large amount of information, then we would
expect Bach’s networks to contain more information than
the null transition structures. By averaging over 100 random
networks for each piece, we find that the real networks gen-
erally have consistently higher entropy—thereby containing
more information—than their random counterparts [Fig. 4(a)].
Moreover, by comparing across pieces, we observe that the
different kinds of compositions cluster together based on their
entropy. The chorales, typically meant to be sung by groups in
ecclesiastical settings, are shorter and simpler diatonic pieces
that display a markedly lower entropy than the rest of the
compositions studied. By contrast, the toccatas, character-
ized by more complex chromatic sections that span a wider
melodic range, have a much higher entropy. It is possible
that the chorales’ functions of meditation, adoration, and sup-
plication are best supported by predictability and hence low
entropy, whereas the entertainment functions of the toccatas
and preludes are best supported by unpredictability and hence
high entropy. In the Supplemental Material [57], we perform
comparisons for other information sources as well to provide
more context to our result.
We know that the node-level entropy is defined only by the
out degrees of the nodes. Accordingly, it is useful to assess
differences between the true networks and others wherein the
node-level entropies have been fixed by preserving the true
degree distribution. To perform this assessment, we compare
the entropy of the real networks with another set of null
models: randomized networks which preserve both the in- and
out-degree of each node (see Methods Sec. A5 for details
013136-4
INFORMATION CONTENT OF NOTE TRANSITIONS IN PHYSICAL REVIEW RESEARCH 6, 013136 (2024)
on generating these networks). We observe that the entropies
of the networks are more or less preserved [see Fig. 4(b)].
Although this preservation is expected for undirected net-
works (where the entropy is determined only by the degree
distribution), it need not exist for directed networks (where
the different stationary distributions contribute to the entropy).
We therefore find that the entropy of music networks is pri-
marily determined by their degree distributions rather than
their stationary distributions.
To gain intuition for how the entropy of note transitions de-
pends on network structure, consider the case of unweighted
and undirected networks. The network entropy takes a partic-
ularly simple form, as shown in Eq. (3). Following a Taylor
expansion around the average degree of the network (see
Methods Sec. A2), one obtains
S=logk+Va r ( k)
2k2+..., (4)
where kis the average degree of the network and Var(k)
is the variance of the degrees. To first order, we see that the
entropy increases logarithmically with the average degree of
the network. To second order, the entropy increases with the
variance or the heterogeneity of the degrees, such that more
information will be produced by networks with heterogeneous
(or broader) degree distributions. We define the degree hetero-
geneity as
H=Va r ( k)
k2.(5)
Many networks that we encounter in our daily lives are
characterized by heterogeneous degree distributions, typically
with few high-degree “hub” nodes and many low-degree
nodes [5860]. By contrast, regular graphs—which have ho-
mogeneous degrees—produce random walks with the least
entropy [see Fig. 3(c)].
Where does Bach’s music fall along this spectrum? We
found in Fig. 4(a) that the music networks analyzed have
consistently higher entropy than null networks with the same
number of nodes and edges (in other words, randomized net-
works with the same average degree). In the Supplemental
Material [57], we show that this higher information content
of Bach’s music networks is due to higher heterogeneity in
their in- and out-degree distribution; that is, the music net-
works are more heterogeneous in their degrees than expected
from transition structures of their size, enabling them to pack
more information into their structure. Since we have focused
our analysis on the first-order sequential relationships among
notes, which are likely common across different kinds of
music, we expect this result to generalize for other kinds of
music as well.
In Fig. 4(a), we also observe that various pieces belong-
ing to certain compositional forms were clustered together
in their entropy. Consistent with this observation, we find
that the pieces which are clustered together in their entropy
have very similar degrees (see Supplemental Material [57]).
Examples include English suites, French suites, and chorales.
In contrast, fugues did not cluster together in their entropy
as much as other composition types and displayed diverse
average degrees. For the compositions that are grouped to-
gether in their entropy, we find that the differences observed
among the pieces in the group can be explained by their
degree heterogeneity (see Supplemental Material [57]). We
can, for example, see this relation in the chorales where the
pieces which have a higher in- and out-degree heterogeneity
tend to have a higher entropy, despite having similar de-
grees [Fig. 4(c)]. To quantitatively verify this relationship,
we compute the Spearman correlation coefficient between the
network entropy and the in- and out-degree heterogeneity in
Fig. 4(c). This coefficient assesses the monotonicity between
two data sets, and ranges from 1 to 1, with 0 implying
no correlation. The positive correlation, as observed in the
figure, suggests that the entropy tends to increase with in- and
out-degree heterogeneity. For additional details on computing
this correlation and the associated pvalue, refer to Methods
Sec. A7. Lastly, we note that this positive monotonic rela-
tionship between the entropy and degree heterogeneity holds
even in our data set of directed networks, likely because the
in- and out-degrees tend to be correlated.
IV. HOW HUMANS PERCEIVE NETWORKS
OF INFORMATION
A key aspect of human communication involves receiving
and assimilating information in the form of interconnected
stimuli—ranging from sequences of words in language and
literature to melodic notes of a musical piece, and even ab-
stract concepts. Humans assimilate this information and build
representations of the underlying structure of inter-item rela-
tionships, as depicted in Fig. 5(a). As noted earlier, humans
build these internal network models using imperfect cognitive
instruments that result in slightly distorted versions of true
network structures. The information that is perceived by a
human is the sum of the information present in the system
and the inaccuracies that stem from the imperfect cognitive
processes involved in perception [21]. In the previous section,
we focused on quantifying the actual information present in
the system (see Fig. 3). We will now account for the second
piece: the inaccuracies that arise due to the imperfect cogni-
tive process of perceiving information (see Fig. 5).
To understand how humans learn and represent transition
structures, researchers have conducted a number of exper-
iments and introduced a range of models describing how
humans internally construct transition networks [35,36,49,61
63]. Several of these studies and models consistently high-
lighted a shared principle: Humans tend to integrate transition
probabilities over time, relating items that are adjacent to
each other as well as those separated by transitions of longer
lengths—two, three, and beyond—with a declining likelihood
as the transition distance increases [21,36,62,64]. Mathemati-
cally, we can capture this temporal integration and express the
inferred transition structure ˆ
Pin terms of the true transition
structure Pas
ˆ
P=
t=0
f(t)Pt+1,(6)
where f(t) is a decreasing function of tsuch that
longer-distance associations contribute less to an individual’s
network representation. This fuzzy temporal integration al-
lows for lower computational costs and better generalizations
about new information at the cost of accuracy. Here, we focus
013136-5
KULKARNI, DAVID, LYNN, AND BASSETT PHYSICAL REVIEW RESEARCH 6, 013136 (2024)
FIG. 5. How humans process networks of information. (a) A key
aspect of human communication involves receiving and assimilating
information in the form of interconnected stimuli. Humans assimilate
patterns of information presented to them through imperfect percep-
tual systems, which results in slightly inaccurate internal models of
the underlying transition structure. (b) When forming internal net-
work models of the world, humans strike a balance between accuracy
and complexity. The parameter ηquantifies this trade-off between
accuracy and cost. In panel (i), we see the example network built
when solely maximizing the accuracy (η0), which forms a per-
fect representation of reality. However, building this network requires
perfect memory and is computationally expensive. In panel (iii), we
see the network built when solely minimizing the computational cost
(η1), in which all nodes are connected to all other nodes, unlike
the original network. Constructing this network does not require sig-
nificant cost, but it provides no accuracy in representing the original
information. Humans tend to display intermediate values of η=0.80
[21], thereby constructing networks that preserve some but not all of
the true transition structure, as shown in panel (ii). Figure adapted
with permission from Ref. [51].
on one such model which captures this temporal integration
and inaccuracies in perception [21,35]. The model postulates
that when constructing internal network representations of
information, humans aim to maximize the accuracy of their
internal representation while simultaneously minimizing the
computational cost required for its construction [21,35,51,65].
The learned transition probabilities ( ˆ
P) can then be written in
terms of the true transition probabilities (P)asfollows:
ˆ
P=(1 η)P(IηP)1,(7)
where η[0,1] captures the errors in representation. A de-
tailed derivation of this expression is provided in Methods
Sec. A3.
To gain intuition for this model, it is useful to consider
the two extreme limits of η. In the limit where η0, the
inferred network is exactly the same as the true network ( ˆ
P=
P). This scenario corresponds to that wherein the network is
learned with no errors, forming an exact representation of
the transitions [Fig. 5(b)(i)]. However, exactly learning the
network is computationally expensive in that it requires per-
fect memory. By contrast, in the limit where η1, the
inferred network connects each node to every other node, and
all of the structure is lost [Fig. 5(b)(iii)]. Such a representation
is efficient to learn, but completely disregards accuracy. It
turns out that most humans do something in between these
two extremes by recalling the sequence of transitions some-
times accurately and sometimes inaccurately, thereby forming
a fuzzy perception of the true network [Fig. 5(b)(ii)]. For-
mally, the competition between computational complexity and
accuracy can be captured by a free-energy model of people’s
internal representation [35], and the inferred network under
this model can be written in terms of the true network us-
ing Eq. (7). We emphasize the similarity of this form across
multiple different theories of cognition [34,49,50]. By relating
the inferred transition structure to the true network structure,
this framework enables one to explore questions about the
information that a human perceives from a given network.
Given our interest in such questions in the context of music,
we use this model to compute the inferred network for each
musical piece. We emphasize that several empirical studies
of musical expectancy have highlighted the role of statistical
learning as a key mechanism, alongside other factors, in mu-
sical expectancy and knowledge acquisition [12,6670].
For the rest of our discussion, we use the term “inferred
network” on its own to refer to the network calculated using
the model of perception discussed above. Prior work indicates
that, on average, humans display an η=0.80 in large-scale
online laboratory experiments [21]. Given a network of note
transitions with transition probabilities (P), we use this em-
pirically measured value to calculate the inferred network
(ˆ
P)usingEq.(7). In the context of music, it is important
to recognize that the inferred structure would naturally ex-
hibit variations, potentially influenced by factors such as an
individual’s level of training or cultural conditioning [4144].
Nonetheless, this framework provides interesting insights re-
garding the types of structures that could be considered more
effective in accurately communicating information, while tak-
ing into account the limitations of human perceptual systems.
We provide a discussion of how future research could expand
upon our research and improve the study of information per-
ception in music in Sec. VII.
V. QUANTIFYING DISCREPANCIES
IN THE PERCEPTION OF MUSIC NETWORKS
We are now prepared to investigate the extent to which the
inferred music networks deviate from their true structure. Net-
works that display a low deviation between the inferred and
true structure can be regarded as more effective in accurately
communicating information. Hence, this framework provides
insight into the communicative success of a network, from the
point of view of how the network interacts with our imper-
fect perceptual systems. Mathematically, one can quantify the
013136-6
INFORMATION CONTENT OF NOTE TRANSITIONS IN PHYSICAL REVIEW RESEARCH 6, 013136 (2024)
deviations between the inferred network ( ˆ
P) and the original
network (P) using the Kullback-Leiber (KL) divergence:
DKL(P|| ˆ
P)=−
i
πi
j
Pij log ˆ
Pij
Pij
,(8)
where πiis the stationary distribution of the original network.
The lower the KL-divergence, the closer the network is to
the true network, and hence the network can be considered
more effective in communicating information accurately. Do
Bach’s musical compositions possess distinct features that
result in smaller discrepancies in their perceived structure?
How do pieces differ in these discrepancies? What are the
structural differences between the musical pieces that lead to
such differences?
To answer these questions, for each musical piece, we
compute the KL divergence between the true transition prob-
abilities Pand the inferred transition probabilities ˆ
P. Then, to
understand whether these music networks do indeed maintain
low discrepancies in their inferred structure, we compare them
against random networks with the same number of nodes and
edges. The data confirms our intuition [Fig. 6(a)]: Bach’s
music networks have a lower KL divergence than random
networks of the same size. Even if we compare against null
networks with the same in- and out-degree distributions, we
still see that the music networks have a lower KL divergence
[Fig. 6(b)]. This finding suggests that the lower KL diver-
gence of these networks cannot be explained by their degree
distributions alone. Additionally, we observe interesting varia-
tions in the KL divergence among the different compositional
forms (Fig. 6). The chorales, at one extreme, seem to have
the highest KL divergence, while the preludes and toccatas
have the lowest KL divergence. In what follows, we attempt
to identify and interpret the network properties that underlie
the observed variations in the discrepancies of the inferred
information across compositional forms and pieces.
A. Transitive clustering coefficient
As seen in the previous section, the discrepancies in the
inferred transition structure for the music networks could
not be explained by the distribution of degrees alone. For
undirected networks, prior research has demonstrated that the
KL divergence between the inferred and true transition struc-
tures decreases with an increase in the density of triangles
within the network [21]. This relationship can be demon-
strated by substituting the expression for the inferred version
of a network [Eq. (7)] into the equation for the KL diver-
gence [Eq. (8)]. We now extend this analysis to our directed
networks, with the aim of generalizing this finding. By per-
forming this substitution, we derive the subsequent expression
for the KL divergence in terms of the original network’s adja-
cency matrix (A):
DKL(P|| ˆ
P)=−log(1 η)η
ln 2
i
πi
×
j
Aij
l
1
kout
i
Ail
1
kout
l
Alj
+O(η2).
(9)
Here we see that the KL divergence depends on a product
of the form AijAilAlj, which quantifies the transitive rela-
tionships present in the network. More explicitly, it depends
on the number of directed triangles of the form ijk
and ik.
To quantify the extent to which a network has clusters
of this form, we introduce a measure termed the transitive
clustering coefficient of the network, defined along similar
lines to the clustering coefficient of a network [71,72]. For
each node, this quantity is measured by dividing the number of
transitive triangles that node iis a part of (T
i) by the number
of possible directed triangles:
CT
i=T
i
ktot
iktot
i1.(10)
Here ktot
iis the total degree (in +out) of the node. We average
this quantity over all nodes in the network to report a single
value for each piece. As indicated by Eq. (9), we expect the
KL divergence of the networks to primarily be driven by the
transitive clustering coefficient. This relationship is indeed
evident in Fig. 6(c), where we observe that musical networks
with a higher transitive clustering coefficient tend to exhibit
lower KL-divergence values. In this context, we also observe
that the preludes and toccatas (which demonstrated relatively
lower KL-divergence values) are characterized by a larger
density of transitive triangles compared to other pieces like
the chorales.
A natural question that arises at this point is, What is
the significance of these transitive relationships within the
networks, and why do they contribute to reduced disparities
between the inferred and true structure? From a cognitive
science perspective, this relationship between the KL diver-
gence and clustering arises from the tendency of humans to
count transitions of length two, as discussed previously. In
a scenario where a given node iis connected to node jand
node jlinks to node k, a human learner may erroneously
draw an edge between node iand node kin their mind.
However, if the network originally had a direct link from node
ito node k, such an error would reinforce an existing edge,
thereby aligning the inferred network more closely with the
true network. Hence, we expect networks with high clustering
to be more robust to errors made during inference. From a
music perspective, interpreting these triangles is not straight-
forward since the networks are unweighted. Nevertheless, the
presence of a large density of such triangles suggests that if
there is a transition between notes iand jand notes iand k,
there is likely also a transition between notes jand k.This
could potentially reflect the tendency of music to form tonally
stable sequences of note transitions. Substantiating these
claims would require further efforts, which we elaborate on in
Sec. VII.
Analyzing the transitive clustering further, we find that
the musical networks have a higher transitive clustering co-
efficient than degree-preserving random networks [Fig. 6(d)],
suggesting that this feature is not due to mere coincidence.
From Fig 6(d), we make an interesting observation: the pre-
ludes appear to have a lower transitive clustering coefficient
than the corresponding null networks that preserve their size
and degree distribution, while the chorale pieces generally
013136-7
KULKARNI, DAVID, LYNN, AND BASSETT PHYSICAL REVIEW RESEARCH 6, 013136 (2024)
FIG. 6. Quantifying the difference between the actual information and the perceived information in Bach’s music networks by calculating
the KL-divergence between the actual and perceived network. (a) KL divergence of the real music networks (Dreal
KL ) compared with random
networks of the same size (Drand
KL ). We report the KL divergence of the corresponding random networks after averaging over 100 independent
realizations. The error bars for Drand
KL indicate the standard error of the sample. (b) KL divergence of the real music networks (Dreal
KL ) compared
with random networks that preserve the in and out degrees of each node (Ddeg
KL ). We report the KL divergence of the corresponding degree-
preserving random networks after averaging over 100 independent realizations. The error bars for Ddeg
KL indicate the standard error of the sample.
(c) KL divergence of the real music networks as a function of the transitive clustering coefficient of the network C=T
i/ktot
i(ktot
i1).(d)The
transitive clustering coefficient of the real music networks compared with random networks that preserve the in and out degrees of each node.
The dotted line indicates the line y=x. For the degree-preserving random networks, we report the transitive clustering coefficient after
averaging over 100 independent realizations, with error bars denoting the standard error of the sample. In all panels, each data point represents
a single piece. Color and marker indicate the type of piece, as shown in the legend. The dotted lines in panels (a), (b), and (d) represent the
line y=x.
have a higher transitive clustering coefficient than expected
from null networks. We probe this further in the Supplemental
Material [57] and identify mesoscale structures that could
lead to the observed differences between the compositional
forms.
VI. ACCOUNTING FOR NOTE
TRANSITION FREQUENCIES
So far, we have focused our attention on the informa-
tion content and perception of unweighted (or binary) note
013136-8
INFORMATION CONTENT OF NOTE TRANSITIONS IN PHYSICAL REVIEW RESEARCH 6, 013136 (2024)
FIG. 7. Accounting for the frequencies of the note transitions in our analysis. (a) Entropy of the weighted versions of Bach’s music
networks (Sweighted) compared with the corresponding unweighted versions (Sunweighted). (b) The KL divergence of the weighted versions of
Bach’s music networks (Dreal,w
KL ) compared with the corresponding unweighted versions (Dreal
KL ). (c) Top: Entropy of the weighted note transition
networks (Sreal,w) compared with degree-preserving edge-rewired null networks (Sdeg, w). Bottom: The KL divergence of the weighted note
transition networks (Dreal,w
KL ) compared with degree-preserving edge-rewired null networks (Ddeg, w
KL ). In all panels, each data point represents a
single piece. Color and marker indicate the type of piece, as shown in the legend. The dashed line represents the line y=x. In the top figure of
panel (c), we report the average deviation of the data points from the line y=x.
transition networks created from Bach’s music. These net-
works only captured whether or not a transition exists between
two notes and were not sensitive to how frequently each transi-
tion occurs. The binary networks enabled us to probe how the
structure of the transitions supports effective communication.
However, in many real networks, not all transitions occur with
the same frequency. To reflect the different frequencies with
which transitions may occur, we construct networks in which
transitions are weighted according to this. For example, if note
ifollows note j90% of the time and note kfollows note j
10% of the time, the edge from node jto node iwill be more
heavily weighted than the edge from node jto node k(see
Methods Sec. A1for further details on network construction).
Adding this piece of information to the networks leads us to
new questions about the role that transition weights play in
communicating information to listeners. For example, how is
the information generated by a random walk on the network
altered by differences in the frequencies of transitions? Do
these differences in frequencies reduce the discrepancies in
the inferred network?
A. Weights reduce the surprisal of transitions
For unweighted networks, the node-level entropy of a
random walk is determined solely by the out degree (kout
i),
since each outgoing edge is traversed with probability Pij =
1/kout
i. If the edges are weighted by their transition frequen-
cies, the Pij’s will no longer be uniformly distributed, and
each outgoing edge will not have an equal probability of
being traversed. Hence, incorporating the edge weights re-
duces the node-level entropy. This observation is intuitive
since nonuniformities in any distribution lead to decreases
in entropy. However, extending this intuition to the entropy
produced by the entire network is not as straightforward,
since one must weigh the contribution of each node by
the stationary distribution of the random walkers, which
cannot be expressed in closed form for directed networks.
Generally, we find that the entropy of weighted networks
is still lower than the corresponding unweighted networks
[Fig. 7(a)]. This finding suggests that the different weights
do indeed reduce the overall surprisal generated by the
networks.
B. Weights reduce discrepancies between the inferred network
and the original network
Incorporating the transition frequencies also helps us to un-
derstand the role that the weights play in the human inference
of note transitions. We observe that the weighted networks
of note transitions have lower KL divergence than the bi-
nary networks [Fig. 7(b)]. This observation suggests that the
weights aid in forming more accurate internal representations
of the transition structures, thereby reducing the discrepancies
between the inferred and true structure.
In light of these data, we next verify the role that the
network structure plays in the communicative success of
weighted networks by comparing the entropy and KL di-
vergence of the weighted music networks with edge-rewired
null networks. In the analysis on unweighted networks, we
observed that the entropy was primarily driven by the degree
distribution of the network and not sensitive to the precise
connectivity pattern. To make this observation, we compared
the entropy of the real music networks to randomized net-
works that preserved the exact degree distribution of each
node and, hence, held the node-level entropies fixed. Along
similar lines, here we make use of null models that keep
the node-level entropies fixed by preserving the in and out
degrees of each node and the outweights at each node (see
the Methods section for details on the null models). By
comparing the entropy of the weighted music networks to
013136-9
KULKARNI, DAVID, LYNN, AND BASSETT PHYSICAL REVIEW RESEARCH 6, 013136 (2024)
the degree-preserving weighted null models, we see that the
entropies of real networks are still more or less unchanged,
although the real networks have marginally higher entropies
than the null networks [Fig. 7(c), top]. These results support
our conclusion that the entropy in the real networks is still
primarily driven by their degree distribution. When we com-
pare the KL divergence of the real weighted networks with the
degree-preserving weighted null models, we find that the real
networks have a lower KL divergence than the corresponding
null networks [Fig. 7(c), bottom]. Together, these results sug-
gest that incorporating the weights into our network analysis
does not alter our results on the effects of network structure
qualitatively.
Accounting for the note transition frequencies in our net-
work model leads to several interesting lines of inquiry. For
instance, is it the specific distribution of weights that improves
the accuracy of the inferred music networks? Future work
could evaluate this possibility by comparing the KL diver-
gence of the weighted networks with a class of null models
that preserve the skeleton of the network but permute the
edge weights. It would also be interesting to test whether
higher edge weights are concentrated in triangular clusters
of the network, offering a potential explanation for the lower
KL divergence of the weighted networks compared to the
binary networks. Since the weights correspond to the number
of times a transition occurs, we anticipate that analyzing the
distribution of these edge weights would give us information
about the tonal characteristics of a musical piece. Atonal
music, in particular, can be expected to exhibit a lower hetero-
geneity in the edge weights. Such a quantitative approach to
assessing the tonal characteristics of a piece ties into existing
and ongoing research on tonality in musical compositions as
well as the perception of tonality [7376]. We elaborate on
how our framework serves as a platform for further research
in this area in Sec. VII.
VII. CONCLUSIONS AND FUTURE DIRECTIONS
Across language, literature, music and even abstract
concepts, humans demonstrate the remarkable ability to
identify patterns and relationships from sequences of
items—an essential aspect of information sharing and
communication [35,51,7779]. Here, we draw upon ideas
from network science, information theory, and statistical
physics to build a framework that serves as a stepping stone
for studying the information conveyed by a musical piece. We
use this framework to analyze networks of note transitions in a
wide range of music composed by J. S. Bach. For each musical
piece, we construct a network of note transitions by drawing
directed edges between notes that are played consecutively.
We then quantify the amount of information generated by the
network structure and find that different compositional forms
can be grouped together based on their information entropy.
We relate the information content of each piece to its network
structure, enabling us to gain insight into the structural
properties of various pieces. Next, inspired by recent progress
in the field of statistical learning which demonstrates how
humans infer transition structures across visual and auditory
domains [35,66,67,78], we use a computational model [21,35]
for how humans learn networks of information to compute
FIG. 8. Network structures that support effective communication
of information. Networks with a larger variance or heterogeneity in
their node degrees, as shown in panel (i), pack more information into
their structure and have a higher entropy. Clustering in the network,
as shown in panel (ii), makes the structure more resilient to errors
made by humans when building an internal representation of the
information, allowing the network to be inferred more accurately.
Together, these structures convey a large amount of information that
can be learned by humans more accurately, and are hence more
efficient for communication.
the average “inferred” network structure for each piece. We
then quantify the discrepancies between the inferred and true
transition structures under this model. Here too, we observe
interesting differences among the pieces, which we attribute
to differences in the clustering of the networks. Finally,
we study how the frequencies of transitions influence the
information content and perception of the musical pieces
by weighing the transitions by the number of times they
occur. We find that the weights reduce the overall entropy
or surprisal of the transitions, and also reduce the deviations
between the inferred and actual network, suggesting that the
weights aid in accurate inference of these transition structures.
Furthermore, we find that the music networks contain more
information and maintain lower discrepancies in the inferred
structure than expected from typical transition structures of
the same size. This provides us insight into features that
make networks of information effective at communicating
information. In general, networks which are denser (have
a higher average degree) produce more information (have
a higher information entropy). For networks of comparable
average degree, more heterogeneous (higher variance in de-
gree distribution) structures produce more information than
those that are more regular or homogeneous in their degree
[Fig. 8(i)]. Moreover, networks which have a high degree of
clustering maintain a lower divergence from human expec-
tations [Fig. 8(ii)]. Together, these findings suggest that for
networks of a given size, rapid and accurate communication of
information is supported by structures that are simultaneously
heterogeneous and clustered (Fig. 8). Notably, such structures
are widely prevalent across complex systems [5860,71,80].
We hope that our framework inspires further exchange
between physics, cognitive science, and musicology. On a
broader scale, our study also adds to investigations on how
information in complex systems is structured. To conclude,
we highlight a number of exciting directions for future inquiry
and outline ways in which our framework can be expanded
upon and improved.
013136-10
INFORMATION CONTENT OF NOTE TRANSITIONS IN PHYSICAL REVIEW RESEARCH 6, 013136 (2024)
A. Future directions
A natural follow-up to this analysis would be to examine
works of other composers—particularly works outside the
Western tradition. This also prompts questions aimed at
assessing how various styles or genres of music differ
[8183]. In particular, what are the key features by which a
listener distinguishes between music from two eras, say the
Classical and the Romantic eras? How do the differences in
structure then impact how the piece is perceived by a listener?
Consequentially, a quantitative assessment of musical
compositions like ours raises the intriguing possibility of
identifying works of a composer or genres that may not be a
priori obvious to musicologists.
Systematically analyzing the information that we extract
from complex systems also provides us with new tools to
understand human creativity and experiences. A question that
often arises in the context of how humans experience music is,
What makes a musical composition appealing to the human
ear? While individual preferences in music can vary widely
and are highly subjectively, there is still a general agreement
on certain composers being considered influential or great.
This fact raises the possibility that there may be some inherent
qualities that are common to musical pieces which are widely
considered appealing. Identifying such features might give
us insight into the creative process of composing music and
also complement existing work using AI to generate music
[84,85]. Several attempts have been made to identify such
patterns. For example, Ref. [19] analyzed note transition net-
works in certain compositions by Bach, Chopin, and Mozart
as well as Chinese pop music, and suggested that “good”
music is characterized by the small-world property [71] and
heavy-tailed degree distributions. On the other hand, Ref. [30]
studied selected compositions from Bach’s Well-Tempered
Clavier and found non-heavy-tailed degree distributions, sug-
gesting that such distributions are not necessary for music to
be appealing. It would be interesting to devise future experi-
ments to determine whether our findings relate to the aesthetic
or emotional appeal of a piece. In our study, we found that
Bach’s music networks had a higher number of transitive tri-
angular clusters, enabling them to be learned more efficiently
than arbitrary transition structures. Are pieces with a larger
number of these triangles also more appealing to a listener?
Future work could assess this possibility by conducting ex-
periments that ask people to rate Bach’s compositions and
analyzing whether these ratings correlate with the presence
of triangular clusters.
Additionally, one could explore how the tonal patterns
and characteristics of a musical composition influence its
general aesthetic appeal. As suggested in Sec. VI, examining
the distribution of edge weights might give us information
about the tonal characteristics of a musical piece. Conducting
experiments to explore the relationship between the diversity
of edge weights in a music network and the overall aesthetic
appeal of note sequences generated by the network would
be an interesting avenue for future work. Such a quantitative
approach would complement the current and ongoing studies
in this area [7375]. More generally, our work focuses not
solely on the information inherent in the transition structure
of music but also on how the information in this transition
structure is perceived by a human listener. This framework
might be useful in studying cognitive aspects of music and
in bridging patterns observed in data with cognitive theories
of music.
In future work, it also would be interesting to extend our
analysis to examine how music networks evolve with time.
There are three potentially interesting lines of inquiry here:
First, how do the entropy and KL divergence of a musical
piece change as the piece progresses? Does this temporal
change differ among the various compositional forms? Sec-
ond, how has the music of a specific composer (whether
Bach or otherwise) changed over the course of their lifetime?
Has it become more intricate and complex, holding more
information? Perhaps as the composer gains experience, their
compositions convey information more efficiently and accu-
rately, as reflected in a reduced KL divergence? If the exact
dates of when each piece was composed were known, then
the framework used in our paper might provide answers to
these questions. Third, how has music of a given genre, say
classical music, changed over the years across composers?
Reference [32], for example, studied the fluctuation in pitch
between adjacent notes in compositions by Bach, Mozart,
Beethoven, Mendelsohn, and Chopin, and found that the
largest pitch fluctuations of a composer gradually increased
over time from Bach to Chopin. As mentioned earlier, it would
be interesting to expand our analysis to different composers,
and see how the information and expectations vary across
composers and time. Tracing how other quantitative network
characteristics (for example, the presence of tonal patterns,
as indicated in Sec. VI) have evolved over time across dif-
ferent music traditions would also be another avenue for
further study.
Finally, the quantitative measures described in our work
could serve as feedback quantities to human composers to aid
their composition process. For example, in music composition
software, one could add a feature that displays the music
network composed thus far and its entropy measures, etc.
Composers can then use these measures as feedback in their
music writing process. Using our framework, the software
could even suggest edits that could alter the surprisal—
potentially suggesting ways to contradict musical expectation
(resulting in a higher entropy) or harmoniously resolve notes
to meet those expectations. Moreover, capturing data from
users as they compose and employing our quantitative frame-
work would add to research mapping out the creative process
of musical composition [86,87]. This approach goes beyond
analyzing only the final product, therefore providing deeper
insight into the workings of human creativity.
Before we conclude, we highlight limitations within our
analysis that highlight directions for further effort. First,
our paper relies on a simplistic representation of music that
could be expanded to incorporate more musical realism and
complexity. For instance, one could account for differences
in timbre, the intervals between notes, or even fused notes
or chords, which are known to play a key role in mu-
sic perception [88,89]. Second, while we have focused on
the information present in first-order sequential relationships
among the notes, future work could capture higher-order cor-
relations, hierarchies, and more intricate structures inherent
in music [5255,90,91]. Recent advances in studying higher-
013136-11
KULKARNI, DAVID, LYNN, AND BASSETT PHYSICAL REVIEW RESEARCH 6, 013136 (2024)
order dependencies and structures present in networks offer
a promising approach to capturing this complexity [9294].
An essential feature to capture in the analysis of polyphonic
music is counterpoint, or in other words, the presence of rela-
tionships between distinct lines of music. Our current analysis
computes the note transitions separately for different lines and
combines them to form a single network, and thereby does not
capture counterpoint. It is important to capture this (among
other techniques used in music) in future work. One potential
method to incorporate this complexity is by extending our
current method using multilayer networks [9598]. In this
framework, the note transition networks derived from distinct
musical lines would constitute distinct “layers” within the
multilayer network. Then, the relationships across lines (or
layers) could be captured through interlayer edges, potentially
quantified by measuring correlations across musical lines. The
process of information generation could then be modeled us-
ing a coupled process, each running on a different layer but
coupled via the interlayer edges [99]. A similar multilayer
network framework could be used to simultaneously capture
other aspects of music—such as rhythm or timbre—in addi-
tion to note transitions.
Incorporating such subtleties would not only improve our
understanding of how the networks are structured, but also
how they are perceived. Expanding on this understanding,
it would be beneficial to conduct targeted experiments that
specifically address and build models of the perception of
distinct (but interdependent) musical attributes. Further, ex-
ploring the variability of music perception among individuals,
considering factors such as musical training or cultural in-
fluences would also be interesting. We highlight existing
research that investigates deviations from expectation in mu-
sic based on such nuances. Notably, studies referenced in
Refs. [4144] suggest that cultural knowledge shapes mu-
sical expectancies and perceptions of musical complexity,
particularly when individuals encounter novel musical forms.
Similarly, research in Ref. [100] illustrates that even rhythm
perception can significantly depend on culture. Investiga-
tions in Refs. [4547] propose that language background,
particularly background in a tonal language, impacts the
perception of musical sequences. Moreover, the level and
type of musical training are also known to influence mu-
sic perception [3740]. While our present framework and
computational model does not account for these nuances, it
provides a benchmark for comparative analysis across cul-
tures and musical training levels that could separate out
elements that are universally easier to anticipate, leaving room
for subsequent research to delve deeper into how the afore-
mentioned specific factors modify these general predictive
mechanisms. Naturally, expanding computational models to
encompass these nuanced factors is a direction for further
inquiry.
The aforementioned and ensuing directions would expand
our capacity to address more specific questions regarding
the composer idiosyncrasies, era characteristics, and genre
discussed earlier. As such, our work offers a flexible frame-
work that can be utilized by a wide range of scholars both
in and outside of physics. Beyond music, our study can also
be extended to a range of complex systems present around
us—such as language and social networks. For example, one
could analyze works of literature and ask, Does the entropy
of noun transitions in various works of Shakespeare differ
based on their genre? More specifically, does the information
content and learnability of noun transitions or relationships
between characters differ between tragedies and comedies?
By providing an example of a systematic and comprehensive
analysis of the actual and perceived information in music, our
study complements and adds to the rich study of language,
music, and art as complex systems [30,101,102]. Finally,
a quantitative treatment of the patterns and motifs present
in music complements research exploring analogies between
music and other fields of science—including understanding
protein structures and designing materials [103105].
The data and code used to perform the analyses in this
paper are openly available at Ref. [106].
ACKNOWLEDGMENTS
We thank C. Macklin for an early conversation on this topic
and audience members who asked probing questions about
our earlier work in communication networks. These interac-
tions motivated our continued investigation in this space. We
would also like to thank S. Varkey, S. Patankar, and A. Winn
for useful conversations and feedback on earlier versions of
this paper. This particular research was primarily supported by
the Army Research Office Award No. DCIST-W911NF-17-2-
0181 and the National Institutes of Mental Health Award No.
1-R21-MH-124121-01. D.S.B. would also like to acknowl-
edge additional support from the John D. and Catherine T.
MacArthur Foundation, the Alfred P. Sloan Foundation, the
Institute for Scientific Interchange Foundation, and the Army
Research Office (Grafton-W911NF-16-1-0474). The content
is solely the responsibility of the authors and does not nec-
essarily represent the official views of any of the funding
agencies.
APPENDIX A: FURTHER DETAILS ON DATA
AND METHODS
1. Data collection and network construction
The music files were collected in the MIDI format from
various sources. The sources for the compositions analyzed
are as follows: preludes [107,108], fugues [107,108], inven-
tions [107,108], cantatas [109], English suites [110], French
suites [110], four-part chorales [108], Brandenburg concertos
[108], toccatas [110], and concertos [110]. The preludes and
fugues are split based on whether they belong to the first or
second part of The Well-Tempered Clavier, and are labeled
1 or 2. Certain compositions consist of different movements
and our data set has separate MIDI files for each movement.
We analyze each movement separately and average our mea-
surements over them to yield a single measured quantity for
each piece, as indexed by a unique Bach–Werke–Verzeichnis
(BWV) number. In the case of the chorales, we analyzed the
186 four-part chorales in Bach-Gesellschaft Ausgabe (BGA)
Vol. 39 with BWV No. 253-438.
The MIDI files were read in MATLAB using the readmidi
function in MATLAB [111] to obtain information about the
notes being played. Different instruments in a piece are stored
013136-12
INFORMATION CONTENT OF NOTE TRANSITIONS IN PHYSICAL REVIEW RESEARCH 6, 013136 (2024)
in separate channels within each data file. The transitions
between notes are calculated separately for each instrument
or track. We assign each note present in a piece a node in
the network, and notes from different octaves are assigned
distinct nodes. We then draw an edge from note ito note jif
there is a transition between them. If there are multiple notes
being played at a single time t(as is the case with chords),
edges are drawn from the previously played note to all notes
at time t, and from all the notes being played at time tto
the subsequent note(s). This procedure gives us a directed
binary network of note transitions. Finally, we restrict our
analysis to the largest connected component of each network.
The code and data used to construct the networks is available
at Ref. [112]. We also construct weighted versions of these
networks, where each edge is weighted by the number of times
the corresponding transition occurs.
2. Entropy of random walks on networks
We use random walks to model how a sequence of informa-
tion is generated from an underlying network of information.
Under this model, a walker traverses the network by picking
an outgoing edge to traverse at each node. Given a net-
work with adjacency matrix Aand matrix element Aij,the
probability that a walker transitions from node ito node j
in a standard Markov random walk is Pij =Aij/kout
i, where
kout
i=jGij is the out degree of a node. We are interested
in quantifying how much information is contained in the
resulting sequence, which is captured by the entropy of the
random walk:
S=−
i
πi
j
Pij log Pij,
where πis the stationary distribution of the walkers, which
satisfies the condition Pπ=π. For the simplest possible
case of an undirected and unweighted network, Pij =1/ki
and πi=ki/2E, where kiis the degree of the ith node and
E=i,jAij/2 is the total number of edges. The entropy in
this case simplifies to
S=1
2E
i
kilog ki=klog k
k.(A1)
We can apply a Taylor expansion to this expression around the
average degree of the network, and thereby obtain
S=logk+Var(k)
2k2+.... (A2)
Hence we find that the entropy of random walks increases
logarithmically with the average degree of the network. Addi-
tionally, it grows as the variance of the degrees increases. This
formalization enables us to relate the information content of
various music networks to their network structure. The code
used to measure the entropy of random walks on the networks
analyzed is available at Ref. [112].
3. Model for how humans learn networks
As discussed in the main text, humans do not infer the tran-
sition probabilities of sequences of information with perfect
accuracy due to imperfections in their cognitive processes.
Studies have consistently found that in forming internal repre-
sentations of transition structures, humans integrate transition
probabilities over time [21,36,62,64]. This process results in
humans connecting items in the sequence that are not directly
adjacent to each other. Mathematically, we can express the
inferred transition structure ˆ
Pin terms of the true transition
structure Punder this model of fuzzy temporal integration as
ˆ
P=
t=0
f(t)Pt+1,(A3)
where f(t) is the weight given to the higher powers of
Pand is a decreasing function of tsuch that longer-
distance associations contribute less to a person’s network
representation. The functional form of f(t) is obtained
using the free-energy model described in Ref. [35]. This
model suggests that when forming internal representations
of information, each human arbitrates a trade-off between
accuracy and cost. The optimal distribution for f(t)
under this model is then a Boltzmann distribution with
a parameter βthat quantifies the trade-off between cost
and accuracy in forming an internal representation of the
information,
f(t)=eβt/Z,(A4)
where Z=eβt=(1 eβ)1is a normalization con-
stant. Substituting this expression to simplify Eq. (A3), we
obtain an equation that relates the inferred transition probabil-
ities ˆ
Pto the true transition probabilities P,
ˆ
P=(1 eβ)1
t=0
eβtPt+1
=(1 η)P(IηP)1,(A5)
where η=eβ. Prior work has estimated the value of ηto be
0.8 from large-scale online experiments in humans [21]. Using
this measured value of η,weuseEq.(A5) to calculate the
inferred network for any given music network (code available
at Ref. [112]).
4. KL divergence
To quantify how much the distorted learned transition
structure ˆ
Pdiffers from the original transition structure P,
we calculate the KL divergence between the two transition
structures. The KL divergence is a measure of how different a
probability distribution is from a reference distribution and is
given by
DKL(P|| ˆ
P)=−
i
πi
j
Pij log ˆ
Pij
Pij
,(A6)
where
πis the stationary probability distribution of the
transition matrix P, obtained by solving Pπ=π.TheKL
divergence between two quantities is always non-negative and
attains the value zero if and only if P=ˆ
P. The larger the KL
divergence, the more the inferred network ˆ
Pdiffers from the
original network. Hence, this quantity acts as a measure of the
extent to which a network gets scrambled by the inaccuracies
of human learning—or in other words, how accurately the
network structure is inferred.
013136-13
KULKARNI, DAVID, LYNN, AND BASSETT PHYSICAL REVIEW RESEARCH 6, 013136 (2024)
FIG. 9. The eight different possible triangles with node ias a
vertex in a directed graph. The triangles which represent transitive
relationships are marked using the letter T.
5. Null models
We aim to identify distinct features in the music networks
that enable them to convey information effectively. To assess
whether our observations are merely due to random chance or
are instead a unique feature of our data set, we compare our
measurements on the real music networks with the following
null network models [113,114]:
(1) Null networks with the same number of nodes and
edges. These are obtained by generating random networks
with the same number of nodes and edges, and enable us
to assess whether the quantity we have measured is to be
expected merely based on network size.
(2) Degree-preserving null networks. These are random-
ized networks of the same size, with the additional constraint
that the in and out degrees of each node in the network
are preserved. Such networks are constructed by swapping
edges between pairs of nodes in the network iteratively, such
that the in and out degrees of each node are preserved but
the connectivity (or topology) of the network is randomized.
This class of null models enable us to evaluate the role
that connectivity or topology plays in the quantity we are
measuring.
We can generalize the degree-preserving null networks to
weighted networks. We are interested in degree-preserving
randomized networks since these keep the node-level en-
tropies fixed and allow us to study the impact of topology
on the quantities we are measuring. In the case of weighted
networks, the node-level entropies are determined by the out
weights and out degrees of the nodes. Hence, our procedure
of swapping edges between pairs of nodes in the network
still works since it preserved the out weights of each node in
addition to the in and out degrees. With these null models, we
can benchmark the presence of the quantities we are interested
in and identify the role that the connectivity pattern or size
plays. The code used to generate the null networks is available
at Ref. [112].
6. Transitive clustering coefficient
Along the lines of the clustering coefficient of a node
[71,72], we define the transitive clustering coefficient as a
measure of the degree to which nodes in a directed network
tend to form transitive relationships. The transitive clustering
coefficient of a node i(for an unweighted graph with no self
loops) is given by
CT
i=T
i
ktot
iktot
i1,(A7)
where T
idenotes the number of transitive triangles that node
iis a part of and ktot
iis the total degree (in +out) of the node.
The denominator simply counts the number of triangles that
could exist within the neighborhood of node i.
The possible directed triangles involving node ican be
divided into two categories—those representing cyclic re-
lationships and those representing transitive relationships
(Fig. 9). The number of transitive triangles involving node i
that actually exist can be expressed in terms of the adjacency
matrix of graph A:
CT
i=(A+AT)3
ii A3
ii (AT)3
ii
2ktot
iktot
i1.(A8)
This expression counts a subset of the total number of tri-
angles and is a special case of the expression derived in
Ref. [115]. We will use this expression to measure the tran-
sitive clustering coefficient of each music networks (code
available at Ref. [112]).
7. Spearman correlation coefficient and its associated pvalue
To assess the correlation between two quantities, we calcu-
late the Spearman correlation coefficient between them. This
coefficient is a nonparametric measure of the monotonicity of
the relationship between two data sets. The Spearman corre-
lation coefficient ranges from 1 to 1, with 0 implying no
correlation. The sign of the correlation coefficient indicates
the direction of association between the two variables. For two
data sets denoted by Xand Y, it is computed as follows:
rs=cov(R(X),R(Y))
σR(X)σR(Y)
,(A9)
where R(X) and R(Y) denote the rank of the data points,
cov(R(X),R(Y)) denotes the covariance of the rank variables,
and σR(X)and σR(Y)are the standard deviations of the rank
variables.
The reported pvalue indicates the probability that uncorre-
lated data sets have a Spearman correlation at least as extreme
as the one computed. In instances where our sample size
is not very large, we calculate the pvalue by performing a
permutation test.
[1] S.J.Mithen,The Singing Neanderthals: The Origins of Music,
Language, Mind and Body (Harvard University Press, USA,
2005).
[2] G. F. Welch, M. Biasutti, J. MacRitchie, G. E. McPherson, and
E. Himonides, The impact of music on human development
and well-being, Front. Psychol. 11, 1246 (2020).
[3] I. Cross, Music, cognition, culture, and evolution, Ann. N.Y.
Acad. Sci. 930, 28 (2001).
[4] S. McClary, The impromptu that trod on a loaf: Or how music
tells stories, Narrative 5, 20 (1997).
[5] D. Miell, R. A. MacDonald, and D. J. Hargreaves, Musical
Communication (Oxford University Press, Oxford, 2005).
013136-14
INFORMATION CONTENT OF NOTE TRANSITIONS IN PHYSICAL REVIEW RESEARCH 6, 013136 (2024)
[6] K. R. Scherer and E. Coutinho, How music creates emotion:
A multifactorial process approach, The Emotional Power of
Music: Multidisciplinary Perspectives on Musical Arousal, Ex-
pression, and Social Control (Oxford University Press, Oxford,
2013), pp. 121–145.
[7] S. Koelsch, Brain correlates of music-evoked emotions, Nat.
Rev. Neurosci. 15, 170 (2014).
[8] A. J. Blood and R. J. Zatorre, Intensely pleasurable responses
to music correlate with activity in brain regions implicated in
reward and emotion, Proc. Natl. Acad. Sci. 98, 11818 (2001).
[9] D. Huron, Sweet Anticipation: Music and the Psychology of
Expectation (The MIT Press, USA, 2006).
[10] B. Tillmann, B. Poulin-Charronnat, and E. Bigand, The role
of expectation in music: From the score to emotions and the
brain, WIREs Cognit. Sci. 5, 105 (2014).
[11] M. T. Pearce and G. A Wiggins, Auditory expectation: The in-
formation dynamics of music perception and cognition, Topics
Cognit. Sci. 4, 625 (2012).
[12] V. N. Salimpoor, D. H. Zald, R. J. Zatorre, A. Dagher, and
A. R. McIntosh, Predictions and the brain: How musical
sounds become rewarding, Trends Cognit. Sci. 19, 86 (2015).
[13] M. A Rohrmeier and S. Koelsch, Predictive information
processing in music cognition. A critical review, Int. J.
Psychophysiol. 83, 164 (2012).
[14] L. B. Meyer, Emotion and Meaning in Music, ACLS Human-
ities E-Book (University of Chicago Press, Chicago, USA,
1956).
[15] P. Vuust and C. D. Frith, Anticipation is the key to understand-
ing music and the effects of music on emotion, Behav. Brain
Sci. 31, 599 (2008).
[16] H. Egermann, M. T. Pearce, G. A. Wiggins, and S. McAdams,
Probabilistic models of expectation violation predict psy-
chophysiological emotional responses to live concert music,
Cogn. Affect. Behav. Neurosci. 13, 533 (2013).
[17] S. Ferretti, On the modeling of musical solos as complex
networks, Inform. Sci. 375, 271 (2017).
[18] S. Ferretti, On the complex network structure of musical
pieces: Analysis of some use cases from different music gen-
res, Multimedia Tools Appl. 77, 16003 (2018).
[19] X. F. Liu, K. T. Chi, and M. Small, Complex network structure
of musical compositions: Algorithmic generation of appealing
music, Physica A 389, 126 (2010).
[20] M. B. Nardelli, MUSICNTWRK: Data Tools for Music The-
ory, Analysis and Composition, Perception, Representations,
Image, Sound, Music: 14th International Symposium, CMMR
2019, Marseille, France (Springer International Publishing,
Cham, 2021), Vol. 12631, pp. 190–215.
[21] C. W. Lynn, L. Papadopoulos, A. E. Kahn, and D. S. Bassett,
Human information processing in complex networks, Nat.
Phys. 16, 965 (2020).
[22] C. E. Shannon, A mathematical theory of communication, Bell
Syst. Tech. J. 27, 379 (1948).
[23] S. T. Piantadosi, H. Tily, and E. Gibson, Word lengths are
optimized for efficient communication, Proc. Natl. Acad. Sci.
108, 3526 (2011).
[24] J. B. Plotkin and M. A. Nowak, Language evolution and infor-
mation theory, J. Theor. Biol. 205, 147 (2000).
[25] J.-P. Eckmann, E. Moses, and D. Sergi, Entropy of dialogues
creates coherent structures in e-mail traffic, Proc. Natl. Acad.
Sci. 101, 14333 (2004).
[26] K. Zhao, M. Karsai, and G. Bianconi, Entropy of dynamical
social networks, PLoS One 6, e28116 (2011).
[27] M. Rosvall, A. Trusina, P. Minnhagen, and K. Sneppen, Net-
works and cities: An information perspective, Phys. Rev. Lett.
94, 028701 (2005).
[28] J. E. Cohen, Information theory and music, Behav. Sci. 7, 137
(1962).
[29] L. Hiller and C. Bean, Information theory analyses of four
sonata expositions, J. Music Theory 10, 96 (1966).
[30] F. Gomez, T. Lorimer, and R. Stoop, Complex networks of har-
monic structure in classical music, in Nonlinear Dynamics of
Electronic Systems (Springer International Publishing, Cham,
2014), pp. 262–269.
[31] J. P. Boon, A. Noullez, and C. Mommen, Complex dynamics
and musical structure, Interface 19, 3 (1990).
[32] L. Liu, J. Wei, H. Zhang, J. Xin, and J. Huang, A statistical
physics view of pitch fluctuations in the classical music from
Bach to Chopin: Evidence for scaling, PLoS One 8, e58710
(2013).
[33] D. Kahneman, S. P. Slovic, P. Slovic, and A. Tversky, Judg-
ment under Uncertainty: Heuristics and Biases (Cambridge
University Press, Cambridge, 1982).
[34] P. Dayan, Improving generalization for temporal difference
learning: The successor representation, Neural Comput. 5, 613
(1993).
[35] C. W. Lynn, A. E. Kahn, N. Nyema, and D. S. Bassett, Abstract
representations of events arise from mental errors in learning
and memory, Nat. Commun. 11, 2313 (2020).
[36] I. Momennejad, E. M. Russek, J. H. Cheong, M. M. Botvinick,
N. D. Daw, and S. J. Gershman, The successor representation
in human reinforcement learning, Nat. Human Behav. 1, 680
(2017).
[37] J. Sherwin and P. Sajda, Musical experts recruit action-related
neural structures in harmonic anomaly detection: Evidence
for embodied cognition in expertise, Brain Cognit. 83, 190
(2013).
[38] M. Tervaniemi, T. Tupala, and E. Brattico, Expertise in folk
music alters the brain processing of Western harmony, Ann.
N.Y. Acad. Sci. 1252, 147 (2012).
[39] P. Vuust, E. Brattico, M. Seppänen, R. Näätänen, and M.
Tervaniemi, Practiced musical style shapes auditory skills,
Ann. N.Y. Acad. Sci. 1252, 139 (2012).
[40] S. Jentschke and S. Koelsch, Musical training modulates the
development of syntax processing in children, NeuroImage 47,
735 (2009).
[41] M. E Curtis and J. J. Bharucha, Memory and musical expec-
tation for tones in cultural context, Music Percept. 26, 365
(2009).
[42] S. M. Demorest and L. Osterhout, ERP responses to cross-
cultural melodic expectancy violations, Ann. N.Y. Acad. Sci.
1252, 152 (2012).
[43] T. Eerola, T. Himberg, P. Toiviainen, and J. Louhivuori, Per-
ceived complexity of western and African folk melodies by
western and African listeners, Psychol. Music 34, 337 (2006).
[44] T. Eerola, J. Louhivuori, and E. Lebaka, Expectancy in Sami
Yoiks revisited: The role of data-driven and schema-driven
knowledge in the formation of melodic expectations, Music.
Sci. 13, 231 (2009).
[45] T. Bent, A. R. Bradlow, and B. A. Wright, The influence of
linguistic experience on the cognitive processing of pitch in
013136-15
KULKARNI, DAVID, LYNN, AND BASSETT PHYSICAL REVIEW RESEARCH 6, 013136 (2024)
speech and nonspeech sounds. J. Exp. Psychol. Hum. Percept.
Perform. 32, 97 (2006).
[46] B. Chandrasekaran, A. Krishnan, and J. T. Gandour, Rela-
tive influence of musical and linguistic experience on early
cortical processing of pitch contours, Brain Lang. 108,1
(2009).
[47] P. Q Pfordresher and S. Brown, Enhanced production and
perception of musical pitch in tone language speakers, Atten.
Percept. Psychophys. 71, 1385 (2009).
[48] M. P. Roncaglia-Denissen, D. A. Roor, A. Chen, and M.
Sadakata, The enhanced musical rhythmic perception in sec-
ond language learners, Frontiers Human Neurosci. 10, 288
(2016).
[49] M. W. Howard and M. J. Kahana, A distributed repre-
sentation of temporal context, J. Math. Psychol. 46, 269
(2002).
[50] S. J. Gershman, C. D. Moore, M. T. Todd, K. A. Norman, and
P. B. Sederberg, The successor representation and temporal
context, Neural Comput. 24, 1553 (2012).
[51] C. W. Lynn and D. S. Bassett, How humans learn and represent
networks, Proc. Natl. Acad. Sci. 117, 29407 (2020).
[52] G. R. Jagari, P. Pedram, and L. Hedayatifar, Long-range cor-
relation and multifractality in Bach’s invention pitches, J. Stat.
Mech.: Theory Exp. (2007) P04012.
[53] J. M. Moore, D. C. Corrêa, and M. Small, Is Bach’s brain a
Markov chain? Recurrence quantification to assess Markov
order for short, symbolic, musical compositions, Chaos 28,
085715 (2018).
[54] F. Lerdahl and R. S. Jackendoff, A Generative Theory of
Tonal Music, Reissue, with a New Preface (MIT Press, USA,
1996).
[55] S. Koelsch, M. Rohrmeier, R. Torrecuso, and S. Jentschke,
Processing of hierarchical syntactic structure in music, Proc.
Natl. Acad. Sci. 110, 15443 (2013).
[56] J. Gomez-Gardenes and V. Latora, Entropy rate of diffusion
processes on complex networks, Phys.Rev.E78, 065102(R)
(2008).
[57] See Supplemental Material at http://link.aps.org/supplemental/
10.1103/PhysRevResearch.6.013136 for more information.
[58] R. Albert, Scale-free networks in cell biology, J. Cell Sci. 118,
4947 (2005).
[59] K. Lerman, X. Yan, and X.-Z. Wu, The majority illusion in
social networks, PLoS One 11, e0147617 (2016).
[60] A.-L. Barabási and R. Albert, Emergence of scaling in random
networks, Science 286, 509 (1999).
[61] M. M. Garvert, R. J. Dolan, and T. EJ Behrens, A map
of abstract relational knowledge in the human hippocampal–
entorhinal cortex, elife 6, e17086 (2017).
[62] F. Meyniel, M. Maheu, and S. Dehaene, Human inferences
about sequences: A minimal transition probability model,
PLoS Comput. Biol. 12, e1005260 (2016).
[63] F. Meyniel and S. Dehaene, Brain networks for confidence
weighting and hierarchical inference during probabilistic
learning, Proc. Natl. Acad. Sci. 114, E3859 (2017).
[64] E. L. Newport and R. N. Aslin, Learning at a distance I. Statis-
tical learning of non-adjacent dependencies, Cognit. Psychol.
48, 127 (2004).
[65] A. E. Kahn, E. A. Karuza, J. M. Vettel, and D. S. Bassett,
Network constraints on learnability of probabilistic motor se-
quences, Nat. Hum. Behav. 2, 936 (2018).
[66] J. R. Saffran, E. K. Johnson, R. N. Aslin, and E. L. Newport,
Statistical learning of tone sequences by human infants and
adults, Cognition 70, 27 (1999).
[67] E. Morgan, A. Fogel, A. Nair, and A. D. Patel, Statistical learn-
ing and gestalt-like principles predict melodic expectations,
Cognition 189, 23 (2019).
[68] V. K. Cheung, P. M. C. Harrison, S. Koelsch, M. T. Pearce,
A. D. Friederici, and L. Meyer, Cognitive and sensory expec-
tations independently shape musical expectancy and pleasure,
Phil. Trans. R. Soc. B 379, 20220420 (2023).
[69] T. Collins, B. Tillmann, F. S. Barrett, C. Delbé, and P. Janata,
A combined model of sensory and cognitive representations
underlying tonal expectations in music: From audio signals to
behavior, Psychol. Rev. 121, 33 (2014).
[70] M. T. Pearce, M. H. Ruiz, S. Kapasi, G. A. Wiggins, and
J. Bhattacharya, Unsupervised statistical learning underpins
computational, behavioural, and neural manifestations of mu-
sical expectation, NeuroImage 50, 302 (2010).
[71] D. J. Watts and S. H. Strogatz, Collective dynamics of ‘small-
world’ networks, Nature (London) 393, 440 (1998).
[72] G. Szabó, M. J. Alava, and J. Kertész, Clustering in complex
networks, Lect. Notes Phys. 650, 139 (2004).
[73] I. Mencke, D. Omigie, M. Wald-Fuhrmann, and E. Brattico,
Atonal music: Can uncertainty lead to pleasure? Front.
Neurosci. 12, 979 (2019).
[74] B. Tillmann and E. Bigand, The relative importance of lo-
cal and global structures in music perception, J. Aesth. Art
Criticism 62, 211 (2004).
[75] D. Butler, Describing the perception of tonality in music:
A critique of the tonal hierarchy theory and a proposal
for a theory of intervallic rivalry, Music Percept. 6, 219
(1989).
[76] P. Toiviainen and C. L. Krumhansl, Measuring and modeling
real-time responses to music: The dynamics of tonality induc-
tion, Perception 32, 741 (2003).
[77] J. R. Saffran, R. N. Aslin, and E. L. Newport, Statis-
tical learning by 8-month-old infants, Science 274, 1926
(1996).
[78] J. Fiser and R. N. Aslin, Statistical learning of higher-
order temporal structure from visual shape sequences. J. Exp.
Psychol. 28, 458 (2002).
[79] A. R. Romberg and J. R. Saffran, Statistical learning and
language acquisition, Wiley Interdiscip. Rev.: Cognit. Sci. 1,
906 (2010).
[80] R. F. I. Cancho and V. S. Richard, The small world
of human language, Proc. R. Soc. London B 268, 2261
(2001).
[81] P. H. R. Zivic, F. Shifres, and G. A. Cecchi, Perceptual basis
of evolving Western musical styles, Proc. Natl. Acad. Sci. 110,
10034 (2013).
[82] C. Pérez-Sancho, D. Rizo, and J. M. Iñesta, Genre classifica-
tion using chords and stochastic language models, Connect.
Sci. 21, 145 (2009).
[83] D. K. Simonton, Melodic structure and note transition prob-
abilities: A content analysis of 15,618 classical themes,
Psychol. Music 12, 3 (1984).
[84] J. Thickstun, Z. Harchaoui, and S. Kakade, Learning features
of music from scratch, arXiv:1611.09827.
[85] I. Liu, B. Ramakrishnan et al., Bach in 2014: Music composi-
tion with recurrent neural network, arXiv:1412.3191.
013136-16
INFORMATION CONTENT OF NOTE TRANSITIONS IN PHYSICAL REVIEW RESEARCH 6, 013136 (2024)
[86] D. Collins, A synthesis process model of creative thinking in
music composition, Psychol. Music 33, 193 (2005).
[87] D. Collins, Real-time tracking of the creative music composi-
tion process, Digital Creativity 18, 239 (2007).
[88] W. J. Dowling and J. C. Bartlett, The importance of
interval information in long-term memory for melodies.
Psychomusicology 1, 30 (1981).
[89] R. Parncutt, Harmony: A Psychoacoustical Approach, Springer
Science & Business Media, Vol. 19 (Springer, Berlin,
Heidelberg, 2012).
[90] C. J. Stevens, Music perception and cognition: A review
of recent cross-cultural research, Topics Cognit. Sci. 4, 653
(2012).
[91] F. Lerdahl and R. Jackendoff, An overview of hierarchical
structure in music, Music Percept. 1, 229 (1983).
[92] J. Xu, T. L. Wickramarathne, and N. V. Chawla, Representing
higher-order dependencies in networks, Sci. Adv. 2, e1600028
(2016).
[93] R. Lambiotte, M. Rosvall, and I. Scholtes, From networks to
optimal higher-order models of complex systems, Nat. Phys.
15, 313 (2019).
[94] H. Yin, A. R. Benson, and J. Leskovec, Higher-order cluster-
inginnetworks,Phys. Rev. E 97, 052306 (2018).
[95] M. Kivelä, A. Arenas, M. Barthelemy, J. P Gleeson, Y.
Moreno, and M. A Porter, Multilayer networks, J, Compl.
Netw. 2, 203 (2014).
[96] A. Aleta and Y. Moreno, Multilayer networks in a nutshell,
Annu. Rev. Condens. Matter Phys. 10, 45 (2019).
[97] M. De Domenico, A. Solé-Ribalta, E. Cozzo, M. Kivelä, Y.
Moreno, M. A. Porter, S. Gómez, and A. Arenas, Mathe-
matical formulation of multilayer networks, Phys. Rev. X 3,
041022 (2013).
[98] Stefano Boccaletti, Ginestra Bianconi, R. Criado, C. I. Del
Genio, J. Gómez-Gardenes, M. Romance, I. Sendina-Nadal, Z.
Wang, and M. Zanin, The structure and dynamics of multilayer
networks, Phys. Rep. 544, 1 (2014).
[99] M. De Domenico, C. Granell, M. A Porter, and A. Arenas, The
physics of spreading processes in multilayer networks, Nat.
Phys. 12, 901 (2016).
[100] J. R. Iversen, A. D. Patel, and K. Ohgushi, Perception of
rhythmic grouping depends on auditory experience, J. Acoust.
Soc. Am. 124, 2263 (2008).
[101] J. Stiller, D. Nettle, and R. I. Dunbar, The small world of
Shakespeare’s plays, Human Nature 14, 397 (2003).
[102] Y.-M. Choi and H.-J. Kim, A directed network of greek and
roman mythology, Physica A 382, 665 (2007).
[103] M. J. Buehler, Unsupervised cross-domain translation via deep
learning and adversarial attention neural networks and appli-
cation to music-inspired protein designs, Patterns 4, 100692
(2023).
[104] C.-H. Yu, Z. Qin, F. J. Martin-Martinez, and M. J. Buehler,
A self-consistent sonification method to translate amino acid
sequences into musical compositions and application in pro-
tein design using artificial intelligence, ACS Nano 13, 7471
(2019).
[105] Joyce Y. Wong, John McDonald, Micki Taylor-Pinney,
David I. Spivak, David L. Kaplan, and Markus J. Buehler,
Materials by design: Merging proteins and music, Nano Today
7, 488 (2012).
[106] https://github.com/SumanSKulkarni/Music_Networks
[107] Bach central, http://www.bachcentral.com/
midiindexcomplete.html.
[108] Kern scores, http://kern.humdrum.org/search?s=t&keyword=
Bach+Johann.
[109] Bach cantatas website, https://www.bach-cantatas.com/Mus/
BWV1-Mus.htm.
[110] Suzu MIDI, http://www.suzumidi.com/eng/bach4.htm.
[111] MIDI file tools for MATLAB by Ken Schutte, https://github.com/
kts/matlab-midi.
[112] Information content of note transitions in the music of J. S.
Bach. https://github.com/SumanSKulkarni/Music_Networks.
[113] M. E. J. Newman and M. Girvan, Finding and evaluating
community structure in networks, Phys. Rev. E 69, 026113
(2004).
[114] F. Váša and B. Miši´
c, Null models in network neuroscience,
Nat. Rev. Neurosci. 23, 493 (2022).
[115] G. Fagiolo, Clustering in complex directed networks, Phys.
Rev. E 76, 026107 (2007).
013136-17
... There are several recent examples of how music scores can be modeled as complex networks, representing notes as nodes, and the temporal sequence as connections between these nodes. In a recent paper (Kulkarni et al., 2024), it has been shown how the study of the topology of networks created from Johann Sebastian Bach's pieces reveals the underlying organization of his music. On the one hand, the authors demonstrate that Bach's compositions exhibit a balance between complexity and simplicity that allows for efficient communication of musical information. ...
... Temporal information is quantized, adjusting every note onset to a precise 16th note and disregarding note duration. Only the quantized starting times of the notes are considered, aligning with previous research in network-based music analysis (Kulkarni et al., 2024). ...
... Additionally, the integration of this network-based modeling with other music analysis and generation methods could lead to further advancements in the field of computational musicology. Indeed, the observation that musical sequences optimized for human perception exhibit certain network topologies (Kulkarni et al., 2024;Lynn et al., 2020) suggests that studying these network properties can lead to a better understanding of how humans relate to music and how music has evolved over time. ...
Article
Full-text available
Complex networks have emerged as a powerful framework for understanding and analyzing musical compositions, revealing underlying structures and dynamics that may not be immediately apparent. This article explores the application of complex network representations to the study of symbolic drum sequences, a topic that has received limited attention in the literature. The proposed methodology involves encoding drum rhythms as directed, weighted complex networks, where nodes represent drum events, and edges capture the temporal succession of these events. This network-based representation allows for the analysis of similarities between different drumming styles, as well as the generation of novel drum patterns. Through a series of experiments, we demonstrate the effectiveness of this approach. First, we show that the complex network representation can accurately classify drum patterns into their respective musical styles, even with a limited number of training samples. Second, we present a generative model based on Markov chains operating on the network structure, which is able to produce new drum patterns that retain the essential features of the training data. Finally, we validate the perceptual relevance of the generated patterns through listening tests, where participants are unable to distinguish the generated patterns from the original ones, suggesting that the network-based representation effectively captures the underlying characteristics of different drumming styles. The findings of this study have significant implications for music research, genre classification, and generative music applications, highlighting the potential of complex networks to provide a transparent and elegant approach to the analysis and synthesis of rhythmic structures in music.
... To fill this gap, we build on the approach of previous studies [33], utilizing Network Science tools to analyze musical compositions and examine how democratization and digital connectivity impact musical complexity and diversity. In particular, we analyze a dataset of approximately 20, 000 MIDI files categorized into six macro-genres [34], choosing to represent musical compositions as weighted directed networks where notes are nodes and transitions are edges. ...
... To check the robustness of our results, we repeat the analysis using a different measure, the Network Entropy, previously used in other work to quantify the information contained in musical networks constructed from J.S. Bach pieces [33]. We report the result of the analysis in Supplementary Fig. S2. ...
Preprint
Full-text available
Music has always been central to human culture, reflecting and shaping traditions, emotions, and societal changes. Technological advancements have transformed how music is created and consumed, influencing tastes and the music itself. In this study, we use Network Science to analyze musical complexity. Drawing on 20,000\approx20,000 MIDI files across six macro-genres spanning nearly four centuries, we represent each composition as a weighted directed network to study its structural properties. Our results show that Classical and Jazz compositions have higher complexity and melodic diversity than recently developed genres. However, a temporal analysis reveals a trend toward simplification, with even Classical and Jazz nearing the complexity levels of modern genres. This study highlights how digital tools and streaming platforms shape musical evolution, fostering new genres while driving homogenization and simplicity.
... Inspired by existing research [35,40,60], we propose to introduce information content [62] as the metric to solve this problem. As an important concept of information theory, information content I(·) quantifies the amount of surprise or unexpectedness associated with an observed event A, which is formally computed by: ...
... While Margulis (2021) can be lauded for advocating interdisciplinary collaboration, I feel that such a commentary would be more effective at encouraging such collaborations if it included a more positive representation of González-Espinoza et al. (2020). For an example of a more positive response in a similar situation, see (Kulkarni et al., 2024;Cutts, 2024). ...
Article
Full-text available
The theory of Jesse Berezovsky (2019) is a rare foray of a physicist into the territory of music science. In their follow-up article in Empirical Musicology Review, Ryan Buechele, Alex Cooke, and Jesse Berezovsky (2024) show how the evolution of Western tuning systems and compositions can be rationalized by a theoretical model that describes a trade-off between minimizing sensory dissonance and maximizing compositional variety. From the Renaissance period onwards there was a trend towards more dissonance, and more compositional variety in both tuning systems and compositions. While this historical progression has perhaps been described qualitatively elsewhere, this model provides a more precise quantitative description of the phenomenon. The validity and scope of this model ought to be tested further by comparing its predictions with empirical measurements of tuning systems in both Western and non-Western cultures, alongside predictions of other theories of scale evolution. In the hope of encouraging and facilitating more of these interdisciplinary endeavors, I discuss some of my anecdotal experiences as a physical scientist embedded in the music science community, and offer advice on how to achieve better understanding and communication across disciplines.
... Efficient communication principles posit that more surprising stimuli have a greater amount of resources allocated to it; this principle has been supported by empirical work showing that surprising words tend to be longer [11]. Computational models show that musical scores exhibit a wide range of IC values for musical events such as harmonic [12,13] and pitch transitions [14]. Analyses of corpora of Western Classical music show that composers tend to assign high-IC chords to longer rhythmic values [5,15]. ...
Article
Full-text available
Sensory systems are permanently bombarded with complex stimuli. Cognitive processing of such complex stimuli may be facilitated by accentuation of important elements. In the case of music listening, alteration of some surface features –such as volume and duration– may facilitate the cognitive processing of otherwise high-level information, such as melody and harmony. Hence, musical accents are often aligned with intrinsically salient elements in the stimuli, such as highly unexpected notes. We developed a novel listening paradigm based on an artificial Markov-chain melodic grammar to probe the hypothesis that listeners prefer structurally salient events to be consistent with salient surface properties such as musical accents. We manipulated two types of structural saliency: one driven by Gestalt principles (a note at the peak of a melodic contour) and one driven by statistical learning (a note with high surprisal, or information content [IC], as defined by the artificial melodic grammar). Results suggest that for all listeners, the aesthetic preferences in terms of surface properties are well predicted by Gestalt principles of melodic shape. In contrast, despite demonstrating good knowledge of novel statistical properties of the melodies, participants did not demonstrate a preference for accentuation of high-IC notes. This work is a first step in elucidating the interplay between intrinsic, Gestalt-like and acquired, statistical properties of melodies in the development of expressive musical properties, with a focus on the appreciation of dynamic accents (i.e. a transient increase in volume). Our results shed light on the implementation of domain-general and domain-specific principles of information processing during music listening.
... Kulkarni et al. [20] defined the entropy of undirected and unweighed networks in their work regarding information content of note transitions in the music of J.S. Bach as follows: ...
Article
Full-text available
The analysis of algebraic invariants of algebras induced by appropriated multiset systems called Brauer configurations is a Brauer analysis of the data defining the multisets. Giving a complete description of such algebraic invariants (e.g., giving a closed formula for the dimensions of algebras induced by significant classes of Brauer configurations) is generally a tricky problem. Ringel previously proposed an analysis of this type in the case of Dynkin algebras, for which so-called Dynkin functions were used to study the numerical behavior of invariants associated with such algebras. This paper introduces two additional tools (the entropy and the covering graph of a Brauer configuration) for Brauer analysis, which is applied to Dynkin and Euclidean diagrams to define Dynkin functions associated with Brauer configuration algebras. Properties of graph entropies defined by the corresponding covering graphs are given to establish relationships between the theory of Dynkin functions, the Brauer configuration algebras theory, and the topological content information theory.
Article
Full-text available
Expectation is crucial for our enjoyment of music, yet the underlying generative mechanisms remain unclear. While sensory models derive predictions based on local acoustic information in the auditory signal, cognitive models assume abstract knowledge of music structure acquired over the long term. To evaluate these two contrasting mechanisms, we compared simulations from four computational models of musical expectancy against subjective expectancy and pleasantness ratings of over 1000 chords sampled from 739 US Billboard pop songs. Bayesian model comparison revealed that listeners' expectancy and pleasantness ratings were predicted by the independent, non-overlapping, contributions of cognitive and sensory expectations. Furthermore, cognitive expectations explained over twice the variance in listeners’ perceived surprise compared to sensory expectations, suggesting a larger relative importance of long-term representations of music structure over short-term sensory–acoustic information in musical expectancy. Our results thus emphasize the distinct, albeit complementary, roles of cognitive and sensory expectations in shaping musical pleasure, and suggest that this expectancy-driven mechanism depends on musical information represented at different levels of abstraction along the neural hierarchy. This article is part of the theme issue ‘Art, aesthetics and predictive processing: theoretical and empirical perspectives’.
Article
Full-text available
Taking inspiration from nature about how to design materials has been a fruitful approach, used by humans for millennia. In this paper we report a method that allows us to discover how patterns in disparate domains can be reversibly related using a computationally rigorous approach, the AttentionCrossTranslation model. The algorithm discovers cycle- and self-consistent relationships and offers a bidirectional translation of information across disparate knowledge domains. The approach is validated with a set of known translation problems, and then used to discover a mapping between musical data-based on the corpus of note sequences in J.S. Bach's Goldberg Variations created in 1741-and protein sequence data-information sampled more recently. Using protein folding algorithms, 3D structures of the predicted protein sequences are generated, and their stability is validated using explicit solvent molecular dynamics. Musical scores generated from protein sequences are sonified and rendered into audible sound.
Article
Full-text available
Humans communicate using systems of interconnected stimuli or concepts—from language and music to literature and science—yet it remains unclear how, if at all, the structure of these networks supports the communication of information. Although information theory provides tools to quantify the information produced by a system, traditional metrics do not account for the inefficient ways that humans process this information. Here, we develop an analytical framework to study the information generated by a system as perceived by a human observer. We demonstrate experimentally that this perceived information depends critically on a system’s network topology. Applying our framework to several real networks, we find that they communicate a large amount of information (having high entropy) and do so efficiently (maintaining low divergence from human expectations). Moreover, we show that such efficient communication arises in networks that are simultaneously heterogeneous, with high-degree hubs, and clustered, with tightly connected modules—the two defining features of hierarchical organization. Together, these results suggest that many communication networks are constrained by the pressures of information transmission, and that these pressures select for specific structural features. The arrangement of a sequence of stimuli affects how humans perceive information. Here, the authors show experimentally that humans perceive information in a way that depends on the network structure of stimuli.
Article
Full-text available
Humans are adept at uncovering abstract associations in the world around them, yet the underlying mechanisms remain poorly understood. Intuitively, learning the higher-order structure of statistical relationships should involve complex mental processes. Here we propose an alternative perspective: that higher-order associations instead arise from natural errors in learning and memory. Using the free energy principle, which bridges information theory and Bayesian inference, we derive a maximum entropy model of people’s internal representations of the transitions between stimuli. Importantly, our model (i) affords a concise analytic form, (ii) qualitatively explains the effects of transition network structure on human expectations, and (iii) quantitatively predicts human reaction times in probabilistic sequential motor tasks. Together, these results suggest that mental errors influence our abstract representations of the world in significant and predictable ways, with direct implications for the study and design of optimally learnable information sources.
Article
Recent advances in imaging and tracing technology provide increasingly detailed reconstructions of brain connectomes. Concomitant analytic advances enable rigorous identification and quantification of functionally important features of brain network architecture. Null models are a flexible tool to statistically benchmark the presence or magnitude of features of interest, by selectively preserving specific architectural properties of brain networks while systematically randomizing others. Here we describe the logic, implementation and interpretation of null models of connectomes. We introduce randomization and generative approaches to constructing null networks, and outline a taxonomy of network methods for statistical inference. We highlight the spectrum of null models — from liberal models that control few network properties, to conservative models that recapitulate multiple properties of empirical networks — that allow us to operationalize and test detailed hypotheses about the structure and function of brain networks. We review emerging scenarios for the application of null models in network neuroscience, including for spatially embedded networks, annotated networks and correlation-derived networks. Finally, we consider the limits of null models, as well as outstanding questions for the field. Comparisons of real networks with null models enable researchers to test how statistically unexpected a particular network feature is. In this Review, Váša and Mišić describe different null-model approaches and instantiations, as well as their emerging uses and limitations.
Article
Humans receive information from the world around them in sequences of discrete items—from words in language or notes in music to abstract concepts in books and websites on the Internet. To model their environment, from a young age people are tasked with learning the network structures formed by these items (nodes) and the connections between them (edges). But how do humans uncover the large-scale structures of networks when they experience only sequences of individual items? Moreover, what do people’s internal maps and models of these networks look like? Here, we introduce graph learning, a growing and interdisciplinary field studying how humans learn and represent networks in the world around them. Specifically, we review progress toward understanding how people uncover the complex webs of relationships underlying sequences of items. We begin by describing established results showing that humans can detect fine-scale network structure, such as variations in the probabilities of transitions between items. We next present recent experiments that directly control for differences in transition probabilities, demonstrating that human behavior depends critically on the mesoscale and macroscale properties of networks. Finally, we introduce computational models of human graph learning that make testable predictions about the impact of network structure on people’s behavior and cognition. Throughout, we highlight open questions in the study of graph learning that will require creative insights from cognitive scientists and network scientists alike.
Book
A search for a grammar of music with the aid of generative linguistics. This work, which has become a classic in music theory since its original publication in 1983, models music understanding from the perspective of cognitive science.The point of departure is a search for the grammar of music with the aid of generative linguistics.The theory, which is illustrated with numerous examples from Western classical music, relates the aural surface of a piece to the musical structure unconsciously inferred by the experienced listener. From the viewpoint of traditional music theory, it offers many innovations in notation as well as in the substance of rhythmic and reductional theory.
Article
Expectation, or prediction, has become a major theme in cognitive science. Music offers a powerful system for studying how expectations are formed and deployed in the processing of richly structured sequences that unfold rapidly in time. We ask to what extent expectations about an upcoming note in a melody are driven by two distinct factors: Gestalt-like principles grounded in the auditory system (e.g.a preference for subsequent notes to move in small intervals), and statistical learning of melodic structure. We use multinomial regression modeling to evaluate the predictions of computationally implemented models of melodic expectation against behavioral data from a musical cloze task, in which participants hear a novel melodic opening and are asked to sing the note they expect to come next. We demonstrate that both Gestalt-like principles and statistical learning contribute to listeners' online expectations. In conjunction with results in the domain of language, our results point to a larger-than-previously-assumed role for statistical learning in predictive processing across cognitive domains, even in cases that seem potentially governed by a smaller set of theoretically motivated rules. However, we also find that both of the models tested here leave much variance in the human data unexplained, pointing to a need for models of melodic expectation that incorporate underlying hierarchical and/or harmonic structure. We propose that our combined behavioral (melodic cloze) and modeling (multinomial regression) approach provides a powerful method for further testing and development of models of melodic expectation.
Article
We report a self-consistent method to translate amino acid sequences into audible sound, use the representation in the musical space to train a neural network, and then apply it to generate protein designs using artificial intelligence (AI). The sonification method proposed here uses the normal mode vibrations of the amino acid building blocks of proteins to compute an audible representation of each of the 20 natural amino acids, which is fully defined by the overlay of its respective natural vibrations. The vibrational frequencies are transposed to the audible spectrum following the musical concept of transpositional equivalence, playing or writing music in a way that makes it sound higher or lower in pitch while retaining the relationships between tones or chords played. This transposition method ensures that the relative values of the vibrational frequencies within each amino acid and among different amino acids are retained. The characteristic frequency spectrum and sound associated with each of the amino acids represents a type of musical scale that consists of 20 tones, the "amino acid scale". To create a playable instrument, each tone associated with the amino acids is assigned to a specific key on a piano roll, which allows us to map the sequence of amino acids in proteins into a musical score. To reflect higher-order structural details of proteins, the volume and duration of the notes associated with each amino acid are defined by the secondary structure of proteins, computed using DSSP and thereby introducing musical rhythm. We then train a recurrent neural network based on a large set of musical scores generated by this sonification method and use AI to generate musical compositions, capturing the innate relationships between amino acid sequence and protein structure. We then translate the de novo musical data generated by AI into protein sequences, thereby obtaining de novo protein designs that feature specific design characteristics. We illustrate the approach in several examples that reflect the sonification of protein sequences, including multihour audible representations of natural proteins and protein-based musical compositions solely generated by AI. The approach proposed here may provide an avenue for understanding sequence patterns, variations, and mutations and offers an outreach mechanism to explain the significance of protein sequences. The method may also offer insight into protein folding and understanding the context of the amino acid sequence in defining the secondary and higher-order folded structure of proteins and could hence be used to detect the effects of mutations through sound.