Fabio Cifariello Ciardi , 1997 " Retrieving Long Term Memory traces in contemporary
music listening: a composer view " Atti della Third International ESCOM Conference,
Uppsala - Sweden. pp.91-96
Retrieving Long Term Memory traces in contemporary music listening:
a composer view
Fabio Cifariello Ciardi
Conservatorio "L.Perosi" - Campobasso
via Pietro Giannone, 28 - 00195 - Roma ITALY
Human memory represents a central area of psychological research and, in general, a
fundamental issue for understanding human communication processes. Composers light be
interested in evaluating listeners memory processes in order to optimize communication of their
aesthetic and poetic ideas. However, composers consideration of memory is often ambiguous, as
while they understand that the possibility to grasp structural invariance of a piece of music
depends on short-term memory (STM) and long-term memory (LTM) interactions, nevertheless
they seldom examine the role of LTM in musical experiences closely. LTM refers to knowledge
which is already stored!while composers ideally tend to explore original, unknown !territories.
Many reasons might justify composers' attitude towards Tulving's distinction of LTM in
episodic and semantic memory . Episodic memory is scarcely significant in composition
because of its mainly autobiographical nature (i.e., episodic memory is a !personal, unique
wisdom which often looses its consistency when !applied to large groups of !people); on the other
hand, remarks about semantic memory are often avoided in order to refrain from facing the
puzzling issue of musical semantics.
Today, however, reconsidering the influence of LTM on music listening might be useful for
aesthetic, ethical and even political reasons (Cifariello Ciardi, 1994), but even simply for a
better evaluation of the cognitive "weight" of music materials. Furthermore, New Media
urgently call for such an evaluation: a better understanding of storing and retrieving processes is
clearly helpful in the organization of the huge sonic universe now available by means of digital
2. Denotation, connotation, commonness of sonic events
Composers are more interested in the unstable zone between expected and unexpected sonic
events, rather than in the listener's recognition of specific musical items. What composers might
wish to investigate is the possibility to influence, by means of appropriate compositional
techniques, the way listeners retrieve LTM traces. That is, composers are interested in
controlling the semantic quality of sonic events in order to improve the “depth” of their acoustic
space (i.e., a depth defined by a figure-ground perspective). This kind of control might be
helpful in shaping subject's attention, strategies and expectations while listening to music
genders based on non-traditional materials and syntax.
In order to assign semantic qualities to an acoustic stimulus, sensorial processing must activate
structures of semantic components stored in LTM. As LTM structures and elements may be
represented by semantic networks, activation of previously learned information may be
described as a process of semantic features correlation. Within this framework, the “meaning”
of sonic events is the result of structures of semantic components which may emerge from such
a correlation process. From a semiological point of view, by means of this process, the subject
may interpret a stimulus as a sign (i.e., a mental construct provided with a specific meaning).
The mental representation of the sign will primarily lead the subject to denotate the stimulus,
giving by this way an immediate meaning, and subsequently connotate the same stimulus
giving a secondary meaning that arise from the first.
Denotation might refer to the result of the very first correlation of the sign with previously
learned knowledge of the subject, while connotation might be considered as the result of
subsequent correlations. Thus far, denotation and connotation have been used here in a quite
broad sense: they indicate the result of any semantic features correlation with sensory data,
rather than specific goals of semantic encoding (e.g., recognition of the cause that produces the
stimulus, identification of the function of the stimulus within a certain context, recall of an
Strictly speaking connotation of a sonic event is quite a subjective attribute; nevertheless, if a
stimulus would activates - among others - analogous semantic features within a group of
subjects, we might hypothesize that a commonness in subjective connotations was elicited. The
eventual emergence of connotation commonness depends on the number of subjects that
associate the stimulus to a common set of semantic components. Moreover, connotation
commonness might be related to the number of interconnected semantic features: the more
interconnected semantic features activated by the stimulus are, the more evident the strength of
connotation will be and the stronger commonalities will emerge. For example, single tones of
well-known musical instruments might give rise to simple common connotations (e.g., the
common connotation of a piano tone might refers to the musical instrument that produced the
sound), while more complex shared connotations are difficult to be considered.
On the other hand, a stronger convergence towards more interconnected semantic features might
be expected in case of different types of sounds, although immediate correlations might still be
difficult to foresee.
Considering a well-known musical theme, a scream, a car beep, the sounds of wind or rain, a
shared connotation might be predicted. Such connotation might be based on a relatively small
number of common semantic components highly interconnected in some analogous way within
the memory network of each member of the group. Commonalties among subjective
connotations might refer to various type of cognitive activity: abstract or procedural knowledge,
emotional or motorial response. However, what remain clear is that different cognitive (and
biological) structures may lead to similar cognitive responses, thus implicating somehow
analogous semantic encoding (e.g., different subjects might regard a sonic event as a shared
concept more than subjective percept).
Our claim is that strong common connotation of any sonic event implies encoding modalities
and semantic effects closely related to those that arise in long-term verbal memory.
Connotated sonic events have hardly been studied as a whole. However, since well-known
melodies may be considered as a commonly connotated sonic event, results from literature on
well-known melodies memory representation (e.g., Dowling & Harwood, 1986; Halpern, 1984)
may be used to evaluate the aforementioned assumption. Among others, evidence for a
parallelism between well-known melodies and word-list encoding modality comes from
experiments by Gardiner et al. (1990). Their results show that the stable relation between
recognition and recall (Tulving-Wiseman law) is not restricted to verbal material, but is
extended to familiar melodic excerpts.
Considering commonness connotation as an index of a class of sonic events, two distinctive
features ought to be emphasized. First of all, such a class of connotated sonic event will include
a wide range of sounds produced by very different media: sounds from a natural environment as
well as sound produced by human being or objects for various reasons. Secondly, connotation
can be described on a relative scale only. That is, connotation refers to commonalities that
emerge within a group of subjects (i.e., a reference group) that agree on specific semantic
qualities of an event.
Consequently, common connotation of a sonic event might be defined on a statistic basis by
taking into account its stability in time and space domain: common connotation of a sonic event
may be strong for a certain period of time and weaker after that time. For instance, the sound of
horse-drawn coach was probably much more connotated before Industrial revolution than in the
present-day word. Nevertheless, despite the uncertain edge of a class of commonly connotated
sonic events, the definition of the issue seems to be plausible throughout analysis of sonic
landscape invariances (cf. Murray Schafer, 1977). A higher rate of constancy and invariance
within a certain time and a certain space will probably lead to a stronger and more common
connotation. In this sense, contemporary world represents an exceptional ground for sonic
connotation inquiries: common auditory !experiences in the present-day multimedia society are
frequent; this means that, today more than in the past, we might !expect to encode sonic
!experiences in a more !tangible common !memory network.
3. Cognitive parameters in sonic connotation
Composers might gain several advantages from investigation on connotated sonic events
cognitive processing. If semantics plays a crucial role in listening to a connotated sonic event it
means that composers could set up an appropriate compositional processes in order to take
advantage of effects related to semantic encoding of sonic materials. As an example,
experimental evidence on semantic encoding of verbal items points out that our cognitive
system produces faster responses for semantically encoded information (e.g., listener cognitive
processing of recognizable inputs is faster than cognitive processing of non-recognizable inputs)
(cf. Chang, 1986 for a review). Hence, if known aspects of a sonic stimulus are processed faster
than unknown ones, than composers may control processing time in music perception through
connotation control. That is, composers might use connotation in such a way as to stimulate and
convoy listener's attention towards specific aspects of the musical piece. In order to do this, a
central issue regards the physical correlates of a specific semantic quality. The question than
arise as to what extent parameters could be modified leaving connotation relatively intact.
The modification of any physical parameter of a connotated sonic event may result in a
variation between two boundaries. On one side, connotation strength may remain almost
unaffected by parameter modification; that is, modified sonic events will activate semantic
features similar to those activated by the original connotated event. On the other side more
dramatic modification of the same physical dimension might irreparably corrupt sonic event
connotation; this means that closeness among semantic features of both modified and original
sonic events will not be detected. Therefore, we may argue that a physical parameter might
erase sonic event connotation whenever parameter modification exceeds a certain threshold.
Connotation control in composition depends on the localization of such connotation
thresholds. Conditions which determine connotation thresholds may refer to subject's timeless
knowledge as well as to time-dependent factors which influence attention or emotional state.
Since consistent commonalities among subjective variables are unpredictable, composers tend
to ignore the issue. However, connotation thresholds may also depend on subject-independent
factors. Acoustical attributes and context distinctive features may contribute to define
connotation thresholds. Within this framework composers may wish to control those parameters
that seems to be critical in order to common connotation detection; that is, they may want to
control parameters which seem to be perceptually salient for any member of the reference
group. Therefore, we may pose the following question: can we determine a common
“connotative weight” of a sonic event acoustic parameter?
Stable relations between connotation and its physical correlates are difficult to establish.
Connotation of a percussion tone depends on amplitude envelope rather than on event length; on
the contrary, considering wind sound connotation, we might tend to assign a higher “weight” to
the event length and a lower “weight” to the amplitude envelope. Moreover, a straightforward
application of theories about music cognition is not always useful to answer the question. For
instance, “connotative weight” of pitch transposition seems to change over different sonic event
categories. Empirical findings show that relative pitch chroma plays the most important role in
memory for familiar melodies rather than absolute pitch (W. J. Dowilng & D. L. Harwood,
1986 for a review). These results suggest a low “connotative weight” for pitch transposition.
However, for other sonic categories (e.g., animal sounds) we may assume that absolute pitch
information is more accurately stored in LTM. If this is the case, absolute pitch transposition
would easily corrupt sonic event connotation.
Generally speaking, “connotative weight” of acoustic parameters may be assigned on a
cognitive-efficiency basis: critical parameters in sonic event connotation might be those which
could lead the subject towards an efficient answer to the expectations usually correlated with
that sonic event. This assumption may explain the reason why less significant parameters in
musical experiences, such as amplitude and sound localization, become critical in natural
Apart from subjective variables, connotation arousal is highly correlated with subject's exposure
time to the stimulus. Under certain conditions connotation almost immediately emerges, while
in other situations a longer time-span seems to be necessary to guarantee a reliable semantic
qualification. Consequently we may argue that, since connotation stems from semantic
correlation activity, connotation assignment may be considered as a time-dependent process: the
longer the exposure to a sonic event, the stronger the connotation will be. This suggests that by
means of an exposure time control, composers could control the chance that a connotation
assignment will take place. Moreover, whether sufficiently long exposure to a sonic event may
give rise to shared connotations, shorter exposure to the same event will probably activate
episodic and semantic encoding which leads to much more subjective responses.
As previously mentioned, connotation assignment is influenced by external conditions. Context-
dependent factors may influence connotation thresholds through a sort of semantic masking.
Under certain conditions it is possible that interferences in semantic encoding will take place
when a connotated sonic event is either preceded, or followed, or superposed to an other sonic
event. In this case connotation strength might depend on both subject-dependent and subject-
independent factors. Presumably, the more complex and robust the semantic network that
represents the connotated event is, the more interference-resistant the connotation will be. Yet,
the possibility to preserve connotation will be related to the subject's attention and expectations:
as classical Cocktail Party phenomenon suggests, we may expect that subjects can almost
“shadow” or ignore one sonic event while processing the other one. On the other hand, selective
attention and semantic masking are probably related to subject-independent factors as well. For
instance, structural coherence between sonic events seems to play a central role in connotation
detection. The more acoustic analogies exists between two differently connotated events, the
more difficult the single connotation detection will be. From a compositional point of view,
semantic masking is an other controllable process that might be used to shape sonic materials
4. Shaping sonic connotations in musical composition
In one of author's pieces, Altre Tracce for Bb clarinet (Cifariello Ciardi, 1991-2), exposure time
and semantic masking have been considered with the purpose of affecting listeners' semantic
encoding. Two groups of well-known melodic excerpts have been used as connotated sonic
events. The first group includes, among others, the opening theme from W. A. Mozart's K.550
symphony, the main theme of G. Rossini's Barbiere di Siviglia and the second theme from
Rossini's La Gazza Ladra overture. The common feature among all excerpts of the former
group is a pattern of two or more repeated notes embellished by a descending and an ascending
minor second. The second melodic group includes the Seguidilla from G. Bizet's Carmen, a
prominent secondary theme from Debussy's clarinet and piano Premiere Rapsodie and the jazz
standard The Lady's a Tramp. The common feature among all excerpts of the latter group is an
embellished segment of a descending chromatic scale. The piece proposes continuous
oscillations among connotated melodic fragments and more “neutral” gestures based on a 12-
tone row derived by merging the two groups of excerpts.
In the first section of the piece two sonic streams are alternatively presented. The first stream
tends to break over the Barbiere di Siviglia theme four time, while the second stream points at
Bizet's Seguidilla.. First and second (Example 1) appearance of Barbiere di Siviglia theme is
normally too short in order to permit recognition or connotation detection. In the successive
exposition the same theme is still obscure because of the exclusion of the initial three-notes
segment (Example 2). However, in both cases the sonic stream is always broken over in the
same pitch region. The goal of the polarization is to generate listeners' expectation associated to
a specific register.
During the third exposition listeners often recognize the theme. This means that the excerpt
connotation has emerged from nearby sonic materials (Example 3). Here, connotation detection
may depend on three factors. Firstly, the length of the third exposition is such that common
correlations emerges. Secondly, previous Rossini presentations contribute to the subject's pre-
activation. Thirdly, semantic masking is avoided by means of contrasting structures of standing
An important secondary effect of connotation assignment in music listening is a sort of semantic
“dissonance” between a connotated events and other neighboring ones. That is, semantic
encoding of the two events may activate concept nodes which are semantically far one from the
other (e.g., interconnection between connotate and “neutral” events memory networks may be
difficult to establish). Consequently, a strong loss of global coherence may be detected by the
subject. Hence, where efforts have been made to facilitate connotation assignment, other
compositional techniques have been used to smooth passages between connotated and more
“neutral” events. Firstly, hidden anticipation of Rossini's theme's salient features (Example 1
and 2) has been intended as a “preparation” to the third recognizable exposition of the excerpt.
Secondly, since pitches of the connotated sonic materials are embedded in the “neutral” 12-
tones row, collage-effects might be attenuate if underlying structural links are detected. Once
connotation has been confirmed, it has been faded out by means of a progressive semantic
Finally, according to author's empirical results connotation may play a role in stream
segregation. The typical loss of a sense of temporal order across two different streams seems to
increase when connotation of the two streams is different. This assumption has been tested in an
other section of the piece (Example 4). Listener responses suggest that connotation may be used
as an extra-cue in interleaved melodies identification. This result is coherent with Hartmann and
Johnson conclusions (Hartmann and Johnson, 1991): although peripheral channeling is the
dominant characteristic in stream segregation, other central processes - as semantic encoding of
connotated sonic events - might arise.
This paper is an attempt to point out how and why composers and music theorists should
thoroughly examine the encoding modalities of well-known sonic events. The result of the
author's empirical studies leads to the following conclusions.
1) Connotation can be usefully considered to index semantic encoding of a relatively large class
of sonic events.
2) Commonness among subjective connotations may be predicted trough a statistical sonic
landscape analysis, and may lead to the definition of a class of commonly connotated sonic
3) For each acoustic parameter of a connotated event, a threshold may be assumed in order to
define to what extent parameter modification will not corrupt connotation assignment.
4) Digital Sound processing as well as appropriate compositional techniques may supply
creative control on exposure time, masking effects and other factors related with connotation
We are aware that many of the aforementioned assumptions are hypothetical and speculative
ones and they need to be corroborated by a great amount of theoretical evaluation and empirical
research. Nevertheless we hope that they might renew psychologists interest in the study of
contemporary and electronic music sonic materials.
The author would like to thank Prof. Olivetti Berardinelli for her valuable criticism to the
Bregman, A.S. (1990) Auditory scene analysis. Cambridge: MIT Press.
Chang,T.M.(1986) Semantic Memory:Fact and Models,Psychological Bull., vol.99 n.2,199-220.
Cifariello Ciardi, F. (1991-2) Altre Tracce for Bb clarinet. Rome: Edipan
Cifariello Ciardi, F. (1994) Sentieri Convergenti? Musica e memoria ai limiti della costellazione
postmoderna in Cambiare Musica, la filosofia della musica dopo Adorno.. Milano: LIM.
Dowilng, W. Jay & Harwood, Dane L. (1986), Music Cognition, New York: Academic Press
Gardiner, Jhon M. et al. (1990) The Tulving-Wiseman law and the recognition of recallable
music, Memory & Cognition, vol.18, n.6, 632-637.
Halpern, Andrea R. Organization in Memory for Familiar Songs, J.of Experimental Psychology:
Learning, Memory , and Cognition, vol. 10, n.3, 496-512.
Hartmann, W.M. and Johnson, D.(1991) Stram Segregation and Peripheral Channeling, Music
Perception, vol.9, n.2, 155-184.
Murray Schafer, R. (1977) The tuning of the world, Toronto: McClelland and Stewart Ltd.