Content uploaded by Chrisoula Alexandraki
Author content
All content in this area was uploaded by Chrisoula Alexandraki on Jan 15, 2024
Content may be subject to copyright.
XXX-X-XXXX-XXXX-X/XX/$XX.00 ©20XX IEEE
Collaborative Score Transformations in
Online Music Lessons: the MusiCoLab Toolset
Chrisoula Alexandraki
Dept. Music Technology and Acoustics
Hellenic Mediterranean University
Rethymnon, Greece
chrisoula@hmu.gr
Alexandros Nousias
Dept. Music Technology and Acoustics
Hellenic Mediterranean University
Heraklion, Greece
nousias@hmu.gr
Demosthenes Akoumianakis
Dept. Electrical and Computer Eng.
Hellenic Mediterranean University
Rethymnon, Greece
da@hmu.gr
Yannis Viglis
Dept. Music Technology and Acoustics
Hellenic Mediterranean University
Rethymnon, Greece
viglis@hmu.gr
Dimitris Milios
Dept. Electrical and Computer Eng.
Hellenic Mediterranean University
Heraklion, Greece
dmilios@hmu.gr
Michael Kalochristianakis
Dept. Music Technology and Acoustics
Hellenic Mediterranean University
Rethymnon, Greece
kalohr@hmu.gr
Konstantinos Velenis
School of Music Studies
Aristotle University of Thessaloniki
Thessaloniki, Greece
kvelenis@mus.auth.gr
Maximos Kaliakatsos-Papakostas
Dept. Music Technology and Acoustics
Hellenic Mediterranean University
Rethymnon, Greece
maximoskalpap@gmail.com
Abstract— In the world of music teaching and learning,
music notation artifacts serve as indispensable material that
facilitate effective communication, interpretation, and
manipulation of musical compositions. Through these artifacts,
educators can guide students towards achieving the objectives
of a music lesson, nurturing not only technical proficiency but
also fostering creativity and artistic expression. This article
presents the development of VHVCoLab, a versatile web
application that addresses the needs of music education, by
seamlessly integrating practices for synchronous collaboration
and asynchronous interaction. The article presents our efforts
for collecting the requirements of music instructors, with
respect to music notation artifacts and their transformative
capabilities, our design and implementation approach for
supporting established and novel application affordances as well
as an informative pilot assessment guiding our ongoing and
future developments. As the domain of online music practicing
has attracted plenty of interest in the last decade, we aspire that
the VHVCoLab application exemplifies the possibilities of
harnessing online platforms to enrich the musical experience
and propel the growth of aspiring musicians worldwide.
Keywords—online music education, music notation software,
computer supported cooperative work.
I. INTRODUCTION
Despite the large body of research on collaborative
systems for visual/graphic environments (multiuser document
authoring, multiplay gaming, multiuser VR scenes, etc.), the
collaborative manipulation of music notation has only
recently been considered [1][2]. Of the several popular music
collaboration websites and applications, only Flat.io1 supports
synchronous collaborative manipulation of music notation.
Earlier calls for supporting cooperative music notation editing
had been voiced in the context of specific setups (e.g.,
orchestras and music schools [3] or Network Music
Performance (NMP) [4].
Such slow response to supporting collaborative music co-
practicing may be due to the complexity presented by music
1 https://flat.io/
scores. For instance, it is common that changes to a score may
be decided by musicians and reported on parts by adding
dynamics, expression marks, string bowings, fingering, and so
on. More complex changes, such as the deletion or addition of
music sections, their movements along the music score, music
arrangements, transposition, and the like, may be decided by
a music tutor, a conductor or composer and they should be
shown in real-time to musicians. Arguably, such specialized
requirements are substantially different from those
encountered in more popular collaborative systems such as
real-time document editors and video-based applications.
This paper presents the current version of VHVColab, an
application under ongoing development, as well as the main
results of an experiment conducted to assess its efficiency in
online music learning and teaching. The application integrates
several components [2] to establish sense of shared space,
accountability and blended learning between peer groups
interested in music education. At the core of the application is
an extended version of the Verovio Humdrum Viewer
(VHV)2, hence the name of the application presented in this
article. VHV has been extended to support collaborative
activities for its fundamental score editing functionalities
(e.g., pitch changes and score transpositions), an improved
facility for collaborative chord editing using the chord
notation of jazz standards, a capability for intelligent score
reharmonizations, an improved audio rendering engine, an
improved score navigation mechanism through the integration
of an audio-to score alignment algorithm and a
teleconferencing facility to support synchronous oral and
video communication.
An attempt has also been made at tracing the history of
collaborative exchanges, using a visual interface for obtaining
a step-by-step overview of co-practicing (i.e., what has taken
place during a synchronous session, who initiated what edit,
undo and replay facilities, etc.) This increased traceability
offers useful insights into social aspects of the lesson such as
anchoring the involvement of peers (e.g., leaders, lurkers) in
2 https://verovio.humdrum.org/
specific tasks, identifying complex parts in the score, etc.,
which may be used as learning materials in pre- or post- lesson
study. In the future, it is also expected to facilitate more
complex, analytics-based assessments of co-practicing.
In the era of big data, ubiquitous computing and the
Internet of Sounds [5], such applications manifest interesting
perspectives in multipart collaboration and maybe more
importantly in gathering analytics for how remote
collaborators transform music notation artifacts to achieve
their intended purpose, whether it being teaching and learning
or performance and composition. Monitoring and analyzing
such practices over the Internet can contribute to the creative
perspectives of the IoMusT [5] by providing the raw material
for delivering visionary musical applications, such as machine
listening interfaces [7], artificial musicians [8], conversational
and generative music agents [9][10] and so on.
The remaining of this paper is structured as follows. The
next section provides and overview of the research setting and
the context of this work. Section III presents the application
and its individual components in terms their design and
implementation. The section that follows presents an
assessment experiment which provided insight to the
efficiency of the VHVCoLab application to support online
music learning. The final section summarizes the
contributions of this article and our plan for future
developments.
II. RESEARCH SETTING
MusiCoLab is a research project that aims at the
development of a comprehensive platform for online music
education. Among the project partners are the creators of the
Genius Jamtracks (GJT) 3 application series which aim at
providing jazz musicians with numerous variations of jam
tracks to practice with. For this reason, the implementation of
the MusiCoLab platform emphasizes on efficiently supporting
jazz music learning and teaching.
The subsections that follow provide a concise overview of
the platform under development and the requirements of
music instructors that were collected to inform the
development of the VHVCoLab application.
A. The MusiCoLab Platform
This platform is designed to offer a suite of innovative
tools that may be used to enhance collaboration and
engagement in networked/virtual settings. These tools have
been implemented as web applications and are aimed at both
asynchronous student/teacher interactions (i.e., course
preparation and scheduling, student assignments and self-
practice) as well as synchronous collaborations, i.e., serving
as groupware to facilitate live music lessons by manipulating
intelligent collaborative digital artifacts [1].
The main components of the platform are the following:
• A Learning Management System (LMS), which is
based on Moodle 4 and has been extended to
interconnect its courses with the remaining
components of the platform.
3 https://geniusjamtracks.com/
4 https://moodle.org/
• An Online Data Repository (ODR) that provides
persistent storage of learning material, i.e., audio,
video, music notation files and their associated
metadata and content annotations. This repository
uses the internal authorization policies of Moodle
(i.e., user roles, course enrolments, role-based editing
and authoring capabilities) [11].
• An Audio/Video Conferencing facility. A Jitsi-Meet
server 5 is hosted by the MusiColab platform to
provide teleconferencing facilities and a Jamulus 6
server to provide low-latency, high-quality audio
connections. If NMP communication quality is
desired during lessons, the users of the platform are
instructed to use Jitsi Meet for Video and Jamulus for
audio communication.
• The Play-Along-Together (PAT) application which
provides affordances for collaborative play-along
practices using a backing tack, as well as intelligent
audio analysis capabilities applied on recorded tracks
and musical accompaniments.
• The VHVCoLab application presented in this article.
The interconnection of the different components within
the MusiCoLab platform is enabled by appropriate hyperlinks
and associated URL parameters that are generated by
dedicated widgets according to the required context of use,
e.g., collaborative vs. non-collaborative, course name and id,
files to use as learning material, etc.
B. Requirements for Score Representations and
Manipulations
During the early phases of the project, a survey was
conducted to understand the needs of music instructors during
online music teaching. This survey involved the distribution
of an online questionnaire to a group of instructors that were
familiar with the practice of teaching music through the
Internet. Apparently, the instructors participating in the study
were compelled to utilize the Internet as the teaching medium
during COVID-19 lockdown. In the years that followed, a
subset of these individuals has continued to employ online
teaching methods, particularly when they were away for
traveling. Furthermore, there are some among them who
aspire to extend their teaching to a broader audience through
online platforms, but they intend to do so when the technology
5 https://jitsi.github.io/handbook/docs/devops-guide/
6 https://jamulus.io/
Fig. 1. The distribution of music genres among the music instructors that
participated in the survey. Three instructors reported that they offer lessons
in two genres.
for online music instruction reaches a more advanced and
mature stage.
The questionnaire was divided in five sections
corresponding to five categories of requirement
specifications: a) the type of courses typically offered by the
instructor (genre, instrument, group or individual student
lessons, etc.) b) use of music notation or equivalent digital
representations, c) experience in conducting online music
lessons (network/software, problems, teleconferencing
limitations etc.) d) familiarity with software for asynchronous
teaching, i.e. LMS capabilities and e) their potential need for
solo/collaborative track recordings.
Different questions accepted different types of replies, e.g.,
single selection, multiple choice, free text answer, and ratings
in a scale of 1 to 5.
A total of fourteen (14) replies were received. As shown
on Fig. 1, the responses were received from instructors of
different music genres. Some teachers reported that they teach
more than one music genre, e.g., jazz and blues, or classical
and pop/rock. Concerning the reported musical instruments,
traditional music referred to percussion instruments; jazz
teaching had to do either with a solo instrument or with group
lessons involving several instruments (i.e., piano, bass drums,
brass, reed instruments and the singing voice); classical music
teaching was reported for piano, violin, acoustic guitar, and
flute; blues for electric guitar and pop/rock for piano.
Although the number of replies may not suffice to ensure
statistical significance in the reported requirements, it is
believed that the wide range of replies in section (a) of the
questionnaire permits making general conclusions.
In the following we report on questions and results related
to music notation, i.e., the (b) category of questions.
• Q1. Do you use music notation (Reply: Yes/No)?
• Q2. How many musical instruments appear on the
scores you use for your lessons? Select all that apply.
(Reply: 1/2/3/more than 3)
• Q3. Which music notation software do you use to
create or edit your scores? Select all that apply.
(Reply: Finale, MuseScore, Sibelius, Other, I don’t
use any software)
• Q4. If, during synchronous online music lessons, you
had the possibility to edit a score, which of the
following would you consider important? Select all
that apply. (Reply: Single note editing, Edit one or
more chords, Transposition of the entire score to a
different tone, Transposition of some segment to
another tone, Tempo changes, Comments and
annotations on the score)
• Q5. Which of the options of Q4 is most important for
your lessons? (Reply: Select a single of the options
provided in Q4)
• Q6. How important is sheet music as teaching
material for your online lessons? (Reply: Not at all
/Little/Moderately/Very much /Absolutely necessary)
• Q7. When interpreting a score, how useful is the
indication of the instant performance position on the
score (score following)? (Reply: Not at all
/Little/Moderately/Very much/ Absolutely necessary)
• Q8. How useful would it be to be able for students to
record their performance and then synchronize it with
sheet music, so that you can listen to their
interpretation at specific positions of the score?
(Reply: Not at all /Little/Moderately/Very much/
Absolutely necessary)
All participants replied that they do use music notation
(Q1), the majority of which reported that the notation used
represents the performance of a single instrument, although
two instruments and more than three instruments were
reported by three instructors (Q2). In Q3, Finale, MuseScore
and Sibelius received an equal number of answers. Four
participants stated that they do not always use music notation
software and one reported that she/he uses another software.
The answers of Q4 and Q5 are presented in Fig. 2 and Fig. 3
respectively. Most appreciated score manipulation features
are ‘comments and annotations’, ‘transposition of the entire
score’, ‘single note editing’ and ‘tempo changes’ (Q4). This
Fig. 2. The importance of different score manipulation actions (Q4). Most
replies were received for the transposition of the entire score and the
comment capability of writing comments and annotations on top of the
score.
Fig. 4. The rating in the importance of a digital score for online music
lessons (Q5) and the affordances of score following (Q7) and
synchronization of the score to an audio recording.
Fig. 3: The most important manipulation action (Q5). Commenting and
transposing the score were rated as most important.
was also confirmed in Q5, in which the instructors were asked
to choose one of the available options.
Finally, in terms of the necessity of a score for online
lessons, the possibilities presented by Q6, Q7 and Q8 were
rated in the scale of 1-5 that were signified as ‘Not at all’,
‘Little’, ‘Moderately’, ‘Very much’ and ‘Absolutely
Necessary’. The received ratings are depicted on Fig. 4. The
presence of a score was rated as as ‘Absolutely necessary’ for
carrying out online music lessons (Q6) by 57% of the
instructors. The affordance of score-following, i.e. providing
an indication of the instant performance position (Q7), was
rated as having moderate importance by most instructors,
while the possibility of synchronizing a recorded performance
to the score to ease the navigation between the two (Q8), had
higher ratings in terms of importance for the online lesson, as
nearly 36% of the instructors rated this functionality as
‘Absolutely necessary’.
This part of the survey guided the developments of the
VHVCoLab application, which is presented in the following
section.
III. THE VHVCOLAB APPLICATION
The Graphical User Interface (GUI) of the VHVCoLab
Application is presented on Fig. 5. Connecting to the
application assumes the prior creation of a course in the
MusiCoLab LMS.
Registered users having an instructor role create courses in
the LMS and invite their students to enroll. The learning
material for each course constitutes audio files, score files,
video files and annotation files. These files are uploaded by
instructors on the ODR and made available to all course
members. Instructors may choose a learning artifact, i.e., a
course file, and invite their student to either self-practice or
participate in synchronous lessons using that artifact. If the
artifact chosen corresponds to a score file (i.e., a kern file) the
hyperlink points to the VHVCoLab application and URL
parameters are attached to provide necessary information to
the application, which opens in a new browser window.
The remaining of this section describes the various
components of the application in terms of their functionality
and their implementation.
A. Score Rendering
The VHVColab application is based on Verovio
Humdrum Viewer (VHV). VHV is an online Humdrum file
notation editor and renderer that uses the Verovio typesetting
engine for generating graphical notation, the ace text editor for
editing the textual representation of the score, and humlib for
digital score processing in the Humdrum file syntax. The
Humdrum syntax allows representing score in an ASCII
format, particularly the **kern representation. Changes of the
**kern code are reflected to the graphical score engraving and
vice versa. The VHV code is freely distributed through
GitHub. VHV was chosen to be integrated in the MusiCoLab
environment for various reasons, including the fact that it is
open source and therefore easy to extend, it is a web
application and therefore suits the priorities of the platform for
a web-only environment and finally, it has built-in
functionalities for basic editing of the graphical engraving
such as pitch change and transposition, which would require
plenty of resources to develop from scratch. Despite these
advantages, VHV has certain limitations, some of which are
being addressed throughout our ongoing development efforts.
Our VHVCoLab implementation supports various
extensions that accommodate the requirements of music
instructors reported in the previous section and provides
complete support for the audiovisual rendering of jazz
transcriptions. Moreover, certain score manipulation actions
are made collaborative, i.e., replicated to multiple-peers of a
synchronous session as explained in section III-E.
The audio rendering capacity of VHV (which uses a
limited set of sound fonts; primarily piano sounds) had to be
replaced with a new audio rendering component, which uses
Fig. 5: The GUI of the VHVCoLab application.
an improved sound bank (provided by the GJT application
series) and moreover supports features such as the ‘Swing’
feel which is fundamental in jazz performance. This improved
audio rendering capability is implemented in python and
invoked by the orange play-button to the left of the Tempo
field of Fig. 5. Pressing this button initiates an
XMLHTTPRequest to the python script that processes the
kern file and returns its GJT audio representation.
A music chart in the jazz standards commonly comprises
a melody overlaid with chord symbols. VHVCoLab leverages
the GJT application for rendering chord information on a chart
of notes played as accompaniment to a soloist of a typical jazz
trio, namely piano, bass and drums. For the purposes of the
VHVCoLab, a ‘server version’ of the GJT application was
developed and it is employed to generate the musical notes
that correspond to the chord symbols of a given jazz standard.
For the purposes of music education, both chords and
generated drums, bass, and piano part are rendered to the
online interface, as in the example of Fig. 5. The process of
converting jazz standards to three-staff notation and sound
font rendering is described in more detail in [12].
B. Score Editing and Reharmonization
Regarding the requirements of instructors presented in Fig.
2 and Fig. 3, the functionalities of single note editing and score
transposition were supported by the original VHV application.
The implemented extensions addressed the requirements for
tempo changes, adding comments as well as editing chord
symbols. As shown on Fig. 5, tempo changes are applied by
the text box entitled ‘Tempo’ and a ‘Change!’ button applies
the tempo change on the visual score as well as the
corresponding kern file. Comments may be added by clicking
on a note or selecting a phrase. An extension for multiple
notes, i.e., phrase, selection and highlighting has also been
implemented. This is shown by the purple frame of Fig. 5.
Upon clicking on a note or selecting a phrase the ‘Add
Comment’ option appears and users are prompted to enter the
comment text. On the score of Fig. 5, a comment has been
added at the triplet of the highlighted area, as indicated by the
two arrows. These arrows are clickable and toggle the display
of comments. The specific comment is shown at the bottom
right of Fig. 5.
Finally, chord editing is facilitated by clicking on a chord
symbol, which reveals two buttons as shown on Fig. 6a.
Clicking on the ‘Edit’ button depicts the chord editing
interface shown at the right of Fig. 6a. In this interface, users
may edit the root of the chord, including its accidentals and
select among a list of possible chord variations. The chord
variations and their notation correspond to the chords found in
the jazz standards. The SVG symbols used in the chord editor
have been designed and provided by our GJT project partners
as custom OpenType Fonts (OTF).
Clicking the ‘Edit’ or the ‘Suggest’ button of Fig. 6a,
initiates an XMLHTTPRequest containing the instant version
of the kern file as well as the chord where editing or suggesting
a different chord was requested. The request is sent to a
separate component of the MusiCoLab platform, i.e., the
reharmonization server.
When the reharmonization server receives an ‘Edit’
request, it checks which chord needs to be substituted by
which new chord and forms an appropriate input string for the
GJT server. Upon reception of the response, the
reharmonization server processes the kern file and sends a new
kern (for displaying) and a new MIDI file (for playback) to
VHVCoLab. Except from editing, another option is also
available to the user for updating the harmonic and surface
content of the score: suggestion. When user clicks the
‘Suggest’ button of Fig. 6a after clicking on a chord, the
request sends the currently displayed kern along with the
index of the chord that needs to be replaced with automatic
suggestion. In this case, the reharmonization server calls pre-
trained jazz standard-focused version the Chameleon melodic
harmonization assistant [13], which substitutes the user-
specified chord with a new chord, based on probabilistic
inference. The harmonic and structural information of the
piece in the initial request now includes a new chord in the
place of the old one. This information is sent to the GJT server,
which again produces a kern and a MIDI representation of the
new score, which is subsequently returned to VHVCoLab.
C. Audio-to-Score Alignment
Audio-to-Score Alignment (ASA) refers to the questions
Q7 and Q8 of the requirements analysis. Q7 investigates the
importance of score following, i.e., depicting the instant score
position while performing. Q8 refers to synchronizing a
recorded performance to the score for the purposes of
navigating within that performance, i.e., by selecting the
corresponding positions on the score. To differentiate, Q7
refers to real-time ASA, while Q8 refers to offline ASA, as a
post-recording process. As shown on Fig. 3, the importance of
these application capabilities was not clear, as different ratings
were provided. This may be attributed to the fact that the
instructors are not familiar with such application affordances.
It was however interesting to implement them in the
VHVCoLab application, to investigate whether musicians
would appreciate and enjoy these features.
At the time of this writing only offline ASA has been
implemented. Fig. 6b shows the user interface for this
functionality. The waveform of a recording is displayed above
the score. The buttons at the top-right of the recording
facilitate the ASA. From left to right, the buttons are: ‘Go to
Selection’, ‘Stop and Go to Start’, ‘Play/Pause’, ‘Record’,
‘Pause Recording’, ‘Stop Recording’, ‘Synhcronize’,
‘Analyze’, ‘Download’.
A user records her/his interpretation of the score. This may
be done while listening to the audio representation of the
score, or just by reading the score. When the recording is
completed, the user may press the ‘Synchronize’ button. This
Fig. 6 Specific functionalities of the VHVCoLab. (a) Chord editing and
reharmonization request. (b) Audio-to-score alignment.
button posts an asynchronous XMLHTTPRequest to the
server. The body of this request contains the recording and a
midi representation of the score file. At the server side, the
midi representation is converted to audio using FluidSynth7
SoundFont synthesizer. The two audio files are subsequently
synchronized using their chromagram representation and the
implementation of the Dynamic Time Warping (DTW)
algorithm [14] of the librosa8 python package. The server then
returns a response which contains an array of corresponding
times of the score and the recording. When the reply is
received from the server, the user may select a note on the
graphical score and then press the ‘Go to Selection’ button to
move the waveform cursor to the position of the recording that
corresponds to the selected note. While playing back the
recording from that point, the user may click multiple times
the ‘Go to Selection’ button to review the recorded
performance of that point of the score.
D. Video Conferencing and NMP
During the lesson, participants communicate using the
Audio/Video Conferencing facility of the MusiCoLab
platform. This is initiated by pressing the dial button, shown
in Fig. 5 at the right side below the video rendering of
participants. Upon pressing the dial-button, users are
presented with the dialogue window shown on Fig. 7. If the
users choose the ‘Video Only’ option, then they join the
meeting room without exchanging audio streams and they
have the possibility to execute the Jamulus application to
connect to the Jamulus server hosted by the platform offering
high quality low-latency audio communications. Selecting
either option reveals the iFrame component shown at the top
of Fig. 5.
During synchronous music lessons, it is not common for a
teacher and a student to simultaneously perform the same
piece of music. It is rather more common that the teacher
performs and asks the student to imitate her/his interpretation
of a piece or a specific phrase. For this reason, and in
alignment with the project priority for offering a web-only
solution, Jitsi was chosen as the baseline teleconferencing
system of MusiCoLab.
Jitsi uses the Opus codec and may be configured for high
quality audio by setting the appropriate URL parameters of a
meeting. For example, setting the parameter
opusMaxAverageBitrate to its highest value, i.e.,
510kbps, enables full band audio communication, while at the
same time disabling latency inducing features such as audio
7 https://www.fluidsynth.org/
8 https://librosa.org/
echo cancellation, automatic gain control, noise suppression
etc. This setup is chosen by default in the MusiCoLab
teleconferencing platform. However, even with the optimal
parameter setup real-time musical interactions are not possible
through Jitsi, or any other WebRTC-based solution.
In the research domain of NMP, it is well known that the
WebRTC technology is currently inappropriate for ultra-low
delay audio communications which is a fundamental
requirement for network-mediated simultaneous music
performance. Although the VHVCoLab application is not
intended for simultaneous music performance, the ‘Video
Only’ option is provided for users wishing to have a video
communication while at the same time using an NMP
application such as Jamulus.
E. Collaborative Activities and Social Awareness
To facilitate synchronous collaboration among remote
peers, and to allow replicating the activities of one peer to the
others, the Yjs9 framework has been recruited. Yjs is a high-
performance shared editing framework for building
collaborative applications that sync automatically [15]. It is
based on research work on Conflict-free Replicated Data
Types (CRDTs) [16]. In this model, each client maintains a
replica of the shared state, applies modifications locally which
are then broadcast to remote participants so that they can
update the state of their own replicas while potential conflicts
are resolved automatically.
With regards to syncing, Yjs relies on a high-level protocol
which constitutes a simple message exchange sequence for the
connected clients. Clients either send their local state or reply
to remote clients with the state segment that they are missing
in an efficient manner. The awareness API, part of the Yjs
framework, is utilized to efficiently construct update messages
with information about activity awareness (peer actions at a
given moment) and social presence (who is connected) which
are not part of the main shared data structure. In addition,
network connectors are part of the Yjs ecosystem which
internally implement the protocols/APIs to connect Yjs with
well-established network protocols (e.g., WebSocket or
WebRTC).
Activity awareness involves informing all session participants
for the execution of a user action. Certain user actions were
chosen to be made aware to the participants of a session.
These are ‘note selection, ‘multiple note selection’, ‘chord
edit’ and ‘add comment’. The ‘note selection awareness is
shown on Fig. 5 by a username label at the notes of the score
where users have clicked. Please note the labels ‘dneonakis’
and ‘yannis’ below the Cm7 chord of the screenshot. The same
applies for the remaining events. For example, when a user
chooses to edit the chord, the chord editor of Fig. 6a is
replicated on every online user.
Social awareness refers to the ability of users to see who
is online. In VHVCoLab, social awareness and social presence
is depicted by the floating black panel at the left side of Fig.
5. If a user disconnects, then their profile icon is shown with a
red, as opposed to a green dot.
Finally, certain actions have been chosen to be registered as
the interaction/collaboration history of the lesson. A
9 https://yjs.dev/
Fig. 7. Joining a meeting room to facilitate audio-video communication.
The room name suggested corresponds to the unique short name of the
LMS course. Users may opt to use both video and audio communication
through Jitsi by pressing 'Join Call' or they may use the 'Video only’ option
to disable audio stream transmission through Jitsi and use the Jamulus
client providing low-latency audio communications.
dedicated database is used to make persistent the collaborative
exchanges that have taken place during a synchronous session
for a specific score of the ODR and a specific course of the
LMS. This allows tracing how different courses transform the
same score as well as how different scores are manipulated by
the same course, and thus its participants. The actions that are
currently stored in this database are those of ‘change pitch’,
‘transpose’, ‘add comment’, ‘edit chord’, ‘connect’ and
‘disconnect’. Each action is associated with a username and a
datetime field of occurrence.
Displaying the action history may be toggled by the
‘Action History’ button of the presence awareness panel (left
of Fig. 5). This reveals a panel to the right of the GUI shown
on Fig. 8, which lists the actions that have taken place in
reverse chronological order. Users can scroll back to the first
action, which is the ‘connect’ action of the first user. For each
action, a short description is provided, and undo/redo buttons
allow to cancel and reinitiate specific actions.
A short video clip has been prepared to demonstrate the
collaborative activities of the VHVCoLab application10.
IV. EXPERIMENTAL SETUP AND TENTATIVE ASSESSMENT
In this section we describe an experiment which was
conducted to assess features of the application. Specifically,
the primary task at hand entailed that independent musicians
(project collaborators) would organize and conduct a lesson
with a small number of students from remote locations at the
same date. For the purposes of the experiment, the lesson was
described in terms of a lifecycle comprising designated tasks
prior, during, and after the lesson.
A. Prior to the lesson
Prior to lesson all tasks are asynchronous and aim at
preparing the lesson on behalf of the instructor as well as the
students. Specifically, instructors should create a course in the
MusiCoLab LMS and enroll students or ask them to self-
enroll. Upon student enrollment, the instructors should post an
announcement to their course such as the following: "Please
connect on Friday 21 Mach at 10.00 at <link_a> and press
the dial (phone) button to join the conferencing room. Please
get prepared by practicing our musical piece to be found at
<link_b>. When joining the lesson, please make sure that you
use headphones". In this announcement, <link_a>
corresponds to the synchronous lesson providing functions for
10 https://youtu.be/C1Fmb6kd1bw
collaborative activities and social awareness, while <link_b>
supports interactions of individual users. Both links point to a
kern file of the MusiCoLab repository.
B. During the lesson
During the lesson, the learning objectives may vary to include
a variety of themes and topics. In this experiment, a short
demonstration of the application features was provided to
participants before starting the lesson. TABLE I provides a
summative description of what the lesson was set out to
achieve and the sequence of (tentative) tasks which were
suggested and arranged during the lesson.
The screenshot shown on Fig. 5 was captured during the
experiment. The lesson involved five participants: ‘antonis’
and ‘dneonakis’ that served the role of the instructors, and
‘socratesvak’ and ‘yannis’ that were the students. User
‘chrisoula’ participated as an observer. Socreatesvak is a
guitarist and yannis is a drummer. The instructors chose to
study the piece ‘Now’s the Time’, which is a twelve-bar blues
piece in F composed by Charlie Parker. All participants were
using headphones and were connected through their domestic
Wi-Fi connection.
C. Post lesson Subjective Assessment
Following the lesson, participants were guided to complete
a short online questionnaire with questions inviting
constructive assessment of the users’ experience. The
questionnaire comprised a total of nine questions concerning
specific tasks, the overall sense of teamwork and social
awareness and the quality of the conferencing and
communication facilities provided. These questions are
presented in TABLE II.
D. Results and Discussion
The experiment was particularly informative in terms of
the actual practice of conducting online music lessons. Several
problems had to be dealt including deficiencies of the
network, side effects resulting from the use of headphones, as
well as some intriguing issues that were not foreseen prior to
the experiment.
Fig. 9 shows the answers received for Q1 and Q2 that
involved the assessment of the application efficiency in the
tasks T1-T8 as well as their importance for online music
lessons. The efficiency of Tasks T2-T6 were rated as excellent
by two participants. T1 (obtaining an overview of the score)
and T7 (ASA) were rated as satisfactory, while T8
(simultaneous music performance) was rated as adequate.
Simultaneous performance was attempted with the
Audio/Video conferencing application. Users did not attempt
to use Jamulus, so the quality of real-time audio
communication was inferior to that offered by NMP tools. It
is well known that WebRTC induces an inevitable amount of
latency, of the order of 100-150ms, as a buffering delay
despite the quality of network communication, which renders
web applications inappropriate for multi-peer NMP [17].
Interestingly however, the facility of simultaneous
performance was not rated as necessary for conducting online
music lessons (Q2) by three out of four participants. This
confirms our previous claims that in music learning, it is rather
uncommon for connected peers to simultaneously perform the
same piece of music. It is instead more common that a teacher
Fig. 8 The history of collaborative exchanges is shown by clicking the
‘Action History’ button of the social presence panel. The panel is scrollable
and shows the actions in reverse chronological order. Here three snapshots
are shown from the most recent (left) to the oldest (right).
TABLE I. DESCRIPTION OF OBJECTIVES AND TASKS DURING THE SYNCHRONOUS LESSON.
Objective
Plausible/suggested tasks
Instructor
Student(s)
Obtaining an overview of
the learning artifact
All participants are instructed to use headphones to avoid feedback. The instructor welcomes the students, explain the
learning objectives, and direct the students' attention to the currently active score
Both instructor and students have a visual representation of the music piece to be studied
Ask each of the students to produce an audio rendering of the
score
Produce an audio rendering of the score and discuss issues
of performance and interpretation
Students get an auditory (in addition to the visual) representation of the music piece to be studied.
Editing/manipulation of
notes and chords
Changes a couple of individual notes of the score, e.g.,
modifying the melody or the bass line and asks students to
perform an equivalent change (the change is replicated to all
participants)
Perform an equivalent change and receive instructor
feedback (the change is replicated to all participants)
Change a chord to modify the harmonic content of the piece
(the change is replicated to all participants) and use your
instrument to perform the modified score
Students follow (no requirement to do something)
Invite students to perform the modified score
Perform the modified score using your instrument
Both instructor and students discuss whether the new representation is appropriate or requires some further editing
Modifying the score as a
whole (e.g., Tempo, Key
Transpositions)
Invite students to change the tempo of the score, listen to its
audio rendering and perform the potentially difficult segments
to arrive at a tempo value at which they can perform
confidently.
Perform your part at different tempi and suggest a tempo
value appropriate for your performance skills. For more
than one student, you may all mute your mic at the Jitsi
room until you come up with your preferred tempo, to
allow practicing in parallel with your collaborators.
Determines a tempo which is convenient for everyone, e.g.,
the slowest and asks students to modify the score tempo at the
new value
Modify the tempo at your score display to the tempo
suggested by the instructor
Transpose the score to a new tone (the change is replicated to
all participants) and ask students to perform the new score
Perform the score with the new tempo at the new tone.
Annotating/guiding
expressive interpretation
(Comments)
Select a phrase on score the and click the dropdown menu to
add comments guiding your students’ how to practice or
perform
Click on the user presence panel (arrow on the left) and the
Comment button to see the list of comments.
Discussion on the comments that need to be addressed
Assess the improved
navigation within the audio
recordings facility (Audio-
to-score alignment)
Instruct the students to interpret the score while recording
their performance and then click the synchronize button.
Perform your solo performance and click the synchronize
button (other students follow)
Click on the score location where the problem appears and
invite the student to click at the same point and subsequently
click the ‘Go to selection’ button
Click at the score position indicated by the instructor and
press the ‘Go to selection’ button to listen to your
performance at that location
Provide feedback to students concerning the interpretation of
that specific point of the score
Practice repeatedly as instructed until you arrive at a
satisfactory interpretation.
or a student will perform a musical excerpt, and others will be
required to imitate or discuss her/his interpretation [1].
Concerning the effectiveness in obtaining an overview of the
score (T1), discussions with the participants of the experiment
revealed that the inferior effectiveness should be attributed to
the fact these tasks are not shared within the group. This was
also reported as a problem by one of the participants in Q7.
Specifically, as every participant was wearing headphones, it
was not possible for the participants to listen to the audio
rendering of the score at the same time. To address this
problem in future developments, we are considering to either
make the play-button of the score collaborative, i.e., share the
activity of this button being pressed by one of the users, or
facilitate the ‘Share Audio’ functionality of the Jitsi
audioconferencing to allow the user listening to the score to
share the received sound with her/his team.
In the same line, the inferior effectiveness of the ASA facility
(T7), may be attributed to the fact that this facility was not
implemented as a collaborative feature. In other words, when
a user records her/his interpretation of the score, the solo
recording as well as the request for synchronizing the
recording to the score by clicking the ‘Synchronize’ button, is
not propagated to the rest of the participants. This renders the
specific application feature useless during synchronous
lessons. Since this capability was rated as necessary or
desirable (with reference to Q2), its implementation as a
collaborative activity is addressed in our imminent
development plans.
In assessing the application support for team/group work
(Q3-Q5), all participants replied ‘Yes’, which confirms the
added value of the application’s support for social awareness
and collaborative teamwork.
In the category of communication and conferencing, all
participants replied that the form of conferencing is perfectly
clear, and they did not provide any suggestions (Q6). In Q7,
participants offered various comments and suggestions for
improvements. Specifically, one participant reported that he
experienced frequent problems with his network connectivity,
which affected continuous audio flow and hence the quality of
the received audio streams. Another participant suggested that
the audio rendering of the score should have been shared
among the group (T1). Another participant pointed out that the
perceived sound was limited in brightness and dynamics and
had synchronization problems owing to audio delay. Finally,
the last participant replied that although timbre perception and
synchronization were adequate, he was receiving noise at
TABLE II THE QUESTIONNAIRE THAT WAS COMPLETED BY THE PARTICIPANTS OF THE EXPERIMENT.
Questions concerning specific tasks
User tasks and application affordances
Answer
Q1. Please rate the VHVCoLab
application in terms of its overall
effectiveness in the following
tasks/affordancies
• T1. Obtaining an overview of the score to be studied (notation /sound)
• T2. Score tempo adjustments
• T3. Using the audio rendering of the score as an accompaniment to your
performance
• T4. Editing notes and chords
• T5. Score transpositions
• T6. Adding/Viewing comments on the score
• T7. Recording your performance and synchronizing it with the score
• T8. Simultaneous performance with the other participants
1. Not effective
2. Moderate
3. Adequate
4. Satisfactory
5. Excellent
Q2. How do you assess the following
facilities in terms of their importance for
online music education?
6. Superfluous
7. Desirable
8. Necessary
Questions about team/group work
Q3. Did you establish sense of community / togetherness in the course of the lesson?
Yes/No
Q4. Was it easy to connect and understand who is online?
Yes/No
Q5. Was it easy to interact with others during the lesson?
Yes/No
Questions about communication and conferencing
Q6. Is this form of conferencing clear prior, during and after the lesson in terms of collective outcomes? Do you have any suggestions
for improvement?
Long answer
Q7. How did you perceive the sound from remote collaborators and what kind of problems did you experience, e.g., sound
interrupts/distorted sound, dull sound (limited brightness), limited dynamics (could not perceive expressive dynamics),
synchronization problems (could not perform simultaneously with others). Please elaborate....
Long answer
Overall efficiency in supporting the learning objective
Q8. Overall, did the application help you to carry out the lesson with others (in comparison with your experience
with conventional tools and apps)?
1 (Not at all) to 5 (Absolutely)
Q9. Please provide your comments and possible suggestions concerning the application.
Long answer
some point during the experiment. It should be noted that
all these problems were anticipated beforehand, as users were
connected using their domestic Wi-Fi connection and
communicated through web-based teleconferencing facilities,
predominantly aimed at oral communications.
As for the efficiency of the application to facilitate music
learning objectives, all participants replied that the
VHVCoLab application was absolutely enabling compared to
conventional tools for online lessons (see Q8) but
unfortunately, they did not provide any comments or
suggestions in Q9.
Considering the above results, our overall assessment of
the VHVCoLab application is positive, despite the problems
and deficiencies observed and noted by the participants.
Several of the issues arising during the lesson had been
foreseen prior to the experiment, while others were anchored
because of the experiment. Overall, it was made clear that peer
collaboration in music education is meaningful, engaging, and
viable, provided that the supporting infrastructure is in place
and fine-tuned.
V. CONCLUSIONS
This article presents the development of the VHVCoLab
application for supporting online music education by
facilitating both synchronous and asynchronous learning and
teaching practices. VHVCoLab is an extension of the open
source VHV application. VHV is a web application for
rendering scores in the kern format. Compared to VHV,
VHVCoLab implements several extensions including the
support of multiuser participation and collaborative score
manipulation activities, improved navigation through audio to
score alignment, chord editing and score rendering of chord
symbols in three-staff notation, improved audio rendering
using high quality sound fonts and replicating the Swing feel,
as well as intelligent reharmonization of the jazz standards
compositions based on probabilistic inference.
The application is part of a comprehensive platform that
‘glues’ together various digital materials including third-party
applications, augmented libraries and dedicated software tools
and repositories to establish an engaging environment for
online music education. A core element of the application is
the multi- for interaction and collaboration among the
participants of faceted role of music notation artifacts and their
novel affordances that create new possibilities a music lesson.
The paper also reports on the results of a pilot experiment,
aimed at assessing VHVCoLab in a blues music lesson which
was crafted and materialized from scratch using the
MusiCoLab platform. Although this experiment was narrow
both in scope and in number of participants, it provided
valuable insights into the actual practice and experience of
real-time computer-mediated music lessons highlighting both
the potential value as well as the deficiencies of synchronous
Fig. 9 The answers received in the questions Q1 and Q2 of the
assessment questionnaire.
and asynchronous collaboration, In addition and equally
important was the fact that the experiment helped the research
team establish the benchmarks for future assessments with
more participants and more complex learning scenarios.
In the future, we foresee several extensions in the core
platform as well as specific components to address the
shortcomings already revealed, but also to step towards a
realization of the capabilities of the IoMusT in music
education. In this context, we expect to be able to make more
solid contributions as MusiCoLab matures in areas related
primarily to software-intensive issues and in particular data
that need to be shared, formats suitable for processing musical
things and mechanisms for sharing (both asynchronous and
synchronous) given the challenges common to real-time audio
processing systems such as latency and quality. We aspire that
this article and the availability of the VHVColab application,
can make a meaningful contribution to the advancement of
online music education technology towards realizing the
vision of professional musicians who seek to expand their
knowledge to a broader audience.
ACKNOWLEDGMENT
This research has been co‐financed by the European
Regional Development Fund of the European Union and
Greek national funds through the Operational Program
Competitiveness, Entrepreneurship and Innovation, under the
call RESEARCH – CREATE – INNOVATE. Project
Acronym: MusiCoLab, Project Code: T2EDK-00353.
The authors would like to thank the musicians Antonis
Tsikandylakis, Dimitris Neonakis, Socrates Vakirtzian and
Yannis Iliakis for their participation in the tentative
assessment and their valuable feedback and suggestions for
improving the efficacy of the application in supporting
desirable affordances.
REFERENCES
[1] C. Alexandraki, D. Akoumianakis, M. Kalochristianakis, P. Zervas, M.
Kaliakatsos-Papakostas and E. Cambouropoulos, “MusiCoLab:
Towards a Modular Architecture for Collaborative Music Learning”,
Web Audio Conference 2022 (WAC 2022), Cannes, France, Jun. 2022.
doi: 10.5281/zenodo.6770559
[2] D. Akoumianakis, C. Alexandraki, D. Milios and A. Nousias,
“Synchronous Collaborative Music Lessons and their digital
materiality”, presented at the Web Audio Conference 2022 (WAC
2022), Cannes, France, Jun. 2022. doi: 10.5281/zenodo.6768537.
[3] P. Bellini, P. Nesi, M. B. Spinu, Cooperative Visual Manipulation of
Music Notation. ACM Transactions on Computer-Human Interaction.
9, 194–237 (2002).
[4] D. Akoumianakis, C. Alexandraki, Collective practices in common
information spaces: Insight from two case studies. Human-Computer
Interaction. 27, 311–351 (2012).
[5] L. Turchet et al., The Internet of Sounds: Convergent Trends, Insights,
and Future Directions. IEEE Internet of Things Journal. 10, 11264–
11292 (2023).
[6] L. Turchet, C. Fischione, G. Essl, D. Keller, M. Barthet, Internet of
Musical Things: Vision and Challenges. IEEE Access. 6, 61994–62017
(2018).
[7] C. Alexandraki, R. Bader, Anticipatory Networked Communications
for Live Musical Interactions of Acoustic Instruments. Journal of New
Music Research. 45, 68–85 (2016).
[8] C. Vear, C. Rees, A. Stephenson, NAUTILUS: A CASE STUDY IN
HOW A DIGITAL SCORE CAN TRANSFORM CREATIVITY.
Tempo (United Kingdom). 77, 33–42 (2023).
[9] S. Chakraborty, S. Dutta, J. Timoney, The Cyborg Philharmonic:
Synchronizing interactive musical performances between humans and
machines. Humanities and Social Sciences Communications. 8 (2021),
doi:10.1057/s41599-021-00751-8.
[10] M. Chen, G. Mani, A. Renaud, Are Musicians Being Replaced by
Artificial Intelligence? Journal of Student Research. 11 (2022),
doi:10.47611/jsrhs.v11i3.3468.
[11] M. Kalochristianakis, P. Zervas and C. Alexandraki. “An Intelligent
Data Repository Consolidating Artifacts of Music Learning.” Web
Audio Conference 2022 (WAC 2022), Cannes, France.
https://doi.org/10.5281/zenodo.6768630
[12] M. Kaliakatsos-Papakostas, K. Velenis, K. Giannos and E.
Cambouropoulos, “Exploring Jazz Standards with Web Visualisation
for Improvisation Training”, presented at the Web Audio Conference
2022 (WAC 2022), Cannes, France, Jun. 2022. doi:
10.5281/zenodo.6767946.
[13] M. Kaliakatsos-Papakostas, K. Velenis, L. Pasias, C. Alexandraki, and
E. Cambouropoulos, “An HMM-Based Approach for Cross-
Harmonization of Jazz Standards,” Applied Sciences, vol. 13, no. 3, p.
1338, Jan. 2023, doi: 10.3390/app13031338.
[14] M. Müller, Fundamentals of Music Processing (Springer International
Publishing, 2015).
[15] P. Nicolaescu, K. Jahns, M. Derntl, R. Klamma, Near real-time peer-
to-peer shared editing on extensible data types, in Proceedings of the
International ACM SIGGROUP Conference on Supporting Group
Work (Association for Computing Machinery, 2016), vols. 13-16-
November-2016, pp. 39–49.
[16] M. Shapiro, N. Preguiça, C. Baquero, M. Zawirski, Confict-free
replicated data types, in Proceedings of the 13th international
conference on Stabilization, safety, and security of distributed systems
(Grenoble, France, October 10–12, 2011). SSS'11. Springer-Verlag,
Berlin, Heidelberg, 386–400.
[17] M. Sacchetto, P. Gastaldi, C. Chafe, C. Rottondi, and A. Servetti,
“Web-Based Networked Music Performances via WebRTC: A Low-
Latency PCM Audio Solution,” J. Audio En. Soc., vol. 70, no. 11, pp.
926–937 (2022 Nov.). https://doi.org/10.17743/jaes.2022.0021.