
Ichiro Fujinaga- Ph.D.
- Professor (Full) at McGill University
Ichiro Fujinaga
- Ph.D.
- Professor (Full) at McGill University
About
176
Publications
64,515
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
3,516
Citations
Current institution
Publications
Publications (176)
This paper presents a comprehensive review of the advancements in Optical Music Recognition (OMR) driven by Deep Learning (DL) techniques. OMR aims to digitize music scores, transforming them into structured formats to enhance accessibility, facilitate preservation, and enable automated analysis. While early methods were based on heuristic approach...
The automatic analysis of scores has been a research topic of interest for the last few decades and still is since music databases that include musical scores are currently being created to make musical content available to the public, including scores of ancient music. For the correct analysis of music elements and their interpretation, the identi...
Music Performance Analysis is based on the evaluation of performance parameters such as pitch, dynamics, timbre, tempo and timing. While timbre is the least specific parameter among these and is often only implicitly understood, prominent brass pedagogues have reported that the presence of excessive muscle tension and inefficiency in playing by a m...
In this study, we consider the case of the trumpet to study the role of timbre quality from the perspectives of music pedagogy and music information retrieval. Prominent brass pedagogues have reported that the presence of excessive muscle tension and inefficiency in playing by a musician is reflected in the timbre quality of the sound produced, whi...
Optical music recognition (OMR) is the field that studies how to automatically read music notation from score images. One of the relevant steps within the OMR workflow is the staff-region retrieval. This process is a key step because any undetected staff will not be processed by the subsequent steps. This task has previously been addressed as a sup...
AugmentedNet is a new convolutional recurrent neural network for predicting Roman numeral labels. The network architecture is characterized by a separate convolu-tional block for bass and chromagram inputs. This layout is further enhanced by using synthetic training examples for data augmentation, and a greater number of tonal tasks to solve simult...
Many areas of the digital humanities (DH) have the potential to benefit greatly from recent advances in machine learning, big data, and statistical analysis. These sophisticated techniques come with pitfalls, however, and their accidental misuse can lead to erroneous results. This article outlines in broad terms our experiences with a large-scale,...
The automatic analysis of scores has been a research topic of interest for the last few decades and still is since music databases that include musical scores are currently being created to make musical content available to the public, including scores of ancient music. For the correct analysis of music elements and their interpretation, the identi...
Musicologists and musicians often would like to search by keys in a digital music library. In this paper, we introduce a new key-finding algorithm that can be applied to music in both symbolic and audio formats. The algorithm, which is based on a Hidden Markov Model (HMM), provides two stages of key-finding output; the first one referring to local...
Existing systems for searching in symbolic music corpora generally suffer from either of two limitations: they are either limited in power because they accept only simple search patterns, or they are hard for musicologists and musicians to use because they require knowledge of programming and text processing tools. In this paper, we propose a new m...
Vocal polyphonic music from 1280 to 1600 is written in mensural notation and it is typically presented in a layout with separate parts. In this paper, we introduce the Mensural Scoring-up Tool, a set of scripts designed to automatically transform the separate-parts representation of the music into a score by dealing with the context-dependent natur...
In this paper, we discuss how different encodings in symbolic music files can have consequences for music analysis, where a truthful representation, not only of the musical score, but of the semantics of the music, can change the results of music analysis tools. We introduce a series of examples in which different encodings effectively modify the c...
The document analysis of music score images is a key step in the development of successful Optical Music Recognition systems. The current state of the art considers the use of deep neural networks trained to classify every pixel of the image according to the image layer it belongs to. This process, however, involves a high computational cost that p...
There is an increasing interest in the automatic digitization of medieval music documents. Despite efforts in this field, the detection of the different layers of information on these documents still poses difficulties. The use of Deep Neural Networks techniques has reported outstanding results in many areas related to computer vision. Consequently...
Optical music recognition (OMR ) describes the process of automatically transcribing music notation from a digital image. Although similar to optical character recognition (OCR ), the process and procedures of OMR diverge due to the fundamental differences between text and music notation, such as the two-dimensional nature of the notation system an...
Content-Based Music Retrieval (CBMR) for symbolic music aims to find all similar occurrences of a musical pattern within a larger database of symbolic music. To the best of our knowledge there does not currently exist a distributable CBMR software package integrated with a music analysis toolkit that facilitates extendability with new CBMR methods....
This paper addresses the problem of harmonic analysis by proposing a non-chord tone identification model using deep neural network (DNN). By identifying non-chord tones, the task of harmonic analysis is much simplified. Trained and tested on a dataset of 140 Bach chorales, an initial DNN was able to identify non-chord tones with F1-measure of 57.00...
Staff-line detection and removal are important processing steps in most Optical Music Recognition systems. Traditional methods make use of heuristic strategies based on image processing techniques with binary images. However, binarization is a complex process for which it is difficult to achieve perfect results. In this paper we describe a novel st...
Musical scores and manuscripts are essential resources for music theory research. Although many libraries are such documents from their collections, these online resources are dispersed and the functionalities for exploiting their content remain limited. In this paper, we present a qualitative study based on interviews with librarians on the challe...
Content within musical documents not only contains musical notation but can also include text, ornaments, annotations, and editorial data. Before any attempt at automatic recognition of elements in these layers, it is necessary to perform a document analysis process to detect and classify each of its constituent parts. The obstacle for this analysi...
A curious divide characterizes the usage of audio descriptors for timbre research in music information research (MIR) and music psychology. While MIR uses a multitude of audio descriptors for tasks such as automatic instrument classification, only a highly constrained set is used to describe the physical correlates of timbre perception in parts of...
Music information retrieval (MIR) is "a multidisciplinary research endeavor that strives to develop innovative content-based searching schemes, novel interfaces, and evolving networked delivery mechanisms in an effort to make the world's vast store of music accessible to all." MIR was born from computational musicology in the 1960s and has since gr...
Both timbre and dynamics of isolated piano tones are determined exclusively by the speed with which the hammer hits the strings. This physical view has been challenged by pianists who emphasize the importance of the way the keyboard is touched. This article presents empirical evidence from two perception experiments showing that touch-dependent sou...
The increasing variety of digital tools available for medieval musicology research includes the new project Single Interface
for Music Score Searching and Analysis (SIMSSA) at McGill University. Currently under development, SIMSSA has begun scanning
medieval chant manuscripts and applying optical music recognition (OMR) software to search for music...
Musical scores are the central resource for musicological research. Our project, Single Interface for Music Score Searching and Analysis (SIMSSA), targets digitized music scores to design a global infrastructure for searching and analyzing music scores. Specifically, we seek to provide researchers, musicians, and others to access the contents and m...
This paper discusses several technical challenges in using crowdsourcing for distributed correction interfaces. The specific scenario under investigation involves the implemen-tation of a crowd-sourced adaptive optical music recogni-tion system (Single Interface for Music Score Searching and Analysis project). We envisage the distribution of correc...
Knowing where listeners are is an important contextual dimension that can be used in context-aware music recommendation systems to improve their performance. This paper presents our research on identifying the time zone where listeners are by analysing their weekly aggregated music listening profiles. We collected a large dataset of full music list...
While analysing large corpora of music, many of the questions that arise involve the proportion of some musical entity relative to one or more similar entities, for example, the relative proportions of tonic, dominant, and subdominant chords. Traditional statistical techniques, however, are fraught with problems when answering such questions. Compo...
In 2008, at the ASA/EAA symposium honouring pioneering scientist of singing, Johan Sundberg, the Advancing Interdisciplinary Research in Singing (AIRS) project was introduced as a major collaborative research initiative on singing [Cohen, Acoustics 08, Paris (2008), 3177-3182]. Over 70 collaborators around the world were to investigate singing from...
For centuries, music has been shared and remembered by two traditions: aural transmission and in the form of written documents normally called musical scores. Many of these scores exist in the form of unpublished manuscripts and hence they are in danger of being lost through the normal ravages of time. To preserve the music some form of typesetting...
This paper introduces the Diva (Document Image Viewer with Ajax) project. Diva is a multi-page image viewer, designed for web-based digital libraries to present documents in a web browser. Key features of Diva include: "lazily loading" only the parts of the document the user is viewing, the ability to "zoom" in and out for viewing high-resolution p...
Medieval music manuscripts pose special challenges for dig-ital processing. Their unique page layouts and sometimes extreme degradation can make it difcult even to identify the portions of an image that correspond to the musical page. This paper addresses the page identication problem for me-dieval documents, with natural extensions to any type o...
In this paper we present our work towards developing a large-scale web application for digitizing, recognizing (via optical music recognition), correcting, displaying, and searching printed music texts. We present the results of a recently completed prototype implementation of our workflow process, from document capture to presentation on the web....
Purpose
The purpose of this paper is to present a new web‐based cataloguing system for the global music bibliography project, Répertoire International des Sources Musicales (RISM), and discuss the implications for the manipulation and discovery of musical heritage materials.
Design/methodology/approach
The paper is designed to illustrate the workf...
Hardcore, jungle, and drum and bass (HJDB) are fast-paced electronic dance music genres that often employ resequenced breakbeats or drum samples from jazz and funk percussionist solos. We present a style-specific method for downbeat detection specifically designed for HJDB. The presented method combines three forms of metrical information in the pr...
This paper introduces the Automatic Music Performance Analysis and Comparison Toolkit (AMPACT), is a MATLAB toolkit for accurately aligning monophonic audio to MIDI scores as well as extracting and analyzing timing-, pitch-, and dynamics-related performance data from the aligned recordings. This paper also presents the results of an analysis perfor...
This paper introduces Neon.js, a browser-based music notation editor written in JavaScript. The editor can be used to manipulate digitally encoded musical scores in square-note notation. This type of notation presents certain challenges to a music notation editor, since many neumes (groups of pitches) are ligatures - continuous graphical symbols th...
This chapter presents ACE XML, a set of file formats that are designed to meet the special representational needs of research in Music Information Retrieval (MIR) in general and automatic music classification in particular. The ACE XML formats are designed to represent a wide range of musical information clearly and simply using formally structured...
Optical music recognition (OMR) and optical character recognition (OCR) have traditionally been used for document transcription - that is, extracting text or symbolic music from page images for use in an editor while discarding all spatial relationships between the transcribed notation and the original image. In this paper we discuss how OCR has sh...
This paper evaluates the utility of the Discrete Cosine Transform (DCT) for characterizing singing voice fundamental frequency (F0) trajectories. Specifically, it focuses on the use of the 1st and 2nd DCT coefficients as approximations of slope and curvature. It also considers the impact of vocal vibrato on the DCT calculations, including the influ...
The work of a humanities e-researcher is scoped by the possibilities offered in digital artefacts: in their ever increasing number and their distribution and access over the Internet. This is recognised through a shift to an increasingly data-intensive method characterised as the "fourth paradigm" of e-Research and enabled by the new computational...
The growing quantity of digital recorded music available in large-scale resources such as the Internet archive provides an important new resource for musical analysis. An e-Research approach has been adopted in order to create a very substantive web-accessible corpus of musical analyses in a common framework for use by music scholars, students and...
This demonstration presents a music structure-based audio/visual interface for the navigation of very large scale music digital libraries. This work is a product of the Structural Analysis of Large Amounts of Music Information (SALAMI) project.
Audio chord recognition has attracted much interest in recent years, but a severe lack of reliable training data-both in terms of quantity and range of sampling-has hindered progress. Working with a team of trained jazz musicians, we have collected time-aligned transcriptions of the harmony in more than a thousand songs selected randomly from the B...
The goal of this study was to examine the possibility of training machine learning algorithms to differentiate between the performance of good notes and bad notes. Four trumpet players recorded a total of 239 notes from which audio features were extracted. The notes were subjectively graded by five brass players. The resulting dataset was used to t...
Music audio structure segmentation has been a task in the Music Information Retrieval Evaluation eXchange (MIREX) since 2009. In 2010, five algorithms were evaluated against two datasets (297 and 100 songs) with an almost exclusive focus on western popular music. A new annotated dataset significantly larger in size and with a more diverse range of...
Recent changes in the Music Encoding Initiative (MEI) have transformed it into an extensible platform from which new notation encoding schemes can be produced. This paper introduces MEI as a document-encoding framework, and illustrates how it can be extended to encode new types of notation, eliminating the need for creating specialized and potentia...
Recorded music offers a wealth of information for studying performance practice. This paper examines the challenges of automatically extracting performance information from audio recordings of the singing voice and discusses our technique for automatically extracting information such as note timings, intonation, vibrato rates, and dynamics. An expe...
In this paper we present our research in the development of a pitch-finding system to extract the pitches of neumes-some of the oldest representations of pitch in Western music- from the Liber Usualis, a well-known compendium of plainchant as used in the Roman Catholic church. Considerations regarding the staff position, staff removal, space- and l...
A music retrieval system is introduced that incorporate tempo, cultural, and beat strength features to help music therapists provide appropriate music for gait training for Parkinson's patients. Unlike current methods available to music therapists (e.g., personal CD/MP3 library search) we propose a domain-specific search engine that utilizes databa...
This paper presents the jWebMiner 2.0 cultural feature extraction software and describes the results of several musical genre classification experiments performed with it. jWebMiner 2.0 is an easy-to-use and open-source tool that allows users to mine the Internet in order to extract features based on both Last.fm social tags and general web search...
This paper discusses two sets of automatic musical genre classification experiments. Promising research directions are then proposed based on the results of these experiments. The first set of experiments was designed to examine the utility of combining features extracted from separate and independent audio, symbolic and cultural sources of musical...
jMIR is a free and open-source software suite designed for applications related to automatic music classification. jMIR includes the jAudio, jSymbolic and jWebMiner feature extractors, the ACE meta-learning framework, the ACE XML information exchange file formats, the jMusicMetaManager musical dataset management software and the Codaich, Bodhidharm...
This presentation will begin by introducing the research fields of music information retrieval and automatic music classification. The core of the presentation will then be divided into two parts, the first dealing with the jMIR software suite, and the second dealing with the ACE XML file formats. jMIR is a set of free and open source software tool...
SALAMI (Structural Analysis of Large Amounts of Music Information) applies computational approaches to the huge and growing volume of digital recorded music that is now available in large-scale resources such as the Internet Archive. It is set to produce a new and very substantive web-accessible corpus of musical analyses in a common framework for...
A new method for reducing parasitic pitch variations in archival audio recordings is presented. The method is intended for analyzing movie soundtracks recorded in optical films. It utilizes image processing for calculating and reducing effects of tape ...
The widespread use of beat- and tempo-tracking methods in music information retrieval tasks has been marginalized due to undesirable sporadic results from these algorithms. While sensorimotor and listening studies have demonstrated the subjectivity and variability inherent to human performance of this task, MIR applications such as recommendation r...
This paper describes the use of fingerprinting-based querying in identifying metadata inconsistencies in music libraries, as well as the updates to the jMusicMeta-Manager software in order to perform the analysis. Test results are presented for both the Codaich database and a generic library of unprocessed metadata. Statistics were computed in orde...
This paper describes experimental research investigating the genre classification utility of combining features extracted from lyrical, audio, symbolic and cultural sources of musical information. It was found that cultural features consisting of information extracted from both web searches and mined listener tags were particularly effective, with...
ABSTRACT This paper introduces ACE XML 2.0, a set of file formats that are designed,to meet,the special representational needs of research in automatic,music classification. Such standardized formats are needed to facilitate the sharing and long-term storage of valuable research data. ACE XML 2.0 is designed,to represent a broad,range of musi- cal...
Musical documents, that is, documents whose primary content is printed music, introduce interesting design challenges for presentation in an online environment. Considerations for the unique properties of printed msic, as well as users' expected levels of comfort with these materials, present opportunities for developing a viewer specifically tailo...
Our work focuses on optically reconstructing the stereo audio signal of a 33 rpm long-playing (LP) record using a white-light interferometry-based approach. Previously, a theoretical framework was presented, alongside the primitive reconstruction result from a few cycles of a stereo sinusoidal test signal. To reconstruct an audible duration of a lo...
This paper presents additions and improvements to the Autonomous Classification Engine (ACE), a framework for using and optimizing classifiers. Given a set of feature values, ACE experiments with a variety of classifiers, classifier parameters, classifier ensembles and dimensionality- reduction techniques in order to arrive at a configuration that...
Optical music recognition (OMR) is one of the most promising tools for generating large-scale, distributable libraries of musical data. Much OMR work has focussed on instrumental music, avoiding a special challenge vocal music poses for OMR: lyric recognition. Lyrics complicate the page layout, making it more difficult to identify the regions of th...
This paper presents a quantitative comparison of different algorithms for the removal of stafflines from music images. It contains a survey of previously proposed algorithms and suggests a new skeletonization based approach. We define three different error metrics, compare the algorithms with respect to these metrics and measure their robustness wi...
Ink bleedthrough is common problem in early music documents. Even when such bleedthrough does not pose problems for human perception, it can inhibit the performance of optical music recognition (OMR). One way to reduce the amount of bleedthrough is to take into account what is printed on the reverse of the page. In order to do so, the reverse of th...
This paper experimentally investigates the classification utility of combining features extracted from separate au- dio, symbolic and cultural sources of musical information. This was done via a series of genre classification experi- ments performed using all seven possible combinations and subsets of the three corresponding types of features. Thes...
Optical music recognition (OMR) applications are predominantly designed for common music notation and as such, are inherently incapable of adapting to specialized notation forms within early music. Two OMR systems, namely Gamut (a Gamera application) and Aruspix, have been proposed for early music. In this paper, we present a novel comparison of th...
A heuristic optimal discrete bit allocation algorithm is proposed for solving the margin maximization problem in discrete multitone (DMT) systems. Starting from an initial equal power assignment bit distribution, the proposed algorithm employs a multistaged ...
Optical music recognition (OMR) enables librarians to digitise early music sources on a large scale. The cost of expert human
labour to correct automatic recognition errors dominates the cost of such projects. To reduce the number of recognition errors
in the OMR process, we present an innovative approach to adapt the system dynamically, taking adv...
Optical music recognition (OMR) systems are promising tools for the creation of searchable digital music libraries. Using an adaptive OMR system for early music prints based on hidden Markov models, we leverage an edit-distance eval- uation metric to improve recognition accuracy. Baseline re- sults are computed with new labeled training and test se...
This paper presents the first Optical Audio Reconstruction (OAR) approach for the long-term digital preservation of stereo phonograph records. OAR uses precision metrology and digital image processing to obtain and convert groove contour data into digital audio for access and preservation. This contactless and imaging-based approach has considerabl...
Binarisation of greyscale images is a critical step in optical music recognition (OMR) preprocessing. Binarising music documents is particularly challenging because of the nature of music notation, even more so when the sources are degraded, e.g., with ink bleed-through from the other side of the page. This paper presents a comparative evaluation o...
jWebMiner is a software package for extracting cultural features from the web. It is designed to be used for arbi- trary types of MIR research, either as a stand-alone ap- plication or as part of the jMIR suite. It emphasizes ex- tensibility, generality and an easy-to-use interface. At its most basic level, the software operates by using web servic...
This paper describes the first iteration of a working model for searching heterogeneous distributed metadata repositories for sound recording collections, focusing on techniques used for real-time querying and harmonizing diverse metadata models. The initial model for a metadata infrastructure presented here is the first of its kind for sound recor...
Despite steady improvement in optical music recognition (OMR), early documents remain challenging because of the high variability in their contents. In this paper, we present an original approach using maximum a posteri- ori (MAP) adaptation to improve an OMR tool for early typographic prints dynamically based on hidden Markov models. Taking advant...
Although automatic chord recognition has generated a number of recent papers in MIR, nobody to date has done a proper cross validation of their recognition results. Cross validation is the most common way to establish baseline standards and make comparisons, e.g., for MIREX competitions, but a lack of labelled aligned training data has rendered it...
This paper introduces a new metadata data dictionary design to assist in the consistent creation of digital libraries of analog sound recording and to promote their interoperability.
A new method for federated searching of music archives using a grid-based dynamic feature extraction system is proposed.
National Science Foundation and the Institute for Museum and Library Services
Research in automatic genre classification has been pro- ducing increasingly small performance gains in recent years, with the result that some have suggested that such research should be abandoned in favor of more general similarity research. It has been further argued that genre classification is of limited utility as a goal in itself because of...
jAudio is an application designed to extract features for use in a variety of MIR tasks. It eliminates the need for re- implementing existing feature extraction algorithms and provides a framework that greatly facilitates the development and deployment of new features. Three classes of features are presented and explained—features, metafeatures, an...
The creation and maintenance of a metadata data dictionary is essential to large-scale digital repositories. It assists the process of data entry, ensures consistency of records, facilitates semantic compatibility and interoperability between systems, and, most importantly, forms the foundation for efficient and effective information retrieval infr...
This paper introduces OMEN (On-demand Metadata Extraction Network), which addresses a fundamental problem in MIR: the lack of universal access to a large dataset containing significant amounts of copyrighted music. This is accomplished by utilizing the large collections of digitized music available at many libraries. Using OMEN, libraries will be a...
By implementing a cached region selection scheme and automatic label completion, we extended an open-source audio editor to become a more convenient audio annotation tool for tasks such as ground-truth annotation for audio and music classification. A usability experiment was conducted with encouraging preliminary results.
Previous work has employed an approach to the evaluation of wrapper feature selection methods that may overstate their ability to improve classification accuracy, because of a phenomenon akin to overfitting. This paper discusses this phenomenon in the context of recent work in machine learning, demonstrates that previous work in MIR has indeed exag...
This paper introduces Codaich, a large and diverse publicly accessible database of musical recordings for use in music information retrieval (MIR) research. The issues that must be dealt with when constructing such a database are dis- cussed, as are ways of addressing these problems. It is sug- gested that copyright restrictions may be overcome by...