Antonio Pertusa

Antonio Pertusa
University of Alicante | UA · University Institute for Computing Research

PhD

About

57
Publications
21,978
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,016
Citations
Citations since 2017
27 Research Items
717 Citations
2017201820192020202120222023050100150200
2017201820192020202120222023050100150200
2017201820192020202120222023050100150200
2017201820192020202120222023050100150200
Introduction
My research interests are in the fields of signal processing and machine learning (mainly deep learning) techniques applied to computer vision, remote sensing and music information retrieval.

Publications

Publications (57)
Preprint
Full-text available
Logo classification is a particular case of image classification, since these may contain only text, images, or a combination of both. In this work, we propose a system for the multi-label classification and similarity search of logo images. The method allows obtaining the most similar logos on the basis of their shape, color, business sector, sema...
Preprint
Full-text available
Brain Imaging Data Structure (BIDS) allows the user to organise brain imaging data into a clear and easy standard directory structure. BIDS is widely supported by the scientific community and is considered a powerful standard for management. The original BIDS is limited to images or data related to the brain. Medical Imaging Data Structure (MIDS) w...
Article
We present a labeled large-scale, high resolution chest x-ray dataset for the automated exploration of medical images along with their associated reports. This dataset includes more than 160,000 images obtained from 67,000 patients that were interpreted and reported by radiologists at San Juan Hospital (Spain) from 2009 to 2017, covering six differ...
Article
This work presents an end-to-end method based on deep neural networks for audio-to-score music transcription of monophonic excerpts. Unlike existing music transcription methods, which normally perform pitch estimation, the proposed approach is formulated as an end-to-end task that outputs a notation-level music score. Using an audio file as input,...
Preprint
Full-text available
In this work we present a method for the detection of radiological findings, their location and differential diagnoses from chest x-rays. Unlike prior works that focus on the detection of few pathologies, we use a hierarchical taxonomy mapped to the Unified Medical Language System (UMLS) terminology to identify 189 radiological findings, 22 differe...
Preprint
Full-text available
This paper describes BIMCV COVID-19+, a large dataset from the Valencian Region Medical ImageBank (BIMCV) containing chest X-ray images CXR (CR, DX) and computed tomography (CT) imaging of COVID-19+ patients along with their radiological findings and locations, pathologies, radiological reports (in Spanish), DICOM metadata, Polymerase chain reactio...
Preprint
Full-text available
We present a framework based on neural networks to extract music scores directly from polyphonic audio in an end-to-end fashion. Most previous Automatic Music Transcription (AMT) methods seek a piano-roll representation of the pitches, that can be further transformed into a score by incorporating tempo estimation, beat tracking, key estimation or r...
Chapter
Full-text available
The classification of logos is a particular case within computer vision since they have their own characteristics. Logos can contain only text, iconic images or a combination of both, and they usually include figurative symbols designed by experts that vary substantially besides they may share the same semantics. This work presents a method for mul...
Article
Full-text available
This work presents a method that can be used for the efficient detection of small maritime objects. The proposed method employs aerial images in the visible spectrum as inputs to train a categorical Convolutional Neural Network for the classification of ships. A subset of those filters that make the greatest contribution to the classification of th...
Article
Full-text available
We present a method to detect maritime oil spills from Side-Looking Airborne Radar (SLAR) sensors mounted on aircraft in order to enable a quick response of emergency services when an oil spill occurs. The proposed approach introduces a new type of neural architecture named Convolutional Long Short Term Memory Selectional AutoEncoders (CMSAE) which...
Preprint
Full-text available
We present a labeled large-scale, high resolution chest x-ray dataset for the automated exploration of medical images along with their associated reports. This dataset includes more than 160,000 images obtained from 67,000 patients that were interpreted and reported by radiologists at Hospital San Juan Hospital (Spain) from 2009 to 2017, covering s...
Chapter
Full-text available
In this work, we present a multimodal approach to perform object recognition from photographs taken using smartphones. The proposed method extracts neural codes from the input image using a Convolutional Neural Network (CNN), and combines them with a series of metadata gathered from the smartphone sensors when the picture was taken. These metadata...
Article
Full-text available
In this study, we use unmanned aerial vehicles equipped with multispectral cameras to search for bodies in maritime rescue operations. A series of flights were performed in open‐water scenarios in the northwest of Spain, using a certified aquatic rescue dummy in dangerous areas and real people when the weather conditions allowed it. The multispectr...
Article
Full-text available
We present a hybrid approach to improve the accuracy of Convolutional Neural Networks (CNN) without retraining the model. The proposed architecture replaces the softmax layer by a k-Nearest Neighbor (kNN) algorithm for inference. Although this is a common technique in transfer learning, we apply it to the same domain for which the network was train...
Conference Paper
Full-text available
In this work, we present an end-to-end framework for audio-to-score transcription. To the best of our knowledge , this is the first automatic music transcription approach which obtains directly a symbolic score from audio, instead of performing separate stages for piano-roll estimation (pitch detection and note tracking), meter detection or key est...
Article
Full-text available
This work presents a system for the detection of ships and oil spills using Side-Looking Airborne Radar (SLAR) images. The proposed method employs a two-stage architecture composed of three pairs of Convolutional Neural Networks (CNNs). Each pair of networks is trained to recognize a single class (ship, oil spill and coast) by following two steps:...
Article
Full-text available
The automatic classification of ships from aerial images is a considerable challenge. Previous works have usually applied image processing and computer vision techniques to extract meaningful features from visible spectrum images in order to use them as the input for traditional supervised classifiers. We present a method for determining if an aeri...
Article
Full-text available
Interventional clinical cancer trials are generally too restrictive and cancer patients are often excluded from them on the basis of comorbidity, past or concomitant treatments and the fact that they are over a certain age. The efficacy and safety of new treatments for patients with these characteristics are not, therefore, defined. In this work, w...
Article
Full-text available
In this work, we use deep neural autoencoders to segment oil spills from Side-Looking Airborne Radar (SLAR) imagery. Synthetic Aperture Radar (SAR) has been much exploited for ocean surface monitoring, especially for oil pollution detection, but few approaches in the literature use SLAR. Our sensor consists of two SAR antennas mounted on an aircraf...
Conference Paper
Full-text available
There are large collections of music manuscripts preserved over the centuries. In order to analyze these documents it is necessary to transcribe them into a machine-readable format. This process can be done automatically using Optical Music Recognition (OMR) systems, which typically consider segmentation plus classification workflows. This work is...
Poster
Full-text available
In this work, the main aim is to detect candidate regions to be oil slicks in Side-Looking Airborne Radar (SLAR) images using Deep Learning techniques. The proposed approach is based on Autoencoders to allow us to automatically discriminate oil spills without hand-crafted features or other features extracted from traditional computer vision techniq...
Article
Full-text available
Staff-line removal is an important preprocessing stage for most optical music recognition systems. Common procedures to solve this task involve image processing techniques. In contrast to these traditional methods based on hand-engineered transformations, the problem can also be approached as a classification task in which each pixel is labeled as...
Article
Full-text available
MirBot is a collaborative application for smartphones that allows users to perform object recognition. This app can be used to take a photograph of an object, select the region of interest and obtain the most likely class (dog, chair, etc.) by means of similarity search using features extracted from a convolutional neural network (CNN). The answers...
Conference Paper
Full-text available
This work presents a new spatial verification technique for image similarity search. The proposed algorithm evaluates the geometry of the detected local keypoints by building segments connecting pairs of points and analyzing their intersections in a 2D plane. We show that these intersections remain constant with respect to different geometric trans...
Conference Paper
Full-text available
The automatic music genre classification task is an active area of research in the field of Music Information Retrieval. In this paper we use two different symbolic feature sets for genre classification and combine them using an early fusion approach. Our results show that early fusion achieves better classification accuracy than using any of the i...
Conference Paper
Full-text available
This study presents a multimodal interactive image retrieval system for smartphones (MirBot). The application is designed as a collaborative game where users can categorize photographs according to the WordNet hierarchy. After taking a picture, the region of interest of the target can be selected, and the image information is sent with a set of met...
Article
Full-text available
This study presents efficient techniques for multiple fundamental frequency estimation in music signals. The proposed methodology can infer harmonic patterns from a mixture considering interactions with other sources and evaluate them in a joint estimation scheme. For this purpose, a set of fundamental frequency candidates are first selected at eac...
Conference Paper
Music transcription consists of transforming an audio signal encoding a music performance in a symbolic representation such as a music score. In this paper, a multimodal and interactive prototype to perform music transcription is presented. The system is oriented to monotimbral transcription, its working domain is music played by a single instrumen...
Conference Paper
Full-text available
We present a cartesian ensemble classification system that is based on the principle of late fusion and feature subspaces. These feature subspaces describe different aspects of the same data set. The framework is built on the Weka machine learning toolkit and able to combine arbitrary feature sets and learning schemes. In our scenario, we use it fo...
Article
Full-text available
The presented onset detection approach is a very simple method described in [1]. An implementation in D2K was already submitted for MIREX 05 [2], yielding a relatively low success rate. However, probably there were some problems in the evaluation, as the mean distance between the detected and actual onsets was too high (about-22 ms, see [3]). There...
Conference Paper
Full-text available
The goal of a polyphonic music transcription system is to extract a score from an audio signal. A multiple fundamental frequency estimator is the main piece of these systems, whereas tempo detection and key estimation complement them to correctly extract the score. In this work, in order to detect the fundamental frequencies that are present in a s...
Article
Full-text available
We propose a novel approach to the task of identifying performers from their playing styles. We investigate how skilled musicians (Jazz saxophone players in particular) express and communicate their view of the musical and emotional content of musical pieces and how to use this information in order to automatically identify performers. We study dev...
Conference Paper
Full-text available
Recent research in music genre classification hints at a glass ceiling being reached using timbral audio features. To overcome this, the combination of multiple different feature sets bearing diverse characteristics is needed. We propose a new approach to extend the scope of the fea- tures: We transcribe audio data into a symbolic form using a tran...
Article
Full-text available
Two multiple fundamental frequency estimation sys- tems are presented in this work. In the first one (PI1, PI2), the best fundamental frequency candidates combination is found in a frame-by-frame analysis by applying a set of rules, taking into account the spectral smoothness mea- sure described in this work. The second system (PI3) was used to ext...
Conference Paper
This paper presents a novel Strongly-Typed Genetic Programming approach for building Regression Trees in order to model expressive music performance. The approach consists of inducing a Regression Tree model from training data (monophonic recordings of Jazz standards) for transforming an inexpressive melody into an expressive one. The work presente...
Conference Paper
Full-text available
Standard MIDI files contain data that can be considered as a symbolic representation of music (a digital score), and most of them are structured as a number of tracks. One of them usually contains the melodic line of the piece, while the other tracks contain accompaniment music. The goal of this work is to identify the track that contains the melod...
Conference Paper
Full-text available
Standard MIDI files contain data that can be considered as a symbolic representation of music (a digital score), and most of them are structured as a number of tracks, one of them usually containing the melodic line of the piece, while the other tracks contain the accompani- ment. The objective of this work is to identify the track containing the m...
Article
Full-text available
The objective of this work is to find the melodic line in MIDI files. Usually, the melodic line is stored in a single track, while the other tracks contain the accom-paniment. The detection of the track that contains the melodic line can be very useful for a number of ap-plications, such as melody matching when searching in MIDI databases. The syst...
Conference Paper
Full-text available
A simple note onset detection system for music is presented in this work. To detect onsets, a 1/12 octave filterbank is simulated in the frequency domain and the band derivatives in time are considered. The first harmonics of a tuned instrument are close to the center frequency of these bands and, in most instruments, these harmonics are those with...
Article
The automatic extraction of the notes that were played in a digital musical signal (automatic music transcription) is an open problem. A number of techniques have been applied to solve it without concluding results. The monotimbral polyphonic version of the problem is posed here: a single instrument has been played and more than one note can sound...
Article
The automatic extraction of the notes that were played in a digital musical signal (automatic music transcription) is an open problem. A number of techniques have been applied to solve it without concluding results. This work tries to pose it through the identification of the spectral pattern of a given instrument in the signal spectrogram using ti...
Article
To produce fast, reasonably intelligible and easily correctable translations between related languages, it su#ces to use a machine translation strategy which uses shallow parsing techniques to refine what would usually be called word-for-word machine translation. This paper describes the application of shallow parsing techniques (morphological anal...
Conference Paper
Full-text available
The main area of work in computer music related to information systems is known,as music information retrieval (MIR). Databases containing musical information can be classified into two main groups: those containing audio data (digitized music) and those that file symbolic data (digital music scores). The latter are much more abstract that the form...
Article
Full-text available
To produce fast, reasonably intelligible and easily corrected translations between related languages, it suffices to use a machine translation strategy which uses shallow parsing techniques to refine what would usually be called word-for-word machine translation. This paper describes the application of shallow parsing techniques (morphological anal...
Article
Full-text available
The automatic extraction of the notes that were played in a digital musical signal (automatic music transcrip- tion) is an open problem. A number of techniques have been applied to solve it without concluding results. This work tries to pose it through the identification of the spec- tral pattern of a given instrument in the signal spectro- gram us...
Article
Full-text available
Evolutionary methods have been largely used in algorithmic music composition due to their ability to explore an immense space of possibilities. The main problem of genetic related composition algorithms has always been the implementation of the selection process. In this work, a pattern recognition-based system helped by a number of music analysis...
Article
A multiple fundamental frequency estimator is presented in this work. At each time frame, a set of fundamental fre- quencies is found in a frame by frame analysis taking into account the spectral smoothness measure described in (1) and the information contained in adjacent frames.
Article
ABSTRACT Recent research in music genre classification hints at a glass ceiling being reached using timbral audio features. To overcome this, the combination of multiple different feature sets bearing diverse characteristics is needed. We propose a new approach to extend the scope of the fea- tures: We transcribe audio data into a symbolic form usi...
Article
Full-text available
Music information retrieval (MIR) is the interdisciplinary science of retrieving information from music. One of the most challenging MIR tasks is the automatic transcription of music, which attempts to extract a human readable representation (a score) from an audio signal. The core of a music transcription system is a multiple fundamental frequency...
Article
Full-text available
The novel approach of combining audio and symbolic fea-tures for music classification from audio enhanced previous audio-only based results in MIREX 2007. We extended the approach by including temporal audio features, enhancing the polyphonic audio to MIDI transcription system and in-cluding an extended set of symbolic features. Recent re-search in...
Article
Full-text available
The approach of combining a multitude of audio features and also symbolic features (through transcription of audio to MIDI) for music classification proved useful, as shown pre-viously. We extended the system submitted to MIREX 2008 by including temporal audio features, adding another audio analysis algorithm based on finding templates on music, en...
Article
Full-text available
Se describe el proceso seguido para construir rápidamente un sistema de traducción automática español–portugués y portugués–español, partiendo de un sistema existente que traduce entre el castellano y el catalán. Un equipo de cuatro desarrolladores ha producido en seis meses un sistema ya utilizable, con una cobertura de texto superior al 95% y con...

Network

Cited By

Projects

Projects (3)
Archived project
Detection of aerial target for emergency missions from autonomous aerial vehicles (UAV, etc)
Project
The HISPAMUS proposal aims at enhancing the Hispanic music heritage from the 15th to the 19th centuries, by exploiting the digital resources of these collections. In addition, thousands of oral tradition melodies that were compiled by folklorists in the 1950s decade are digitized just as images, currently without the possibility of content-based search or study. It is necessary to develop services and tools for the benefit of archives, libraries, scholars, computer scientists and general public. HISPAMUS tries to provide smart access to archival manuscripts of music scores, allowing its reuse and exploitation. In order to reach this ambitious goal, our group can provide cutting-edge technology in the fields of Machine Learning, Pattern Recognition, and Optical Music Recognition.