
Antonio Pertusa- PhD
- Professor (Associate) at University of Alicante
Antonio Pertusa
- PhD
- Professor (Associate) at University of Alicante
About
67
Publications
27,452
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,627
Citations
Introduction
My research interests are in the fields of signal processing and machine learning (mainly deep learning) techniques applied to computer vision, remote sensing and music information retrieval.
Current institution
Publications
Publications (67)
Radiology report generation (RRG) aims to create free-text radiology reports from clinical imaging. Grounded radiology report generation (GRRG) extends RRG by including the localisation of individual findings on the image. Currently, there are no manually annotated chest X-ray (CXR) datasets to train GRRG models. In this work, we present a dataset...
Medical image datasets are essential for training models used in computer-aided diagnosis, treatment planning, and medical research. However, some challenges are associated with these datasets, including variability in data distribution, data scarcity, and transfer learning issues when using models pre-trained from generic images. This work studies...
Classifying logo images is a challenging task as they contain elements such as text or shapes that can represent anything from known objects to abstract shapes. While the current state of the art for logo classification addresses the problem as a multi‐class task focusing on a single characteristic, logos can have several simultaneous labels, such...
Brain extraction, i.e. the precise removal of MRI signal outside the brain boundaries, is a key step in MRI preprocessing pipelines, typically achieved via masks delineating the region of interest (ROI). Existing automated tools often lack accuracy for rodent MRI due to resolution limitations, so large manual editing efforts are required. This work...
Medical image classification datasets usually have a limited availability of annotated data, and pathological samples are usually much scarcer than healthy cases. Furthermore, data is often collected from different sources with different acquisition devices and population characteristics, making the trained models highly dependent on the data domai...
Logo classification is a particular case of image classification, since these may contain only text, images, or a combination of both. In this work, we propose a system for the multi-label classification and similarity search of logo images. The method allows obtaining the most similar logos on the basis of their shape, color, business sector, sema...
Brain Imaging Data Structure (BIDS) allows the user to organise brain imaging data into a clear and easy standard directory structure. BIDS is widely supported by the scientific community and is considered a powerful standard for management. The original BIDS is limited to images or data related to the brain. Medical Imaging Data Structure (MIDS) w...
We present a labeled large-scale, high resolution chest x-ray dataset for the automated exploration of medical images along with their associated reports. This dataset includes more than 160,000 images obtained from 67,000 patients that were interpreted and reported by radiologists at San Juan Hospital (Spain) from 2009 to 2017, covering six differ...
This work presents an end-to-end method based on deep neural networks for audio-to-score music transcription of monophonic excerpts. Unlike existing music transcription methods, which normally perform pitch estimation, the proposed approach is formulated as an end-to-end task that outputs a notation-level music score. Using an audio file as input,...
In this work we present a method for the detection of radiological findings, their location and differential diagnoses from chest x-rays. Unlike prior works that focus on the detection of few pathologies, we use a hierarchical taxonomy mapped to the Unified Medical Language System (UMLS) terminology to identify 189 radiological findings, 22 differe...
This paper describes BIMCV COVID-19+, a large dataset from the Valencian Region Medical ImageBank (BIMCV) containing chest X-ray images CXR (CR, DX) and computed tomography (CT) imaging of COVID-19+ patients along with their radiological findings and locations, pathologies, radiological reports (in Spanish), DICOM metadata, Polymerase chain reactio...
We present a framework based on neural networks to extract music scores directly from polyphonic audio in an end-to-end fashion. Most previous Automatic Music Transcription (AMT) methods seek a piano-roll representation of the pitches, that can be further transformed into a score by incorporating tempo estimation, beat tracking, key estimation or r...
The classification of logos is a particular case within computer vision since they have their own characteristics. Logos can contain only text, iconic images or a combination of both, and they usually include figurative symbols designed by experts that vary substantially besides they may share the same semantics. This work presents a method for mul...
This work presents a method that can be used for the efficient detection of small maritime objects. The proposed method employs aerial images in the visible spectrum as inputs to train a categorical Convolutional Neural Network for the classification of ships. A subset of those filters that make the greatest contribution to the classification of th...
We present a method to detect maritime oil spills from Side-Looking Airborne Radar (SLAR) sensors mounted on aircraft in order to enable a quick response of emergency services when an oil spill occurs. The proposed approach introduces a new type of neural architecture named Convolutional Long Short Term Memory Selectional AutoEncoders (CMSAE) which...
In this work, we present a multimodal approach to perform object recognition from photographs taken using smartphones. The proposed method extracts neural codes from the input image using a Convolutional Neural Network (CNN), and combines them with a series of metadata gathered from the smartphone sensors when the picture was taken. These metadata...
We present a labeled large-scale, high resolution chest x-ray dataset for the automated exploration of medical images along with their associated reports. This dataset includes more than 160,000 images obtained from 67,000 patients that were interpreted and reported by radiologists at Hospital San Juan Hospital (Spain) from 2009 to 2017, covering s...
In this study, we use unmanned aerial vehicles equipped with multispectral cameras to search for bodies in maritime rescue operations. A series of flights were performed in open‐water scenarios in the northwest of Spain, using a certified aquatic rescue dummy in dangerous areas and real people when the weather conditions allowed it. The multispectr...
We present a hybrid approach to improve the accuracy of Convolutional Neural Networks (CNN) without retraining the model. The proposed architecture replaces the softmax layer by a k-Nearest Neighbor (kNN) algorithm for inference. Although this is a common technique in transfer learning, we apply it to the same domain for which the network was train...
In this work, we present an end-to-end framework for audio-to-score transcription. To the best of our knowledge , this is the first automatic music transcription approach which obtains directly a symbolic score from audio, instead of performing separate stages for piano-roll estimation (pitch detection and note tracking), meter detection or key est...
This work presents a system for the detection of ships and oil spills using Side-Looking Airborne Radar (SLAR) images. The proposed method employs a two-stage architecture composed of three pairs of Convolutional Neural Networks (CNNs). Each pair of networks is trained to recognize a single class (ship, oil spill and coast) by following two steps:...
The automatic classification of ships from aerial images is a considerable challenge. Previous works have usually applied image processing and computer vision techniques to extract meaningful features from visible spectrum images in order to use them as the input for traditional supervised classifiers. We present a method for determining if an aeri...
Interventional cancer clinical trials are generally too restrictive, and some patients are often excluded on the basis of comorbidity, past or concomitant treatments, or the fact that they are over a certain age. The efficacy and safety of new treatments for patients with these characteristics are, therefore, not defined. In this work, we built a m...
Interventional cancer clinical trials are generally too restrictive, and some patients are often excluded on the basis of comorbidity, past or concomitant treatments, or the fact that they are over a certain age. The efficacy and safety of new treatments for patients with these characteristics are, therefore, not defined. In this work, we built a m...
In this work, we use deep neural autoencoders to segment oil spills from Side-Looking Airborne Radar (SLAR) imagery. Synthetic Aperture Radar (SAR) has been much exploited for ocean surface monitoring, especially for oil pollution detection, but few approaches in the literature use SLAR. Our sensor consists of two SAR antennas mounted on an aircraf...
There are large collections of music manuscripts preserved over the centuries. In order to analyze these documents it is necessary to transcribe them into a machine-readable format. This process can be done automatically using Optical Music Recognition (OMR) systems, which typically consider segmentation plus classification workflows. This work is...
In this work, the main aim is to detect candidate regions to be oil slicks in Side-Looking Airborne Radar (SLAR) images using Deep Learning techniques. The proposed approach is based on Autoencoders to allow us to automatically discriminate oil spills without hand-crafted features or other features extracted from traditional computer vision techniq...
Staff-line removal is an important preprocessing stage for most optical music recognition systems. Common procedures to solve this task involve image processing techniques. In contrast to these traditional methods based on hand-engineered transformations, the problem can also be approached as a classification task in which each pixel is labeled as...
MirBot is a collaborative application for smartphones that allows users to perform object recognition. This app can be used to take a photograph of an object, select the region of interest and obtain the most likely class (dog, chair, etc.) by means of similarity search using features extracted from a convolutional neural network (CNN). The answers...
MirBot is a collaborative application for smartphones that allows users to perform object recognition. This app can be used to take a photograph of an object, select the region of interest and obtain the most likely class (dog, chair, etc.) by means of similarity search using features extracted from a convolutional neural network (CNN). The answers...
This work presents a new spatial verification technique for image similarity search. The proposed algorithm evaluates the geometry of the detected local keypoints by building segments connecting pairs of points and analyzing their intersections in a 2D plane. We show that these intersections remain constant with respect to different geometric trans...
The automatic music genre classification task is an active area of research in the field of Music Information Retrieval. In this paper we use two different symbolic feature sets for genre classification and combine them using an early fusion approach. Our results show that early fusion achieves better classification accuracy than using any of the i...
This study presents a multimodal interactive image retrieval system for smartphones (MirBot). The application is designed as a collaborative game where users can categorize photographs according to the WordNet hierarchy. After taking a picture, the region of interest of the target can be selected, and the image information is sent with a set of met...
This study presents efficient techniques for multiple fundamental frequency estimation in music signals. The proposed methodology can infer harmonic patterns from a mixture considering interactions with other sources and evaluate them in a joint estimation scheme. For this purpose, a set of fundamental frequency candidates are first selected at eac...
Music transcription consists of transforming an audio signal encoding a music performance in a symbolic representation such as a music score. In this paper, a multimodal and interactive prototype to perform music transcription is presented. The system is oriented to monotimbral transcription, its working domain is music played by a single instrumen...
We present a cartesian ensemble classification system that is based on the principle of late fusion and feature subspaces. These feature subspaces describe different aspects of the same data set. The framework is built on the Weka machine learning toolkit and able to combine arbitrary feature sets and learning schemes. In our scenario, we use it fo...
The presented onset detection approach is a very simple method described in [1]. An implementation in D2K was already submitted for MIREX 05 [2], yielding a relatively low success rate. However, probably there were some problems in the evaluation, as the mean distance between the detected and actual onsets was too high (about-22 ms, see [3]). There...
The goal of a polyphonic music transcription system is to extract a score from an audio signal. A multiple fundamental frequency estimator is the main piece of these systems, whereas tempo detection and key estimation complement them to correctly extract the score. In this work, in order to detect the fundamental frequencies that are present in a s...
We propose a novel approach to the task of identifying performers from their playing styles. We investigate how skilled musicians (Jazz saxophone players in particular) express and communicate their view of the musical and emotional content of musical pieces and how to use this information in order to automatically identify performers. We study dev...
Recent research in music genre classification hints at a glass ceiling being reached using timbral audio features. To overcome this, the combination of multiple different feature sets bearing diverse characteristics is needed. We propose a new approach to extend the scope of the fea- tures: We transcribe audio data into a symbolic form using a tran...
Two multiple fundamental frequency estimation sys- tems are presented in this work. In the first one (PI1, PI2), the best fundamental frequency candidates combination is found in a frame-by-frame analysis by applying a set of rules, taking into account the spectral smoothness mea- sure described in this work. The second system (PI3) was used to ext...
This paper presents a novel Strongly-Typed Genetic Programming approach for building Regression Trees in order to model expressive
music performance. The approach consists of inducing a Regression Tree model from training data (monophonic recordings of
Jazz standards) for transforming an inexpressive melody into an expressive one. The work presente...
Standard MIDI files contain data that can be considered as a symbolic representation of music (a digital score), and most of them are structured as a number of tracks. One of them usually contains the melodic line of the piece, while the other tracks contain accompaniment music. The goal of this work is to identify the track that contains the melod...
Standard MIDI files contain data that can be considered as a symbolic representation of music (a digital score), and most of them are structured as a number of tracks, one of them usually containing the melodic line of the piece, while the other tracks contain the accompani- ment. The objective of this work is to identify the track containing the m...
The objective of this work is to find the melodic line in MIDI files. Usually, the melodic line is stored in a single track, while the other tracks contain the accom-paniment. The detection of the track that contains the melodic line can be very useful for a number of ap-plications, such as melody matching when searching in MIDI databases. The syst...
A simple note onset detection system for music is presented in this work. To detect onsets, a 1/12 octave filterbank is simulated
in the frequency domain and the band derivatives in time are considered. The first harmonics of a tuned instrument are close
to the center frequency of these bands and, in most instruments, these harmonics are those with...
The automatic extraction of the notes that were played in a digital musical signal (automatic music transcription) is an open problem. A number of techniques have been applied to solve it without concluding results. The monotimbral polyphonic version of the problem is posed here: a single instrument has been played and more than one note can sound...
The automatic extraction of the notes that were played in a digital musical signal (automatic music transcription) is an open problem. A number of techniques have been applied to solve it without concluding results. This work tries to pose it through the identification of the spectral pattern of a given instrument in the signal spectrogram using ti...
To produce fast, reasonably intelligible and easily correctable translations between related languages, it su#ces to use a machine translation strategy which uses shallow parsing techniques to refine what would usually be called word-for-word machine translation. This paper describes the application of shallow parsing techniques (morphological anal...
The main area of work in computer music related to information systems is known,as music information retrieval (MIR). Databases containing musical information can be classified into two main groups: those containing audio data (digitized music) and those that file symbolic data (digital music scores). The latter are much more abstract that the form...
To produce fast, reasonably intelligible and easily corrected translations between related languages, it suffices to use a machine translation strategy which uses shallow parsing techniques to refine what would usually be called word-for-word machine translation. This paper describes the application of shallow parsing techniques (morphological anal...
The automatic extraction of the notes that were played in a digital musical signal (automatic music transcrip- tion) is an open problem. A number of techniques have been applied to solve it without concluding results. This work tries to pose it through the identification of the spec- tral pattern of a given instrument in the signal spectro- gram us...
Evolutionary methods have been largely used in algorithmic music composition due to their ability to explore an immense space of possibilities. The main problem of genetic related composition algorithms has always been the implementation of the selection process. In this work, a pattern recognition-based system helped by a number of music analysis...
A multiple fundamental frequency estimator is presented in this work. At each time frame, a set of fundamental fre- quencies is found in a frame by frame analysis taking into account the spectral smoothness measure described in (1) and the information contained in adjacent frames.
ABSTRACT Recent research in music genre classification hints at a glass ceiling being reached using timbral audio features. To overcome this, the combination of multiple different feature sets bearing diverse characteristics is needed. We propose a new approach to extend the scope of the fea- tures: We transcribe audio data into a symbolic form usi...
Music information retrieval (MIR) is the interdisciplinary science of retrieving information from music. One of the most challenging MIR tasks is the automatic transcription of music, which attempts to extract a human readable representation (a score) from an audio signal. The core of a music transcription system is a multiple fundamental frequency...
The novel approach of combining audio and symbolic fea-tures for music classification from audio enhanced previous audio-only based results in MIREX 2007. We extended the approach by including temporal audio features, enhancing the polyphonic audio to MIDI transcription system and in-cluding an extended set of symbolic features. Recent re-search in...
The approach of combining a multitude of audio features and also symbolic features (through transcription of audio to MIDI) for music classification proved useful, as shown pre-viously. We extended the system submitted to MIREX 2008 by including temporal audio features, adding another audio analysis algorithm based on finding templates on music, en...
Se describe el proceso seguido para construir rápidamente un sistema de traducción automática español–portugués y portugués–español, partiendo de un sistema existente que traduce entre el castellano y el catalán. Un equipo de cuatro desarrolladores ha producido en seis meses un sistema ya utilizable, con una cobertura de texto superior al 95% y con...