Inaki Inza

Inaki Inza
University of the Basque Country | UPV/EHU · Computer Sciences and Artificial Intelligence

About

128
Publications
105,084
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
10,576
Citations

Publications

Publications (128)
Preprint
Full-text available
This report addresses, from a machine learning perspective, a multi-class classification problem to predict the first deterioration level of a COVID-19 positive patient at the time of hospital admission. Socio-demographic features, laboratory tests and other measures are taken into account to learn the models. Our output is divided into 4 categorie...
Chapter
Vibration analysis (VA) techniques have aroused great interest in the industrial sector during the last decades. In particular, VA is widely used for rotatory components failure detection, such as rolling bearings, gears, etc. In the present work, we propose a novel data-driven methodology to process vibration-related data, in order to detect rotat...
Chapter
The COVID-19 pandemic is continuously evolving with drastically changing epidemiological situations which are approached with different decisions: from the reduction of fatalities to even the selection of patients with the highest probability of survival in critical clinical situations. Motivated by this, a battery of mortality prediction models wi...
Article
Full-text available
Background Feature selection is a relevant step in the analysis of single-cell RNA sequencing datasets. Most of the current feature selection methods are based on general univariate descriptors of the data such as the dispersion or the percentage of zeros. Despite the use of correction methods, the generality of these feature selection methods bias...
Preprint
Full-text available
Feature selection is a relevant step in the analysis of single-cell RNA sequencing datasets. Triku is a feature selection method that favours genes defining the main cell populations. It does so by selecting genes expressed by groups of cells that are close in the nearest neighbor graph. Triku efficiently recovers cell populations present in artifi...
Preprint
Full-text available
In the last years considerable research effort has been put on modelling the ever growing data in streaming environments. Some of these efforts are related with streaming novelty detection. In this scenario, new classes may emerge, disappear, or drift within time while others are normally classified. Recent works model these events with non-paramet...
Article
Full-text available
In recent years, a variety of research areas have contributed to a set of related problems with rare event, anomaly, novelty and outlier detection terms as the main actors. These multiple research areas have created a mix-up between terminology and problems. In some research, similar problems have been named differently; while in some other works,...
Article
Full-text available
Plain Language Summary Deep neural networks have recently demonstrated great versatility and an unprecedented capacity to model complex problems. In weather modeling, these algorithms have been applied to solve different problems. This is a promising area of research, given the availability of large volumes of weather data and increasingly powerful...
Article
In regression, a predictive model which is able to anticipate the output of a new case is learnt from a set of previous examples. The output or response value of these examples used for model training is known. When learning with aggregated outputs, the examples available for model training are individually unlabeled. Collectively, the aggregated o...
Preprint
Full-text available
Numerical Weather Prediction (NWP) models represent sub-grid processes using parameterizations, which are often complex and a major source of uncertainty in weather forecasting. In this work, we devise a simple machine learning (ML) methodology to learn parameterizations from basic NWP fields. Specifically, we demonstrate how encoder-decoder Convol...
Article
Majority voting is a popular and robust strategy to aggregate different opinions in learning from crowds, where each worker labels examples according to their own criteria. Although it has been extensively studied in the binary case, its behavior with multiple classes is not completely clear, specifically when annotations are biased. This paper att...
Article
Full-text available
Classifying software defects according to any defined taxonomy is not straightforward. In order to be used for automatizing the classification of software defects, two sets of defect reports were collected from public issue tracking systems from two different real domains. Due to the lack of a domain expert, the collected defects were categorized b...
Article
This paper describes a suite of tools and a model for improving the accuracy of airport weather forecasts produced by numerical weather prediction (NWP) products, by learning from the relationships between previously modelled and observed data. This is based on a new machine learning methodology that allows circular variables to be naturally incorp...
Article
In software engineering, associating each reported defect with a category allows, among many other things, for the appropriate allocation of resources. Although this classification task can be automated using standard machine learning techniques, the categorization of defects for model training requires expert knowledge, which is not always availab...
Article
Since many important real-world classification problems involve learning from unbalanced data, the challenging class-imbalance problem has lately received considerable attention in the community. Most of the methodological contributions proposed in the literature carry out a set of experiments over a battery of specific datasets. In these cases, in...
Poster
Full-text available
Lan eredu honen helburua atunaren arrantzaren errentagarritasuna hobetzea da arrainaren behaketa eta ibilbidearen optimizazioan oinarrituta, erregaien kontsumoa murriztuz eta harrapaketak mantenduz. Munduko lanpo sabelmarradun eta hegahoriaren arrantzaren kontribuzio maila altuenak erakusten dituen itsas-azaleko arrantza mota inguratze arrantzan da...
Article
Full-text available
Although a great methodological effort has been invested in proposing competitive solutions to the class-imbalance problem, little effort has been made in pursuing a theoretical understanding of this matter. In order to shed some light on this topic, we perform, through a novel framework, an exhaustive analysis of the adequateness of the most commo...
Presentation
Full-text available
Probabilistic Graphical model (PGMs) types Data format and pre-processing Bayesian networks (BNs): structure and parameters Bayesian network classifiers Applications of Bayesian networks in environmental sciences Sentimental analysis in social sciences using BNs Multi-dimensional Bayesian network classifiers Flexible classifiers Inference diagrams...
Article
Weakly supervised classification tries to learn from data sets which are not certainly labeled. Many problems, with different natures of partial labeling, fit this description. In this paper, the novel problem of learning from positive-unlabeled proportions is presented. The provided examples are unlabeled, and the only class information available...
Article
Machine learning techniques have been previously used to assist clinicians to select embryos for human-assisted reproduction. This work aims to show how an appropriate modeling of the problem can contribute to improve machine learning techniques for embryo selection. In this study, a dataset of 330 consecutive cycles (and associated embryos) carrie...
Article
During the last decades several learning algorithms have been proposed to learn probability distributions based on decomposable models. Some of these algorithms can be used to search for a maximum likelihood decomposable model with a given maximum clique size, k. Unfortunately, the problem of learning a maximum likelihood decomposable model given a...
Article
In recent years, the performance of semisupervised learning (SSL) has been theoretically investigated. However, most of this theoretical development has focused on binary classification problems. In this paper, we take it a step further by extending the work of Castelli and Cover to the multiclass paradigm. In particular, we consider the key proble...
Conference Paper
Standard supervised classification learns a classifier from a set of labeled examples. Alternatively, in the field of weakly supervised classification different frameworks have been presented where the training data cannot be certainly labeled. In this paper, the novel problem of learning from positive-unlabeled proportions is presented. The provid...
Article
Full-text available
Performance assessment of a learning method related to its prediction ability on independent data is extremely important in supervised classification. This process provides the information to evaluate the quality of a classification model and to choose the most appropriate technique to solve the specific supervised classification problem at hand. T...
Article
Learning from crowds is a classification problem where the provided training instances are labeled by multiple (usually conflicting) annotators. In different scenarios of this problem, straightforward strategies show an astonishing performance. In this paper, we characterize the crowd scenarios where these basic strategies show a good behavior. As...
Article
Wind is one of the parameters best predicted by numerical weather models, as it can be directly calculated from the physical equations of pressure that govern its movement. However, local winds are considerably affected by topography, which global numerical weather models, due to their limited resolution, are not able to reproduce. To improve the s...
Article
The effect of different factors (spawning biomass, environmental conditions) on recruitment is a subject of great importance in the management of fisheries, recovery plans and scenarios exploration. In this study, recently proposed supervised classification techniques, tested by the machine-learning community, are applied to forecast the recruitmen...
Article
A fundamental question in the field of approximation algorithms, for a given problem instance, is the selection of the best (or a suitable) algorithm with regard to some performance criteria. A practical strategy for facing this problem is the application of machine learning techniques. However, limited support has been given in the literature to t...
Article
This paper deals with a classification problem known as learning from label proportions. The provided dataset is composed of unlabeled instances and is divided into disjoint groups. General class information is given within the groups: the proportion of instances of the group that belong to each class. We have developed a method based on the Struct...
Conference Paper
Full-text available
A multi-species approach to fisheries management requires taking into account the interactions between species in order to improve recruitment forecasting. Recent advances in Bayesian networks direct the learning of models with several interrelated variables to be forecasted simultaneously. These are known as multi-dimensional Bayesian network clas...
Conference Paper
This work presents a multidimensional classifier described in terms of interaction factors called multidimensional k-interaction classifier. The classifier is based on a probabilistic model composed of the product of all the interaction factors of order lower or equal to k and it takes advantage of all the information contained in them. The propose...
Conference Paper
Learning from crowds is a recently fashioned supervised classification framework where the true/real labels of the training instances are not available. However, each instance is provided with a set of noisy class labels, each indicating the class-membership of the instance according to the subjective opinion of an annotator. The additional challen...
Article
One of the emerging techniques for performing the analysis of the DNA microarray data known as biclustering is the search of subsets of genes and conditions which are coherently expressed. These subgroups provide clues about the main biological processes. Until now, different approaches to this problem have been proposed. Most of them use the mean...
Article
In the information retrieval framework, there are problems where the goal is to recover objects of a particular class from big sets of unlabelled objects. In some of these problems, only examples from the class we want to recover are available. For such problems, the machine learning community has developed algorithms that are able to learn binary...
Article
Full-text available
Sentiment Analysis is defined as the computational study of opinions, sentiments and emotions expressed in text. Within this broad field, most of the work has been focused on either Sentiment Polarity classification, where a text is classified as having positive or negative sentiment, or Subjectivity classification, in which a text is classified as...
Article
Full-text available
Malignancies arising in the large bowel cause the second largest number of deaths from cancer in the Western World. Despite progresses made during the last decades, colorectal cancer remains one of the most frequent and deadly neoplasias in the western countries. A genomic study of human colorectal cancer has been carried out on a total of 31 tumor...
Conference Paper
This paper deals with the problem of multi-instance learning when label proportions are provided. In this classification problem, the instances of the dataset are divided into disjoint groups, where there is no certainty about the labels associated with individual samples. However, in each group the number of instances that belong to each class is...
Article
Full-text available
Progress is continuously being made in the quest for stable biomarkers linked to complex diseases. Mass spectrometers are one of the devices for tackling this problem. The data profiles they produce are noisy and unstable. In these profiles, biomarkers are detected as signal regions (peaks), where control and disease samples behave differently. Mas...
Article
Full-text available
Improving our ability to predict recruitment is a key element in fisheries management. However, the interactions between population dynamics and different environmental factors are complex and often non-linear, making it difficult to produce robust predictions. ‘Machine-learning’ techniques (in particular, supervised classification methods) have be...
Article
The increase in the number and complexity of biological databases has raised the need for modern and powerful data analysis tools and techniques. In order to fulfill these requirements, the machine learning discipline has become an everyday tool in bio-laboratories. The use of machine learning techniques has been extended to a wide spectrum of bioi...
Conference Paper
Full-text available
A pipeline of supervised classification methods proposed is applied to seven fish species of commercial interest in the Bay of Biscay.
Technical Report
Full-text available
‘Machine-learning’ techniques have been proposed as a useful tool to produce robust predictions. In this WD we apply to anchovy the methodology proposed in Fernandes et al. (2009) to build a robust classifier of recruitments and to make early predictions using climatic indices. The methodology consists of a ‘pipeline’ of state-of-the-art machine-le...
Data
Taqman probes distribution in the Taqman Low density array (www.appliedbiosystem.com) (0.05 MB XLS)
Data
DCT data from the TLDA analysis. The data comes from the different comparisons: MS (relapse and remitting) vs Controls; Relapse (Relap) vs controls; remitting(Remitt) vs controls and relapse vs remitting (0.32 MB DOC)
Data
Target genes studied with their gene ID, the miRNA that binds to the gene, the group in which these genes are expected to be down-regulated and the Geneglobe Assay code. (0.03 MB DOC)
Data
Resume of the panther software methods (0.03 MB DOC)
Data
Clinical description of the patients. Tev: Time of evolution (years). EDSS: Expanded Disability Status Score. Te: Time from the relapse onset and the blood extraction (in days) (0.03 MB DOC)
Data
Complete data from the non-parametrical statistical analysis (0.15 MB XLS)
Data
Complete list of the miRNA predicted targets (0.05 MB XLS)
Data
Data from the pathway analysis conducted by panther with the predicted gene target lists from each miRNA. Two different groups of miRNA were studied; coming from the experiment and coming from the chance group (0.05 MB DOC)
Article
Full-text available
Microarray-based global gene expression profiling, with the use of sophisticated statistical algorithms is providing new insights into the pathogenesis of autoimmune diseases. We have applied a novel statistical technique for gene selection based on machine learning approaches to analyze microarray expression data gathered from patients with system...
Article
When learning Bayesian network based classifiers continuous variables are usually handled by discretization, or assumed that they follow a Gaussian distribution. This work introduces the kernel based Bayesian network paradigm for supervised classification. This paradigm is a Bayesian network which estimates the true density of the continuous variab...
Article
Full-text available
Differences in gene expression patterns have been documented not only in Multiple Sclerosis patients versus healthy controls but also in the relapse of the disease. Recently a new gene expression modulator has been identified: the microRNA or miRNA. The aim of this work is to analyze the possible role of miRNAs in multiple sclerosis, focusing on th...
Article
Full-text available
Evolutionary search algorithms have become an essential asset in the algorithmic toolbox for solving high-dimensional optimization problems in across a broad range of bioinformatics problems. Genetic algorithms, the most well-known and representative evolutionary search technique, have been the subject of the major part of such applications. Estima...
Article
Full-text available
Zooplankton biomass and abundance estimation, based on surveys or time-series, is carried out routinely. Automated or semi-automated image analysis processes, combined with machine-learning techniques for the identification of plankton, have been proposed to assist in sample analysis. A difficulty in automated plankton recognition and classificatio...
Article
The main purpose of a gene interaction network is to map the relationships of the genes that are out of sight when a genomic study is tackled. DNA microarrays allow the measure of gene expression of thousands of genes at the same time. These data constitute the numeric seed for the induction of the gene networks. In this paper, we propose a new app...
Technical Report
Full-text available
10.1 Introducción En este tema se va a presentar el paradigma conocido comó arbol de clasificación. En el mismo, basándose en un particionamiento recursivo del dominio de definición de las variables predictoras, se va a poder representar el conocimiento sobre el problema por medio de una estructura dé arbol. El paradigma que se presenta en este tem...
Article
Full-text available
Limb-girdle muscular dystrophy type 2A (LGMD2A) is a recessive genetic disorder caused by mutations in calpain 3 (CAPN3). Calpain 3 plays different roles in muscular cells, but little is known about its functions or in vivo substrates. The aim of this study was to identify the genes showing an altered expression in LGMD2A patients and the possible...
Chapter
Within the wide field of classification on the Machine Learning discipline, Bayesian classifiers are very well established paradigms. They allow the user to work with probabilistic processes, as well as, with graphical representations of the relationships among the variables of a problem.
Article
Full-text available
Feature selection techniques have become an apparent need in many bioinformatics applications. In addition to the large pool of techniques that have already been developed in the machine learning and data mining fields, specific applications in bioinformatics have led to a wealth of newly proposed techniques. In this article, we make the interested...
Article
We present a supervised wrapper approach to discretization. In contrast to many classical approaches, the discretization process is multivariate: all variables are discretized simultaneously, and the proposed discretization is evaluated with the Naive-Bayes classifier. The search for the optimal discretization is carried out as an optimization proc...
Conference Paper
Full-text available
This work shows, using bivariate continuous artificial domains, the relation that seems to exist between some measures based on the information theory and the expected classification error. The relations that seem to be found in this work could be applied to the improvement of the classifiers which assign a posteriori probabilities to each class v...
Article
Most of the Bayesian network-based classifiers are usually only able to handle discrete variables. However, most real-world domains involve continuous variables. A common practice to deal with continuous variables is to discretize them, with a subsequent loss of information. This work shows how discrete classifier induction algorithms can be adapte...
Article
This article reviews machine learning methods for bioinformatics. It presents modelling methods, such as supervised classification, clustering and probabilistic graphical models for knowledge discovery, as well as deterministic and stochastic heuristics for optimization. Applications in genomics, proteomics, systems biology, evolution and text mini...
Article
This is a nicely edited volume on Estimation of Distribution Algorithms (EDAs) by leading researchers on this important topic. It covers a wide range of topics in EDAs, from theoretical analysis to experimental studies, from single objective to multi-objective optimisation, and from parallel EDAs to hybrid EDAs. It is a very useful book for everyon...
Book
This is a nicely edited volume on Estimation of Distribution Algorithms (EDAs) by leading researchers on this important topic. It covers a wide range of topics in EDAs, from theoretical analysis to experimental studies, from single objective to multi-objective optimisation, and from parallel EDAs to hybrid EDAs. It is a very useful book for everyon...
Article
The transjugular intrahepatic portosystemic shunt (TIPS) is a treatment for cirrhotic patients with portal hypertension. A subgroup of patients dies in the first 6 months and another subgroup lives a long period of time. Nowadays, no risk factors have been identified in order to determine how long a patient will survive. An empirical study for pred...
Chapter
IntroductionGenetic NetworksProbabilistic Graphical ModelsInferring Genetic Networks by Means of Probabilistic Graphical ModelsConclusions AcknowledgementsReferences