Jesús S. Aguilar-Ruiz

Jesús S. Aguilar-Ruiz
Verified
Jesús verified their affiliation via an institutional email.
Verified
Jesús verified their affiliation via an institutional email.
Pablo de Olavide University | UPO · School of Engineering

Data Analytics Science & Engineering
Data Analytics Consulting for Global Businesses

About

255
Publications
104,293
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
4,408
Citations
Introduction
Jesús S. Aguilar-Ruiz is a Full Professor of Data Analytics at the Pablo de Olavide University, Seville, Spain. His teaching and research, during more than two decades, have been oriented to Data Mining and Machine Learning. He has been involved in R&D Projects on Data Analytics Engineering with several companies. Founding Editor-in-Chief: Biodata Mining Journal (2008-14). Founding Dean, School of Engineering, Pablo de Olavide University (2005-15).
Additional affiliations
September 2005 - December 2015
Pablo de Olavide University
Position
  • Head of Faculty
May 2005 - September 2005
University of Massachusetts Boston
Position
  • Invited Researcher
June 2004 - September 2004
University of Massachusetts Boston
Position
  • Invited Researcher
Education
November 2009 - December 2009
National Institute of Genetics, Japan
Field of study
  • Bioinformatics
February 2008 - February 2008
University of Bologna
Field of study
  • Bioinformatics
May 2005 - September 2005
University of Massachusetts, Boston
Field of study
  • Data Mining

Publications

Publications (255)
Article
Full-text available
The application of information encoded in molecular networks for prognostic purposes is a crucial objective of systems biomedicine. This approach has not been widely investigated in the cardiovascular research area. Within this area, the prediction of clinical outcomes after suffering a heart attack would represent a significant step forward. We de...
Conference Paper
The most widespread biclustering algorithms use the Mean Squared Residue (MSR) as measure for assessing the quality of biclusters. MSR can identify correctly shifting patterns, but fails at discovering biclusters presenting scaling patterns. Virtual Error (VE) is a measure which improves the performance of MSR in this sense, since it is effective a...
Article
Full-text available
The problem of protein structure prediction (PSP) is one of the main challenges in structural bioinformatics. To tackle this problem, PSP can be divided into several subproblems. One of these subproblems is the prediction of disulfide bonds. The disulfide connectivity prediction problem consists in identifying which nonadjacent cysteines would be c...
Article
Full-text available
The evaluation of diagnostic systems is pivotal for ensuring the deployment of high-quality solutions, especially given the pronounced context-sensitivity of certain systems, particularly in fields such as biomedicine. Of notable importance are predictive models where the target variable can encompass multiple values (multiclass), especially when t...
Conference Paper
Biclustering is a powerful tool for analyzing gene expression time series, providing the ability to simultaneously explore both gene and condition dimensions. Unlike traditional clustering methods, which are limited to one dimension, biclustering uncovers local patterns of co–expression, making it particularly well–suited for the analysis of dynami...
Article
Full-text available
Feature selection techniques aim at finding a relevant subset of features that perform equally or better than the original set of features at explaining the behavior of data. Typically, features are extracted from feature ranking or subset selection techniques, and the performance is measured by classification or regression tasks. However, while se...
Preprint
Full-text available
Evaluating the performance of classifiers is critical in machine learning, particularly in high-stakes applications where the reliability of predictions can significantly impact decision-making. Traditional performance measures, such as accuracy and F-score, often fail to account for the uncertainty inherent in classifier predictions, leading to po...
Preprint
Full-text available
Feature Selection techniques aim at finding a relevant subset of features that perform equally or better than the original set of features at explaining the behavior of data. Typically, features are extracted from feature ranking or subset selection techniques, and the performance is measured by classification or regression tasks. However, while se...
Preprint
Full-text available
In today's data-intensive landscape, where high-dimensional datasets are increasingly common, reducing the number of input features is essential to prevent overfitting and improve model accuracy. Despite numerous efforts to tackle dimensionality reduction, most approaches apply a universal set of features across all classes, potentially missing the...
Article
The Multiclass Classification Performance (MCP) curve is an innovative method to visualize the performance of a classifier for multiclass datasets. On the other hand, the Imbalanced Multiclass Classification Performance (IMCP) curve is a novel approach to visualizing classifier performance on multiclass datasets that exhibit class imbalance, i.e. t...
Conference Paper
Full-text available
This paper introduces a machine learning method, Neural Network Ensemble (NNE), which combines ensemble learning principles with neural networks for classification tasks, particularly in the context of gene expression analysis. While the concept of weak learnability equalling strong learnability has been previously discussed, NNE’s unique features,...
Article
Walkability principles are an important part in the planning process of cities that face urban problems such as gentrification, pollution, and decay of their built heritage. The proposed factors – connectivity, proximity, land use mix, and retail density – form a comprehensive framework for evaluating walkability that transcends the boundaries of h...
Article
Full-text available
The COVID-19 pandemic has had a profound impact on various aspects of our lives, affecting personal, occupational, economic, and social spheres. Much has been learned since the early 2020s, which will be very useful when the next pandemic emerges. In general, mobility and virus spread are strongly related. However, most studies analyze the impact o...
Article
Full-text available
There are several goals of the two-dimensional data analysis: one may be interested in searching for groups of similar objects (clustering), another one may be focused on searching for some dependencies between a specified one and other variables (classification, regression, associate rules induction), and finally, some may be interested in serchin...
Chapter
In this work, we discuss the most determinant variables in predicting the first-grade university courses (period 2021-I) of students who course a degree in the Faculty of Engineering in the Jorge Basadre Grohmann National University in times of pandemic. We used Machine Learning algorithms to determine the relations between high school grades, stud...
Article
Full-text available
The human brain works in such a complex way that we have not yet managed to decipher its functional mysteries. It has five main channels that act as information input: the senses. Sight, hearing, taste, smell, and touch generate information that flows from their corresponding receptors, i.e., the eyes, ears, tongue, nose, and skin, that help us und...
Article
Full-text available
Since the envisioning of the concept of Artificial Intelligence in the 1950s, the interest in making machines emulate human behavior has increased, scientific dedication has grown, and, consequently, new concepts have appeared, with unequal success [...]
Article
Full-text available
The Internet generates large volumes of data at a high rate, in particular, posts on social networks. Although social network data have numerous semantic adulterations and are not intended to be a source of geo-spatial information, in the text of posts we find pieces of important information about how people relate to their environment, which can b...
Article
Full-text available
Quality of predictive models is a critical factor. Many evaluation measures have been proposed for binary and multi–class datasets. However, less attention has been paid to graphical representation of the classification performance, where the ROC curve is extensively used for binary datasets but there is no standard method accepted by the scientifi...
Preprint
Full-text available
The Internet generates large volumes of data at a high rate, in particular, posts on social networks. Although social network data has numerous semantic adulterations, and is not intended to be a source of geo-spatial information, in the text of posts we find pieces of important information about how people relate to their environment, which can be...
Preprint
Full-text available
Biclustering is a powerful approach to search for patterns in data, as it can be driven by a function that measures the quality of diverse types of patterns of interest. However, due to its computational complexity, the exploration of the search space is usually guided by an algorithmic strategy, sometimes introducing random factors that simplify t...
Article
Full-text available
Background Biclustering algorithms search for groups of genes that share the same behavior under a subset of samples in gene expression data. Nowadays, the biological knowledge available in public repositories can be used to drive these algorithms to find biclusters composed of groups of genes functionally coherent. On the other hand, a distance am...
Article
Full-text available
Background: Biological networks are used to represent interactions involving genes, DNA, RNA and proteins that are able to manipulate many cellular processes. Objective: The aim of this study is to evaluate whether prior knowledge can improve the quality of biological networks, in particular protein-protein interaction networks and gene regulatory...
Article
Full-text available
The prediction of protein structures is a current issue of great significance in structural bioinformatics. More specifically, the prediction of the tertiary structure of a protein con- sists in determining its three-dimensional conformation based solely on its amino acid sequence. This study proposes a method in which protein fragments are assembl...
Conference Paper
Biclustering is an unsupervised machine learning technique that simultaneously clusters genes and conditions in gene expression data. Gene Ontology (GO) is usually used in this context to validate the biological relevance of the results. However, although the integration of biological information from different sources is one of the research direct...
Conference Paper
In this work, we have extended the experimental analysis about an encoding approach for evolutionary-based algorithms proposed in [1], called probabilistic encoding. The potential of this encoding for complex problems is huge, as candidate solutions represent regions, instead of points, of the search space. We have tested in the context of gene exp...
Conference Paper
Full-text available
Las series de tiempo es un tipo de datos complejos importantes. Debido a la alta dimensionalidad, los métodos probados en los campos de la minería de datos y reconocimiento de patrones no son adecuados para el tratamiento de este tipo de datos. Como resultado, se han desarrollado varias representaciones de series de tiempo, capaces de conseguir una...
Conference Paper
Full-text available
El cultivo y producción del tabaco (nicotiana tabacum), especialmente en los países tropicales, es altamente dependiente de los plaguicidas. No obstante, la aplicación de plaguicidas a menudo no es eficaz y es peligrosa para los seres humanos y el medio ambiente. Por otra parte, es bien conocido que algunos metabolitos secundarios juegan un papel e...
Conference Paper
Full-text available
Resumen. La predicción de estructuras de proteínas es una de las áreas de investigación más importantes para la Bioinformática. Se conoce que la estabilidad del plegamiento de las proteínas está asociada a la interac-ción entre aminoácidos y su vecindad. Varias técnicas han sido empleadas para mejorar la predicción de esas interacciones entre resid...
Article
The Regression Network plugin for Cytoscape (RegNetC) implements the RegNet algorithm for the inference of transcriptional association network from gene expression profiles. This algorithm is a model tree-based method to detect the relationship between each gene and the remaining genes simultaneously instead of analyzing individually each pair of g...
Article
In this paper, we introduce FoDT, a new algorithm for the prediction of proteins contact map, one of the great challengers of the Bioinformatics. The need of more accurate predictions, aims to combining classifiers, beyond complexity increase. The proposed methodology can be considered as a set of cooperative classifiers, which employs a not traina...
Article
Full-text available
Biclustering has become a popular technique for the study of gene expression data, especially for discovering functionally related gene sets under different subsets of experimental conditions. Most of biclustering approaches use a measure or cost function that determines the quality of biclusters. In such cases, the development of both a suitable h...
Article
A variety of approaches for protein inter-residue contact prediction have been developed in recent years. However, this problem is far from being solved yet. In this article, we present an efficient nearest neighbor (NN) approach, called PKK-PCP, and an application for the protein inter-residue contact prediction. The great strength of using this a...
Article
Full-text available
Gene expression data analysis is based on the assumption that co-expressed genes imply co-regulated genes. This assumption is being reformulated because the co-expression of a group of genes may be the result of an independent activation with respect to the same experimental condition and not due to the same regulatory regime. For this reason, trad...
Article
Full-text available
An noticeable number of biclustering approaches have been proposed proposed for the study of gene expression data, especially for discovering functionally related gene sets under different subsets of experimental conditions. In this context, recognizing groups of co-expressed or co-regulated genes, that is, genes which follow a similar expression p...
Article
The prediction of contact maps in protein is a challenging topic for the determination of three-dimensional protein structures. In this paper, we introduce Forest of Decision Trees, a methodology for the prediction of protein contact maps based on (1) a divide-and-conquer approach to analyze the prediction problem; (2) a codification vector that co...
Article
The prediction of protein structures remains as a challenge for the scientific community. For the construction of the adequate models, several authors has been explored complex heuristic like the recurrent neural networks, multilayer support vector machines, bio-inspired algorithms or the combination of classifiers, but all these efforts are not en...
Conference Paper
This work proposes an improvement of the multi-objective evolutionary method for the protein residue-residue contact prediction called MECoMaP. This method bases its prediction on physico- chemical properties of amino acids, structural features and evolutionary information of the proteins. The evolutionary algorithm produces a set of decision rules...
Conference Paper
Full-text available
Protein structure prediction remains one of the most important challenges in molecular biology. Contact maps have been extensively used as a simplified representation of protein structures. In this work, we propose a multi-objective evolutionary approach for contact map prediction. The proposed method bases the prediction on a set of physico-chemic...
Article
Full-text available
Background Biclustering algorithms for microarray data aim at discovering functionally related gene sets under different subsets of experimental conditions. Due to the problem complexity and the characteristics of microarray datasets, heuristic searches are usually used instead of exhaustive algorithms. Also, the comparison among different techniqu...
Article
Full-text available
Gene networks (GNs) have become one of the most important approaches for modelling gene-gene relationships in Bioinformatics (Hecker et al, 2009). These networks allow us to carry out studies of different biological processes in a visual way. Many GN inference algorithms have been developed as techniques for extracting biological knowledge (Ponzoni...
Conference Paper
Mining data streams has attracted the attention of the scientific community in recent years with the development of new algorithms for processing and sorting data in this area. Incremental learning techniques have been used extensively in these issues. A major challenge posed by data streams is that their underlying concepts can change over time. T...
Article
Full-text available
Protein structure prediction is currently one of the main open challenges in Bioinformatics. The protein contact map is an useful, and commonly used, representation for protein 3D structure and represents binary proximities (contact or non-contact) between each pair of amino acids of a protein. In this work, we propose a multi-objective evolutionar...
Article
We address the feature subset selection problem for classification tasks. We examine the performance of two hybrid strategies that directly search on a ranked list of features and compare them with two widely used algorithms, the fast correlation based filter (FCBF) and sequential forward selection (SFS). The proposed hybrid approaches provide the...
Article
Full-text available
The prediction of a protein's contact map has become in recent years, a crucial stepping stone for the prediction of the complete 3D structure of a protein. In this article, we describe a methodology for this problem that was shown to be successful in CASP8 and CASP9. The methodology is based on (i) the fusion of the prediction of a variety of stru...
Conference Paper
Protein contact map prediction is one of the most important intermediate steps of the protein folding prediction problem. In this research we want to know how a decision tree predictor based on short-range interactions can learn the correlation among the covalent structures of a protein residues. The proposed solution predicts protein contact maps...
Article
Data mining methods in software engineering are becoming increasingly important as they can support several aspects of the software development life-cycle such as quality. In this work, we present a data mining approach to induce rules extracted from static software metrics characterising fault-prone modules. Due to the special characteristics of t...
Conference Paper
We present a multi-objective evolutionary approach to predict protein contact maps. The algorithm provides a set of rules, inferring whether there is contact between a pair of residues or not. Such rules are based on a set of specific amino acid properties. These properties determine the particular features of each amino acid represented in the rul...
Conference Paper
Full-text available
Protein structure prediction consists in determining the thre-e-dimensional conformation of a protein based only on its amino acid sequence. This is currently a difficult and significant challenge in structural bioinformatics because these structures are necessary for drug designing. This work proposes a method that reconstructs protein structures...
Conference Paper
Full-text available
In this paper, we focus on protein contact map prediction, one of the most important intermediate steps of the protein folding problem. The objective of this research is to know how short-range interactions can contribute to a system based on decision trees to learn about the correlation among the covalent structures of a protein residues. We propo...
Article
Full-text available
Con la llegada del Espacio Europeo de Educación Superior (EEES), las estrategias didácticas deben cambiar para centrarse en el aprendizaje del estudiante, convirtiendo al alumno en un elemento activo dentro de su aprendizaje, incentivando su participación, de tal manera que se sienta parte activa del proceso de aprendizaje. En la asignatura de ISG2...
Article
Full-text available
RESUMEN El objetivo de este artículo es el de presentar tres casos prácticos, en el ámbito de tres asignaturas de la Titulación en Ingeniería Técnica en Informática de Gestión de la Universidad Pablo de Olavide, en los que el trabajo autónomo del alumno ha sido la herramienta utilizada para solventar la problemática provocada por la reducción de ho...
Article
Full-text available
In this paper we propose an approach based on evolutionary computation for the prediction of secondary protein structure motifs. The prediction model consists of a set of rules that predict both the beginning and the end of the regions corresponding to a secondary structure state conformation (α-helix or β-strand). The prediction is based on a set...
Conference Paper
In this paper, we propose a greedy clustering algorithm to identify groups of related genes and a new measure to improve the results of this algorithm. Clustering algorithms analyze genes in order to group those with similar behavior. Instead, our approach groups pairs of genes that present similar positive and/or negative interactions. In order to...
Conference Paper
Full-text available
The Protein Structure Prediction (PSP) problem consists of predicting the structure of a protein from its amino acids sequence, and have received much attention lately. In fact, being able to predict the structure of a protein, would allow to know the function of the protein. In this paper, we propose a multi-objective evolutionary algorithm for th...
Article
The identification of regulatory modules is one of the most important tasks in order to discover disease markers. This paper presents a methodology to infer coexpression networks based on local patterns in gene expression data matrix. In the proposed algorithm two steps can clearly be differentiated. Firstly, a Biclustering procedure that uses a Sc...
Article
The majority of the biclustering approaches for microarray data analysis use the Mean Squared Residue (MSR) as the main evaluation measure for guiding the heuristic. MSR has been proven to be inefficient to recognize several kind of interesting patterns for biclusters. Transposed Virtual Error (VEt) has recently been discovered to overcome MSR draw...
Conference Paper
Knowledge extraction from gene expression data has been one of the main challenges in the bioinformatics field during the last few years. In this context, a particular kind of data, data retrieved in a temporal basis (also known as time series), provide information about the way a gene can be expressed during time. This work presents an exhaustive...
Conference Paper
Scatter Search is a population-based metaheuristic that emphasizes systematic processes against random procedures. A local search procedure is added to a Scatter Search for Biclustering in order to improve the quality of biclusters. This local search constitutes the existing Improvement Method in most of Scatter Search schemes which intensifies the...
Conference Paper
In the last few years, DNA microarray technology has attained a very important role in biological and biomedical research. It enables analyzing the relations among thousands of genes simultaneously, generating huge amounts of data. The gene networks represent, in a graph data structure, genes or gene products and the functional relationships betwee...
Article
Full-text available
The great amount of biological information provides scientists with an incomparable framework for testing the results of new algorithms. Several tools have been developed for analysing gene-enrichment and most of them are Gene Ontology-based tools. We developed a Kyoto Encyclopedia of Genes and Genomes (Kegg)-based tool that provides a friendly gra...
Article
Full-text available
The prediction of protein structures is a current issue of great significance in structural bioinformatics. More specifically, the prediction of the tertiary structure of a protein consists in determining its three-dimensional conformation based solely on its amino acid sequence. This study proposes a method in which protein fragments are assembled...
Article
Full-text available
This paper presents a new approach to forecast the behavior of time series based on similarity of pattern sequences. First, clustering techniques are used with the aim of grouping and labeling the samples from a data set. Thus, the prediction of a data point is provided as follows: first, the pattern sequence prior to the day to be predicted is ext...
Article
The forecasting process of real-world time series has to deal with especially unexpected values, commonly known as outliers. Outliers in time series can lead to unreliable modeling and poor forecasts. Therefore, the identification of future outlier occurrence is an essential task in time series analysis to reduce the average forecasting error. The...
Article
Full-text available
The Gene Ontology (GO) provides a controlled vocabulary for describing the functions of genes and can be used to evaluate the functional coherence of gene sets. Many functional coherence measures consider each pair of gene functions in a set and produce an output based on all pairwise distances. A single gene can encode multiple proteins that may d...
Article
Full-text available
Binary datasets represent a compact and simple way to store data about the relationships between a group of objects and their possible properties. In the last few years, different biclustering algorithms have been specially developed to be applied to binary datasets. Several approaches based on matrix factorization, suffix trees or divide-and-conqu...
Conference Paper
Full-text available
In this paper we propose a novel representation scheme, called probabilistic encoding. In this representation, each gene of an individual represents the probability that a certain trait of a given problem has to belong to the solution. This allows to deal with uncertainty that can be present in an optimization problem, and grant more exploration ca...
Conference Paper
Mining data streams is a field of study that poses new challenges. This research delves into the study of applying different techniques of classification of data streams, and carries out a comparative analysis with a proposal based on similarity; introducing a new form of management of representative data models and policies of insertion and remova...