Sebastian Ventura

Sebastian Ventura
University of Cordoba (Spain) | UCO · Department of Computer Sciences and Numerical Analysis

Ph.D.

About

371
Publications
245,336
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
16,662
Citations
Citations since 2016
166 Research Items
11550 Citations
201620172018201920202021202205001,0001,500
201620172018201920202021202205001,0001,500
201620172018201920202021202205001,0001,500
201620172018201920202021202205001,0001,500
Introduction
Sebastián Ventura is a Professor of Computing and Artificial Intelligence at the University of Córdoba. His teaching is devoted to machine learning and artificial intelligence. His research labor is developed as head and researcher of the "Knowledge Discovery and Intelligent Systems" (KDIS) research group, and it is focused on data science, data analytics, big data, machine learning, data mining, and its applications.
Additional affiliations
April 2016 - present
University of Cordoba (Spain)
Position
  • Professor (Full)
January 2015 - December 2018
Ministry of Economy and Competitiveness, Spain
Ministry of Economy and Competitiveness, Spain
Position
  • More Flexible Representations in Data Mining
April 2013 - present
King Abdulaziz University
Position
  • Distinguished Adjunct Professor
Education
September 1993 - July 1996

Publications

Publications (371)
Article
Recently, Convolutional Neural Networks have achieved performance levels similar to those achieved by dermatologists. However, the diagnosis of melanoma remains a challenging task, mainly due to the high inter and intra-class variability in images of moles. This paper introduces a new framework to improve the state-of-the-art effective melanoma dia...
Article
Full-text available
perdona que estoy en clase. Te lo mando: In the airline industry, the Revenue and Pricing teams generally spend a considerable amount of time analysing and interpreting the actions of their competitors. Most of the time the analysts have to use their analytical skills to create ad-hoc methods to interpret or find patterns in the fares. In this fi...
Chapter
Students’ performance prediction is one of the essential educational data mining research fields. Predicting students’ performance aims at improving the learning process inside educational institutions. This is achieved by early prediction of at-risk students who are vulnerable to drop out to help them and improve their performance sooner. Therefor...
Conference Paper
Full-text available
En este trabajo, se consideran los parámetros obtenidos en los análisis de las muestras realizados por el Laboratorio Central de Ejército (LCE) cuyo fin es el de determinar la conformidad para el servicio de los aceites lubricantes y líquidos hidráulicos empleados en las plataformas del Ejército de Tierra. A partir de los que se realiza un estudio...
Preprint
Full-text available
In this paper we explore capabilities of spiking neural networks in solving multi-task classification problems using the approach of single-tasking of multiple tasks. We designed and implemented a multi-task spiking neural network (MT-SNN) that can learn two or more classification tasks while performing one task at a time. The task to perform is se...
Article
Full-text available
The multi-label classification task has been widely used to solve problems where each of the instances may be related not only to one class but to many of them simultaneously. Many of these problems usually comprise a high number of labels in the output space, so learning a predictive model from such datasets may turn into a challenging task since...
Article
Full-text available
To provide a good study plan is key to avoid students’ failure. Academic advising based on student’s preferences, complexity of the semester, or even background knowledge is usually considered to reduce the dropout rate. This article aims to provide a good course index to recommend courses to students based on the sequence of courses already taken...
Article
Dysregulation of the splicing machinery is emerging as a hallmark in cancer due to its association with multiple dysfunctions in tumor cells. Inappropriate function of this machinery can generate tumor-driving splicing variants and trigger oncogenic actions. However, its role in pancreatic neuroendocrine tumors (PanNETs) is poorly defined. In this...
Article
Full-text available
Predictive maintenance is a field of study whose main objective is to optimize the timing and type of maintenance to perform on various industrial systems. This aim involves maximizing the availability time of the monitored system and minimizing the number of resources used in maintenance. Predictive maintenance is currently undergoing a revolution...
Article
Full-text available
Students’ engagements reflect their level of involvement in an ongoing learning process which can be estimated through their interactions with a computer-based learning or assessment system. A pre-requirement for stimulating student engagement lies in the capability to have an approximate representation model for comprehending students’ varied (dis...
Article
Full-text available
Peer evaluation consists of the evaluation of students by their peers following criteria or rubrics provided by the teacher, where the way to evaluate students is specified so that they achieve the desired competencies. The quality of the measurement instrument must meet two essential criteria: validity and reliability. In this research, we explore...
Chapter
Knowledge discovery is a complex process involving several phases. Some of them are repetitive and time-consuming, so they are susceptible of being automated. As an example, the large number of machine learning algorithms, together with their hyper-parameters, constitutes a vast search space to explore. In this vein, the term AutoML was coined to e...
Preprint
Full-text available
Dysregulation of the splicing machinery is emerging as a hallmark in cancer due to its association with multiple dysfunctions in tumor cells. Inappropriate function of this machinery can generate tumor-driving splicing variants and trigger oncogenic actions. However, its role in pancreatic neuroendocrine tumors (PanNETs) is poorly defined. In this...
Article
Applying data mining for improving the outcomes of the educational process has become one of the most significant areas of research. The most important corner stone in the educational process is students' performance. Therefore, early prediction of students' performance aims to assist at-risk students by providing appropriate and early support and...
Article
Full-text available
In this paper, we applied a peer assessment scenario at the Technical University of Manabí (Ecuador). Students and professors evaluated some works through rubrics, assigned a numerical score, and provided textual feedback grounding why such a numerical score was determined, to detect inaccuracy between both assessments. The proposed model uses soft...
Article
Full-text available
Background Pancreatic ductal adenocarcinoma (PDAC) is a highly lethal cancer, requiring novel treatments to target both cancer cells and cancer stem cells (CSCs). Altered splicing is emerging as both a novel cancer hallmark and an attractive therapeutic target. The core splicing factor SF3B1 is heavily altered in cancer and can be inhibited by Plad...
Article
Full-text available
Melanoma is one of the main causes of cancer-related deaths. The development of new computational methods as an important tool for assisting doctors can lead to early diagnosis and effectively reduce mortality. In this work, we propose a convolutional neural network architecture for melanoma diagnosis inspired by ensemble learning and genetic algor...
Article
Full-text available
Skin cancer is one of the most common types of cancers in the world, being melanoma the most lethal form. Automatic melanoma diagnosis from skin images has recently gained attention within the machine learning community, due to the complexity involved. In the past few years, convolutional neural network models have been commonly used to approach th...
Conference Paper
Full-text available
Resumen-La tarea de clustering o agrupamiento consiste en encontrar la mejor agrupación de patrones en función de un criterio de similitud o disimilitud entre ellos. De esta forma, se busca que los patrones dentro de un clúster sean muy similares entre ellos y disimilares de otros clústeres. Definir el criterio de similitud entre patrones resulta a...
Article
Full-text available
Multi-label classification has been used to solve a wide range of problems where each example in the dataset may be related either to one class (as in traditional classification problems) or to several class labels at the same time. Many ensemble-based approaches have been proposed in the literature, aiming to improve the performance of traditional...
Preprint
Full-text available
Background Pancreatic ductal adenocarcinoma (PDAC) remains an appallingly lethal cancer, requiring novel treatments to target both cancer cells and cancer stem cells (CSCs). Altered splicing is emerging as a novel cancer hallmark and attractive therapeutic target. The core splicing factor SF3B1 is heavily altered in cancer and can be inhibited by P...
Article
Full-text available
This paper presents an approach based on emerging pattern mining to analyse cancer through genomic data. Unlike existing approaches, mainly focused on predictive purposes, the proposed approach aims to improve the understanding of cancer in a descriptive way, not requiring either any prior knowledge or hypothesis to be validated. Additionally, it e...
Article
The propositionalization process tries to find distinctive features of the examples in a database to transform such relational data into a simpler representation. More informative features have a positive impact on the classification capabilities of the learning algorithms. In this work, we propose a new propositionalization method, which generates...
Article
Full-text available
In this paper we present a Competitive Rate-Based Algorithm (CRBA) that approximates operation of a Competitive Spiking Neural Network (CSNN). CRBA is based on modeling of the competition between neurons during a sample presentation, which can be reduced to ranking of the neurons based on a dot product operation and the use of a discrete Expectatio...
Article
Full-text available
BACKGROUND: The dataset from genes used for the prediction of HCV outcome was evaluated in a previous study by means of conventional statistical methodology. OBJECTIVE: The aim of this study was reanalyze this same dataset using the data mining approach in order to find models that improve the classification accuracy of the genes studied. METHO...
Article
Full-text available
Multi-Target Regression problem comprises the prediction of multiple continuous variables given a common set of input features, unlike traditional regression tasks, where just one output target is available. There are two major challenges when addressing this problem, namely the exploration of the inter-target dependencies and the modeling of compl...
Article
Full-text available
The current state of the art in supervised descriptive pattern mining is very good in automatically finding subsets of the dataset at hand that are exceptional in some sense. The most common form, subgroup discovery, generally finds subgroups where a single target variable has an unusual distribution. Exceptional model mining (EMM) typically finds...
Article
Full-text available
Periodic frequent patterns are sets of events or items that periodically appear in a sequence of events or transactions. Many algorithms have been designed to identify periodic frequent patterns in data. However, most assume that the periodic behavior of a pattern does not change much over time. To address this limitation, this paper proposes to di...
Article
Melanoma is the type of skin cancer with the highest levels of mortality, and it is more dangerous because it can spread to other parts of the body if not caught and treated early. Melanoma diagnosis is a complex task, even for expert dermatologists, mainly due to the great variety of morphologies in moles of patients. Accordingly, the automatic di...
Chapter
Full-text available
Many algorithms have been proposed to find high utility itemsets (sets of items that yield a high profit) in customer transactions. Though, it is useful to analyze customer behavior, it ignores information about item categories. To consider a product taxonomy and find high utility itemsets describing relationships between items and categories, the...
Article
Deregulated splicing machinery components have shown to be associated with the development of several types of cancer and, therefore, the determination of such alterations can help the development of tumor-specific molecular targets for early prognosis and therapy. Determining such splicing components, however, is not a straightforward task mainly...
Article
Full-text available
The growing demand for eliciting useful knowledge from data calls for techniques that can discover insights (in the form of patterns) that users need. Methodologies for describing intrinsic and relevant properties of data through the extraction of useful patterns, however, work on fixed input data, and the data representation, therefore, constrains...
Article
Full-text available
Glioblastomas remain the deadliest brain tumour, with a dismal ∼12-16-month survival from diagnosis. Therefore, identification of new diagnostic, prognostic and therapeutic tools to tackle glioblastomas is urgently needed. Emerging evidence indicates that the cellular machinery controlling the splicing process (spliceosome) is altered in tumours, l...
Article
Full-text available
Multi-label learning is a challenging task demanding scalable methods for large-scale data. Feature selection has shown to improve multi-label accuracy while defying the curse of dimensionality of high-dimensional scattered data. However, the increasing complexity of multi-label feature selection, especially on continuous features, requires new app...
Chapter
Recent work on spiking neural networks showed good progress towards unsupervised feature learning. In particular, networks called Competitive Spiking Neural Networks (CSNN) achieve reasonable accuracy in classification tasks. However, two major disadvantages limit their practical applications: high computational complexity and slow convergence. Whi...
Article
We argue that classic citation-based scientific document clustering approaches, like co-citation or Bibliographic Coupling, lack to leverage the social-usage of the scientific literature originate through online information dissemination platforms, such as Twitter. In this paper, we present the methodology Tweet Coupling, which measures the similar...
Article
Full-text available
This survey is an updated and improved version of the previous one published in 2013 in this journal with the title “data mining in education”. It reviews in a comprehensible and very general way how Educational Data Mining and Learning Analytics have been applied over educational data. In the last decade, this research area has evolved enormously...
Article
Full-text available
Multi-view learning analyses the information from several perspectives and has largely been applied to semi-supervised contexts. It has not been extensively analyzed for inducing interpretable rule-based classifiers. We present a multi-view and grammar-based genetic programming model for inducing rules for semi-supervised contexts. It evolves sever...
Article
Existing systems to support decision-taking process based on textual information of clinical reports are insufficient. Currently, there are few systems that unify different subtasks in a single and user-friendly framework, easing therefore the clinical work by automating complex and arduous tasks such as the detection of clinical alerts as well as...
Preprint
BACKGROUND The dataset from genes used for the prediction of HCV outcome was evaluated in a previous study by means of conventional statistical methodology. OBJECTIVE The aim of this study was reanalyze this same dataset using the data mining approach in order to find models that improve the classification accuracy of the genes studied. METHODS W...
Article
Background The dataset from genes used to predict hepatitis C virus outcome was evaluated in a previous study using a conventional statistical methodology. Objective The aim of this study was to reanalyze this same dataset using the data mining approach in order to find models that improve the classification accuracy of the genes studied. Methods...
Article
Full-text available
The multi-label classification task has gained a lot of attention in the last decade thanks to its good application to many real-world problems where each object could be attached to several labels simultaneously. Several approaches based on ensembles for multi-label classification have been proposed in the literature; however, the vast majority ar...
Article
Colorectal cancer affects many people and is one of the most frequent causes of cancer-related deaths in many countries. Professionals of the Reina Sofia University Hospital have fed a database about this pathology, with 1516 patients and 126 attributes, for more than 10 years. Finding useful knowledge therein has shown to be a difficult endeavor....
Article
Full-text available
Multi-label learning generalizes traditional learning by allowing an instance to belong to multiple labels simultaneously. This causes multi-label data to be characterized by its large label space dimensionality and the dependencies among labels. These challenges have been addressed by feature selection techniques which improve the final model accu...
Article
The goal of this paper is to introduce LAC, a new Java Library for Associative Classification. LAC is the first tool that covers the full taxonomy of this classification paradigm through 10 well-known proposals in the field. Furthermore, it includes several measures to quantify the quality of the solutions as well as different input/output data for...
Article
Traditional bibliometric techniques gauge the impact of research through quantitative indices based on the citations data. However, due to the lag time involved in the citation-based indices, it may take years to comprehend the full impact of an article. This paper seeks to measure the early impact of research articles through the sentiments expres...
Article
Full-text available
Background: Dysregulation of splicing variants (SVs) expression has recently emerged as a novel cancer hallmark. Although the generation of aberrant SVs (e.g. AR-v7/sst5TMD4/etc.) is associated to prostate-cancer (PCa) aggressiveness and/or castration-resistant PCa (CRPC) development, whether the molecular reason behind such phenomena might be lin...
Article
Full-text available
Frequent itemset mining (FIM) is an essential task within data analysis since it is responsible for extracting frequently occurring events, patterns, or items in data. Insights from such pattern analysis offer important benefits in decision‐making processes. However, algorithmic solutions for mining such kind of patterns are not straightforward sin...
Article
Full-text available
To date, the subgroup discovery task has been considered in problems where a target variable is unequivocally described by a set of features, also known as instance. Nowadays, however, with the increasing interest in data storage, new data structures are being provided such as the multiple-instance data in which a target variable value is ambiguous...