About
417
Publications
298,789
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
20,698
Citations
Introduction
Sebastián Ventura is a Professor of Computing and Artificial Intelligence at the University of Córdoba. His teaching is devoted to machine learning and artificial intelligence. His research labor is developed as head and researcher of the "Knowledge Discovery and Intelligent Systems" (KDIS) research group, and it is focused on data science, data analytics, big data, machine learning, data mining, and its applications.
Additional affiliations
April 2016 - present
September 1998 - April 2016
January 2015 - December 2018
Ministry of Economy and Competitiveness, Spain
Position
- More Flexible Representations in Data Mining
Education
September 1993 - July 1996
Publications
Publications (417)
Sentiment analysis on big data presents unique challenges due to the volume of unstructured data. Traditional single-node systems struggle with this scale, necessitating the use of distributed computing systems like Apache Spark. This study investigates the role of large-scale data preprocessing and feature extraction in sentiment analysis tasks. W...
Alterations in alternative splicing are emerging as a novel hallmark in cancer biology, offering new insights. However, integrative analyses of splicing are still scarce, particularly in rare cancers such as pancreatic neuroendocrine tumors (PanNETs). These tumors are highly heterogeneous, complicating diagnosis and treatment. This study is the fir...
Data stream learning is a very relevant paradigm because of the increasing real-world scenarios generating data at high velocities and in unbounded sequences. Stream learning aims at developing models that can process instances as they arrive, so models constantly adapt to new concepts and the temporal evolution in the stream. In multi-label data s...
Data stream learning is a very relevant paradigm because of the increasing real-world scenarios generating data at high velocities and in unbounded sequences. Stream learning aims at developing models that can process instances as they arrive, so models constantly adapt to new concepts and the temporal evolution in the stream. In multi-label data s...
Este estudio tiene como objetivo explorar las posibilidades de la supervisión
débil aplicadas al mantenimiento en el ámbito militar, implementando
las propuestas de la industria 4.0 y 5.0. Nos centramos en la
monitorización de variables, su etiquetado y el desarrollo de modelos
mediante técnicas de aprendizaje supervisado débil, de acuerdo con los...
Machine learning and medical diagnostic studies often struggle with the issue of class imbalance in medical datasets, complicating accurate disease prediction and undermining diagnostic tools. Despite ongoing research efforts, specific characteristics of medical data frequently remain overlooked. This article comprehensively reviews advances in add...
Entrepreneurial activity, a subject of enduring intrigue among scholars, continues to captivate attention, especially in distinct contexts such as Morocco. This study undertakes the formidable task of comprehending and forecasting entrepreneurial activity using the comprehensive Global Entrepreneurship Monitor (GEM) dataset for Morocco. Employing a...
A mediados del siglo XX, se compuso la Suite Illiac para cuarteto de cuerdas, la cual está considerada como la primera obra en la que se emplea una computadora durante el proceso de composición. Este hecho marcó un hito significativo en el ámbito de la tecnología como generadora de música mediante algoritmos, gracias a los trabajos pioneros realiza...
A mediados del siglo XX, se compuso la Suite Illiac para cuarteto de cuerdas, la cual está considerada como la primera obra en la que se emplea una computadora durante el proceso de composición. Este hecho marcó un hito significativo en el ámbito de la tecnología como generadora de música mediante algoritmos, gracias a los trabajos pioneros realiza...
Unemployment, a significant economic and social challenge, triggers repercussions that affect individual workers and companies, generating a national economic impact. Forecasting the unemployment rate becomes essential for policymakers, allowing them to make short-term estimates, assess economic health, and make informed monetary policy decisions....
In recent years, significant attention has been paid to fuzzy recommender systems for housing, highlighting their ability to effectively handle the imprecision and uncertainty inherent in the real estate market. With the objective of improving the filtering of recommendations in the real estate sector, the PRISMA 2020 methodology was applied to per...
Sequential pattern mining is a dynamic and thriving research field that aims to extract recurring sequences of events from complex datasets. Traditionally, focusing solely on the order of events often falls short of providing precise insights. Consequently, incorporating the temporal intervals between events has emerged as a vital necessity across...
Hyper-parameter tuning of machine learning models has become a crucial task in achieving optimal results in terms of performance. Several researchers have explored the optimisation task during the last decades to reach a state-of-the-art method. However, most of them focus on batch or offline learning, where data distributions do not change arbitra...
Background
Lung neuroendocrine neoplasms (LungNENs) comprise a heterogeneous group of tumors ranging from indolent lesions with good prognosis to highly aggressive cancers. Carcinoids are the rarest LungNENs, display low to intermediate malignancy and may be surgically managed, but show resistance to radiotherapy/chemotherapy in case of metastasis....
El mantenimiento predictivo ha supuesto un importante hito en la forma en la que los sistemas industriales se analizan con el fin de detectar anomalías en el funcionamiento y posibles fallos antes de que éstos ocurran. En este trabajo se presenta una Herramienta de Sostenimiento Avanzado (HSA) del Ejército de Tierra que permite mejorar la planifica...
This paper introduces a spiking neural network able to learn multiple tasks using their unique characteristic, namely, that their behavior can be changed based on the modulation of the firing threshold of spiking neurons. We designed and tested a threshold-modulated spiking neural network (TM-SNN) to solve multiple classification tasks using the ap...
Super-resolution is an area of Computer Vision comprising various techniques to recover a high-resolution image from a low-resolution counterpart. These techniques can also be used to enhance a low-resolution input image without a native high-resolution original. Single Image Super-Resolution (SISR) techniques aim to do this in a picture-by-picture...
Clustering is an unsupervised learning task that groups objects in a multi-dimensional space based on similarity criteria. The goal is to make groups that contain objects that are similar to each other and different from other groups. This work proposes a novelty genetic algorithm to solve the clustering problem based on partitions and estimate aut...
Knowledge extraction through machine learning techniques has been successfully applied in a large number of application domains. However, apart from the required technical knowledge and background in the application domain, it usually involves a number of time-consuming and repetitive steps. Automated machine learning (AutoML) emerged in 2014 as an...
The use of backpropagation through the time learning rule enabled the supervised training of deep spiking neural networks to process temporal neuromorphic data. However, their performance is still below non-spiking neural networks. Previous work pointed out that one of the main causes is the limited number of neuromorphic data currently available,...
Las instituciones de educación superior se enfrentan actualmente a varios retos frente a los sistemas de evaluación informatizados para conseguir llegar al conocimiento inmerso de textos no estructurado. La aplicación de análisis de sentimiento mediante aprendizaje automático favorece la exploración de textos no estructurado para la gestión educati...
La evaluación por pares puede ser útil en todos los niveles educativos existentes; una herramienta utilizada para este tipo de evaluación es la rúbrica, instrumento cuya principal finalidad es compartir los criterios de realización de las tareas de aprendizaje y de evaluación con los estudiantes y entre el profesorado. El propósito de esta investig...
The task of detection of common and unique characteristics among different cancer subtypes is an important focus of research that aims to improve personalized therapies. Unlike current approaches mainly based on predictive techniques, our study aims to improve the knowledge about the molecular mechanisms that descriptively led to cancer, thus not r...
Background
Lung neuroendocrine neoplasms (LungNENs) comprise a heterogeneous group of tumors ranging from indolent lesions with good prognosis to highly aggressive cancers. Carcinoids are the rarest LungNENs, display low to intermediate malignancy and may be surgically managed, but show resistance to radiotherapy/chemotherapy in case of metastasis....
Mining high utility itemsets is an emerging and very active research area in data mining. The goal is to mine all itemsets with a utility value, in terms of importance to the user, no less than a predefined threshold value. Setting an appropriate threshold value is not trivial, requiring not only multiple trials but also the know-how in the applica...
Resumen Existen numerosos problemas de clasicación de creciente actualidad en los que un patrón puede tener asignadas varias clases de for-ma simultánea. Este tipo de problemas, de-nominados problemas de clasicación multi-etiqueta, deben ser abordados con técnicas es-pecícas que generen modelos de clasicación más precisos que los obtenidos mediante...
Early melanoma diagnosis is the most important factor in the treatment of skin cancer and can effectively reduce mortality rates. Recently, Generative Adversarial Networks have been used to augment data, prevent overfitting and improve the diagnostic capacity of models. However, its application remains a challenging task due to the high levels of i...
Teacher evaluation is presented as an object of study of great interest, where multiple efforts converge to establish models from the association of heterogeneous data from academic actors, one of these is the students' community, who stands out for their contribution with rich data information for the establishment of teacher evaluation in higher...
Recently, Convolutional Neural Networks have achieved performance levels similar to those achieved by dermatologists. However, the diagnosis of melanoma remains a challenging task, mainly due to the high inter and intra-class variability in images of moles. This paper introduces a new framework to improve the state-of-the-art effective melanoma dia...
perdona que estoy en clase. Te lo mando:
In the airline industry, the Revenue and Pricing teams generally spend a considerable amount of time analysing and interpreting the actions of their competitors. Most of the time the analysts have to use their analytical skills to create ad-hoc methods to interpret or find patterns in the fares. In this fi...
Students’ performance prediction is one of the essential educational data mining research fields. Predicting students’ performance aims at improving the learning process inside educational institutions. This is achieved by early prediction of at-risk students who are vulnerable to drop out to help them and improve their performance sooner. Therefor...
En este trabajo, se consideran los parámetros obtenidos en los análisis de las muestras realizados por el Laboratorio Central de Ejército (LCE) cuyo fin es el de determinar la conformidad para el servicio de los aceites lubricantes y líquidos hidráulicos empleados en las plataformas del Ejército de Tierra. A partir de los que se realiza un estudio...
El mantenimiento de instalaciones industriales ha sido siempre una tarea crítica para garantizar el buen funcionamiento de los sistemas y su disponibilidad. Las estrategias de mantenimiento tradicionales han estado marcadas por enfoques correctivos y preventivos. Sin embargo, los últimos avances en sensorización y aprendizaje automático han impulsa...
In this paper we explore capabilities of spiking neural networks in solving multi-task classification problems using the approach of single-tasking of multiple tasks. We designed and implemented a multi-task spiking neural network (MT-SNN) that can learn two or more classification tasks while performing one task at a time. The task to perform is se...
The multi-label classification task has been widely used to solve problems where each of the instances may be related not only to one class but to many of them simultaneously. Many of these problems usually comprise a high number of labels in the output space, so learning a predictive model from such datasets may turn into a challenging task since...
To provide a good study plan is key to avoid students’ failure. Academic advising based on student’s preferences, complexity of the semester, or even background knowledge is usually considered to reduce the dropout rate. This article aims to provide a good course index to recommend courses to students based on the sequence of courses already taken...
Dysregulation of the splicing machinery is emerging as a hallmark in cancer due to its association with multiple dysfunctions in tumor cells. Inappropriate function of this machinery can generate tumor-driving splicing variants and trigger oncogenic actions. However, its role in pancreatic neuroendocrine tumors (PanNETs) is poorly defined. In this...
Predictive maintenance is a field of study whose main objective is to optimize the timing and type of maintenance to perform on various industrial systems. This aim involves maximizing the availability time of the monitored system and minimizing the number of resources used in maintenance. Predictive maintenance is currently undergoing a revolution...
En la primera parte de este trabajo, se exponen los resultados de una encuesta realizada entre empresas y autónomos andaluces sobre el conocimiento y el uso de esta tecnología en el ámbito empresarial. Este estudio ha revelado algunos resultados que deben ser considerados por los diferentes agentes e instituciones de cara a impulsar la adopción de...
Students’ engagements reflect their level of involvement in an ongoing learning process which can be estimated through their interactions with a computer-based learning or assessment system. A pre-requirement for stimulating student engagement lies in the capability to have an approximate representation model for comprehending students’ varied (dis...
Peer evaluation consists of the evaluation of students by their peers following criteria or rubrics provided by the teacher, where the way to evaluate students is specified so that they achieve the desired competencies. The quality of the measurement instrument must meet two essential criteria: validity and reliability. In this research, we explore...
Knowledge discovery is a complex process involving several phases. Some of them are repetitive and time-consuming, so they are susceptible of being automated. As an example, the large number of machine learning algorithms, together with their hyper-parameters, constitutes a vast search space to explore. In this vein, the term AutoML was coined to e...
Dysregulation of the splicing machinery is emerging as a hallmark in cancer due to its association with multiple dysfunctions in tumor cells. Inappropriate function of this machinery can generate tumor-driving splicing variants and trigger oncogenic actions. However, its role in pancreatic neuroendocrine tumors (PanNETs) is poorly defined. In this...
Applying data mining for improving the outcomes of the educational process has become one of the most significant areas of research. The most important corner stone in the educational process is students' performance. Therefore, early prediction of students' performance aims to assist at-risk students by providing appropriate and early support and...
In this paper, we applied a peer assessment scenario at the Technical University of Manabí (Ecuador). Students and professors evaluated some works through rubrics, assigned a numerical score, and provided textual feedback grounding why such a numerical score was determined, to detect inaccuracy between both assessments. The proposed model uses soft...
Background
Pancreatic ductal adenocarcinoma (PDAC) is a highly lethal cancer, requiring novel treatments to target both cancer cells and cancer stem cells (CSCs). Altered splicing is emerging as both a novel cancer hallmark and an attractive therapeutic target. The core splicing factor SF3B1 is heavily altered in cancer and can be inhibited by Plad...
Melanoma is one of the main causes of cancer-related deaths. The development of new computational methods as an important tool for assisting doctors can lead to early diagnosis and effectively reduce mortality. In this work, we propose a convolutional neural network architecture for melanoma diagnosis inspired by ensemble learning and genetic algor...
Skin cancer is one of the most common types of cancers in the world, being melanoma the most lethal form. Automatic melanoma diagnosis from skin images has recently gained attention within the machine learning community, due to the complexity involved. In the past few years, convolutional neural network models have been commonly used to approach th...
Resumen-La tarea de clustering o agrupamiento consiste en encontrar la mejor agrupación de patrones en función de un criterio de similitud o disimilitud entre ellos. De esta forma, se busca que los patrones dentro de un clúster sean muy similares entre ellos y disimilares de otros clústeres. Definir el criterio de similitud entre patrones resulta a...
Multi-label classification has been used to solve a wide range of problems where each example in the dataset may be related either to one class (as in traditional classification problems) or to several class labels at the same time. Many ensemble-based approaches have been proposed in the literature, aiming to improve the performance of traditional...
Background
Pancreatic ductal adenocarcinoma (PDAC) remains an appallingly lethal cancer, requiring novel treatments to target both cancer cells and cancer stem cells (CSCs). Altered splicing is emerging as a novel cancer hallmark and attractive therapeutic target. The core splicing factor SF3B1 is heavily altered in cancer and can be inhibited by P...
This paper presents an approach based on emerging pattern mining to analyse cancer through genomic data. Unlike existing approaches, mainly focused on predictive purposes, the proposed approach aims to improve the understanding of cancer in a descriptive way, not requiring either any prior knowledge or hypothesis to be validated. Additionally, it e...
The propositionalization process tries to find distinctive features of the examples in a database to transform such relational data into a simpler representation. More informative features have a positive impact on the classification capabilities of the learning algorithms. In this work, we propose a new propositionalization method, which generates...
In this paper we present a Competitive Rate-Based Algorithm (CRBA) that approximates operation of a Competitive Spiking Neural Network (CSNN). CRBA is based on modeling of the competition between neurons during a sample presentation, which can be reduced to ranking of the neurons based on a dot product operation and the use of a discrete Expectatio...