Josep Salvador SánchezJaume I University | UJI · Department of Computer Languages and Systems
Josep Salvador Sánchez
PhD
About
170
Publications
56,834
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
4,754
Citations
Introduction
Additional affiliations
September 2006 - August 2007
July 1999 - January 2009
October 1992 - June 1999
Education
October 1992 - January 1998
September 1984 - July 1989
Publications
Publications (170)
Cash flow forecasting is an important task for any organization, but it becomes crucial for self-employed workers. In this paper, we model the cash flow of three real self-employed workers as a time series problem and compare the performance of conventional parametric methods against two types of fuzzy inference systems in terms of both prediction...
Accurate prediction and grading of gliomas play a crucial role in evaluating brain tumor progression, assessing overall prognosis, and treatment planning. In addition to neuroimaging techniques, identifying molecular biomarkers that can guide the diagnosis, prognosis and prediction of the response to therapy has aroused the interest of researchers...
An innovative strategy for organizations to obtain value from their large datasets, allowing them to guide future strategic actions and improve their initiatives, is the use of machine learning algorithms. This has led to a growing and rapid application of various machine learning algorithms with a predominant focus on building and improving the pe...
Accurate prediction and grading of gliomas play a crucial role in evaluating brain tumor progression, assessing overall prognosis, and treatment planning. In addition to neuroimaging techniques, identifying molecular biomarkers that can guide the diagnosis, prognosis and prediction of the response to therapy has aroused the interest of researchers...
At the beginning of 2023, there was a reform for the calculation of Social Security contributions for self-employed workers in Spain. It replaced the previous system where these workers could freely choose the amount of their contributions. Under this new policy, contribution amounts are determined according to the actual revenue of self-employed w...
The availability of rich data sets from several sources poses new opportunities to develop pattern recognition systems in a diverse array of industry, government, health, and academic areas [...]
The purpose of this paper is to present the results of a systematic literature review regarding the development of fuzzy-based models for time series forecasting in the period 2017–2021. The study was conducted using a well-established review protocol and a couple of powerful tools for bibliometric analysis to know and analyse the main approaches a...
The use of face masks in public places has emerged as one of the most effective non-pharmaceutical measures to lower the spread of COVID-19 infection. This has led to the development of several detection systems for identifying people who do not wear a face mask. However, not all face masks or coverings are equally effective in preventing virus tra...
In machine learning, a natural way to represent an instance is by using a feature vector. However, several studies have shown that this representation may not accurately characterize an object. For classification problems, the dissimilarity paradigm has been proposed as an alternative to the standard feature-based approach. Encoding each object by...
Clustering in transaction databases can find potentially useful patterns to gain some insight into the structure of the data, which can help for effective decision-making. However, one of the critical tasks in clustering is to identify the appropriate number of clusters, which will determine the performance of any process further applied to the tra...
In many real-world problems (such as industrial applications, chemistry models, social network analysis, among others), their solution can be obtained by transforming the problem in terms of vertices and edges, that is to say, using graph theory. Data Science applications are characterized by processing large volumes of data, in some cases, the dat...
The resampling methods are among the most popular strategies to face the class imbalance problem. The objective of these methods is to compensate the imbalanced class distribution by over-sampling the minority class and/or under-sampling the majority class. In this paper, a new under-sampling method based on the DBSCAN clustering algorithm is intro...
The class imbalance problem occurs when one class far outnumbers the other classes, causing most traditional classifiers perform poorly on the minority classes. To tackle this problem, a plethora of techniques have been proposed, especially centered around resampling methods. This paper introduces a two-stage method that combines the DBSCAN cluster...
Class overlap and class imbalance are two data complexities that challenge the design of effective classifiers in Pattern Recognition and Data Mining as they may cause a significant loss in performance. Several solutions have been proposed to face both data difficulties, but most of these approaches tackle each problem separately. In this paper, we...
Over the last decades, the academic and professional communities have paid much attention toward the use of multi-criteria decision-making methods in a range of business and financial problems due to the variety and complexity of their decisions. Within this branch of operations research, the value-based and outranking relations approaches stand as...
Things are the core of the Internet of Things (IoT) and must be properly characterized according to the different functions they accomplish. Identifying their capabilities and combining them as sets provides a view on the single or joint properties of existing things and guide in properly designing and building new things while maximizing their pot...
Although various algorithms have widely been studied for bankruptcy and credit risk prediction, conclusions regarding the best performing method are divergent when using different performance assessment metrics. As a solution to this problem, the present paper suggests the employment of two well-known multiple-criteria decision-making (MCDM) techni...
Data plays a key role in the design of expert and intelligent systems and therefore, data preprocessing appears to be a critical step to produce high-quality data and build accurate machine learning models. Over the past decades, increasing attention has been paid towards the issue of class imbalance and this is now a research hotspot in a variety...
Data plays a key role in the design of expert and intelligent systems and therefore, data preprocessing appears to be a critical step to produce high-quality data and build accurate machine learning models. Over the past decades, increasing attention has been paid towards the issue of class imbalance and this is now a research hotspot in a variety...
Instance selection is one of the most successful solutions to low noise tolerance of the nearest neighbor classifier. Many algorithms have been proposed in the literature, but further research in this area is still needed to complement the existing findings. Here we intend to go beyond a simple comparison of instance selection methods and correspon...
Quality in a manufacturing process implies that the performance characteristics of the product and the process itself are designed to meet specific objectives. Thus, accurate quality prediction plays a principal role in delivering high-quality products to further enhance competitiveness. In tubing extrusion, measuring of the inner and outer diamete...
Research carried out by the scientific community has shown that the performance of the classifiers depends not only on the learning rule, if not also on the complexities inherent in the data sets. Some traditional classifiers have been commonly used in the context of classification problems (three Neural Networks, C4.5, SVM, among others). However,...
Credit risk and corporate bankruptcy prediction has widely been studied as a binary classification problem using
both advanced statistical and machine learning models. Ensembles of classifiers have demonstrated their effectiveness
for various applications in finance using data sets that are often characterized by imperfections such
as irrelevant fe...
In single-pixel imaging, a series of illumination patterns are projected onto an object and the reflected or transmitted light from the object is integrated by a photodetector (the single-pixel detector). Then, from the set of received photodetector signals, the image of the object can ultimately be reconstructed. However, this reconstruction is no...
In general, gene expression microarrays consist of a vast number of genes and very few samples, which represents a critical challenge for disease prediction and diagnosis. This paper develops a two-stage algorithm that integrates feature selection and prediction by extending a type of hetero-associative neural networks. In the first level, the algo...
This paper introduces a feature extraction scheme for offline handwritten math symbol recognition. It is a hybrid model that involves the basic ideas of the wavelet and zoning techniques so as to define the feature vectors with both statistical and geometrical properties of the symbols, with the aim of overcoming some limitations of the individual...
Bankruptcy prediction has acquired great relevance for financial institutions due to the complexity of global economies and the growing number of corporate failures, especially since the world financial crisis of 2008. In this paper, the problem of corporate bankruptcy prediction is faced by means of four linear classifiers (Fisher’s linear discrim...
This 2-volume set constitutes the refereed proceedings of the 9th Iberian Conference on Pattern Recognition and Image Analysis, IbPRIA 2019, held in Madrid, Spain, in July 2019.
The 99 papers in these volumes were carefully reviewed and selected from 137 submissions. They are organized in topical sections named:
Part I: best ranked papers; machine...
This 2-volume set constitutes the refereed proceedings of the 9th Iberian Conference on Pattern Recognition and Image Analysis, IbPRIA 2019, held in Madrid, Spain, in July 2019.
The 99 papers in these volumes were carefully reviewed and selected from 137 submissions. They are organized in topical sections named:
Part I: best ranked papers; machine...
The renowned k-nearest neighbor decision rule is widely used for classification tasks, where the label of any new sample is estimated based on a similarity criterion defined by an appropriate distance function. It has also been used successfully for regression problems where the purpose is to predict a continuous numeric label. However, some altern...
In recent years, researchers have increased their interest in
deep learning for data mining and pattern recognition applications. This
is mainly due to its high processing capability and good performance
in feature selection, prediction and classification tasks. In general, deep
learning algorithms have demonstrated their great potential in handli...
In gene-expression microarray data sets each sample is defined by hundreds or thousands of measurements. High-dimensionality data spaces have been reported as a significant obstacle to apply machine learning algorithms, owing to the associated phenomenon called 'curse of dimensionality'. Therefore the analysis (and interpretation) of these data set...
This paper analyzes the effect of the high-dimensional, low-sample size problem in cancer classification using gene-expression microarrays. Here the two key questions addressed are: (i) What is the percentage of genes that can ensure highly accurate classification?, and (ii) Does this percentage differ from one classifier to another? Both these iss...
This book constitutes the refereed proceedings of the 8th Iberian Conference on Pattern Recognition and Image Analysis, IbPRIA 2017, held in Faro, Portugal, in June 2017. The 60 regular papers presented in this volume were carefully reviewed and selected from 86 submissions. They are organized in topical sections named: Pattern Recognition and Mach...
Abstract Associative memories have emerged as a powerful computational neural network model for several pattern classification problems. Like most traditional classifiers, these models assume that the classes share similar prior probabilities. However, in many real-life applications the ratios of prior probabilities between classes are extremely sk...
Many models have been explored for financial distress prediction, but no consistent conclusions have been drawn on which method shows the best behavior when different performance evaluation measures are employed. Accordingly, this paper proposes the integration of the ranking scores given by two popular multiple-criteria decision-making tools as an...
This paper presents an alternative technique for financial distress prediction systems. The method is based on a type of neural network, which is called hybrid associative memory with translation. While many different neural network architectures have successfully been used to predict credit risk and corporate failure, the power of associative memo...
This paper compares the behaviour of three linear classifiers modelled on both the feature space and the dissimilarity space when the class imbalance of data sets interweaves with small disjuncts and noise. To this end, experiments are carried out over three synthetic databases with different imbalance ratios, levels of noise and complexity of the...
In Spain, the new Medicine degrees of the European Higher Education Area (EHEA) have incorporated Information Technologies (IT) subjects to develop horizontal competences in their curriculum. Medicine Studies are costly for Higher Education (HE) institutions and, then, every cost-effective educational innovation is welcomed. There are different gen...
Often, it is necessary to construct training sets. If we have only a small number of tagged objects and a large group of unlabeled objects, we can build the training set simulating a data stream of unlabelled objects from which it is necessary to learn and to incorporate them to the training set later. In order to prevent deterioration of the train...
Over the last years, it has been observed an increasing interest of the finance and business communities in any application tool related to the prediction of credit and bankruptcy risk, probably due to the need of more robust decision-making systems capable of managing and analyzing complex data. As a result, plentiful techniques have been develope...
In pattern recognition, it is well known that the classifier performance depends on the classification rule and the complexities presented in the data sets (such as class overlapping, class imbalance, outliers, high-dimensional data sets among others). In this way, the issue of class imbalance is exhibited when one class is less represented with re...
The last years have seen the development of many credit scoring models for assessing the creditworthiness of loan applicants. Traditional credit scoring methodology has involved the use of statistical and mathematical programming techniques such as discriminant analysis, linear and logistic regression, linear and quadratic programming, or decision...
In real-life credit scoring applications, the case in which the class of defaulters is under-represented in comparison with the class of non-defaulters is a very common situation, but it has still received little attention. The present paper investigates the suitability and performance of several resampling techniques when applied in conjunction wi...
Two intrinsic data characteristics that arise in many domains are the class imbalance and the high dimensionality, which pose new challenges that should be addressed. When using gait for gender classification, benchmarking public databases and renowned gait representations lead to these two problems, but they have not been jointly studied in depth....
In the dissimilarity representation approach, the dimension reduction of the dissimilarity space is addressed by using instance selection methods. Several studies have shown that these methods work well on small data sets. Also, the uniformity of the instances distribution can be obtained when the classes are evenly spread and balanced. However, ma...
In practical applications to credit risk evaluation, most prediction models often make inaccurate decisions because of the lack of sufficient default data. The challenging issue of highly skewed class distribution between defaulter and nondefaulters is here faced by means of an algorithmic solution based on cost-sensitive learning. The present stud...
Imbalanced credit data sets refer to databases in which the class of defaulters is heavily under-represented in comparison to the class of non-defaulters. This is a very common situation in real-life credit scoring applications, but it has still received little attention. This paper investigates whether data resampling can be used to improve the pe...
A wide range of classification models have been explored for financial risk prediction, but conclusions on which technique behaves better may vary when different performance evaluation measures are employed. Accordingly, this paper proposes the use of multiple criteria decision making tools in order to give a ranking of algorithms. More specificall...
In the dissimilarity representation paradigm, several prototype selection methods have been used to cope with the topic of how to select a small representation set for generating a low-dimensional dissimilarity space. In addition, these methods have also been used to reduce the size of the dissimilarity matrix. However, these approaches assume a re...
This paper introduces a new approach for gait-based gender classification in which some key biomechanical poses of a gait pattern are represented by partial Gait Energy Images (GEIs). These pose-based GEIs can more accurately represent the shape of the body parts and some dynamic features with respect to the usually blurred depiction provided by a...
Many traditional approaches to pattern classification assume that the problem classes share similar prior probabilities. However, in many real-life applications, this assumption is grossly violated. Often, the ratios of prior probabilities between classes are extremely skewed. This situation is known as the class imbalance problem. One of the strat...
The class imbalance problem has been reported as an important challenge in various fields such as Pattern Recognition, Data Mining and Machine Learning. A less explored research area is related to how to evaluate classifiers on imbalanced data sets. This work analyzes the behaviour of performance measures widely used on imbalanced problems, as well...
Various machine learning techniques have been explored for credit scoring and management, but no consistent conclusions have been drawn on which method shows the best behaviour. This paper presents an experimental analysis involving five real-world databases with several credit scoring models, including logistic regression, neural networks, support...
Apart from human recognition, gait has lately become a promising biometric feature also useful for prediction of gender. One of the most popular methods to represent gait is the well-known Gait Energy Image (GEI), which conducts to a high-dimensional Euclidean space where many features are irrelevant. In this paper, the problem of selecting the mos...
This paper analyzes a generalization of a new metric to evaluate the classification performance in imbalanced domains, combining some estimate of the overall accuracy with a plain index about how dominant the class with the highest individual accuracy is. A theoretical analysis shows the merits of this metric when compared to other well-known measu...
A realistic appearance-based representation of side-view gait sequences is here introduced. It is based on a prior method where a set of appearance-based features of a gait sample is used for gender recognition. These features are computed from parameter values of ellipses that fit body parts enclosed by regions previously defined while ignoring we...
In many real world data applications, objects may have missing attributes. Conventional techniques used to classify this kind of data are represented in a feature space. However, usually they need imputation methods and/or changing the classifiers. In this paper, we propose two classification alternatives based on dissimilarities. These techniques...
The present paper studies the influence of two distinct factors on the performance of some resampling strategies for handling
imbalanced data sets. In particular, we focus on the nature of the classifier used, along with the ratio between minority
and majority classes. Experiments using eight different classifiers show tha