About
57
Publications
19,853
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,612
Citations
Citations since 2017
Introduction
Additional affiliations
October 1997 - present
Publications
Publications (57)
Beef derived from grass-fed cattle is a specific quality criterion. The effect of grass silage intake on quality characteristics, i.e., fatty acids, fat-soluble vitamins, and lipid-derived volatile composition of intramuscular and perirenal fat from fattening bull weaners were studied. Visible (VIS) and near-infrared (NIR) spectra were also obtaine...
Recent studies have shown that phishers are using phishing kits to deploy phishing attacks faster, easier and more massive. Detecting phishing kits in deployed websites might help to detect phishing campaigns earlier. To the best of our knowledge, there are no datasets providing a set of phishing kits that are used in websites that were attacked by...
Spam emails have been traditionally seen as just annoying and unsolicited emails containing advertisements, but they increasingly include scams, malware or phishing. In order to ensure the security and integrity for the users, organisations and researchers aim to develop robust filters for spam email detection. Recently, most spam filters based on...
In this paper, we propose Ranksum, an approach for extractive text summarization of single documents based on the rank fusion of four multi-dimensional sentence features extracted for each sentence: topic information, semantic content, significant keywords, and position. The Ranksum obtains the sentence saliency rankings corresponding to each featu...
This study evaluates several feature ranking techniques together with some classifiers based on machine learning to identify relevant factors regarding the probability of contracting breast cancer and improve the performance of risk prediction models for breast cancer in a healthy population. The dataset with 919 cases and 946 controls comes from t...
Cybercriminals have increasingly used spam email to send scams, phishing, malware and other frauds to organisations and people. They design sophisticated and contextualised emails to make them look trustworthy for users, being the sender addresses an essential part. Although cybersecurity agencies and companies develop products and organise courses...
Producing or sharing Child Sexual Exploitation Material (CSEM) is a serious crime fought vigorously by Law Enforcement Agencies (LEAs). When an LEA seizes a computer from a potential producer or consumer of CSEM, they need to analyze the suspect's hard disk's files looking for pieces of evidence. However, a manual inspection of the file content loo...
Retrieving text embedded within images is a challenging task in real-world settings. Multiple problems such as low-resolution and the orientation of the text can hinder the extraction of information. These problems are common in environments such as Tor Darknet and Child Sexual Abuse images, where text extraction is crucial in the prevention of ill...
Face recognition is a valuable forensic tool for criminal investigators since it certainly helps in identifying individuals in scenarios of criminal activity like fugitives or child sexual abuse. It is, however, a very challenging task as it must be able to handle low-quality images of real world settings and fulfill real time requirements. Deep le...
Nowadays, children have access to Internet on a regular basis. Just like the real world, the Internet has many unsafe locations where kids may be exposed to inappropriate content in the form of obscene, aggressive, erotic or rude comments. In this work, we address the problem of detecting erotic/sexual content on text documents using Natural Langua...
Feature selection is a key step when dealing with high-dimensional data. In particular, these techniques simplify the process of knowledge discovery from the data by selecting the most relevant features out of the noisy, redundant and irrelevant features. A problem that arises in many of these practical applications is that the outcome of the featu...
The food industry requires automatic methods to establish authenticity of food products. In this work, we address the problem of the certification of suckling lamb meat with respect to the rearing system. We evaluate the performance of neural network classifiers as well as different dimensionality reduction techniques, with the aim of categorizing...
The quick development of communication through new technology media such as social networks and mobile phones has improved our lives. However, this also produces collateral problems such as the presence of insults and abusive comments. In this work, we address the problem of detecting violent content on text documents using Natural Language Process...
Background and objective:
Risk prediction models aim at identifying people at higher risk of developing a target disease. Feature selection is particularly important to improve the prediction model performance avoiding overfitting and to identify the leading cancer risk (and protective) factors. Assessing the stability of feature selection/ranking...
In this work we propose a new online, low cost and fast approach based on computer vision and machine learning to determine whether cutting tools used in edge profile milling processes are serviceable or disposable based on their wear level. We created a new dataset of 254 images of edge profile cutting heads which is, to the best of our knowledge,...
Risk prediction models for colorectal cancer play an important role to identify people at higher risk of developing this disease as well as the risk factors associated with it. Feature selection techniques help to improve the prediction model performance and to gain insight in the data itself. The assessment of the stability of feature selection/ra...
In this paper, a new system based on combinations of a shape descriptor and a contour descriptor has been proposed for classifying inserts in milling processes according to their wear level following a computer vision based approach. To describe the wear region shape we have proposed a new descriptor called ShapeFeat and its contour has been charac...
This study aimed to determine and compare the conformation of suckling lamb carcasses from three sheep breeds produced in Spain, two local and one non-native. A total of 52 and 55 carcasses from the local breeds Churra and Castellana, respectively, and 54 from the non-native breed Assaf (92 male and 69 female in total) were evaluated for their conf...
In this paper, we present a new approach to categorize the wear of cutting tools used in edge profile milling processes. It is based on machine learning and computer vision techniques, specifically using B-ORCHIZ, a novel shape-based descriptor computed from the wear region image. A new Insert dataset with 212 images of tool wear has been created t...
Class distribution estimation (quantification) plays an important role in many practical classification problems. Firstly, it is important in order to adapt the classifier to the operational conditions when they differ from those assumed in learning. Additionally, there are some real domains where the quantification task is itself valuable due to t...
The automated assessment of the sperm quality is an important challenge in the veterinary field. In this paper, we explore how to describe the acrosomes of boar spermatozoa using image analysis so that they can be automatically categorized as intact or damaged. Our proposal aims at characterizing the acrosomes by means of texture features. The text...
The field of dataset shift has received a growing amount of interest in the last few years. The fact that most real-world applications have to cope with some form of shift makes its study highly relevant. The literature on the topic is mostly scattered, and different authors use different names to refer to the same concepts, or use the same name fo...
Quantification-or proportion estimation-plays an important role in many practi{reversed not sign}cal classification problems. On the one hand, a machine that automatically classifies an element into a group of predefined classes will make suboptimal decisions if the class distribution in the test (real) domain differs from the one assumed in learni...
Feature selection and ranking techniques play an important role in the analysis of high-dimensional data. In particular, their
stability becomes crucial when the feature importance is later studied in order to better understand the underlying process.
The fact that a small change in the dataset may affect the outcome of the feature selection/rankin...
We consider the problem of classification in environments where training and test data may come from different probability distributions. When the fundamental stationary distribution assumption made in supervised learning (and often not satisfied in practice) does not hold, the classifier performance may significantly deteriorate. Several proposals...
Feature selection plays an important role in applications with high dimensional data. The assessment of the stability of feature selection/ranking algorithms becomes an important issue when the dataset is small and the aim is to gain insight into the underlying process by analyzing the most relevant features. In this work, we propose a graphical ap...
Advances in image analysis make possible the automatic semen analysis in the veterinary practice. The proportion of sperm
cells with damaged/intact acrosome, a major aspect in this assessment, depends strongly on several factors, including animal
diversity and manipulation/conservation conditions. For this reason, the class proportions have to be q...
This paper presents a method to perform a surface finish control using a computer vision system. The goal pursued was to design
an acceptance criterion for the control of surface roughness of steel parts, dividing them in those with low roughness - acceptable
class - and those with high roughness - defective class. We have used 143 images obtained...
Many types of nonlinear classifiers have been proposed to automatically generate land-cover maps from satellite images. Some are based on the estimation of posterior class probabilities, whereas others estimate the decision boundary directly. In this paper, we propose a modular design able to focus the learning process on the decision boundary by u...
This paper analyzes the application of a particular class of Bregman divergences to design cost-sensitive classifiers for
multiclass problems. We show that these divergence measures can be used to estimate posterior probabilities with maximal accuracy
for the probability values that are close to the decision boundaries. Asymptotically, the proposed...
Classifier performance evaluation, which typically yields a vast number of results, may be approached as a problem of analyzing
high dimensional data. Conducting an exploratory analysis of visual representations of this evaluation data enables us to
exploit the advantages of the powerful human visual capabilities. This allows us to gain insight int...
The fundamental assumption that training and operational data come from the same probability distribution, which is the basis
of most learning algorithms, is often not satisfied in practice. Several algorithms have been proposed to cope with classification
problems where the class priors may change after training, but they can show a poor performan...
Fourier transform mid-infrared (FT-IR) spectroscopy was evaluated as a tool to discriminate between carcasses of suckling lambs according to the rearing system. Fat samples (39 perirenal and 67 omental) were collected from carcasses of lambs from up to three sheep dairy farms, reared on either ewes milk (EM) or milk replacer (MR). Fatty acid compos...
Tool replacement operations have a great influence over the cost of machined parts. At present, the common criteria used to determine the tool life do not optimize the use of tools and lead to significant economic losses. The main objective of this work is to define a new procedure to improve the decision about the time for tool replacement. The ap...
Classifier performance evaluation typically gives rise to vast numbers of results that are difficult to interpret. On the one hand, a variety of different performance metrics can be applied; and on the other hand, evaluation must be conducted on multiple domains to get a clear view of the classifier's general behaviour. In this paper, we present a...
Insemination techniques in the veterinary field demand more objective methods to control the quality of sperm samples. In particular, different factors may damage a number of sperm cells that is difficult to predict in advance. This paper addresses the problem of quantifying the proportion of damaged/intact sperm cells in a given sample based on co...
Classifier performance evaluation typically gives rise to a multitude of results that are difficult to interpret. On the one
hand, a variety of different performance metrics can be applied, each adding a little bit more information about the classifiers
than the others; and on the other hand, evaluation must be conducted on multiple domains to get...
The purpose of this paper is to test the hypothesis that sim- ple classiflers are more robust to changing environments than complex ones. We propose a strategy for generating artiflcial, but realistic do- mains, which allows us to control the changing environment and test a variety of situations. Our results suggest that evaluating classiflers on s...
Bagging as well as other classifier ensembles have made possible a performance improvement in many pattern recognition problems
for the last decade. A careful analysis of previous work points out, however, that the most significant advance of bagged
neural networks is achieved for multiclass problems, whereas the binary classification problems seld...
A new method based on computer vision and a neural network classifier is proposed to estimate the wear of metal cutting inserts in order to identify the time for their replacement. Classification of wear level in two classes –low and too high wear– is possible following a supervised approach, so as tool replacement is carried out before the wear re...
This paper presents a method to perform a surface finish control using a computer vision system. The goal pursued was to design an acceptance criterion for the control strategy. Class 1 would contain those parts with low roughness— acceptable— and class 2 those with high roughness —defective. We have used 140 images obtained from AISI 303 stainless...
Unpredictable topology changes, energy constraints and link unreliability make the information transmission a challenging problem in wireless sensor networks (WSN). Taking some ideas from machine learning methods, we propose a novel geographic routing algorithm for WSN, named Q-probabilistic routing (Q-PR), that makes intelligent routing decisions...
The design of a minimum risk classifier based on data usually stems from the stationarity assumption that the conditions during training and test are the same: the misclassification costs assumed during training must be in agreement with real costs, and the same statistical process must have generated both training and test data. Unfortunately, in...
The design of structures and algorithms for non-MAP multiclass decision problems is discussed in this paper. We propose a parametric family of loss functions that provides accurate estimates for the posterior class probabilities near the decision regions. Moreover, we discuss learning algorithms based on the stochastic gradient minimization of thes...
The problem of designing a classifier when prior probabilities are not known or are not representative of the underlying data distribution is discussed in this paper. Traditional learning approaches based on the assumption that class priors are stationary lead to sub-optimal solutions if there is a mismatch between training and future (real) priors...
The goal of this work is to automatically determine the level of tool insert wear based on images acquired using a vision
system. Experimental wear was carried out by machining AISI SAE 1045 and 4140 steel bars in a precision CNC lathe and using
Sandvik inserts of tungsten carbide. A Pulnix PE2015 B/W with an optic composed by an industrial zoom 70...
In some real applications, such as medical diagnosis or remote sensing, available training data do not often reflect the true a priori probabilities of the underlying data distribution. The classifier designed from these data may be suboptimal. Building classifiers that are robust against changes in prior probabilities is possible by applying a min...
Decision theory shows that the optimal decision is a function of the posterior class probabilities. More specifically, in binary classification, the optimal decision is based on the comparison of the posterior probabilities with some threshold. Therefore, the most accurate estimates of the posterior probabilities are required near these decision th...
Many supervised learning algorithms are based on the assumption that the training data set reflects the underlying statistical
model of the real data. However, this stationarity assumption may be partially violated in practice: for instance, if the
cost of collecting data is class dependent, the class priors of the training data set may be differen...
Most supervised learning algorithms are based on the assumption that the training data set reflects the underlying statistical model of the real data. However, this stationarity assumption is not always satisfied in practice: quite frequently, class prior probabilities are not in accordance with the class proportions in the training data set. The m...
Classifying damaged-intact cells in a semen sample presents the peculiarity that the test class distribution is unknown. This
paper studies under which design conditions the misclassification rate is minimum for the uncertainty region of interest (ratio
of damaged cells lower than 20%) and (b) deals with quantifying the proportion of damaged/intact...