Richard Weber

Richard Weber
University of Chile · Department of Industrial Enginerering

PhD

About

146
Publications
43,312
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
4,020
Citations
Introduction
Current research interests: data science, (dynamic) data mining, Applications to crime analytics, learning analytics, business analytics.
Additional affiliations
October 2003 - December 2003
The University of Tokyo
Position
  • Professor
October 2003 - December 2003
The University of Tokyo
Position
  • Professor
January 1999 - July 2020
University of Chile
Position
  • Professor (Full)
Education
July 1988 - July 1992
RWTH Aachen University
Field of study
  • Machine Learning

Publications

Publications (146)
Preprint
Full-text available
A growing number of universities worldwide use various forms of online and blended learning as part of their academic curricula. Furthermore, the recent changes caused by the COVID-19 pandemic have led to a drastic increase in importance and ubiquity of online education. Among the major advantages of e-learning is not only improving students' learn...
Article
A growing number of universities worldwide use various forms of online and blended learning as part of their academic curricula. Furthermore, the recent changes caused by the COVID-19 pandemic have led to a drastic increase in importance and ubiquity of online education. Among the major advantages of e-learning is not only improving students’ learn...
Article
Full-text available
Rough-Fuzzy Support Vector Clustering (RFSVC) is a novel soft computing derivative of the classical Support Vector Clustering (SVC) algorithm, which has been used already in many real-world applications. RFSVC’s strengths are its ability to handle arbitrary cluster shapes, identify the number of clusters, and e?ectively detect outliers by the means...
Article
Full-text available
Demand forecasting and capacity management are complicated tasks for emergency healthcare services due to the uncertainty, complex relationships, and high public exposure involved. Published research does not show integrated solutions to these tasks. Thus, the objective of this paper is to present results from three hospitals that show the feasibil...
Experiment Findings
Experiment Findings
Construct decision trees using linguistic variables.
Experiment Findings
Construct decision trees using linguistic variables.
Article
One of the most common methods used in the social network analysis of criminal groups is node importance evaluation, which focuses on the links between network members to identify likely crime suspects. Because such traditional node evaluators do not take full advantage of group members' individual criminal propensities, a new evaluator called the...
Article
Understanding criminal groups as social networks has led to the design of powerful systems for decision support in criminal investigative work. Tools using the methods of social network analysis have proven particularly effective in the identification of associations between individuals whose relationships are not otherwise evident. This identifica...
Conference Paper
Networks can be extracted from a wide range of real systems, such as online social networks, communication networks and biological systems. Detection of cohesive groups in these graphs, primarily based on link information, is the goal of community detection. Community structures emerge when a group of nodes is more likely to be linked to each other...
Article
Full-text available
Transportation Research (TR) was established in 1967 with the vision of promoting multi-disciplinary (economics, engineering, sociology, psychology) research on transport systems. The journal has continuously expanded its wings becoming a world-leading journal, now publishing research work through six parts, A to F, respectively addressing Policy a...
Article
In addition to scheduling trains, drivers, and security personnel, many subway companies worldwide are faced with the challenge of determining the staffing levels and shift schedules required to operate their fare-collection systems. Subway companies typically deal with a highly variable demand for fares, several operational requirements, and incre...
Article
Credit scoring is a crucial task within risk management for any company in the financial sector. On the one hand, it is in the self-interest of banks to avoid approving credits to customers who probably default. On the other hand, regulators require strict risk management systems from banks to protect their customers and, from “too big to fail inst...
Conference Paper
Car insurance is a highly competitive business line in the insurance industry. Companies face a very large number of claims year-on-year and must decide carefully when to doubt a given claim, or when to simply cover the accident without question. In this presentation, we will show the results of a decision support system that ranks claims by their...
Article
Data analysis has gained strategic importance for virtually any organization. It covers areas like business analytics, big data, business intelligence, and data mining, among others. The past decades have also witnessed increasing efforts to capture, analyze, and interpret dynamic data instead of just static snapshot data. This is due to the fact t...
Article
Information Sciences is a leading international journal in computer science launched in 1968, so becoming fifty years old in 2018. In order to celebrate its anniversary, this study presents a bibliometric overview of the leading publication and citation trends occurring in the journal. The aim of the work is to identify the most relevant authors, i...
Article
“Data is the new oil” is just one of the sayings that describe the importance of data for today´s society. We have witnessed a rapid development of methods to analyze such data; starting with Statistics in the early 18th century, followed by Artificial Intelligence and Machine Learning, and finally leading to Data Science incorporating classical me...
Article
Clustering is one of the main data mining tasks with many proven techniques and successful real-world applications. However, in changing environments, the existing systems need to be regularly updated in order to describe in the best possible way an observed phenomenon at each point in time. Since changes lead to uncertainty, the respective systems...
Article
Full-text available
In adversarial classification, the interaction between classifiers and adversaries can be modeled as a game between two players. It is natural to model this interaction as a dynamic game of incomplete information, since the classifier does not know the exact intentions of the different types of adversaries (senders). For these games, equilibrium st...
Article
Full-text available
Clustering is one of the most relevant data mining tasks. Its goal is to group similar objects in one cluster while dissimilar objects should belong to different clusters. Many extensions have been developed based on traditional cluster algorithms. Recently, approaches for dynamic as well as for granular clustering have been of particular interest....
Article
Support Vector Clustering (SVC) is an important density-based clustering algorithm which can be applied in many real world applications given its ability to handle arbitrary cluster silhouettes and detect the number of classes without any prior knowledge. However, if outliers are present in the data, the algorithm leaves them unclassified, assignin...
Book
This book constitutes the refereed proceedings of the International Joint Conference on Rough Sets, IJCRS 2016, held in Santiago de Chile, Chile, in October 2016. The 46 revised full papers presented together with 7 keynotes, tutorials and expert papers were carefully reviewed and selected from 108 submissions. The papers are grouped in topical sec...
Article
Recently, many security-related problems have gained increasing attention from a quantitative perspective. In this paper, we propose a game-theoretical approach to model the interaction between police forces and delinquents in public places. In the well-known Stackelberg game, a leader is faced with only one follower. However, in our application, t...
Article
Full-text available
This special issue on “Data Analysis and Intelligent Optimization with Applications” follows a previous special issue of this journal on the interplay of Machine Learning and Optimization, “Model Selection and Optimization in ML” (Machine Learning 85:1-2, October 2011). This time we shift our focus to applications of data analysis and optimization...
Article
Full-text available
Uno de los grandes desafíos de la Minería de Datos aplicada al Análisis de Negocios es la selección de atributos para un modelo de clasificación. La mayoría de las técnicas de selección de atributos se basan en criterios de validación estadística, perdiendo en muchos casos el objetivo del negocio en sí mismo, lo que no necesariamente lleva a modelo...
Conference Paper
Full-text available
We present Rough-Fuzzy Support Vector Domain Description (RFSVDD), a novel data description algorithm that provides a rough-fuzzy characterization of a data set and shows its potential for outlier detection. Its resulting data structure is characterized by two components: a crisp lower-approximation and a fuzzy boundary. While the lower-approximati...
Article
Churn prediction is an important application of classification models that identify those customers most likely to attrite based on their respective characteristics described by e.g. socio-demographic and behavioral variables. Since nowadays more and more of such features are captured and stored in the respective computational systems, an appropria...
Article
We present an unsupervised method that selects the most relevant features using an embedded strategy while maintaining the cluster structure found with the initial feature set. It is based on the idea of simultaneously minimizing the violation of the initial cluster structure and penalizing the use of features via scaling factors. As the base metho...
Article
Full-text available
En la �ultima d�écada el avance de los sistemas de gesti�on docente y sistematizaci�ón de datos en educaci�on superior han motivado el uso de herramientas de la miner��a de datos para entender procesos de aprendizaje y los contextos en los cuales estos ocurren. En el mundo anglosaj�on, comunidades en torno al learning analytics o el educational dat...
Article
Feature selection and classification of imbalanced data sets are two of the most interesting machine learning challenges, attracting a growing attention from both, industry and academia. Feature selection addresses the dimensionality reduction problem by determining a subset of available features to build a good model for classification or predicti...
Article
One of the main tasks of conjoint analysis is to identify consumer preferences about potential products or services. Accordingly, different estimation methods have been proposed to determine the corresponding relevant attributes. Most of these approaches rely on the post-processing of the estimated preferences to establish the importance of such va...
Article
This paper presents a new approach for consumer credit scoring, by tailoring a profit-based classification performance measure to credit risk modeling. This performance measure takes into account the expected profits and losses of credit granting and thereby better aligns the model developers’ objectives with those of the lending company. It is bas...
Article
Feature selection and classification of imbalanced data sets are two of the most interesting machine learning challenges, attracting a growing attention from both, industry and academia. Feature selection addresses the dimensionality reduction problem by determining a subset of available features to build a good model for classification or predicti...
Article
We present a methodology for improving credit scoring models by distinguishing two forms of rational behaviour of loan defaulters. It is common knowledge among practitioners that there are two types of defaulters, those who do not pay because of cash flow problems ("Can't Pay"), and those that do not pay because of lack of willingness to pay ("Won'...
Chapter
There are many definitions of Operations Research (OR) and what most of them have in common is that OR uses quantitative models to optimize making decisions. This requires understanding real-world decision-making processes, modeling skills, use (or at least understanding) of advanced IT systems, and interactions within multidisciplinary teams, amon...
Article
The performance of classification methods, such as Support Vector Machines, depends heavily on the proper choice of the feature set used to construct the classifier. Feature selection is an NP-hard problem that has been studied extensively in the literature. Most strategies propose the elimination of features independently of classifier constructio...
Article
Policing plays an important role in combating street crime. Though policing actions have a dissuasive impact on criminal behavior, they can also have unpredictable and even undesirable effects such as displacement of crime hot-spots. This paper presents an agent-based simulation model that generates artificial street-crime data which can be used to...
Article
Full-text available
Clustering is one of the most widely used approaches in data mining with real life applications in virtually any domain. The huge interest in clustering has led to a possibly three-digit number of algorithms with the k-means family probably the most widely used group of methods. Besides classic bivalent approaches, clustering algorithms belonging t...
Article
We present a methodology to grant and follow-up credits for micro-entrepreneurs. This segment of grantees is very relevant for many economies, especially in developing countries, but shows a behavior different to that of classical consumers where established credit scoring systems exist. Parts of our methodology follow a proven procedure we have ap...
Article
Dynamic data mining has gained increasing attention in the last decade. It addresses changing data structures which can be observed in many real-life applications, e.g. buying behavior of customers. As opposed to classical, i.e. static data mining where the challenge is to discover pattern inherent in given data sets, in dynamic data mining the cha...
Article
Dynamic data mining has gained increasing attention in the last decade. It addresses changing data structures which can be observed in many real-life applications, e.g. buying behavior of customers. As opposed to classical, i.e. static data mining where the challenge is to discover pattern inherent in given data sets, in dynamic data mining the cha...
Article
Full-text available
This exercise introduces students to the topics of modeling and mixed-integer programming and motivates them to investigate and understand the complexity of day-to-day problems. The activity consists of putting together a sheet of a newspaper using news and advertising items taken from a real newspaper. The objective is to arrive at a layout that h...
Article
Clustering methods are one of the most popular approaches to data mining. They have been successfully used in virtually any field covering domains such as economics, marketing, bioinformatics, engineering, and many others. The classic cluster algorithms require static data structures. However, there is an increasing need to address changing data pa...
Chapter
Full-text available
Economies are characterized by constant change. This change has several facets ranging from long term effects like economic cycles and short term financial distortion caused by rumors. It also includes socio-economic technological trends or seasonal alteration and many others. “The only constant is change”, the famous saying often credited to the G...
Article
Full-text available
During the last decades, the disciplines of Data Mining and Operations Research have been working mostly independent of each other. However, the increasing complexity of today's applications in areas such as business, medicine, and science requires more and more interaction between both disciplines. On the one hand, several data mining algorithms a...
Conference Paper
Full-text available
Constrained clustering addresses the problem of creating minimum variance clusters with the added complexity that there is a set of constraints that must be fulfilled by the elements in the cluster. Research in this area has focused on “must-link” and “cannot-link” constraints, in which pairs of elements must be in the same or in different clusters...
Conference Paper
Recently, databases have incremented their size in all areas of knowledge, considering both the number of instances and attributes. Current data sets may handle hundreds of thousands of variables with a high level of redundancy and/or irrelevancy. This amount of data may cause several problems to many data mining algorithms in terms of performance...
Article
Full-text available
Demand forecasting and capacity management are complicated tasks for certain healthcare services due to the inherent uncertainty, complex relationships, and typically high public exposure involved. Health service demand in three Chilean hospitals has been studied concluding that it can be forecast with high accuracy using Neural Networks and Suppor...
Article
Partitive algorithms, like cluster algorithms, are frequently used methods in data mining. Most of them are static in the sense that they detect pattern in stable data structures, i.e. the data structure remains unchanged over time. However, many real-life situations are characterized by changing data environments that require an adaptation of the...
Article
We introduce an embedded method that simultaneously selects relevant features during classifier construction by penalizing each feature’s use in the dual formulation of support vector machines (SVM). This approach called kernel-penalized SVM (KP-SVM) optimizes the shape of an anisotropic RBF Kernel eliminating features that have low relevance for t...
Article
Constrained clustering addresses the problem of creating minimum variance clusters with the added complexity that there is a set of constraints that must be fulfilled by the elements in the cluster. Research in this area has focused on “must-link” and “cannot-link” constraints, in which pairs of elements must be in the same or in different clusters...
Chapter
Operations research methods and applications have played a crucial role in the development of several Chilean industries as well as the government. This paper provides a brief overview of the history and current developments of the Chilean Institute of Operations Research, ICHIO. Keywords: operations research; Chile
Conference Paper
Full-text available
This paper presents a novel feature selection approach (KP-SVR) that determines a non-linear regression function with minimal error and simultaneously minimizes the number of features by penalizing their use in the dual formulation of SVR. The approach optimizes the width of an anisotropic RBF Kernel using an iterative algorithm based on the gradie...
Conference Paper
Full-text available
Uncertainty plays an important role in clustering. For example in customer segmentation we may be faced with the situation that a certain customer not necessarily belongs to just one segment, i.e. his/her class assignment is uncertain. Several cluster algorithms have been proposed that employ uncertainty modeling in different ways. The most frequen...
Data
Table S1. Polymerase chain reaction primers used in this study.
Data
Full-text available
Figure S1. (A) Gene structure of FBW2 showing the location of primers used in real time polymerase chain reaction and the abundance of FBW2 messenger RNA in sqn-1 and mutants doubly mutant for sqn-1 and different alleles of fbw2. All target genes were normalized to EIF4. (B) Alignments of the Arabidopsis F-box genes most closely related to At4g0898...
Conference Paper
To model market dynamics is a challenge that has attracted the interest of practitioners and researchers alike. This problem has been addressed from the perspective of Game Theory, in models that explicitly include profit-maximization schemes for the companies, and also from the point of view of Data Mining, with models that consider multivariate f...
Conference Paper
Full-text available
Data Mining is a widely used discipline with methods that are heavily supported by statistical theory. Game theory, instead, develops models with solid economical foundations but with low applicability in companies so far. This work attempts to unify both approaches, presenting a model of price competition in the credit industry. Based on game theo...
Conference Paper
Full-text available
Phishing email fraud has been considered as one of the main cyber-threats over the last years. Its development has been closely related to social engineering techniques, where different fraud strategies are used to deceit a natïve email user. In this work, a latent semantic analysis and text mining methodology is proposed for the characterisation o...
Article
Full-text available
Resumen Todas las instituciones financieras que ofrecen crédito a sus clientes deben abordar el problema de estimar cuánto del dinero otorgado retor-nará a la entidad y a qué clientes ofrecerles crédito. Sistemas de Credit Scoring se han desarrollado de manera exitosa para determinar la pro-babilidad que un cierto cliente falle en devolver el crédi...
Article
Full-text available
This paper addresses the problem of probability estimation in Multiclass classification tasks combining two well-known data mining techniques: Support Vector Machines and Neural Networks. We present an algorithm which uses both techniques in a two-step procedure. The first step employs Support Vector Machines within a One-vs-All reduction from mult...
Article
Traditional methodologies for time series prediction take the series to be predicted and split it into training, validation, and test sets. The first one serves to construct forecasting models, the second set for model selection, and the third one is used to evaluate the final model. Different time series approaches such as ARIMA and exponential sm...
Article
We introduce a novel wrapper Algorithm for Feature Selection, using Support Vector Machines with kernel functions. Our method is based on a sequential backward selection, using the number of errors in a validation subset as the measure to decide which feature to remove in each iteration. We compare our approach with other algorithms like a filter m...
Conference Paper
In adversarial systems, the performance of a classifier de- creases after it is deployed, as the adversary learns to defeat it. Recently, adversarial data mining was introduced, where the classification problem is viewed as a game mechanism between an adversary and an intelligent and adaptive clas- sifier. Over the last years, phishing fraud throug...
Article
Many real life applications are characterised by changing data structures. For example, the buying patterns of retail customers may change due to changing economical parameters (increasing oil prices motivate to buy smaller cars) or a technological break-through (replacement of analogue by digital cameras). In such dynamic environments the paramete...
Article
Full-text available
La gran mayoría de los proyectos de minería de datos que utilizan la metodología KDD en la vida real entregan solamente soluciones estáticas, que con el paso del tiempo pierden la capacidad de explicar los fenómenos para los que fueron construidos inicialmente. Presentamos un marco teórico-práctico que permite realizar un seguimiento cercano a los...
Conference Paper
Many projects in data mining face, besides others, the following two challenges. On the one hand concepts to deal with uncertainty - like probability, fuzzy set or rough set theory - play a major role in the description of real life problems. On the other hand many real life situations are characterized by constant change - the structure of the dat...
Conference Paper
This paper addresses the problem of probability estimation in Multiclass classification tasks combining two well known datamining techniques: Support Vector Machines and Neural Networks. We present an algorithm which uses both techniques in a two-step procedure. The first step employs Support Vector Machines within a One-vs-All reduction from multi...
Article
Support from the Chilean Fondecyt project 1040926 and the Millennium Science Institute ‘‘Complex Engineering Systems’’ (www.sistemasdeingenieria.cl) is greatly acknowledged.
Conference Paper
Over the last decade, workflow management has become a significant tool in the effort of organizations to improve the efficiency of their processes. However, the scope for its adoption has been constrained by the variability, uncertainty and impreciseness that is inherent in business process execution. A number of attempts have been made by the wor...
Article
Recently, clustering algorithms based on rough set theory have gained increasing attention. For example, Lingras et al. introduced a rough k-means that assigns objects to lower and upper approximations of clusters. The objects in the lower approximation surely belong to a cluster while the membership of the objects in an upper approximation is unce...
Chapter
Supply Chain Management relies heavily on forecasts, e.g. of future demand or future prices. Most applications, however, use static forecasting models in the sense that past data is used for model construction and evaluation without being updated adequately when new data becomes available.We propose a dynamic forecasting methodology and show its ef...
Conference Paper
Demand prediction plays a crucial role in advanced systems for supply chain management. Having a reliable estimation for a product’s future demand is the basis for the respective systems. Various forecasting techniques have been developed, each one with its particular advantages and disadvantages compared to other approaches. This motivated the dev...
Chapter
IntroductionReview of Literature Related to Dynamic ClusteringRecent Approaches for Dynamic Fuzzy ClusteringApplicationsFuture Perspectives and Conclusions AcknowledgementReferences
Article
Full-text available
Demand forecasts play a crucial role for supply chain management. The future demand for a certain product is the basis for the respective replenishment systems. Several forecasting techniques have been developed, each one with its particular advantages and disadvantages compared to other approaches. This motivates the development of hybrid systems...