• Home
  • Luis Moreira-Matias
Luis Moreira-Matias

Luis Moreira-Matias
Kreditech

PhD

About

51
Publications
32,820
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,311
Citations
Introduction
My major research interests are Data Mining and (Automated) Machine Learning. Specifically, I am interested in solving complex knowledge discovery problems from real-world data. Regression, Concept Drift, AutoML and application areas such as Finance and Credit Risk Modelling, Mobility and Retail are topics that I am currently interested on.
Additional affiliations
August 2018 - present
Kreditech
Position
  • Head of Department
Description
  • Lead a Data Science team on both research and deploy in production State-of-Art Machine Learning pipelines for Credit Scoring, Pricing and Affordability.
December 2014 - July 2018
NEC Laboratories Europe
Position
  • Senior Researcher
Description
  • To develop novel Machine Learning frameworks in order to anticipate Control events on Transportation Networks (e.g. transportation demand, travel time, etc.).
July 2014 - November 2014
University of Porto
Position
  • Research Assistant
Description
  • To develop online recommendation models for predicting which is the most profitable taxi/stand to go in each moment;
Education
September 2003 - March 2009
University of Porto
Field of study
  • Informatics Engineering

Publications

Publications (51)
Article
Full-text available
Autonomous vehicles are soon to become ubiquitous in large urban areas, encompassing cities, suburbs and vast highway networks. In turn, this will bring new challenges to the existing traffic management expert systems. Concurrently, urban development is causing growth, thus changing the network structures. As such, a new generation of adaptive algo...
Conference Paper
Learning from data streams is a challenge faced by data science professionals from multiple industries. Most of them struggle hardly on applying traditional Machine Learning algorithms to solve these problems. It happens so due to their high availability on ready-to-use software libraries on big data technologies (e.g. SparkML). Nevertheless, most...
Conference Paper
Long-term travel time predictions are crucial for tactical and operational public transport planning in schedule design and resource allocation tasks. Similarly to any regression task, its success considerably depend on an adequate feature selection framework. In this paper, we approach the myopia of the State-of-the-Art method RReliefF on mining r...
Article
Recent advances in telecommunications created new opportunities for monitoring public transport operations in real-time. This paper presents an Automatic Control framework to mitigate the Bus Bunching phenomenon in real-time. The framework depicts a powerful combination of distinct Machine Learning principles and methods to extract valuable informa...
Conference Paper
Full-text available
Credit scoring models support loan approval decisions in the financial services industry. Lenders train these models on data from previously granted credit applications, where the borrowers’ repayment behavior has been observed. This approach creates sample bias. The scoring model is trained on accepted cases only. Applying the model to screen appl...
Preprint
Full-text available
Credit scoring models support loan approval decisions in the financial services industry. Lenders train these models on data from previously granted credit applications, where the borrowers' repayment behavior has been observed. This approach creates sample bias. The scoring model (i.e., classifier) is trained on accepted cases only. Applying the r...
Poster
Full-text available
The dynamic behavior of urban mobility patterns makes matching taxi supply with demand as one of the biggest challenges in this industry. Recently, the increasing availability of massive broadcast GPS data has encouraged the exploration of this issue under different perspectives. One possible solution is to build a data-driven real-time taxi-dispat...
Patent
A method for providing dispatching services for an on-demand transportation (ODT) service includes determining that a predictive assignment message should be transmitted to a vehicle, generating, in response to the determining that a predictive assignment should be transmitted to a vehicle, the predictive assignment message, and transmitting, to th...
Chapter
Methods for learning heterogeneous regression ensembles have not yet been proposed on a large scale. Hitherto, in classical ML literature, stacking, cascading and voting are mostly restricted to classification problems. Regression poses distinct learning challenges that may result in poor performance, even when using well established homogeneous en...
Article
Full-text available
Massive data broadcast by GPS-equipped vehicles provide unprecedented opportunities. One of the main tasks in order to optimize our transportation networks is to build data-driven real-time decision support systems. However, the dynamic environments where the networks operate disallow the traditional assumptions required to put in practice many off...
Article
The recent technological advances on telecommunications create a new reality on mobility sensing. Digital devices are now ubiquitous and able to broadcast rich information about human mobility in real-time. Such fact exponentially increased the availability of large-scale mobility data (i.e. Big Data) which has been popularized in the media as the...
Article
Full-text available
Floating car data (FCD) denotes the type of data (location, speed, and destination) produced and broadcasted periodically by running vehicles. Increasingly, intelligent transportation systems take advantage of such data for prediction purposes as input to road and transit control and to discover useful mobility patterns with applications to transpo...
Article
Full-text available
The development requirements of shared buses are extremely urgent to alleviate urban traffic congestions by improving road resource utilization and to provide a neotype transportation mode with good user experiences. The key to shared bus implementation lies in accurately predicting travel requirements and planning dynamic routes. However, the spar...
Article
Today, we live in an era where pervasive sensor networks both collect and broadcast rich digital footprints about the human mobility. However, most of this data often comes in an incomplete and/or inaccurate fashion. In this paper, we propose a knowledge discovery framework to handle such issues in the context of automatic incident detection system...
Article
Full-text available
Ensembles are popular methods for solving practical supervised learning problems. They reduce the risk of having underperforming models in production-grade software. Although critical, methods for learning heterogeneous regression ensembles have not been proposed at large scale, whereas in classical ML literature, stacking, cascading and voting are...
Article
Clustering consists of grouping together samples giving their similar properties. The problem of modeling simultaneously groups of samples and features is known as Co-Clustering. This paper introduces ROCCO - a Robust Continuous Co-Clustering algorithm. ROCCO is a scalable, hyperparameter-free, easy and ready to use algorithm to address Co-Clusteri...
Conference Paper
Full-text available
A traffic incident is defined by an event which provokes a disruption on the normal (free) flow condition of any highway. Such incidents must be caused by a recurrent excessive demand or, in alternative, by a series of possible stochastic occurrences which may suddenly reduce the road capacity (e.g. car accidents, extreme weather changes). This pap...
Research
Full-text available
With the prevalence of ubiquitous computing devices (smartphones, wearable devices, etc.) and social network services (Facebook, Twitter, etc.), humans are generating massive digital traces continuously in their daily life. Considering the invaluable crowd intelligence residing in these pervasive and social big data, a spectrum of opportunities are...
Conference Paper
This paper presents the proof-of-concept evaluation of a Resource Aware VNF Agnostic (RAVA) NFV orchestration method that is designed to enhance the Quality of Decision (QoD) of a cloud controller by optimizing the life cycle management decisions that it takes in order to manage the resources in a cloud infrastructure (e.g., a data center). The RAV...
Conference Paper
Full-text available
The efficiency of Public Transportation (PT) Networks is a major goal of any urban area authority. Advances on both location and communication devices drastically increased the availability of the data generated by their operations. Adequate Machine Learning methods can thus be applied to identify patterns useful to improve the Schedule Plan. In th...
Article
The rapid increase in automated data collection in the public transport industry facilitates the adjustment of operational planning and real-time operations based on the prevailing traffic and demand conditions. In contrast to automated passenger counts systems, automated vehicle location (AVL) data are often available for the entire public transpo...
Conference Paper
Full-text available
Traffic congestion is a major problem on today’s urban mobility. This paper introduces a novel model for Automatic Incident Prediction (AID) on freeways: Drift3Flow. This stepwise methodology produces flow/occupancy rate predictions using an online weighted ensemble schema of two well-known time series analysis techniques: Autoregressive Integrated...
Conference Paper
Full-text available
Road transport solutions depend on the quality of the measurements of the underlying traffic state. This paper introduces quality indicators that aim at identify the presence of traffic measurement anomalies. The proposed method seeks inconsistency in the traffic measures by statistically evaluating the variability of measures. The computation of t...
Article
Full-text available
Intelligent transportation systems based on automated data collection frameworks are widely used by the major transit companies around the globe. This paper describes the current state of the art on improving both planning and control on public road transportation companies using automatic vehicle location (AVL) data. By surveying this topic, the e...
Conference Paper
Full-text available
Nowadays, the major Public Transportation Companies around the world use intelligent transportation systems based on automated data collection frameworks. The existence of these data has driven to the development of new approaches to the operational planning of public transportation. These approaches, commonly known as ADC-based operational plannin...
Conference Paper
Full-text available
In this paper, we presented a probabilistic framework to predict Bus Bunching (BB) occurrences in real-time. It uses both historical and real-time data to approximate the headway distributions on the further stops of a given route by employing both offline and online supervised learning techniques. Such approximations are incrementally calculated b...
Conference Paper
Full-text available
Private car commuting is heavily dependent on the subsidisation that exists in the form of available free parking. However, the public funding policy of such free parking has been changing over the last years, with a substantial increase of meter-charged parking areas in many cities. To help to increase the sustainability of car transportation, a n...
Conference Paper
Full-text available
Taxi services play a central role in the mobility dynamics of major urban areas. Advanced communication devices such as GPS (Global Positioning System) and GSM (Global System for Mobile Communications) made it possible to monitor the drivers' activities in real-time. This paper presents an online learning approach to predict profitability in taxi s...
Conference Paper
Full-text available
Agent scheduling in call centers is a major management problem as the optimal ratio between service quality and costs is hardly achieved. In the literature, regression and time series analysis methods have been used to address this problem by predicting the future arrival counts. In this paper, we propose to discretize these target variables into f...
Conference Paper
Full-text available
Nowadays, transportation vehicles are equipped with intelligent sensors. Together, they form collaborative networks that broadcast real-time data about mobility patterns in urban areas. Online intelligent transportation systems for taxi dispatching, time-saving route finding or automatic vehicle location are already exploring such information in th...
Conference Paper
Full-text available
Informed driving is becoming a key feature to increase the sustainability of taxi companies. Some recent works are exploring the data broadcasted by each vehicle to provide live information for decision making. In this paper, we propose a method to employ a learning model based on historical GPS data in a real-time environment. Our goal is to predi...
Article
Full-text available
Informed driving is increasingly becoming a key feature for increasing the sustainability of taxi companies. The sensors that are installed in each vehicle are providing new opportunities for automatically discovering knowledge, which, in return, delivers information for real-time decision making. Intelligent transportation systems for taxi dispatc...
Article
Full-text available
The rising fuel costs is disallowing random cruising strategies for passenger finding. Hereby, a recommendation model to suggest the most passengerprofitable urban area/stand is presented. This framework is able to combine the 1) underlying historical patterns on passenger demand and the 2) current network status to decide which is the best zone to...
Conference Paper
Full-text available
Nowadays, Informed Driving is crucial to the transportation industry. We present an online recommendation model to help the driver to decide about the best stand to head in each moment, minimizing the waiting time. Our approach uses time series forecasting techniques to predict the spatiotemporal distribution in real-time. Then, we combine this inf...
Conference Paper
Full-text available
In recent years, both companies and researchers have been exploring intelligent data analysis to increase the profitability of the taxi industry. Intelligent systems for online taxi dispatching and time saving route finding have been built to do so. In this paper, we propose a novel methodology to produce online predictions regarding the spatial di...
Conference Paper
Full-text available
In the last decade, the real-time vehicle location systems attracted everyone attention for the new kind of rich spatio-temporal information. The fast processing of this large amount of information is a growing and explosive challenge. Taxi companies are already exploring such information in efficient taxi dispatching and time-saving route finding....
Conference Paper
Full-text available
Mining public transportation networks is a growing and explosive challenge due to the increasing number of information available. In highly populated urban zones, the vehicles can often fail the schedule. Such fails cause headway deviations (HD) between high-frequency bus pairs. In this paper, we propose to identify systematic HD which usually prov...
Conference Paper
Full-text available
In highly populated urban zones, it is common to notice headway deviations (HD) between pairs of buses. When these events occur in a bus stop, they often cause bus bunching (BB) in the following bus stops. Several proposals have been suggested to mitigate this problem. In this paper, we propose to find BBS (Bunching Black Spots) – sequences of bus...
Conference Paper
Full-text available
Text Categorization (TC) has attracted the attention of the research community in the last decade. Algorithms like Support Vector Machines, Naïve Bayes or k Nearest Neighbors have been used with good performance, confirmed by several comparative studies. Recently, several ensemble classifiers were also introduced in TC. However, many of those can o...
Conference Paper
Full-text available
Vehicular sensing is emerging as a powerful mean to collect information using the variety of sensors that equip modern vehicles. These sensors range from simple speedometers to complex video capturing systems capable of performing image recognition. The advent of connected vehicles makes such information accessible nearly in real-time and creates a...
Article
Full-text available
The increasing use of wind power as a source of electricity poses new challenges with regard to both power production and load balance in the electricity grid. This new source of energy is volatile and highly variable. The only way to integrate such power into the grid is to develop reliable and accurate wind power forecasting systems. Electricity...
Conference Paper
Full-text available
It is well known that the definition of bus schedules is critical for the service reliability of public transports. Several proposals have been suggested, using data from Automatic Vehicle Location (AVL) systems, in order to enhance the reliability of public transports. In this paper we study the optimum number of schedules and the days covered by...
Conference Paper
Full-text available
In the business world, there are several software tools to generate reports automatically. We can do it with software like Crystal Reports and we can use different options to configure the reports. However, we observed that it is not flexible in several points like the structure of the data base or the changing of report’s parameters after its gene...
Conference Paper
Full-text available
It is well known that the definition of bus schedules is critical for the service reliability of public transports. Several proposals have been suggested, using data from Automatic Vehicle Location (AVL) systems, in order to enhance the reliability of public transports. In this paper we study the optimum number of schedules and the days covered by...

Questions

Questions (3)
Question
The idea is to do an online pruning using a continuous timestamped dataset...and I wanted to train my model using some data then improve it during the day with some other information that I may receive (i.e. active learning, weather condition, etc.). It could be by prunning (i.e. removing some tree brunches) or by adding more branches to the current leaves. Are there any R packages that support such an implementation? Which would be the best way to do so? Thank you in advance.
Question
I want to do survival analysis over a dataset of repairs. However, I do not know any package that support this kind of tool in R. Can you help? Thank you.
Question
I am wondering to explore M5 algorithm to do some regression tasks...and I wanted to know which is the best implementation of M5 in R so far. And why. Thank you.

Projects

Projects (2)
Project
Recent advances in telecommunications created new opportunities for monitoring public transport operations in real-time. This project intends to develop an automatic control framework to mitigate some operational problems in real-time. The framework depicts a powerful combination of distinct Machine Learning principles and methods to extract valuable information from raw location-based data. State-of-the-art tools and methodologies such as Regression Analysis, Probabilistic Reasoning and Perceptron’s learning with Stochastic Gradient Descent constitute building blocks of this predictive methodology. The prediction’s output is then used to select and deploy corrective actions to automatically prevent problems. The proposed system could be embedded in a decision support system to improve control room operations.
Project
Build an automated features selection pipeline that will work combining filter and wrapper methods for most of the off-the-shelf regression algorithms.