
Olatz Arbelaitz- PhD
- Professor (Associate) at University of the Basque Country
Olatz Arbelaitz
- PhD
- Professor (Associate) at University of the Basque Country
About
96
Publications
15,056
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
2,403
Citations
Introduction
Current institution
Publications
Publications (96)
In the era where digital technologies are becoming increasingly prevalent, it is anticipated that a majority of the global population will have at least a basic understanding of informatics. However, empirical evidence suggests that a significant portion of the global population remains digitally illiterate. This phenomenon is particularly pronounc...
Parkinson’s disease (PD) is a neurodegenerative disorder marked by motor and cognitive impairments. The early prediction of cognitive deterioration in PD is crucial. This work aims to predict the change in the Montreal Cognitive Assessment (MoCA) at years 4 and 5 from baseline in the Parkinson’s Progression Markers Initiative database. The predicto...
The partial consolidated tree bagging (PCTBagging) was presented as a multiple classifier that, based on a parameter, the consolidation percentage, can exploit more the possibilities of the inner ensembles, and obtain higher levels of interpretability, or can exploit more the possibilities of the ensembles, and obtain higher discriminant capacity. T...
italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Objectives:
The aim of this research is to study how games and game development can help students learn introductory concepts of a very abstract topic such as Computer Architecture.
Methodology:
In a quasi-experimental scenario, quantitative and qual...
italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Contribution:
Adaptation and application of a methodology to introduce Informatics from an early age to students living in disadvantaged areas in Peru and analysis of its effects.
Background: On the on hand, during the COVID-19 pandemic, students livi...
Within the field of social reintegration and re-education, this paper presents an educational experience carried out at the Iquitos Penitentiary Center, Lima, Peru, with the aim of providing an introduction to informatics to 25 inmates who volunteered to take part in the project. Twenty students and a teacher from the Scientific University of the S...
Non-motor manifestations of Parkinson’s disease (PD) appear early and have a significant impact on the quality of life of patients, but few studies have evaluated their predictive potential with machine learning algorithms. We evaluated 9 algorithms for discriminating PD patients from controls using a wide collection of non-motor clinical PD featur...
[open access]
The use of decision trees considerably improves the discriminating capacity of ensemble classifiers. However, this process results in the classifiers no longer being interpretable, although comprehensibility is a desired trait of decision trees. Consolidation (consolidated tree construction algorithm, CTC) was introduced to improve th...
It is said that with great power comes great responsibility. Nowadays, we rely on machine learning systems to make decisions. Unfortunately these systems suffer from algorithmic biases; they often produce results that are systemically prejudiced due to erroneous assumptions in the machine learning process. Consequently these systems can contribute...
The current importance of digital competence makes it essential to enable people with disabilities to use digital devices and applications and to automatically adapt site interactions to their needs. Although most of the current adaptable solutions make use of predefined user profiles, automatic detection of user abilities and disabilities is the f...
Deep learning techniques are being increasingly used in the scientific community as a consequence of the high computational capacity of current systems and the increase in the amount of data available as a result of the digitalisation of society in general and the industrial world in particular. In addition, the immersion of the field of edge compu...
Contribution: A learning-by-teaching methodology through games can be used to promote informatics (computer science) in primary and secondary education. Applying the proposed activities can change students' perception of informatics from seeing it as merely using computers to seeing its relationship with mathematics. The experience can also help st...
Software medikoan, beste esparru askotan bezala, erabilgarritasuna erronka handia da, batetik, datuak ugariak eta konplexuak direlako eta bestetik, erabilera-testuingurua kritikoa delako. Jakina da klinikoek informazio kopuru egokia eskatzen dutela beren zereginak aurrera eramateko eta zentzu horretan, erabiltzaile-interfaze moldagarriak informazio...
The digital divide in Europe has not yet been bridged and thus more contributions towards understanding the factors affecting the different dimensions involved are required. This research offers some insights into the topic by analyzing the e-Government adoption or practical use of e-Government across Europe (26 EU countries). Based on the data pro...
Tracking urban mobility with current heterogeneous sensing capabilities has opened a wide research area on analytical and predictive data-driven models for improvements in transport operations and planning. These improvements are applicable for individual users, service providers and decision-makers. People, vehicles and goods move along the city a...
Computer applications, and especially the Internet, provide many people with disabilities with unique opportunities for interpersonal communication, social interaction, and active participation (including access to labor and entertainment). Nevertheless, rigid user interfaces often present accessibility barriers to people with physical, sensory, or...
Computer applications provide people with disabilities with unique opportunities for interpersonal communication, social interaction and active participation. However, rigid user interfaces often present accessibility barriers to people with physical, sensory or cognitive impairments. User interface personalization is crucial to overcome these barr...
Objective:
To characterise the use of an electronic medication safety dashboard by exploring and contrasting interactions from primary users (i.e. pharmacists) who were leading the intervention and secondary users (i.e. non-pharmacist staff) who used the dashboard to engage in safe prescribing practices.
Materials and methods:
We conducted a 10-...
Nowadays the importance of digital competences is unarguable, and specially for people with functional diversity. On the other hand, the website should adapt to the user necessities automatically. This work focuses on the latter, in detecting navigation problems automatically. Firstly, the device used by the user will be detected by proposing two l...
The monitoring of small houses and rooms has become possible due to the advances in IoT sensors, actuators and low power communication protocols in the last few years. As buildings are one of the biggest energy consuming entities, monitoring them has great interest for trying to avoid non-necessary energy waste. Moreover, human behaviour has been r...
This work analyses the navigation in the enrolment web information area of the University of the Basque Country. A complete data mining process shows that successful and failure navigation behaviors can be modeled using machine learning techniques. Unsupervised learning algorithms have been applied on two different domains: URLs visited by the user...
The PART rule-induction algorithm creates rulesets by iteratively creating partial decision trees and extracting a rule from each tree. A recent study showed that growing trees further and combining it with pruning created classifiers with better discriminating capacity and less structural complexity. In this work we propose an algorithm that works...
The development of IoT devices has allowed to install large amounts of sensors in different environments. Consequently, monitoring small houses and entire buildings has become possible. In addition, buildings are one of the biggest energy consumers, so the monitoring of the energy waste, and its sources, is gaining attention. Human behaviour has be...
(its size and complexity) and its context of use. This results in user interfaces with a high-density of data that do not support optimal decision-making by clinicians. Anecdotal evidence indicates that clinicians demand the right amount of information to carry out their tasks. This suggests that adaptive user interfaces could be employed in order...
Novel ubiquitous traffic sensors such as floating car data (FCD) are getting extended due to the use of 24 h connected smartphones and global positioning systems. Road conditions such as travel speeds in each road link and mobility demand can be monitored by measurements coming from moving vehicles consisting of geolocation and speed information wi...
In this work we have analyzed the enrollment eService navigation of the UPV/EHU and using data
mining techniques we have attempted to automatically perform navigation sessions classification. The
results show that we are able to detect the defined success and failure navigation behaviours. For
example, more than 90 % of the sessions of the clusters...
Nowadays the importance of digital competences makes important to capacitate people with disabilities in
the use of digital devices and applications, and moreover, to adapt the site interaction to their necessities
automatically. This work focuses on the latter, identifying important user characteristics and the device
being used to interact with t...
In the unsupervised learning environment, the correct partition of data is not available, making it difficult to evaluate the performance of clustering algorithms. Therefore, one of the biggest challenges in this area is the validation of the results obtained by the algorithms. Amongst the various proposals currently under discussion, one of the mo...
Digital competences are considered basic nowadays what makes important to familiarize and capacitate people with disabilities in the use of digital devices and applications and to adapt the site interaction to their necessities. Most of the current adaptable systems are linked to predefined user profiles. However, the automatic detection of user ch...
In supervised classification, decision tree and rule induction algorithms possess the desired ability to build understandable models. The PART algorithm creates partially developed C4.5 decision trees and extracts a rule from each tree. Some of the criteria used by this algorithm can be modified to yield better results. In this work, we propose and...
Dokumentu honetan biltzen den informazioa Espainiako Gobernuko Ekonomia eta Lehiakortasun Ministeritzak
eta Europako Erregioen Garapenerako Funtsak diruz lagundutako eGovernAbility (A framework for building
user tailored accessible services for eAdministration) ikerkuntza proiektuaren (TIN2014-52665-C2-1-R) ataza
gisa sortutakoa da.
Proiektuaren he...
In this work, we improved the link prediction part of a web mining system developed for a tourism website (Bidasoa Turismo, BTw). First, we replaced the PAM clustering algorithm used in the profiling part of the system with the adaptation of other system, SEP/COP, which is based on a hierarchical clustering algorithm and is able to automatically ad...
In the original article we published the results of the CTC algorithm using a coverage-based strategy and compared its results against a set of genetics based and classical algorithms for rule induction. Shortly before submission we changed the storage system for the results of our experiments and a recent review of the process has discovered a mis...
The dramatic increase in the amount of information stored on the web makes it more important to familiarize people with disabilities and elderly people with digital devices and applications and to adapt websites to enable their use by these users. Discapnet is a website mainly aimed at visually disabled people, and navigation is a challenging task...
Online communities are becoming an important tool in the communication and participation processes in our society. However, the most widespread applications are difficult to use for people with disabilities, or may involve some risks if no previous training has been undertaken. This work describes a novel social network for cognitively disabled peo...
The class imbalance problem has attracted a lot of attention from the data mining community recently, becoming a current trend in machine learning research. The Consolidated Tree Construction (CTC) algorithm was proposed as an algorithm to solve a classification problem involving a high degree of class imbalance without losing the explaining capaci...
Web personalization becomes essential in industries and specially for the case of users with special needs such as visually impaired people. Adaptation may very much speed up the navigation of visually impaired people and contribute to diminish the existing technological gap. This work is the first stage of a web mining process carried out in disca...
This paper presents an analysis of introducing active methodologies in the Computer Architecture course taught in the second year of the Computer Engineering Bachelor's degree program at the University of the Basque Country (UPV/EHU), Spain. The paper reports the experience from three academic years, 2011–2012, 2012–2013, and 2013–2014, in which th...
The J48Consolidated WEKA's class implements the CTC algorithm which builds a unique decision tree based on a set of samples. This package has recently become an “official” Weka package and is accessible in the central package repository of Weka. It can be installed through the package management system of the development 3.7.11 version of Weka.
htt...
http://www.sc.ehu.es/acwaldap/weka-ctc/
The tourism industry has experienced a shift from offline to online travellers and this has made the use of
intelligent systems in the tourism sector crucial. These information systems should provide tourism con-
sumers and service providers with the most relevant information, more decision support, greater mobil-
ity and the most enjoyable travel...
User-adapted interaction based on user modelling is a well known methodology that can be applied to enhance accessibility. To this respect, the structure and content of the user model is a key issue to ensure adequate adaptation. This paper describes the use of Web mining techniques to create user models that allow adaption of the Web interaction p...
Many efforts have been done recently proposing new intelligent resampling methods as a way to solve class imbalance problems; one of the main challenges of the machine learning community nowadays. Usually the purpose of these methods is to balance the classes. However, there are works in the literature showing that those methods can also be suitabl...
Personalised electronic tourist guides (PETs) are mobile hand-held devices able to create tourist routes matching tourists' preferences. Transportation information has been identified as one of the most appreciated functionalities of a PET. We model the tourist planning problem, integrating public transportation, as the time-dependent team orientee...
http://www.sc.ehu.es/acwaldap/weka-ctc
The validation of the results obtained by clustering algorithms is a fundamental part of the clustering process. The most used approaches for cluster validation are based on internal cluster validity indices. Although many indices have been proposed, there is no recent extensive comparative study of their performance. In this paper we show the resu...
Class imbalance problems have lately become an important area of study in machine learning and are often solved using intelligent resampling methods to balance the class distribution. The aim of this work is to show that balancing the class distribution is not always the best solution when intelligent resampling methods are used, i.e. there is ofte...
Websites are important tools for tourism destinations. The adaptation of the websites to the users' preferences and requirements will turn the websites into more effective tools. Using machine learning techniques to build user profiles allows us to take into account their real preferences. This paper presents the first approach of a system that, ba...
There is a need to facilitate access to the required information in the web and adapting it to the users' preferences and requirements. This paper presents a system that, based on a collaborative filtering approach, adapts the web site to improve the browsing experience of the user: it generates automatically interesting links for new users. The sy...
Personalised Electronic Tourist Guides (PETs) provide an integrated solution for route generation based on the profiles and constraints of tourists, up-to-date Points Of Interest (POIs) and destination information. In this paper we present the result of an evaluation of a PET prototype that applies an advanced algorithm to generate personalised tou...
During the last decades, the information on the web has increased drastically but larger quantities of data do not provide added value for web visitors; there is a need of easier access to the required information and adaptation to their preferences or needs. The use of machine learning techniques to build user models allows to take into account th...
The evaluation and comparison of internal cluster validity indices is a critical problem in the clustering area. The methodology used in most of the evaluations assumes that the clustering algorithms work correctly. We propose an alternative methodology that does not make this often false assumption. We compared 7 internal cluster validity indices...
In real world problems solved using data mining techniques, it is very usual to find data in which the number of examples of one of the classes is much smaller than the number of examples of the rest of the classes. Many works have been done to deal with these problems known as class imbalance problems. Most of them focus their effort on data resam...
Hierarchical clustering algorithms provide a set of nested partitions called a cluster hierarchy. Since the hierarchy is usually too complex it is reduced to a single partition by using cluster validity indices. We show that the classical method is often not useful and we propose SEP, a new method that efficiently searches in an extended partition...
In some real-world problems solved by machine learning it is compulsory for the solution provided to be comprehensible so
that the correct decision can be made. It is in this context that this paper compares bagging (one of the most widely used
multiple classifier systems) with the consolidated trees construction (CTC) algorithm, when the learning...
When tourists are at a destination, they typically search for information in the Local Tourist Organizations. There, the staff
determines the profile of the tourists and their restrictions. Combining this information with their up-to-date knowledge
about the local attractions and public transportation, they suggest a personalized route for the tour...
The Time Dependent Orienteering Problem with Time Windows (TDOPTW) consists of a set of locations with associated time windows
and scores. Visiting a location allows to collect its score as a reward. Traveling time between locations varies depending
on the leave time. The objective is to obtain a route that maximizes the obtained score within a lim...
When using machine learning to solve real world problems, the class distribution used in the training set is important; not
only in highly unbalanced data sets but in every data set. Weiss and Provost suggested that each domain has an optimal class
distribution to be used for training. The aim of this work was to analyze the truthfulness of this hy...
SAHN is a widely used agglomerative hierarchical clustering method. Nevertheless it is not an incremental algorithm and therefore it is not suitable for many real application areas where all data is not available at the beginning of the process. Some authors proposed incremental variants of SAHN. Their goal was to obtain the same results in increme...
When tourists are at a destination, they typically search for information Organizations. There, the staff categorizes tourists’
profile and restrictions. Combining this information with their up-to-date knowledge about the local attractions, weather
and public transportation, they suggest a personalised route for the tourist agenda. This paper pres...
Generating personalized tourist routes based on a tourist's interests and constraints is an upcom-ing trend in tourist applications. This problem can be directly related to the Multi Constrained Team Orienteering Problem with Time Windows (MCTOPTW). The MCTOPTW consists of a set of locations, each of them with a certain score, a time window and one...
The popularity of computer networks broadens the scope for network attackers and increases the damage thes e attacks can cause. In this context, Intrusion Detec tion Systems (IDS) are included as part of any complete security package. This work focuses on nIDSs which work by scanning the network traffic. A service- independent payload processing ap...
Malware detection is an important problem today. New malware appears every day and in order to be able to detect it, it is important to recognize families of existing malware. Data mining techniques will be very helpful in this context; concretely unsupervised learning methods will be adequate. This work presents a comparison of the behaviour of tw...
Being aware of the importance of classifiers to be comprehensible when using machine learning to solve real world problems,
bagging needs a way to be explained. This work compares Consolidated Tree’s Construction (CTC) algorithm with the Combined
Multiple Models (CMM) method proposed by Domingos when used to extract explanation of the classificatio...
This work describes the Consolidated Tree Construction (CTC) algorithm: a single tree is built based on a set of subsamples. This way the explaining capacity of the classifier is not lost even if many subsamples are used. We show how CTC algorithm can use undersampling to change class distribution without loss of information, building more accurate...
When different subsamples of the same data set are used to induce classification trees, the structure of the built classifiers
is very different. The stability of the structure of the tree is of capital importance in many domains, such as illness diagnosis,
fraud detection in different fields, customer’s behaviour analysis (marketing), etc, where c...
This paper makes an extensive description of a real fuel distribution problem. It includes the description of the elements, process, constraints and difficulties of the real problem, as well as the description of the efficient system that has been designed to solve it. This system combines a global strategy working with the whole set of orders and...
This paper presents an analysis of the behaviour of Consolidated Trees, CT (classification trees induced from multiple subsamples
but without loss of explaining capacity). We analyse how CT trees behave when used to solve a fraud detection problem in a
car insurance company. This domain has two important characteristics: the explanation given to th...
In real world problems solved with machine learning techniques, achieving small error rates is important, but there are situations
where an explanation is compulsory. In these situations the stability of the given explanation is crucial. We have presented
a methodology for building classification trees, Consolidated Trees Construction Algorithm (CT...
In the paper a parallelizable system based on Simulated Annealing to solve VRPTW problems is described. The system consists of two optimization phases: a global one, a the local one, both based on Simulated Annealing and paralllizable. For the first phase different parallelization strategies are presented and evaluated. The importance of the co-ope...
This paper presents the design and analysis of several systems to solve Vehicle Routing Problems with Time Windows (VRPTW) limiting the search to a small number of solutions explored. All of them combine a metaheuristic technique with a route building heuristic. Simulated Annealing, different evolutionary approaches and hybrid methods have been tri...
This paper presents a new methodology for building decision trees, Consolidated Trees Construction algorithm, that improves the behavior of C4.5. It reduces the error and the complexity of the induced trees, being the differences in the complexity statistically significant. The advantage of this methodology in respect to other techniques such as ba...
The aim of this PhD thesis has been to design a fast and robust system to solve Vehicle Routing Problems with Time Windows. As a practical result, a real application has been built for the oil delivery company Vda. de Londaiz y sobrinos de Mercadaiz.
The starting point has been a bibliographical review in order to analyse the methods used by other...
In the paper a parallelizable system based on simulated annealing to solve vehicle routing problem with time window (VRPTW) problems is described. The system consists of two optimization phases: a global one, and local one, both based on simulated annealing and parallizable. For the first phase different parallelization strategies are presented and...
This position paper tackles the problem of automatic web personalization using machine learning techniques to model the users' behavior. The target population is people with physical, sensory or cognitive restrictions. In this paper we present our plans to study the possibility of creating user models using the information extracted from web naviga...
The popularity of computer networks broadens the scope for network attackers and increases the damage these attacks can cause. In this context, any complete security package includes a network Intru-sion Detection System (nIDS). This work focuses on nIDSs which work by scanning the network traffic. We present a service-independent pay-load processi...
This paper presents the design and analysis of a system to solve Vehicle Routing Problems wit Time Window (VRPTW), emphasising in the fastness and effectiveness of the algorithm. The solution consists of two phases of optimisation: the global one and the local one. Both are based on Simulated Annealing. The method has been implemented and tested on...
In many pattern recognition problems, the explanation of the made classification becomes as important as the good performance of the classifier related to its discriminating capacity. For this kind of problems we can use Consolidated TreesConstruction (CTC) algorithm which uses several subsamples to build a single tree. This paper presents a wide a...
Due to the high complexity of the required calculations, Intelligent Routing Systems have to apply latest Operations Research techniques to be able to create routes efficiently. This paper proposes a solution to the Multi Path Orienteering Problem with Time Windows (MPOPTW), which includes multiple paths to move between locations. The main characte...