# Álvar Arnaiz-González's research while affiliated with Universidad de Burgos and other places

Technological advances together with machine learning techniques give health science disciplines tools that can improve the accuracy of evaluation and diagnosis. The objectives of this study were: (1) to design a web application based on cloud technology (eEarlyCare-T) for creating personalized therapeutic intervention programs for children aged 0–...
An inherent requirement of teaching using online learning platforms is that the teacher must analyze student activity and performance in relation to course learning objectives. Therefore, all e-learning environments implement a module to collect such information. Nevertheless, these raw data must be processed to perform e-learning analysis and to h...
The prediction of multiple numeric outputs at the same time is called multi-target regression (MTR), and it has gained attention during the last decades. This task is a challenging research topic in supervised learning because it poses additional difficulties to traditional single-target regression (STR), and many real-world problems involve the pr...
Background Approximately 40–70% of people with Parkinson’s disease (PD) fall each year, causing decreased activity levels and quality of life. Current fall-prevention strategies include the use of pharmacological and non-pharmacological therapies. To increase the accessibility of this vulnerable population, we developed a multidisciplinary telemedi...
Background: The use of telemedicine has increased to address the ongoing healthcare needs of patients with movement disorders. Objective: We aimed to describe the technical and basic security features of the most popular telemedicine videoconferencing software. Methods: We conducted a systematic review of articles/websites about “Telemedicine,” “Cy...
This paper studies the effects that combinations of balancing and feature selection techniques have on wide data (many more attributes than instances) when different classifiers are used. For this, an extensive study is done using 14 datasets, 3 balancing strategies, and 7 feature selection algorithms. The evaluation is carried out using 5 classifi...
One of the main goals of Big Data research, is to find new data mining methods that are able to process large amounts of data in acceptable times. In Big Data classification, as in traditional classification, class imbalance is a common problem that must be addressed, in the case of Big Data also looking for a solution that can be applied in an acc...
Datasets are growing in size and complexity at a pace never seen before, forming ever larger datasets known as Big Data. A common problem for classification, especially in Big Data, is that the numerous examples of the different classes might not be balanced. Some decades ago, imbalanced classification was therefore introduced, to correct the tende...
The Rotation Forest classifier is a successful ensemble method for a wide variety of data mining applications. However, the way in which Rotation Forest transforms the feature space through PCA, although powerful, penalizes training and prediction times, making it unfeasible for Big Data. In this paper, a MapReduce Rotation Forest and its implement...
BACKGROUND Technological advances together with artificial intelligence and data mining techniques give health science disciplines tools that can improve the accuracy of evaluation and diagnosis. OBJECTIVE The objective of this study is to present a pilot project carried out as part of the eEarlyCare project. It involves a computer application wit...
Phytoliths can be an important source of information related to environmental and climatic change, as well as to ancient plant use by humans, particularly within the disciplines of paleoecology and archaeology. Currently, phytolith identification and categorization is performed manually by researchers, a time-consuming task liable to misclassificat...
In classification problems, the purpose of feature selection is to identify a small, highly discriminative subset of the original feature set. In many applications, the dataset may have thousands of features and only a few dozens of samples (sometimes termed `wide'). This study is a cautionary tale demonstrating why feature selection in such cases...
Over the past few decades, the remarkable prediction capabilities of ensemble methods have been used within a wide range of applications. Maximization of base-model ensemble accuracy and diversity are the keys to the heightened performance of these methods. One way to achieve diversity for training the base models is to generate artificial/syntheti...
The detection of faulty machinery and its automated diagnosis is an industrial priority because efficient fault diagnosis implies efficient management of the maintenance times, reduction of energy consumption, reduction in overall costs and, most importantly, the availability of the machinery is ensured. Thus, this paper presents a new intelligent...
The application of Industry 4.0 to the field of Health Sciences facilitates precise diagnosis and therapy determination. In particular, its effectiveness has been proven in the development of personalized therapeutic intervention programs. The objectives of this study were (1) to develop a computer application that allows the recording of the obser...
Random Balance strategy (RandBal) has been recently proposed for constructing classifier ensembles for imbalanced, two-class data sets. In RandBal, each base classifier is trained with a sample of the data with a random class prevalence, independent of the a priori distribution. Hence, for each sample, one of the classes will be undersampled while...
A novel approach to prototype selection for multi-output regression data sets is presented. A multi-objective evolutionary algorithm is used to evaluate the selections using two criteria: training data set compression and prediction quality expressed in terms of root mean squared error. A multi-target regressor based on k-NN was used for that purpo...
The digital age in early care environments at ages 0-6 facilitates both, the evaluation process and the intervention process. For this reason, the design of a desktop application has been developed from the use of widely spread technologies such as JavaFX or .NET. These technologies allow data inset and processing, with only few machine requirement...
In this paper, the focus is on the application of prototype selection to multi-label data sets as a preliminary stage in the learning process. There are two general strategies when designing Machine Learning algorithms that are capable of dealing with multi-label problems: data transformation and method adaptation. These strategies have been succes...
Variscite is an aluminium phosphate mineral widely used as a gemstone in antiquity. Knowledge of the ancient trade in variscite has important implications on the historical appreciation of the commercial and migratory movements of human population. The mining complex of Gavà, which dates from the Neolithic, is one of the oldest underground mine sit...
The theoretical background to automata and formal languages represents a complex learning area for students. Computer tools for interacting with the algorithm and interfaces to visualize its different steps can assist the learning process and make it more attractive. In this paper, we present a web application for learning some of the most common a...
A natural way of handling imbalanced data is to attempt to equalise the class frequencies and train the classifier of choice on balanced data. For two-class imbalanced problems, the classification success is typically measured by the geometric mean (GM) of the true positive and true negative rates. Here we prove that GM can be improved upon by inst...
The multi-label classification problem is an extension of traditional (single-label) classification, in which the output is a vector of values rather than a single categorical value. The multi-label problem is therefore a very different and much more challenging one than the single-label problem. Recently, multi-label classification has attracted i...
Large numbers of data streams are today generated in many fields. A key challenge when learning from such streams is the problem of concept drift. Many methods, including many prototype methods, have been proposed in recent years to address this problem. This paper presents a refined taxonomy of instance selection and generation methods for the cla...
The use of biosolids for soil improvement and for the reduction of inorganic fertilization costs has been a common practice in recent decades and is being used more and more often as inorganic fertilization cost increases. This practice is useful because it can be effective for the recovery of low fertility soils and to recycle urban and industrial...
Instance selection is a popular preprocessing task in knowledge discovery and data mining. Its purpose is to reduce the size of data sets maintaining their predictive capabilities. The usual emerging problem at this point is that these methods quite often suffer of high computational complexity, which becomes highly inconvenient for processing huge...
Over recent decades, database sizes have grown considerably. Larger sizes present new challenges, because machine learning algorithms are not prepared to process such large volumes of information. Instance selection methods can alleviate this problem when the size of the data set is medium to large. However, even these methods face similar problems...
Machine Learning has two central processes of interest that captivate the scientific community: classification and regression. Although instance selection for classification has shown its usefulness and has been researched in depth, instance selection for regression has not followed the same path and there are few published algorithms on the subjec...
Ensembles are learning methods the operation of which relies on a combination of different base models. The diversity of ensembles is a fundamental aspect that conditions their operation. Random Feature Weights ($${\mathcal {RFW}}$$) was proposed as a classification-tree ensemble construction method in which diversity is introduced into each tree b...
An important step in building expert and intelligent systems is to obtain the knowledge that they will use. This knowledge can be obtained from experts or, nowadays more often, from machine learning processes applied to large volumes of data. However, for some of these learning processes, if the volume of data is large, the knowledge extraction pha...
Data pre-processing is a very important aspect of data mining. In this paper we discuss instance selection used for prediction algorithms, which is one of the pre-processing approaches. The purpose of instance selection is to improve the data quality by data size reduction and noise elimination. Until recently, instance selection has been applied m...
Industrial demand for models and simulation tools that can predict dimensional errors in manufacturing processes is vigorous. One example of these processes is ball-end finishing of inclined surfaces, which is a very complex task, due to the high number of variables that may influence dimensional errors during a cutting process and their different...
Binarization techniques deal with multiclass classification problem combining several binary classifiers. They were originally introduced for dealing with multiclass problems with methods that were only able to deal with two classes (e.g., SVM). Nevertheless, it has been shown that they can also be useful with classification methods able to deal di...

