# Álvar Arnaiz-González's research while affiliated with Universidad de Burgos and other places

## Publications (36)

Article
Full-text available
Technological advances together with machine learning techniques give health science disciplines tools that can improve the accuracy of evaluation and diagnosis. The objectives of this study were: (1) to design a web application based on cloud technology (eEarlyCare-T) for creating personalized therapeutic intervention programs for children aged 0–...
Article
Full-text available
An inherent requirement of teaching using online learning platforms is that the teacher must analyze student activity and performance in relation to course learning objectives. Therefore, all e-learning environments implement a module to collect such information. Nevertheless, these raw data must be processed to perform e-learning analysis and to h...
Article
Full-text available
The prediction of multiple numeric outputs at the same time is called multi-target regression (MTR), and it has gained attention during the last decades. This task is a challenging research topic in supervised learning because it poses additional difficulties to traditional single-target regression (STR), and many real-world problems involve the pr...
Article
Full-text available
Background Approximately 40–70% of people with Parkinson’s disease (PD) fall each year, causing decreased activity levels and quality of life. Current fall-prevention strategies include the use of pharmacological and non-pharmacological therapies. To increase the accessibility of this vulnerable population, we developed a multidisciplinary telemedi...
Article
Full-text available
Background: The use of telemedicine has increased to address the ongoing healthcare needs of patients with movement disorders. Objective: We aimed to describe the technical and basic security features of the most popular telemedicine videoconferencing software. Methods: We conducted a systematic review of articles/websites about “Telemedicine,” “Cy...
Article
This paper studies the effects that combinations of balancing and feature selection techniques have on wide data (many more attributes than instances) when different classifiers are used. For this, an extensive study is done using 14 datasets, 3 balancing strategies, and 7 feature selection algorithms. The evaluation is carried out using 5 classifi...
Article
One of the main goals of Big Data research, is to find new data mining methods that are able to process large amounts of data in acceptable times. In Big Data classification, as in traditional classification, class imbalance is a common problem that must be addressed, in the case of Big Data also looking for a solution that can be applied in an acc...
Article
Datasets are growing in size and complexity at a pace never seen before, forming ever larger datasets known as Big Data. A common problem for classification, especially in Big Data, is that the numerous examples of the different classes might not be balanced. Some decades ago, imbalanced classification was therefore introduced, to correct the tende...
Article
The Rotation Forest classifier is a successful ensemble method for a wide variety of data mining applications. However, the way in which Rotation Forest transforms the feature space through PCA, although powerful, penalizes training and prediction times, making it unfeasible for Big Data. In this paper, a MapReduce Rotation Forest and its implement...
Preprint
BACKGROUND Technological advances together with artificial intelligence and data mining techniques give health science disciplines tools that can improve the accuracy of evaluation and diagnosis. OBJECTIVE The objective of this study is to present a pilot project carried out as part of the eEarlyCare project. It involves a computer application wit...
Article
Full-text available
Phytoliths can be an important source of information related to environmental and climatic change, as well as to ancient plant use by humans, particularly within the disciplines of paleoecology and archaeology. Currently, phytolith identification and categorization is performed manually by researchers, a time-consuming task liable to misclassificat...
Preprint
Full-text available
In classification problems, the purpose of feature selection is to identify a small, highly discriminative subset of the original feature set. In many applications, the dataset may have thousands of features and only a few dozens of samples (sometimes termed `wide'). This study is a cautionary tale demonstrating why feature selection in such cases...
Article
Over the past few decades, the remarkable prediction capabilities of ensemble methods have been used within a wide range of applications. Maximization of base-model ensemble accuracy and diversity are the keys to the heightened performance of these methods. One way to achieve diversity for training the base models is to generate artificial/syntheti...
Article
The detection of faulty machinery and its automated diagnosis is an industrial priority because efficient fault diagnosis implies efficient management of the maintenance times, reduction of energy consumption, reduction in overall costs and, most importantly, the availability of the machinery is ensured. Thus, this paper presents a new intelligent...
Article
Full-text available
The application of Industry 4.0 to the field of Health Sciences facilitates precise diagnosis and therapy determination. In particular, its effectiveness has been proven in the development of personalized therapeutic intervention programs. The objectives of this study were (1) to develop a computer application that allows the recording of the obser...
Article
Random Balance strategy (RandBal) has been recently proposed for constructing classifier ensembles for imbalanced, two-class data sets. In RandBal, each base classifier is trained with a sample of the data with a random class prevalence, independent of the a priori distribution. Hence, for each sample, one of the classes will be undersampled while...
Article
Full-text available
A novel approach to prototype selection for multi-output regression data sets is presented. A multi-objective evolutionary algorithm is used to evaluate the selections using two criteria: training data set compression and prediction quality expressed in terms of root mean squared error. A multi-target regressor based on k-NN was used for that purpo...
Conference Paper
Full-text available
The digital age in early care environments at ages 0-6 facilitates both, the evaluation process and the intervention process. For this reason, the design of a desktop application has been developed from the use of widely spread technologies such as JavaFX or .NET. These technologies allow data inset and processing, with only few machine requirement...
Article
Full-text available
In this paper, the focus is on the application of prototype selection to multi-label data sets as a preliminary stage in the learning process. There are two general strategies when designing Machine Learning algorithms that are capable of dealing with multi-label problems: data transformation and method adaptation. These strategies have been succes...
Article
Full-text available
Variscite is an aluminium phosphate mineral widely used as a gemstone in antiquity. Knowledge of the ancient trade in variscite has important implications on the historical appreciation of the commercial and migratory movements of human population. The mining complex of Gavà, which dates from the Neolithic, is one of the oldest underground mine sit...
Article
The theoretical background to automata and formal languages represents a complex learning area for students. Computer tools for interacting with the algorithm and interfaces to visualize its different steps can assist the learning process and make it more attractive. In this paper, we present a web application for learning some of the most common a...
Article
Full-text available
A natural way of handling imbalanced data is to attempt to equalise the class frequencies and train the classifier of choice on balanced data. For two-class imbalanced problems, the classification success is typically measured by the geometric mean (GM) of the true positive and true negative rates. Here we prove that GM can be improved upon by inst...
Article
Full-text available
The multi-label classification problem is an extension of traditional (single-label) classification, in which the output is a vector of values rather than a single categorical value. The multi-label problem is therefore a very different and much more challenging one than the single-label problem. Recently, multi-label classification has attracted i...
Article
Full-text available
Large numbers of data streams are today generated in many fields. A key challenge when learning from such streams is the problem of concept drift. Many methods, including many prototype methods, have been proposed in recent years to address this problem. This paper presents a refined taxonomy of instance selection and generation methods for the cla...
Article
The use of biosolids for soil improvement and for the reduction of inorganic fertilization costs has been a common practice in recent decades and is being used more and more often as inorganic fertilization cost increases. This practice is useful because it can be effective for the recovery of low fertility soils and to recycle urban and industrial...
Article
Full-text available
Instance selection is a popular preprocessing task in knowledge discovery and data mining. Its purpose is to reduce the size of data sets maintaining their predictive capabilities. The usual emerging problem at this point is that these methods quite often suffer of high computational complexity, which becomes highly inconvenient for processing huge...
Article
Full-text available
Over recent decades, database sizes have grown considerably. Larger sizes present new challenges, because machine learning algorithms are not prepared to process such large volumes of information. Instance selection methods can alleviate this problem when the size of the data set is medium to large. However, even these methods face similar problems...
Article
Machine Learning has two central processes of interest that captivate the scientific community: classification and regression. Although instance selection for classification has shown its usefulness and has been researched in depth, instance selection for regression has not followed the same path and there are few published algorithms on the subjec...
Article
Ensembles are learning methods the operation of which relies on a combination of different base models. The diversity of ensembles is a fundamental aspect that conditions their operation. Random Feature Weights ($${\mathcal {RFW}}$$) was proposed as a classification-tree ensemble construction method in which diversity is introduced into each tree b...
Article
An important step in building expert and intelligent systems is to obtain the knowledge that they will use. This knowledge can be obtained from experts or, nowadays more often, from machine learning processes applied to large volumes of data. However, for some of these learning processes, if the volume of data is large, the knowledge extraction pha...
Article
Data pre-processing is a very important aspect of data mining. In this paper we discuss instance selection used for prediction algorithms, which is one of the pre-processing approaches. The purpose of instance selection is to improve the data quality by data size reduction and noise elimination. Until recently, instance selection has been applied m...
Article
Full-text available
Industrial demand for models and simulation tools that can predict dimensional errors in manufacturing processes is vigorous. One example of these processes is ball-end finishing of inclined surfaces, which is a very complex task, due to the high number of variables that may influence dimensional errors during a cutting process and their different...
Conference Paper
Binarization techniques deal with multiclass classification problem combining several binary classifiers. They were originally introduced for dealing with multiclass problems with methods that were only able to deal with two classes (e.g., SVM). Nevertheless, it has been shown that they can also be useful with classification methods able to deal di...

## Citations

... High-dimensionality can even lead to class overlapping, which makes the design of discriminative rules extremely difficult in imbalanced data scenarios [44]. Ramos-Pérez et al. analyzed the combination effects of resampling and feature selection techniques on high-dimensional and low instance imbalanced data, also determining whether resample data should be before or after feature selection [45]. The contribution of feature selection to specific preterm labor prediction from imbalanced data remains unclear. ...
... The SMOTE was adopted because it is a reference algorithm to solve the class disequilibrium learning problem [63]. The SMOTE algorithm has the dynamics of generating new synthetic examples in the neighborhood of small groups of nearby instances, using the k-nearest neighbor [64]. The function was implemented from the smotefamily package. ...
... We conduct experiments on a widely-used data repository with imbalanced data for classification: KEEL Repository 1 and UCI Repository 2 . Following [7,34,70], we use 43 datasets in the KEEL repository and UCI Repository, the main information of which is summarized in Table 2. The number of instances in these datasets ranges from 92 to 85,871. ...
... However, recent research has demonstrated effective and lightweight learning techniques that require relatively few labeled training examples for the purpose of automatic segmentation and denoising 36,37 . The ability to analyze data sets on the basis of limited training data, as often encountered in microscopy 38,39 , is an important frontier in materials and data science. Unfortunately, even one labeled example oftentimes includes tedious manual annotation of thousands or even millions of pixels. ...
... Mechanical failures are the most common in IM, which derive from high load torque because of their continuous starting and stopping [5]. Broken rotor bars (BRB) are among these failures, and they are one of the most difficult to detect in early stages because of the harmonic-component mixture between the nominal power frequency and the characteristic fault frequencies [6][7][8][9][10][11]. In addition, IM with this faulty condition work in apparent normality, which might cause further rotor bar ruptures, and in the worst scenario, the total collapse of the machine [12][13][14], provoking unscheduled shutdowns and rising production costs critically. ...
... In addition, using these techniques will improve the cost-effectiveness of both personal and material resources, and will improve the prognosis in various pathologies. In recent years, these technologies have been used via cloud-based applications which make them much easier for early care professionals to use [17,18]. An example of this procedure of working is given in Figure 1. ...
... Experiments were carried out on 34 benchmark small and medium sized regression data sets that are commonly used in some papers on regression [27]- [29]. Table 1 summarizes the main characteristics of these data sets, which were taken from the following sources: 1) Torgo repository (https://www.dcc.fc.up.pt/~ltorgo/ ...
... In addition, the number of samples belongs to different categories have imbalanced distribution (in Supplementary Table S1 and Supplementary Table S2). According to these features, our problem can be characterized as a multi-class imbalanced classification problem (Rodriguez et al. 2020;Sleeman IV and Krawczyk 2021;Thabtah et al. 2020). ...
... Therefore, implementing the F vib as an important supplementary parameter [39] due to interlinking with response and damage of structures [26,28,64] is highly essential for the safe and economic blast [32,42,49,51,53]. This implies on great interest in developing optimized multiobjective predictive models through platform of intelligence systems, while the selections are evaluated using different criteria to find appropriate trade-off between objectives in complex models and thus more possibilities for decision-making problems [3,4,40]. Since multi-objective models can be developed over different approaches and assumptions, uncertainty analysis as knowledge situations involving imperfect or unknown information is often required to evaluate the robustness and accuracy performance [43]. ...
... A range of healthcare professionals work in this field (neonatologists, neurologists, pediatricians, physical therapists, psychiatrists, physiologists, physiotherapists, occupational therapists, speech therapists, etc.). They need computer-based tools that allow them to make effective differential diagnoses, as a good diagnosis is the beginning of a good intervention [9,15,16]. Computer-based tools, together with artificial intelligence and machine learning techniques integrated in web applications, will assist this process of personalizing interventions and thus enhance therapeutic success [9]. In addition, using these techniques will improve the cost-effectiveness of both personal and material resources, and will improve the prognosis in various pathologies. ...