Alberto Cano

Alberto Cano
Virginia Commonwealth University | VCU · Department of Computer Science

Ph.D. in Computer Science

About

101
Publications
167,679
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
2,256
Citations
Citations since 2016
60 Research Items
2105 Citations
20162017201820192020202120220100200300400
20162017201820192020202120220100200300400
20162017201820192020202120220100200300400
20162017201820192020202120220100200300400
Introduction
Alberto Cano is an Associate Professor in the Department of Computer Science, Virginia Commonwealth University, Richmond, Virginia, United States, where he heads the High-Performance Data Mining laboratory. His research is focused on machine learning, big data, data streams, concept drift, continual learning, GPUs and distributed computing. He is the Faculty Director of the High Performance Research Computing Core Facility at VCU.
Additional affiliations
July 2021 - present
Virginia Commonwealth University
Position
  • Professor (Associate)
August 2015 - June 2021
Virginia Commonwealth University
Position
  • Professor (Assistant)
July 2010 - July 2015
University of Cordoba (Spain)
Position
  • PhD in Computer Science
Education
August 2011 - February 2014
University of Granada
Field of study
  • Computer Science

Publications

Publications (101)
Article
Full-text available
General purpose computation using Graphic Processing Units (GPUs) is a well-established research area focusing on high-performance computing solutions for massively parallelizable and time-consuming problems. Classical methodologies in machine learning and data mining cannot handle processing of massive and high-speed volumes of information in the...
Article
Full-text available
Learning from data streams in the presence of concept drift is among the biggest challenges of contemporary machine learning. Algorithms designed for such scenarios must take into an account the potentially unbounded size of data, its constantly changing nature, and the requirement for real-time processing. Ensemble approaches for data stream minin...
Article
Full-text available
Data streams are potentially unbounded sequences of instances arriving over time to a classifier. Designing algorithms that are capable of dealing with massive, rapidly arriving information is one of the most dynamically developing areas of machine learning. Such learners must be able to deal with a phenomenon known as concept drift, where the data...
Preprint
Full-text available
Class imbalance poses new challenges when it comes to classifying data streams. Many algorithms recently proposed in the literature tackle this problem using a variety of data-level, algorithm-level, and ensemble approaches. However, there is a lack of standardized and agreed-upon procedures on how to evaluate these algorithms. This work presents a...
Article
Full-text available
Multi-label data streams are sequences of multi-label instances arriving over time to a multi-label classifier. The properties of the stream may continuously change due to concept drift. Therefore, algorithms must constantly adapt to the new data distributions. In this paper we propose a novel ensemble method for multi-label drifting streams named...
Article
Full-text available
A wide range of applications based on sequential data, named time series, have become increasingly popular in recent years, mainly those based on the Internet of Things (IoT). Several different machine learning algorithms exploit the patterns extracted from sequential data to support multiple tasks. However, this data can suffer from unreliable rea...
Article
Full-text available
The pH level of oceans has been largely monitored and studied to make sure that aquatic ecosystems are thriving. However, the pH level of other large bodies of waters, such as rivers, has largely been glanced over. Many rivers contain very sensitive underwater ecosystems, and as a result even small pH changes can largely impact the relative biodive...
Article
Many research areas depend on group anomaly detection. The use of group anomaly detection can maintain and provide security and privacy to the data involved. This research attempts to solve the deficiency of the existing literature in outlier detection thus a novel hybrid framework to identify group anomaly detection from sequence data is proposed...
Article
Full-text available
This paper investigates the semantic modeling of smart cities and proposes two ontology matching frameworks, called Clustering for Ontology Matching-based Instances (COMI) and Pattern mining for Ontology Matching-based Instances (POMI). The goal is to discover the relevant knowledge by investigating the correlations among smart city data based on c...
Article
Full-text available
Drifting data streams and multi-label data are both challenging problems. Multi-label instances may simultaneously be associated with many labels and classifiers must predict the complete set of labels. Learning from data streams requires algorithms able to learn from potentially unbounded data that is constantly changing. When multi-label data arr...
Chapter
Classification of imbalanced data is one of most challenging aspects of machine learning. Despite over two decades of progress there is still a need for developing new techniques capable to overcome numerous difficulties embedded in the nature of imbalanced datasets. In this paper, we propose Locally Linear Support Vector Machines (LL-SVMs) for eff...
Article
Full-text available
This paper explores the joint use of decomposition methods and parallel computing for solving constraint satisfaction problems and introduces a framework called Parallel Decomposition for Constraint Satisfaction Problems (PD-CSP). The main idea is that the set of constraints are first clustered using a decomposition algorithm in which highly correl...
Article
Full-text available
This paper explores five pattern mining problems and proposes a new distributed framework called DT-DPM: Decomposition Transaction for Distributed Pattern Mining. DT-DPM addresses the limitations of the existing pattern mining problems by reducing the enumeration search space. Thus, it derives the relevant patterns by studying the different correla...
Article
This paper addresses the taxi fraud problem and introduces a new solution to identify trajectory outliers. The approach as presented allows to identify both individual and group outliers and is based on a two phase-based algorithm. The first phase determines the individual trajectory outliers by computing the distance of each point in each trajecto...
Article
Full-text available
Cyber-epidemics, the widespread of fake news or propaganda through social media, can cause devastating economic and political consequences. A common countermeasure against cyber-epidemics is to disable a small subset of suspected social connections or accounts to effectively contain the epidemics. An example is the recent shutdown of 125,000 ISIS-r...
Article
Full-text available
Multi-label learning is a challenging task demanding scalable methods for large-scale data. Feature selection has shown to improve multi-label accuracy while defying the curse of dimensionality of high-dimensional scattered data. However, the increasing complexity of multi-label feature selection, especially on continuous features, requires new app...
Article
Detecting abnormal trajectories is an important task in research and industrial applications, which has attracted considerable attention in recent decades. This work studies the existing trajectory outlier detection algorithms in different industrial domains and applications, including maritime, smart urban transportation, video surveillance, and c...
Article
Full-text available
Computational prediction of ion channels facilitates the identification of putative ion channels from protein sequences. Several predictors of ion channels and their types were developed in the last quindecennial. While they offer reasonably accurate predictions, they also suffer a few shortcomings including lack of availability, parallel predictio...
Article
Full-text available
This paper addresses the hashtag recommendation problem using high average-utility pattern mining. We introduce a novel framework called PM-HRec (Pattern Mining for Hashtag Recommendation). It consists of two main stages. First, offline processing transforms the corpus of tweets into a transactional database considering the temporal information of...
Article
Full-text available
Hashtag is an iconic feature to retrieve the hot topics of discussion on Twitter or other social networks. This paper incorporates the pattern mining approaches to improve the accuracy of retrieving the relevant information and speeding up the search performance. A novel algorithm called PM-HR (Pattern Mining for Hashtag Retrieval) is designed to f...
Article
Full-text available
Multi-label learning generalizes traditional learning by allowing an instance to belong to multiple labels simultaneously. This causes multi-label data to be characterized by its large label space dimensionality and the dependencies among labels. These challenges have been addressed by feature selection techniques which improve the final model accu...
Conference Paper
Full-text available
Learning from data streams is one of the most promising and challenging domains in modern machine learning. Proliferating online data sources provide us access to real-time knowledge we have never had before. At the same time, new obstacles emerge and we have to overcome them in order to fully and effectively utilize the potential of the data. Proh...
Poster
BACKGROUND A wider view of the interaction between different omic-domains is needed to identify potential biomarkers of low- and high-grade gliomas. Using an interactomic approach, we analyzed the correlation between radiological data, IDH mutation, gene expression profiling and metabolic signature in glioma samples. MATERIAL AND METHODS Tumor bio...
Article
Full-text available
In multi-label learning, data may simultaneously belong to more than one class. When multi-label data arrives as a stream, the challenges associated with multi-label learning are joined by those of data stream mining, including the need for algorithms that are fast and flexible, able to match both the speed and evolving nature of the stream. This p...
Article
Full-text available
Multi-label classification is one of the most dynamically growing fields of machine learning, due to its numerous real-life applications in solving problems that can be described by multiple labels at the same time. While most of works in this field focus on proposing novel and accurate classification algorithms, the issue of the computational comp...
Conference Paper
Full-text available
Learning from data streams is among the most vital contemporary fields in machine learning and data mining. Streams pose new challenges to learning systems, due to their volume and velocity, as well as ever-changing nature caused by concept drift. Vast majority of works for data streams assume a fully supervised learning scenario, having an unrestr...
Conference Paper
Full-text available
Multi-label classification has attracted increasing attention of the scientific community in recent years, given its ability to solve problems where each of the examples simultaneously belongs to multiple labels. From all the techniques developed to solve multi-label classification problems, Classifier Chains has been demonstrated to be one of the...
Chapter
Full-text available
Apache Spark has become a popular framework for distributed machine learning and data mining. However, it lacks support for operating with Attribute-Relation File Format (ARFF) files in a native, convenient, transparent, efficient, and distributed way. Moreover, Spark does not support advanced learning paradigms represented in the ARFF definition i...
Article
Full-text available
Early warning systems have been progressively implemented in higher education institutions to predict student performance. However, they usually fail at effectively integrating the many information sources available at universities to make more accurate and timely predictions, they often lack decision-making reasoning to motivate the reasons behind...
Article
Full-text available
Designing efficient algorithms for mining massive high-speed data streams has become one of the contemporary challenges for the machine learning community. Such models must display highest possible accuracy and ability to swiftly adapt to any kind of changes, while at the same time being characterized by low time and memory complexities. However, l...
Article
Full-text available
This paper reviews the use of outlier detection approaches in urban traffic analysis. We divide existing solutions into two main categories: flow outlier detection and trajectory outlier detection. The first category groups solutions that detect flow outliers and includes statistical, similarity, and pattern mining approaches. The second category c...
Article
Full-text available
Outlier detection is an extense research area which has been intensively studied in several domains such as biological sciences, medical diagnosis, surveillance, and traffic anomaly detection. This paper explores advances in the outlier detection area by finding anomalies in spatio-temporal urban traffic flow. It proposes a new approach by consider...
Conference Paper
Full-text available
Multi-label data streams is a highly challenging task involving drifts in features and labels. Classifiers must automatically adapt to changes while keeping a competitive accuracy in a real-time dynamic environment where the frequencies of the labelsets are non-stationary and highly imbalanced. This paper presents a multi-label k Nearest Neighbor (...
Article
Full-text available
Mining data streams is among most vital contemporary topics in machine learning. Such scenario requires adaptive algorithms that are able to process constantly arriving instances, adapt to potential changes in data, use limited computational resources, as well as be robust to any atypical events that may appear. Ensemble learning has proven itself...
Article
Full-text available
This paper considers discovering frequent itemsets in transactional databases and addresses the time complexity problem by using high performance computing (HPC). Three HPC versions of the Single Scan (SS) algorithm are proposed. The first one (GSS) implements SS on a GPU (Graphics Processing Unit) architecture using an efficient mapping between th...
Conference Paper
Full-text available
Learning from imbalanced data is a challenge that machine learning community is facing over last decades, due to its ever-growing presence in real-life problems. While there is a significant number of works addressing the issue of handling binary and skewed datasets, its multi-class counterpart have not received as much attention. This problem is m...
Conference Paper
Full-text available
High-speed data streams are potentially infinite sequences of rapidly arriving instances that may be subject to concept drift phenomenon. Hence, dedicated learning algorithms must be able to update themselves with new data and provide an accurate prediction in a limited amount of time. This requirement was considered as prohibitive for using evolut...
Article
Full-text available
Due to the ever-growing nature of dataset sizes, the need for scalable and accurate machine learning algorithms has become evident. Stochastic gradient descent methods are popular tools used to optimize large-scale learning problems because of their generalization performance, simplicity, and scalability. This paper proposes a novel stochastic, als...
Article
Full-text available
Markerless Motion Capture is the problem of determining the pose of a person from images captured by one or several cameras simultaneously without using markers on the subject. Evaluation of the solutions is frequently the most time-consuming task, making most of the proposed methods inapplicable in real-time scenarios. This paper presents an effic...
Article
Full-text available
Locally weighted regression allows to adjust the regression models to nearby data of a query example. In this paper, a locally weighted regression method for the multi-target regression problem is proposed. A novel way of weighting data based on a data gravitation-based approach is presented. The process of weighting data does not need to decompose...
Article
Full-text available
Multi-target regression is a challenging task that consists of creating predictive models for problems with multiple continuous target outputs. Despite the increasing attention on multi-label classification, there are fewer studies concerning multi-target (MT) regression. The current leading MT models are based on ensembles of regressor chains, whe...
Article
Full-text available
Large scale optimization is an active research area in which many algorithms, benchmark functions, and competitions have been proposed to date. However, extremely high-dimensional optimization problems comprising millions of variables demand new approaches to perform effectively in results quality and efficiently in time. Memetic algorithms are pop...
Article
Full-text available
Multi-view learning combines data from multiple heterogeneous sources and employs their complementary information to build more accurate models. Multi-instance learning represents examples as labeled bags containing sets of instances. Data fusion of different multi-instance views cannot be simply concatenated into a single set of features due to th...
Conference Paper
Full-text available
Multi-label learning is a challenging problem which has received growing attention in the research community over the last years. Hence, there is a growing demand of effective and scalable multi-label learning methods for larger datasets both in terms of number of instances and numbers of output labels. The use of ensemble classifiers is a popular...
Conference Paper
Full-text available
Twitter became one of the most dynamically developing areas of social media. Due to concise nature of messages, rapid publication and high outreach, people share more and more of their opinions, thoughts and commentaries using this medium. Sentiment analysis is a specific subsection of natural language processing that concentrates on automatically...
Article
Full-text available
Feature extraction transforms high dimensional data into a new subspace of lower dimensionality while keeping the classification accuracy. Traditional algorithms do not consider the multi-objective nature of this task. Data transformations should improve the classification performance on the new subspace, as well as to facilitate data visualization...
Article
Full-text available
The growing interest in data storage has made the data size to be exponentially increased, hampering the process of knowledge discovery from these large volumes of high-dimensional and heterogeneous data. In recent years, many efficient algorithms for mining data associations have been proposed, facing up time and main memory requirements. Neverthe...
Conference Paper
Full-text available
At this time, many industrial and science problems deal with a large number of decision variables. Classic metaheuristics have shown excellent search abilities on bounded problems, but they often lose their efficacy when applied to large ones. This is known as the curse of dimensionality. To this issue, we have to add the simple fact that the solut...
Article
Full-text available
Cellular automata are mathematical models for a dynamic system that evolve in discrete steps. This paper presents an application of cellular automata for encrypting information and sharing secrets. It is shown a didactic example of its application for encrypting secrets using digital images. This work has served as an academic activity aimed at stu...
Conference Paper
Full-text available
Data processing in a fast and efficient way is an important functionality in machine learning, especially with the growing interest in data storage. This exponential increment in data size has hampered traditional techniques for data analysis and data processing, giving rise to a new set of methodologies under the term Big Data. Many efficient algo...
Article
Full-text available
Association rule mining is one of the most common data mining techniques used to identify and describe interesting relationships between patterns from large datasets, the frequency of an association being defined as the number of transactions that it satisfies. In situations where each transaction includes an undetermined number of instances (custo...