About
26
Publications
6,884
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
569
Citations
Introduction
Current institution
Publications
Publications (26)
Neural Network (NN) models provide potential to speed up the drug discovery process and reduce its failure rates. The success of NN models requires uncertainty quantification (UQ) as drug discovery explores chemical space beyond the training data distribution. Standard NN models do not provide uncertainty information. Some methods require changing...
Neural Network (NN) models provide potential to speed up the drug discovery process and reduce its failure rates. The success of NN models require uncertainty quantification (UQ) as drug discovery explores chemical space beyond the training data distribution. Standard NN models do not provide uncertainty information. Methods that combine Bayesian m...
To enable personalized cancer treatment, machine learning models have been developed to predict drug response as a function of tumor and drug features. However, most algorithm development efforts have relied on cross-validation within a single study to assess model accuracy. While an essential first step, cross-validation within a biological data s...
To enable personalized cancer treatment, machine learning models have been developed to predict drug response as a function of tumor and drug features. However, most algorithm development efforts have relied on cross validation within a single study to assess model accuracy. While an essential first step, cross validation within a biological data s...
Gene expression profiles have been widely used to characterize patterns of cellular responses to diseases. As data becomes available, scalable learning toolkits become essential to processing large datasets using deep learning models to model complex biological processes. We present an autoencoder to capture nonlinear relationships recovered from g...
There has been an increasing interest in recent years in the mining of massive data sets whose sizes are measured in terabytes. However, there are some problems where collecting even a single data point is very expensive, resulting in data sets with only tens or hundreds of samples. One such problem is that of building code surrogates, where a comp...
The autoencoder is an artificial neural network model that learns hidden representations of unlabeled data. With a linear transfer function it is similar to the principal component analysis (PCA). While both methods use weight vectors for linear transformations, the autoencoder does not come with any indication similar to the eigenvalues in PCA tha...
The move toward exascale computing for scientific simulations is placing new demands on compression techniques. It is expected that the I/O system will not be able to support the volume of data that is expected to be written out. To enable quantitative analysis and scientific discovery, we are interested in techniques that compress high-dimensional...
Ramp events, which are significant changes in wind generation over a short interval, make it difficult to schedule wind energy on the power grid. Predicting the occurrences of these events can help control room operators ensure that the load and generation on the power grid are in balance at all times. In this paper, we focus on predicting up-ramp...
In this paper, we formulate the problem of predicting wind generation as one of streaming data analysis. We want to understand if it is possible to use the weather data in a time window just before the current time to gain insight into how the wind generation might behave in a time interval just after the current time. Specifically, we use a singul...
Wind energy is scheduled on the power grid using 0–6 h ahead forecasts generated from computer simulations or historical data. When the forecasts are inaccurate, control room operators use their expertise, as well as the actual generation from previous days, to estimate the amount of energy to schedule. However, this is a challenge, and it would be...
Data mining is the process of uncovering patterns, associations, anomalies, and statistically significant structures and events in data. It borrows and builds on ideas from many disciplines, ranging from statistics to machine learning, mathematical optimization, and signal and image processing. Data mining techniques are becoming an integral part o...
Wind energy is scheduled on the power grid using 0-6 hour ahead forecasts generated from computer simulations or historical data. When the forecasts are inaccurate, control room operators use their expertise, as well as the actual generation from previous days, to estimate the amount of energy to schedule. However, this is a challenge, and it would...
As renewable resources, such as wind, start providing an increasingly larger percentage of our energy needs, we need to improve our understanding of these resources, so we can manage them better. The intermittent nature of the power generation makes it challenging for control room operators to schedule wind energy while balancing the load on the gr...
Many dimension reduction methods have been proposed to discover the intrinsic, lower dimensional structure of a high-dimensional dataset. However, determining critical features in datasets that consist of a large number of features is still a challenge. In this paper, through a series of carefully designed experiments on real-world datasets, we inv...
There is an urgent need for a quick screening process that could help neurologists diagnose and determine whether a patient is epileptic versus simply demonstrating symptoms linked to epilepsy but actually stemming from a different illness. An inaccurate diagnosis could have fatal consequences, particularly in operating rooms and intensive care uni...
In this paper, we propose a new optimization framework for improving feature selection in medical data classification. We
call this framework Support Feature Machine (SFM). The use of SFM in feature selection is to find the optimal group of features
that show strong separability between two classes. The separability is measured in terms of inter-cl...
Identifying abnormalities or anomalies by visual inspection on neurophysiologic signals such as ElectroEncephaloGrams (EEGs), is extremely challenging. We propose a novel Multi-Dimensional Time Series (MDTS) classification technique, called Connectivity Support Vector Machines (C-SVMs) that integrates brain connectivity network with SVMs. To alter...
This paper proposes a new classification technique, called support feature machine (SFM), for multidimensional time-series data. The proposed technique was applied to the classification of abnormal brain activity represented in electroencephalograms (EEGs). First, the dynamical properties of EEGs from each electrode were extracted. These dynamical...
This chapter is focused on recent advances in mathematical pro-gramming methodologies in data mining research, which is a rapidly emerging interdisciplinary research area. The main focus of this review chapter lies on classification (supervised learning) and clustering (unsupervised learning), which are among the most studied data mining tasks. We...
Deterministic Optimization Models
Support Vector Machines
Robust LP for SVM
Feature Selection with SVM
Hybrid LP Discriminant Model
MIP Discriminant Model
Multi-hyperplane Classification
Support Feature Machines
Probabilistic Optimization Models
Bayesian-Based Mathematical Program
Probabilistic Models for Classification
MIP Formulation for Anderson...
Epilepsy is one of the most common brain disorders, but the dynamical transitions to neurological dysfunctions of epilepsy are not well understood in current neuroscience research. Uncontrolled epilepsy poses a significant burden to society due to associated healthcare cost to treat and control the unpredictable and spontaneous occurrence of seizur...
In this study, a novel multidimensional time series classiflca- tion technique, namely support feature machine (SFM), is proposed. SFM is inspired by the optimization model of sup- port vector machine and the nearest neighbor rule to incor- porate both spatial and temporal of the multi-dimensional time series data. This paper also describes an appl...