Ya Ju Fan

Ya Ju Fan
  • PhD
  • Researcher at Lawrence Livermore National Laboratory

About

26
Publications
6,884
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
569
Citations
Current institution
Lawrence Livermore National Laboratory
Current position
  • Researcher
Additional affiliations
November 2010 - present
Lawrence Livermore National Laboratory
Position
  • PostDoc Position
Description
  • Nonlinear dimension reduction, anomaly detection for multiple sensor streams, pattern recognition for wind generation.
August 2006 - June 2010
Rutgers, The State University of New Jersey

Publications

Publications (26)
Article
Full-text available
Neural Network (NN) models provide potential to speed up the drug discovery process and reduce its failure rates. The success of NN models requires uncertainty quantification (UQ) as drug discovery explores chemical space beyond the training data distribution. Standard NN models do not provide uncertainty information. Some methods require changing...
Preprint
Full-text available
Neural Network (NN) models provide potential to speed up the drug discovery process and reduce its failure rates. The success of NN models require uncertainty quantification (UQ) as drug discovery explores chemical space beyond the training data distribution. Standard NN models do not provide uncertainty information. Methods that combine Bayesian m...
Article
Full-text available
To enable personalized cancer treatment, machine learning models have been developed to predict drug response as a function of tumor and drug features. However, most algorithm development efforts have relied on cross-validation within a single study to assess model accuracy. While an essential first step, cross-validation within a biological data s...
Preprint
Full-text available
To enable personalized cancer treatment, machine learning models have been developed to predict drug response as a function of tumor and drug features. However, most algorithm development efforts have relied on cross validation within a single study to assess model accuracy. While an essential first step, cross validation within a biological data s...
Preprint
Full-text available
Gene expression profiles have been widely used to characterize patterns of cellular responses to diseases. As data becomes available, scalable learning toolkits become essential to processing large datasets using deep learning models to model complex biological processes. We present an autoencoder to capture nonlinear relationships recovered from g...
Article
Full-text available
There has been an increasing interest in recent years in the mining of massive data sets whose sizes are measured in terabytes. However, there are some problems where collecting even a single data point is very expensive, resulting in data sets with only tens or hundreds of samples. One such problem is that of building code surrogates, where a comp...
Article
Full-text available
The autoencoder is an artificial neural network model that learns hidden representations of unlabeled data. With a linear transfer function it is similar to the principal component analysis (PCA). While both methods use weight vectors for linear transformations, the autoencoder does not come with any indication similar to the eigenvalues in PCA tha...
Article
Full-text available
The move toward exascale computing for scientific simulations is placing new demands on compression techniques. It is expected that the I/O system will not be able to support the volume of data that is expected to be written out. To enable quantitative analysis and scientific discovery, we are interested in techniques that compress high-dimensional...
Article
Ramp events, which are significant changes in wind generation over a short interval, make it difficult to schedule wind energy on the power grid. Predicting the occurrences of these events can help control room operators ensure that the load and generation on the power grid are in balance at all times. In this paper, we focus on predicting up-ramp...
Article
In this paper, we formulate the problem of predicting wind generation as one of streaming data analysis. We want to understand if it is possible to use the weather data in a time window just before the current time to gain insight into how the wind generation might behave in a time interval just after the current time. Specifically, we use a singul...
Article
Full-text available
Wind energy is scheduled on the power grid using 0–6 h ahead forecasts generated from computer simulations or historical data. When the forecasts are inaccurate, control room operators use their expertise, as well as the actual generation from previous days, to estimate the amount of energy to schedule. However, this is a challenge, and it would be...
Chapter
Data mining is the process of uncovering patterns, associations, anomalies, and statistically significant structures and events in data. It borrows and builds on ideas from many disciplines, ranging from statistics to machine learning, mathematical optimization, and signal and image processing. Data mining techniques are becoming an integral part o...
Conference Paper
Wind energy is scheduled on the power grid using 0-6 hour ahead forecasts generated from computer simulations or historical data. When the forecasts are inaccurate, control room operators use their expertise, as well as the actual generation from previous days, to estimate the amount of energy to schedule. However, this is a challenge, and it would...
Article
As renewable resources, such as wind, start providing an increasingly larger percentage of our energy needs, we need to improve our understanding of these resources, so we can manage them better. The intermittent nature of the power generation makes it challenging for control room operators to schedule wind energy while balancing the load on the gr...
Article
Full-text available
Many dimension reduction methods have been proposed to discover the intrinsic, lower dimensional structure of a high-dimensional dataset. However, determining critical features in datasets that consist of a large number of features is still a challenge. In this paper, through a series of carefully designed experiments on real-world datasets, we inv...
Article
Full-text available
There is an urgent need for a quick screening process that could help neurologists diagnose and determine whether a patient is epileptic versus simply demonstrating symptoms linked to epilepsy but actually stemming from a different illness. An inaccurate diagnosis could have fatal consequences, particularly in operating rooms and intensive care uni...
Article
Full-text available
In this paper, we propose a new optimization framework for improving feature selection in medical data classification. We call this framework Support Feature Machine (SFM). The use of SFM in feature selection is to find the optimal group of features that show strong separability between two classes. The separability is measured in terms of inter-cl...
Article
Full-text available
Identifying abnormalities or anomalies by visual inspection on neurophysiologic signals such as ElectroEncephaloGrams (EEGs), is extremely challenging. We propose a novel Multi-Dimensional Time Series (MDTS) classification technique, called Connectivity Support Vector Machines (C-SVMs) that integrates brain connectivity network with SVMs. To alter...
Article
Full-text available
This paper proposes a new classification technique, called support feature machine (SFM), for multidimensional time-series data. The proposed technique was applied to the classification of abnormal brain activity represented in electroencephalograms (EEGs). First, the dynamical properties of EEGs from each electrode were extracted. These dynamical...
Article
Full-text available
This chapter is focused on recent advances in mathematical pro-gramming methodologies in data mining research, which is a rapidly emerging interdisciplinary research area. The main focus of this review chapter lies on classification (supervised learning) and clustering (unsupervised learning), which are among the most studied data mining tasks. We...
Chapter
Deterministic Optimization Models Support Vector Machines Robust LP for SVM Feature Selection with SVM Hybrid LP Discriminant Model MIP Discriminant Model Multi-hyperplane Classification Support Feature Machines Probabilistic Optimization Models Bayesian-Based Mathematical Program Probabilistic Models for Classification MIP Formulation for Anderson...
Article
Full-text available
Epilepsy is one of the most common brain disorders, but the dynamical transitions to neurological dysfunctions of epilepsy are not well understood in current neuroscience research. Uncontrolled epilepsy poses a significant burden to society due to associated healthcare cost to treat and control the unpredictable and spontaneous occurrence of seizur...
Conference Paper
Full-text available
In this study, a novel multidimensional time series classiflca- tion technique, namely support feature machine (SFM), is proposed. SFM is inspired by the optimization model of sup- port vector machine and the nearest neighbor rule to incor- porate both spatial and temporal of the multi-dimensional time series data. This paper also describes an appl...

Network

Cited By