ChapterPDF Available

Ensemble of Multi-headed Machine Learning Architectures for Time-Series Forecasting of Healthcare Expenditures

Chapter

Ensemble of Multi-headed Machine Learning Architectures for Time-Series Forecasting of Healthcare Expenditures

Abstract and Figures

Machine learning (ML) is increasingly being used in the healthcare domain for time-series predictions. However, the development of multi-headed ML architectures for multivariate time-series predictions has been less explored in the literature concerning the health-care domain. Multi-headed neural network architectures work on the idea that each independent variable (input series) can be handled by a separate neural network model (head) and the output of each of these models (heads) can be combined before a prediction is made about a dependent variable. In this paper, we develop three multi-headed ML architectures and their ensemble to predict patients' weekly average expenditures on certain pain medications. A multi-headed long short-term memory (LSTM) model, a multi-headed convolutional neural network long short-term memory (CNN-LSTM) model, a multi-headed convolutional long short-term memory (ConvLSTM) model, and an ensemble model combining predictions of the three multi-headed models were calibrated to predict patients' weekly average expenditures on two pain medications. The ensemble model combined the predictions of different multi-headed models using an exponential weight algorithm. Results revealed that the ensemble model outperformed the multi-headed models and the multi-headed LSTM model outperformed the multi-headed CNN-LSTM and ConvLSTM models across both pain medications. We highlight the utility of developing multi-headed ML architectures and their ensembles for the prediction of patient-related expenditures in the healthcare domain.
Content may be subject to copyright.
Ensemble of Multi-headed Machine Learning
Architectures for Time-series Forecasting of Healthcare
Expenditures
Shruti Kaushik1,a, Abhinav Choudhury1,b, Nataraj Dasgupta2,c, Sayee Natarajan2,d,
Larry A. Pickett2,e, and Varun Dutt1,f
1Applied Cognitive Science Laboratory, Indian Institute of Technology Mandi, Himachal
Pradesh, India 175005
2RxDataScience, Inc., USA - 27709
ashruti_kaushik@students.iitmandi.ac.in,
babhinav_choudhury@students.iitmandi.ac.in,
cnd@rxdatascience.com, dsayee@rxdatascience.com,
elarry@rxdatascience.com, and fvarun@iitmandi.ac.in
Abstract. Machine learning (ML) is increasingly being used in the healthcare
domain for time-series predictions. However, the development of multi-headed
ML architectures for multivariate time-series predictions has been less explored
in the literature concerning health-care domain. Multi-headed neural network
architectures work on the idea that each independent variable (input series) can
be handled by a separate neural network model (head) and the output of each of
these models (heads) can be combined before a prediction is made about a
dependent variable. In this paper, we develop three multi-headed ML
architectures and their ensemble to predict patients’ weekly average
expenditures on certain pain medications. A multi-headed long short-term
memory (LSTM) model, a multi-headed convolutional neural network long
short-term memory (CNN-LSTM) model, a multi-headed convolutional long
short-term memory (ConvLSTM) model, and an ensemble model combining
predictions of the three multi-headed models were calibrated to predict patients’
weekly average expenditures on two pain medications. The ensemble model
combined the predictions of different multi-headed models using an exponential
weight algorithm. Results revealed that the ensemble model outperformed the
multi-headed models and the multi-headed LSTM model outperformed the
multi-headed CNN-LSTM and ConvLSTM models across both pain
medications. We highlight the utility of developing multi-headed ML
architectures and their ensembles for prediction of patient-related expenditures
in the healthcare domain.
Keywords: Time-series forecasting, long short-term memory (LSTM),
convolutional neural network (CNN), ensemble, healthcare, multivariate time-
series.
1 Introduction
Predicting expenditure on medications is one of the crucial issues in the world as it
may help patients better manage their spending on healthcare [1]. Also, it may help
pharmaceutical companies optimize their manufacturing process and determine
attractive pricing for their medications [2]. Machine learning (ML) offers a wide
range of techniques to predict future medicine expenditures using data of historical
expenditures as well as other healthcare variables. For example, literature has
developed autoregressive integrated moving average (ARIMA), multi-layer
perceptron (MLP), long short-term memory (LSTM), and convolutional neural
network (CNN) models to predict the future medicine expenditures and other
healthcare outcomes [1, 3-5]. Researchers have also utilized traditional approaches
like k-nearest neighbor and support vector machines frameworks for time-series
predictions [6]. However, the non-stationary and non-linear dynamics of the time-
series poses major challenges in predicting the underlying time-series accurately [7].
Literature has shown the advantages of using LSTM and CNN models for
performing time-series predictions when the underlying time-series depicts non-linear
and/or non-stationary behavior [7]. However, prior research has not utilized the
benefits of convolutional long short-term memory (ConvLSTM) models and
convolutional neural network models combined with long short-term memory (CNN-
LSTM) models for predicting the non-linear and non-stationary time-series in
healthcare domains.
Second, an investigation involving LSTM, ConvLSTM, and CNN-LSTM
multi-headed neural network architectures has not been explored for predicting the
time-series in healthcare domains. In these multi-headed architectures each
independent variable (input series) is handled by a separate neural network model
(head) and the output of each of these models (heads) are combined before a
prediction is made about a dependent variable [16].
Third, prior research in the field of time-series forecasting suggests that
significant enhancement in the accuracies can be attained by combining predictions
from various models [8, 21]. Prior research has used the weighted approaches to train
the ensemble models [18, 21]. However, combination of multi-headed neural network
architectures via ensemble models has been less explored in the literature for
predicting healthcare outcomes.
Fourth, prior research has shown the advantages of shuffling the supervised
mini-batches while training the univariate time-series LSTM models [21]. Shuffling is
helpful because it avoids the model getting trapped in local minima during training
due to the repeated presentation of data in the same order. In addition, prior research
has shown a regularization technique called dropout to be helpful in reducing the
overfitting in neural network architectures [9]. However, the effect of shuffling and
dropout has not been evaluated so far on multi-headed architectures for time-series
prediction problems.
Overall, building upon these gaps in literature, the primary objective of this
research is to evaluate multi-headed architectures involving LSTM, ConvLSTM, and
CNN-LSTM models as well as evaluate an ensemble of these multi-headed models.
The ensemble model combines the predictions of the LSTM, ConvLSTM, and CNN-
LSTM models to predict patients' expenditures on certain pain medications
1
.
Moreover, we also compare four different variations across different multi-headed
neural architectures: shuffle with dropout, shuffle without dropout, no-shuffle with
dropout, and no-shuffle without dropout. When shuffling is present (shuffle), smaller
supervised sets (mini-batches) containing attributes corresponding to the chosen look-
back (lag) period are created and shuffled across the time-series during network
training. However, when shuffle is not present (no-shuffle), the mini-batches are
created in the order they occur in data and inputted into the network without shuffling.
Also, when dropout is present, certain proportion of nodes in the network are
randomly discarded during model training. When dropout is absent, no nodes are
discarded during training in the network.
In what follows, we first provide a brief review of related literature. Next, we
explain the methodology of applying different multi-headed architectures consisting
of LSTM, CNN-LSTM, ConvLSTM, and their ensemble for multivariate time-series
prediction of healthcare outcomes. In Section IV, we present our experimental results,
where we compare different model predictions. Finally, we conclude our paper and
provide a discussion on the implication of this research and its future scope.
2 Background
In recent years, ML algorithms (e.g., LSTM, CNN) have gained lot of attention in
almost every domain [4, 7, 13-15]. The ML neural networks can automatically learn
the complex and arbitrary mappings from inputs to outputs [7]. Neural network
models like LSTMs can handle the ordering of observations (important for time-series
problems), which is not offered by multi-layer perceptron or CNN models [4].
Recently, LSTMs and their variants ConvLSTM and CNN-LSTM have been
used by researchers to solve different problems across a number of domains. For
example, Zhao et al., have used LSTMs for traffic forecasting [13]. These authors
used the temporal-spatial information and proposed an LSTM network for short-term
traffic forecast. Xingjian et al., have used Convolutional LSTM (ConvLSTM) to
predict the rainfall intensity for a short period of time [14]. They showed that
ConvLSTM performed better than a fully connected LSTM to forecast rainfall
intensity.
Researchers have also developed cascades of CNN and LSTM models to
learn the upward and downward trend in a time series [15]. For example, reference
[15] showed that the CNN-LSTM (a cascade model) outperformed the standalone
CNN and LSTM models to learn the trend in a time-series.
Moreover, there have been multi-headed neural network models proposed in recent
literature, where a head (a neural network) is used for each independent variable and
outputs of each head are combined to give the final prediction for the dependent
variable [16-17]. Researchers have developed multi-headed recurrent neural networks
in certain identification tasks [16] and clustering tasks [17]. However, to the best of
1
Pain medications were chosen as they cut across a number of patient-related ailments.
our knowledge, multi-headed architectures of LSTM, ConvLSTM, and CNN-LSTM
have not been evaluated yet for the multivariate time-series forecasting in the
healthcare domain. Moreover, shuffling the mini-batches while training the neural
networks and adding regularization to reduce overfitting has proven to be effective
training mechanisms in literature [9, 21].
Additionally, researchers have used ensemble techniques to further improve
the performance of the individual models [8, 18-19]. Prior research has shown that the
ensemble approach performs better than the individual ML models [8]. Thus, we also
create an ensemble model using the predictions of these three multi-head models and
compare its performance with the individual models for multivariate time-series
prediction of patients expenditures. Moreover, we expect the ensemble model to
perform better than the individual models. One likely reason for this expectation is
ensemble method tend to give more weight to those model predictions that are
accurate compared to those that are less accurate. Also, prior research shows that the
ConvLSTM and CNN-LSTM architectures performed better than individual LSTM or
CNN on image datasets where both spatial and temporal information contribute
towards future predictions [14]. In this research, we deal with only temporal
information. Thus, we expect multi-headed LSTM to perform like ConvLSTM and
CNN-LSTM models. We also expect that shuffling between the supervised mini-
batches and adding the dropout will be helpful in getting good prediction on test data-
set.
Overall, the main contribution of this research is to evaluate the performance
of three multi-headed architectures, i.e. LSTM, ConvLSTM, and CNN-LSTM to
predict the weekly average expenditure on certain medications. The second
contribution is to compare the individual multi-headed models with their ensemble
model. The third contribution is to evaluate the performance of shuffle and dropout
variations while training the multi-headed architectures.
3 Method
3.1 Data
In this paper, we selected two pain medications (named “A” and “B”) from the
Truven MarketScan dataset for our analyses [10]
2
. These two pain medications were
among the top-ten most prescribed pain medications in the US [11]. Data for both
medications range between 2nd January 2011 and 15th April 2015 (1565 days). Every
day, on average, about 1,428 patients refilled medicine A and about 550 patients
refilled medicine B. For both medicines, we prepared a multivariate time-series
containing the daily average expenditures by patients on these medications,
respectively. We used 20 attributes for performing multivariate time-series analyses.
These attributes provide information regarding the number of patients of a particular
gender (male, female), age group (0-17, 18-34, 35-44, 45-54, and 55-65), region
(south, northeast, north central, west, and unknown), health-plan (two type of health
2
To maintain privacy, the actual names of the two pain medications have not been disclosed.
plans), and different diagnoses and procedure codes (six ICD-9 codes) who consumed
medicine on a particular day. These 6 (ICD-9 codes were selected from the frequent
pattern mining using Apriori algorithm [20]. The 21st attribute was the average
expenditure per patient for a medicine on a day
!"
was defined as per the following
equation:
#$%&'"()*+$,*"-./*01%!2+*34 " %353
6
"
(10)"
Where
%"
was the total amount spent in a day
!
on the medicine across all patients
and
5
was the total number of patients who refilled the medicine in day
"!
. This daily
average expenditure along with the 20 attributes was used to compute the weekly
average expenditure for both medicines, where the weekly average expenditure was
used to evaluate model performance.
3.2 Evaluation Metrics
All the models were fit to data at a weekly level using the following metrics: Root
Mean Squared Error (RMSE; error) and R-square (R2; trend) [12]. As weekly average
expenditure predictions were of interest, the RMSE and R2 scores and visualizations
for weekly average expenditures were computed in weekly blocks of 7-days. Thus,
the daily average expenditures per patient were summed across seven days in a block
for both training and test datasets. This resulted in the weekly average expenditure
across 186 blocks of training data and 37 blocks of test data. We calibrated all models
to reduce error and capture trend in data. Thus, all models were calibrated using an
objective function that was defined as the following: (RMSE/10 + (1-R2))
3
. This
objective function ensured that the obtained parameters minimized the error (RMSE)
and maximized the trend (R2) on the weekly average expenditure per patient between
model and actual data.
3.3 Experiment Design for Multi-headed LSTM
Fig. 1(a) shows the multi-headed LTSM architecture used in this paper. The first layer
across all heads is the input layer where mini-batches of each feature in data are put
into a separate head. As shown in Fig. 1(a), for training the multi-headed LSTM on a
medicine, each variable (20 independent variables and 1 dependent variable) for the
medicine was put into a separate LSTM model (head) to produce a single combined
concatenated output. The dense (output) layer contained 1 neuron which gave the
expenditure prediction about the medicine for a time-period. We used a grid search
procedure to train different hyper-parameters in the LSTM block in each head. The
hyper-parameters used and their range of variation in the grid search were the
following: hidden layers (1, 2, 3, and 4), number of neurons in a layer (4, 8, 16, 32,
3
RMSE was divided by 10 in order to bring both RMSE and 1- R2 on the same scale. RMSE
captures the error and R2 captures the trend.
and 64), batch size (5, 10, 15, and 20), number of epochs (8, 16, 32, 64, 128, 256, and
512), lag/look-back period (2 to 8), activation function (tanh, adam, and adagrad), and
dropout rate (20% or 30%)
4
. Additionally, the training was done in four ways:
shuffle with dropout, shuffle without dropout, no-shuffle with dropout, and no-shuffle
without dropout. When shuffling was present, the mini-batches were shuffled
randomly across time-series for each medicine. When shuffling was absent, we did
not shuffle the mini-batches and presented these batches in sequential order to the
neural network. For dropout present conditions, we put a dropout layer after each
hidden layer in the LSTM block. For dropout absent conditions, we did not apply any
dropout layer.
Fig. 1. (a) Multi-headed LSTM and (b) Multi-headed ConvLSTM
3.4 Experiment Design for Multi-headed ConvLSTM
The multi-headed ConvLSTM architecture was also trained exactly in a same manner
as multi-headed LSTM. Fig. 1(b) shows the example of a multi-headed ConvLSTM
architecture in which the first layer across all heads is the input layer where mini-
batches of each feature in data are put into a separate head. Here also, each feature
variable was put into a separate model (head) to produce a single combined
concatenated output. The ConvLSTM is different from the LSTM in a manner that in
ConvLSTM layers, the internal matrix multiplications (present in LSTM) are replaced
with convolution operations [14]. Convolution is a mathematical operation which is
performed on the input data with the use of a filter (a matrix) to produce a feature map
[14]. The ConvLSTM block in each head contained first the ConvLSTM layer
where we passed (32, 64, or 128) filters with different kernel size (1, 3, 5, and 7). The
output of this ConvLSTM layer was passed to different fully connected or dropout
layers. At last, the output from each head after training was then concatenated to
predict the expenditure (21st feature) on a medicine on a day. The dense (output) layer
contained 1 neuron which gave the expenditure prediction about the medicine for a
time-period.
4
A 20% dropout rate means that 20% connections will be dropped randomly from this layer to
the next layer.
3.5 Experiment Design for Multi-headed CNN-LSTM
The multi-headed CNN-LSTM architecture was also trained in a same way as other
two multi-headed architectures. The example of multi-headed CNN-LSTM
architecture is shown in Fig. 2. As shown in Fig. 2, the CNN-LSTM architecture
contained both a CNN model as well as a LSTM model in each head. The CNN block
is used for features extraction and it is followed by the LSTM block for the sequence
prediction of data [15]. There was a flatten layer just after the CNN block to flatten
the 3D output from convolution layer in a 1D vector. The outputs from each LSTM
head were concatenated and passed through a dense layer, which then produce the
daily expenditures for a medicine. Different number of filter (32, 64, or 128) and
different kernel sizes (1, 3, 5, and 7) were varied as hyper-parameters in the CNN
block. The hyper-parameters in the LSTM block were varied similar to those in the
multi-headed LSTM model.
Fig. 2. Multi-headed ConvLSTM architecture
3.4 Ensemble Model
We used normalized exponential weighted algorithm [EWS; 8] to ensemble the
predictions of the multi-headed LSTM, ConvLSTM, and CNN-LSTM architectures.
The working of the EWS algorithm is presented in the box below. Given a set of
predictions on the training data from different models, the EWS algorithm starts with
equal weights to all predictions (i.e.,
7
8
weight to the multi-headed LSTM,
ConvLSTM, and CNN-LSTM model predictions) and computes the ensemble
prediction. Then, for the first training sample (i.e., the first point out of 1306 points),
the squared error between the ensemble model’s prediction and actual data is
computed (line 4). Next, the weights are updated using the squared error for different
model predictions (line 5), where the parameter
9
is the learning-rate parameter.
Finally, we normalize the weights obtained for each model’s predictions by dividing it
with the total sum of the weights across all models (line 6). This process continues
until all the 1306 training samples are covered. We calibrated the value of the
9"
parameter by varying its values from 0.1 to 1.0 in steps of 0.01.
:;<=>?@ABC"DEF;GBGH@>?"IB@JKHBC"L?J;<@HK=
1. Input:
M"NO1*&P"*$QR"/+*1%Q!%0,"O2!QON*P"S
T3"SO+"+O201"!U $"/$+$N*!*+"9
)
2.
VT
7"W XYM"SO+"% 4 XU Z Z U M"
(Set initial weights of each model i’s predictions to
1/N)
[Z SO+"!+$%0%0,"P$N/&*P""!" 4 "XU \U Z Z Z 1O
""""""""""""""]Z""""&
^
S
T3"U '3
_
4 ^S
T3` '3_a"
(Calculation of squared error where
'3"
is the
actual value) "
5.
VT
3b7 " W VT
3"*cd^e
fgUhg_
""""""""""""""iZ""""""VT
3b7 " W VT
3b7Y
j
VT
3b7
k
Tl7
(normalization of weights) "
4 Results
4.1 Multi-headed LSTM Model
Table 1 shows the multi-headed LSTM model’s RMSE and R2 on training and test
data for different shuffle and dropout combinations on medicines A and B (shuffle
with dropout, shuffle without dropout, no-shuffle with dropout, and no-shuffle
without dropout). As shown in Table 1, the best RMSE (= USD 310.47 per patient) on
test data was obtained for the shuffle and dropout combination for medicine A and
this model was trained with 3 lag period, 64 epochs, 20 batch size, and tanh activation
function. The architecture contained 2 hidden layers, 2 dropout layers, and output
layer. The architecture description is as follows: first hidden layer with 64 neurons,
dropout layer with 30% dropout rate, second hidden layer with 64 neurons, another
dropout layer with 30% dropout rate, and finally the dense (output) layer with 1
neuron. On medicine B, we obtained the best RMSE (= USD 38.71 per patient) on
test data for no-shuffle and with dropout combination. The corresponding LSTM
architecture contained 4 hidden layers, 4 dropout layers, and a dense layer at the end.
The detailed description of architecture in sequence: LSTM layer with 64 neurons,
dropout layer with 30% dropout rate, second LSTM layer with 32 neurons, dropout
layer with 20% dropout rate, third LSTM layer with 32 neurons, dropout layer with
20% dropout rate, fourth LSTM layer with 32 neurons, dropout layer with 20%
dropout rate, and finally the dense layer with 1 neuron. This architecture was trained
with 3 lag period, 64 epochs, 10 batch size, and tanh activation. Fig. 3 shows the
LSTM model fits for medicine A (Fig. 3A) and medicine B (Fig. 3B) in test data for
the best performing model combinations. As shown in Fig. 3, the LSTM model fits
were reasonably accurate for medicine B.
Table 1. Multi-headed LSTM results during training and test
Combinations of Shuffle
and Dropout
Train
RMSE
Train
R2
Test
RMSE
Test
R2
Shuffle with dropout
181.91
0.66
310.47
0.05
Shuffle without dropout
145.58
0.79
320.39
0.04
No-shuffle with dropout
164.82
0.72
315.83
0.01
No-shuffle without dropout
153.84
0.75
326.70
0.01
Shuffle with dropout
43.68
0.98
39.18
0.92
Shuffle without dropout
44.65
0.98
42.59
0.92
No-shuffle with dropout
44.13
0.98
38.71
0.93
No-shuffle without dropout
43.29
0.98
39.21
0.92
Note. The bold text highlights the variation with the lowest RMSE on test data.
0
1000
2000
3000
4000
5000
6000
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37
Average Expenditure (USD per
patient)
Blocks
(A) Predictions for Medicine A
Actual
LSTM
0
500
1000
1500
2000
2500
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37
Average Expenditure (USD per
patient)
Blocks
(B) Predictions for Medicine B
Actual
LSTM
Fig. 3. Average expenditure (in USD per patient) from the multi-headed LSTM model for
medicine A (A) and for medicine B (B) in test data
4.2 Multi-headed ConvLSTM Model
Table 2 shows the multi-headed ConvLSTM model’s RMSE and R2 on training and
test data for different shuffle and dropout combinations on medicines A and B
(shuffle with dropout, shuffle without dropout, no-shuffle with dropout, and no-
shuffle without dropout). As shown in Table 2, the best RMSE (= USD 326.89 per
patient) on test data was obtained for the no-shuffle and with dropout combination for
medicine A and this model was trained with 3 lag period, 128 epochs, 20 batch size,
and tanh activation layer. The model contained 2 hidden layers, 2 dropout layers, and
one dense layer in following sequence: first ConvLSTM hidden layer with 128
neurons, dropout layer with 30% dropout rate, second ConvLSTM layer with 64
neurons, dropout layer with 30% dropout rate, at last the dense layer with 1 neuron.
On medicine B, we obtained the best RMSE (= USD 41.02 per patient) on test data
for shuffle and with dropout combination. The corresponding model was trained with
3 lag period, 64 epochs, 15 batch size, and tanh activation function. The architecture
possessed 2 hidden layers, 1 dropout layer, and a dense layer in the following
sequence: first hidden layer with 32 neurons, second hidden layer with 32 neurons,
dropout layer with 30% dropout rate, and the dense layer at last with 1 neuron. Fig. 4
shows the ConvLSTM model fits for medicine A (Fig. 4A) and medicine B (Fig. 4B)
in test data for the best performing model combinations. As shown in Fig. 4, the
ConvLSTM model fits were reasonably accurate for medicine B.
Table 2. Multi-headed ConvLSTM results during training and test
Combinations of Shuffle
and Dropout
Train
RMSE
Train
R2
Test
RMSE
Test
R2
Shuffle with dropout
130.65
0.86
330.19
0.04
Shuffle without dropout
121.75
0.87
344.79
0.04
No-shuffle with dropout
124.71
0.85
326.89
0.02
No-shuffle without dropout
124.53
0.86
331.33
0.04
Shuffle with dropout
46.91
0.98
41.02
0.91
Shuffle without dropout
45.46
0.97
41.96
0.91
No-shuffle with dropout
51.19
0.98
44.63
0.90
No-shuffle without dropout
47.77
0.98
47.07
0.90
Note. The bold text highlights the variation with the lowest RMSE on test data.
Fig. 4. Average expenditure (in USD per patient) from the multi-headed ConvLSTM model for
medicine A (A) and for medicine B (B) in test data
4.3 Multi-headed CNN-LSTM Model
Table 3 shows the multi-headed CNN-LSTM model’s RMSE and R2 on training and
test data for different shuffle and dropout combinations on medicines A and B
(shuffle with dropout, shuffle without dropout, no-shuffle with dropout, and no-
shuffle without dropout). As shown in Table 3, the best RMSE (= USD 329.35 per
patient) on test data was obtained for the no-shuffle and no dropout combination for
medicine A and this architecture contained 1D convolutional layer followed by one
LSTM layer and a dense layer. The model comprised of 1D convolution layer with 32
filters having 5 kernel size, followed with LSTM layer with 64 neurons, and dense
0
1000
2000
3000
4000
5000
6000
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37
Average Expenditure (USD per
patient)
Blocks
(A) Predictions for Medicine A
Actual
ConvLSTM
0
500
1000
1500
2000
2500
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37
Average Expenditure (USD per
patient)
Blocks
(B) Predictions for Medicine B
Actual
ConvLSTM
(output) layer having 1 neuron. This architecture was trained with 3 lag period, 64
epochs, 15 batch size, and tanh activation function. On medicine B, we obtained the
best RMSE (= USD 44.23 per patient) on test data for shuffle and with dropout
combination. The corresponding CNN-LSTM model possessed 1D convolution layer
followed by 2 LSTM layers, 2 dropout layers, and finally the dense layer. The
architecture contained 1D convolution layer having 64 filters with 5 kernel size,
followed by LSTM layer with 32 neurons, dropout layer with 30% dropout rate,
another LSTM layer with 32 neurons, dropout layer with 30% dropout rate, and dense
(output) layer at last. This architecture was trained with 3 lag periods, 64 epochs, 20
batch size, and tanh activation function. Fig. 5 shows the CNN-LSTM model fits for
medicine A (Fig. 5A) and medicine B (Fig. 5B) in test data for the best performing
model combinations. As shown in Fig. 5 the CNN-LSTM model fits were reasonably
accurate for medicine B.
Table 3. Multi-headed CNN-LSTM results during training and test
Medicine
Name
Combinations of Shuffle and
Dropout
Train
RMSE
Train
R2
Test
RMSE
Test
R2
A
Shuffle with dropout
135.35
0.82
336.93
0.03
A
Shuffle without dropout
120.89
0.87
364.27
0.09
A
No-shuffle with dropout
134.06
0.82
347.43
0.04
A
No-shuffle without dropout
127.39
0.84
329.35
0.09
B
Shuffle with dropout
44.98
0.98
44.23
0.90
B
Shuffle without dropout
53.47
0.97
46.62
0.91
B
No-shuffle with dropout
49.15
0.98
49.49
0.88
B
No-shuffle without dropout
52.66
0.97
71.50
0.78
Note. The bold text highlights the variation with the lowest RMSE on test data.
Fig. 5. Average expenditure (in USD per patient) from the multi-headed CNN-LSTM model
for medicine A (A) and for medicine B (B) in test data
4.4 Ensemble Model
Table 4 shows the ensemble model’s RMSE and R2 on training and test data on both
the medicines. These results were obtained using the best predictions (the results
which are shown in bold in Table 1, 2, and 3) from each of the LSTM, ConvLSTM,
and CNN-LSTM models. The ensemble results on medicine A were obtained with the
following weights: 0.33 for LSTM, 0.33 for ConvLSTM, and 0.33 for CNN-LSTM,
with
9 4 mZmX
as the learning rate parameter from the normalized exponential
weighted algorithm. On medicine B, the ensemble results were obtained with the
0
1000
2000
3000
4000
5000
6000
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37
Average Expenditure (USD per
patient)
Blocks
(A) Predictions for Medicine A
Actual
CNNLSTM
0
500
1000
1500
2000
2500
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37
Average Expenditure (USD per
patient)
Blocks
(B) Predictions for Medicine B
Actual
CNNLSTM
following weights: 0.95 for LSTM, 0.0 for ConvLSTM, and 0.05 for CNN-LSTM.
The ensemble model weights for medicine B were obtained for
9 4 mZmn
using the
normalized exponential weighted algorithm. Fig. 6 shows the model fits from the
ensemble model for medicine A (Fig. 6A) and medicine B (Fig. 6B). As shown in
Table 4, the ensemble the test RMSE have improved from stand-alone models for
both the medicines.
Table 4. Ensemble model results during training and test
Ensemble Model
for Medicines
Train RMSE
Train R2
Test RMSE
Test R2
A
135.69
0.83
305.08
0.09
B
43.67
0.98
38.62
0.93
Fig. 6. Average expenditure (in USD per patient) from the ensemble model for medicine A
(A) and for medicine B (B) in test data
0
1000
2000
3000
4000
5000
6000
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37
Average Expenditure (USD) per
patient
Blocks
(A) Predictions for Medicine A
Actual
Ensemble
0
500
1000
1500
2000
2500
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37
Average Expenditure (USD)
per patient
Blocks
(B) Predictions for Medicine B
Actual
Ensemble
5 Discussion and Conclusions
Time-series architectures have gained popularity among researchers across various
disciplines [4, 13-15]. Specifically, the recurrent architectures such as long short-term
memory (LSTM) have been utilized for time-series predictions [4, 21]. However, the
variants of LSTM such as ConvLSTM and CNN-LSTM have not been evaluated for
time-series predictions in the healthcare domain. Additionally, the researchers have
utilized the single-headed neural network architectures to predict the future time-
series [21]. However, the potential of multi-headed neural network architectures need
to be utilized for multivariate time-series predictions. In the multi-headed
architectures, each head takes one variable as input and the finally output from each
head (model) is merged to provide a single output for the variable of interest. In
addition, prior research has shown that ensemble of different architectures can
improve the overall performance [18]. Therefore, the primary objective of this
research was to evaluate the performance of multi-headed LSTM, ConvLSTM, CNN-
LSTM, and ensemble of all these three multi-headed models to predict the weekly
average expenditure by patients on two pain medications. Another objective of this
paper was to evaluate the advantages of shuffling of training data and adding of
dropouts while training the neural network models.
First, as per our expectation, we found that the best value of test RMSE and
test R2 was obtained from the ensemble model for both the medications. Second, we
found that the multi-headed LSTM performed better than the multi-headed
ConvLSTM and CNN-LSTM. The performance of multi-headed LSTM was followed
by ConvLSTM and CNN-LSTM in terms of test RMSE and test R2. In fact, the test
RMSE and test R2 obtained by multi-headed LSTM on medicine B was more or less
same as obtained by ensemble model. The likely reason why multi-headed LSTM
performed better than the fusion architectures, i.e. multi-headed ConvLSTM and
CNN-LSTM, is because the convolution architectures are known for learning the
features representations in images and LSTM is mainly known for learning the
temporal patterns in the data. In this paper, each feature is handled by each head
(model), therefore, each head was dealing with a different feature (variable).
Therefore, LSTM alone may be sufficient to learn and understand the temporal
patterns in the data, thus, presence of convolution operations did not really help in
further improving the prediction accuracy.
Third, we found that all the models performed better on test data in dropout
present condition for both medicines (except multi-headed CNN-LSTM in case of
medicine A). A likely reason is neural network architectures tend to over-fit easily.
Also, the multi-headed architectures are more complex architectures, i.e., models with
more weights and parameters, hence there is a tendency to over-fit.
Fourth, we found that the multi-headed neural network models showed
random behaviour with shuffling/not shuffling the mini-batches of input data while
training neural networks for time-series predictions. However, in previous research,
shuffling the supervised vectors (mini-batches) was helpful for the LSTM models for
univariate time-series forecasting [21]. Therefore, researchers should experiment with
how they present data to the input layer (shuffle/no-shuffle).
Overall, we believe that the multi-headed LSTM and weighted ensemble
approaches could be helpful to caregivers, patients, and pharmaceutical companies to
predict per-patient expenditures where we can utilize the demographic details and
other variables of patients in predicting their future expenditures. In future, we plan to
perform long-range bi-weekly or monthly predictions to evaluate the capacity of
multi-headed neural network architectures. Also, we plan to evaluate other networked
architectures (e.g., generative adversarial networks) and their ensembles for time-
series forecasting of healthcare expenditure data.
Acknowledgement. The project was supported by grants (awards:
#IITM/CONS/PPLP/VD/03 and # IITM/CONS/RxDSI/VD/16) to Varun Dutt.
References
1. Pham, T., Tran, T., Phung, D., & Venkatesh, S.: Deepcare: A deep dynamic memory
model for predictive medicine. In Pacific-Asia Conference on Knowledge Discovery and
Data Mining (pp. 30-41). Springer, Cham (2016).
2. Hunter, J.: Adopting AI is essential for a sustainable pharma industry. Drug Discov.
World, pp. 69-71 (2016).
3. Xing, Y., Wang, J., & Zhao, Z.: Combination data mining methods with new medical data
to predicting outcome of coronary heart disease. In 2007 International Conference on
Convergence Information Technology (ICCIT 2007) (pp. 868-872). IEEE (2007).
4. Kaushik, S., Choudhury, A., Dasgupta, N., Natarajan, S., Pickett, L. A., & Dutt, V.: Using
LSTMs for Predicting Patient's Expenditure on Medications. In 2017 International
Conference on Machine Learning and Data Science (MLDS) (pp. 120-127). IEEE (2017).
5. Feng, Y., Min, X., Chen, N., Chen, H., Xie, X., Wang, H., & Chen, T.: Patient outcome
prediction via convolutional neural networks based on multi-granularity medical concept
embedding. In 2017 IEEE International Conference on Bioinformatics and Biomedicine
(BIBM) (pp. 770-777). IEEE (2017).
6. Huang, Z., & Shyu, M. L.: Long-term time series prediction using k-NN based LS-SVM
framework with multi-value integration. In Recent Trends in Information Reuse and
Integration (pp. 191-209). Springer, Vienna (2012).
7. Gamboa, J. C. B.: Deep learning for time-series analysis. arXiv preprint arXiv:1701.01887
(2017).
8. Adhikari, R., Verma, G., & Khandelwal, I.: A model ranking based selective ensemble
approach for time series forecasting. Procedia Computer Science, 48, pp. 14-21 (2015).
9. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R.: Dropout: a
simple way to prevent neural networks from overfitting. The Journal of Machine Learning
Research, 15(1), pp. 1929-1958 (2014).
10. Danielson, E.: Health research data for the real world: the MarketScan:registered:
Databases. Ann Arbor, MI: Truven Health Analytics (2014).
11. Scott, G. (2014). Top 10 Painkillers in the US. MD magazine. Retrieved from
https://www.mdmag.com/medical-news/top-10-painkillers-in-us.
12. Yilmaz, I., Erik, N. Y., & Kaynar, O.: Different types of learning algorithms of artificial
neural network (ANN) models for prediction of gross calorific value (GCV) of coals.
Scientific Research and Essays, 5(16), pp. 2242-2249 (2010).
13. Zhao, Z., Chen, W., Wu, X., Chen, P. C., & Liu, J.: LSTM network: a deep learning
approach for short-term traffic forecast. IET Intelligent Transport Systems, 11(2), pp. 68-
75 (2017).
14. Xingjian, S. H. I., Chen, Z., Wang, H., Yeung, D. Y., Wong, W. K., & Woo, W. C.:
Convolutional LSTM network: A machine learning approach for precipitation nowcasting.
In Advances in neural information processing systems, pp. 802-810 (2015).
15. Lin, T., Guo, T., & Aberer, K.: Hybrid neural networks for learning the trend in time
series. In Proceedings of the Twenty-Sixth International Joint Conference on Artificial
Intelligence (No. CONF, pp. 2273-2279) (2017).
16. Bagnall, D.: Author identification using multi-headed recurrent neural networks. arXiv
preprint arXiv:1506.04891 (2015).
17. Bagnall, D.: Authorship clustering using multi-headed recurrent neural networks. arXiv
preprint arXiv:1608.04485 (2016).
18. Adhikari, R., & Agrawal, R. K.: Performance evaluation of weights selection schemes for
linear combination of multiple forecasts. Artificial Intelligence Review, 42(4), pp. 529-
548 (2014).
19. Jose, V. R. R., & Winkler, R. L.: Simple robust averages of forecasts: Some empirical
results. International Journal of Forecasting, 24(1), pp. 163-169 (2008).
20. Kaushik, S., Choudhury, A., Dasgupta, N., Natarajan, S., Pickett, L. A., & Dutt, V.:
Evaluating Frequent-Set Mining Approaches in Machine-Learning Problems with Several
Attributes: A Case Study in Healthcare. In International Conference on Machine Learning
and Data Mining in Pattern Recognition, pp. 244-258. Springer, Cham (2018).
21. Kaushik, S., Choudhury, A., Sheron, P. K., Dasgupta, N., Natarajan, S., Pickett, L. A., &
Dutt, V.: AI in Healthcare: Time-Series Forecasting using Statistical, Neural, and
Ensemble Architectures. In Frontiers in Big Data (Under Review) (2019).
... The features extracted by both the blocks were concatenated and given to a softmax layer. In [32], the authors designed three types of multi-headed architectures (multi-headed LSTM, multi-headed ConvLSTM, and multi-headed CNN-LSTM) and an ensemble of all three of them. In these multi-headed models, each head handled one feature. ...
Article
Full-text available
Human Activity Recognition (HAR) has attracted much attention from researchers in the recent past. The intensification of research into HAR lies in the motive to understand human behaviour and inherently anticipate human intentions. Human activity data obtained via wearable sensors like gyroscope and accelerometer is in the form of time series data, as each reading has a timestamp associated with it. For HAR, it is important to extract the relevant temporal features from raw sensor data. Most of the approaches for HAR involves a good amount of feature engineering and data pre-processing, which in turn requires domain expertise. Such approaches are time-consuming and are application-specific. In this work, a Deep Neural Network based model, which uses Convolutional Neural Network, and Gated Recurrent Unit is proposed as an end-to-end model performing automatic feature extraction and classification of the activities as well. The experiments in this work were carried out using the raw data obtained from wearable sensors with nominal pre-processing and don’t involve any handcrafted feature extraction techniques. The accuracies obtained on UCI-HAR, WISDM, and PAMAP2 datasets are 96.20%, 97.21%, and 95.27% respectively. The results of the experiments establish that the proposed model achieved superior classification performance than other similar architectures.
... In case there are parameterized events, such as MouseMove events requiring coordinates where the mouse pointer should be moved to, the classification problem is further extended to a combination of a classification and regression problem. These parameters are produced via a multi-headed network model [27] in which regression nodes are mapped to parameterized events. Similar to input features, the set of feasible output features and the number of required output nodes may change throughout the course of a game. ...
Preprint
Game-like programs have become increasingly popular in many software engineering domains such as mobile apps, web applications, or programming education. However, creating tests for programs that have the purpose of challenging human players is a daunting task for automatic test generators. Even if test generation succeeds in finding a relevant sequence of events to exercise a program, the randomized nature of games means that it may neither be possible to reproduce the exact program behavior underlying this sequence, nor to create test assertions checking if observed randomized game behavior is correct. To overcome these problems, we propose Neatest, a novel test generator based on the NeuroEvolution of Augmenting Topologies (NEAT) algorithm. Neatest systematically explores a program's statements, and creates neural networks that operate the program in order to reliably reach each statement -- that is, Neatest learns to play the game in a way to reliably cover different parts of the code. As the networks learn the actual game behavior, they can also serve as test oracles by evaluating how surprising the observed behavior of a program under test is compared to a supposedly correct version of the program. We evaluate this approach in the context of Scratch, an educational programming environment. Our empirical study on 25 non-trivial Scratch games demonstrates that our approach can successfully train neural networks that are not only far more resilient to random influences than traditional test suites consisting of static input sequences, but are also highly effective with an average mutation score of more than 65%.
Article
Monitoring signals such as fetal heart rate (FHR) are important indicators of fetal well-being. Computer-assisted analysis of FHR patterns has been successfully used as a decision support tool. However, the absence of a gold standard for the building blocks decision-making in the systems design process impairs the development of new solutions. Here we propose a prognostic model based on advanced signal processing techniques and machine learning algorithms for the fetal state assessment within a comprehensive evaluation process. Feature-engineering-based and time-series-based machine learning classifiers were modeled into three data segmentation schemas for CTU-UHB, HUFA, and DB-TRIUM datasets and the generalization performance was assessed by a two-way cross-dataset evaluation. It has been shown that the feature-based algorithms outperformed the time-series ones on data-limited scenarios. The Support Vector Machines (SVM) obtained the best results on the datasets individually: specificity (85.6% ) and sensitivity (67.5%). On the other hand, the most effective generalization results were achieved by the Multi-layer perceptron (MLP) with a specificity of 71.6% and sensitivity of 61.7%. The overall process provided a combination of techniques and methods that increased the final prognostic model performance, achieving relevant results and requiring a smaller amount of data when compared to the state-of-the-art fetal status assessment solutions.
Article
In this paper, we use the Eurozone yield curve in an effort to forecast the deviations of the euro-area output (IPI) from its long-run trend. We use various short- and long-term interest rates spanning the period from 2004:9 to 2020:6 in monthly frequency. The interest rates are fed to three machine learning methodologies: Decision Trees, Random Forests, and Support Vector Machines (SVM). These Machine Learning methodologies are then compared to an Elastic-Net Logistic Regression (Logit) model from the area of Econometrics. According to the results, the optimal SVM model coupled with the RBF kernel outperforms the competition reaching an in-sample accuracy of 85.29% and an out-of-sample accuracy of 94.74%.
Article
Nowadays, Artificial intelligence (AI), combined with the digitalisation of healthcare, can lead to substantial improvements in Patient Care, Disease Management, Hospital Administration, and supply chain effectiveness. Among predictive analytics tools, time series forecasting represents a central task to support healthcare management in terms of bookings and medical services predictions. In this context, the development of flexible frameworks to provide robust and reliable predictions became a central point in this healthcare innovation process. This paper presents and discusses a multi-source time series fusion and forecasting framework relying on Deep Learning. By combining weather, air-quality and medical bookings time series through a feature compression stage which preserves temporal patterns, the prediction is provided through a flexible ensemble technique based on machine learning models and a hybrid neural network. The proposed system is able to predict the number of bookings related to a specific medical examination for a 7-days horizon period. To assess the proposed approach’s effectiveness, we rely on time series extracted from a real dataset of administrative e-health records provided by the Campania Region health department, in Italy.
Article
Full-text available
Machine learning (ML) offers a wide range of techniques to predict medicine expenditures using historical expenditures data as well as other healthcare variables. For example, researchers have developed multi-layer perceptron (MLP), long-short term memory (LSTM), and convolutional neural network (CNN) models for predicting healthcare outcomes. However, recently proposed generative approaches (e.g., generative adversarial networks; GANs) are yet to be explored for time-series prediction of medicine-related expenditures. The primary objective of this research is to develop and test a generative adversarial network model (called “variance-based GAN or V-GAN”) that specifically minimizes the difference in variance between model and actual data during model training. For our model development, we used patient expenditure data of a popular pain medication in the US. In the V-GAN model, we used an LSTM model as a generator network and a CNN model or an MLP model as a discriminator network. The performance of the V-GAN model was compared with other GAN variants and ML models proposed in prior research such as linear regression (LR), gradient boosting regression (GBR), MLP, and LSTM. Results revealed that the V-GAN model using an LSTM generator and a CNN discriminator outperformed other GAN-based prediction models, as well as the LR, GBR, MLP, and LSTM models in correctly predicting medicine expenditures of patients. Through this research, we highlight the utility of developing GAN-based architectures involving variance minimization for predicting patient-related expenditures in the healthcare domain.
Article
Full-text available
Both statistical and neural methods have been proposed in the literature to predict healthcare expenditures. However, less attention has been given to comparing predictions from both these methods as well as ensemble approaches in the healthcare domain. The primary objective of this paper was to evaluate different statistical, neural, and ensemble techniques in their ability to predict patients’ weekly average expenditures on certain pain medications. Two statistical models, persistence (baseline) and auto-regressive integrated moving average (ARIMA), a multi-layer perceptron (MLP) model, a long short-term memory (LSTM) model, and an ensemble model combining predictions of the ARIMA, MLP, and LSTM models were calibrated to predict the expenditures on two different pain medications. In the MLP and LSTM models, we compared the influence of shuffling of training data and dropout of certain nodes in MLPs and nodes and recurrent connections in LSTMs in layers during training. Results revealed that the ensemble model outperformed the persistence, ARIMA, MLP, and LSTM models across both pain medications. In general, not shuffling the training data and adding the dropout helped the MLP models and shuffling the training data and not adding the dropout helped the LSTM models across both medications. We highlight the implications of using statistical, neural, and ensemble methods for time-series forecasting of outcomes in healthcare domain.
Article
Full-text available
Both statistical and neural methods have been proposed in the literature to predict healthcare expenditures. However, less attention has been given to comparing predictions from both these methods as well as ensemble approaches in the healthcare domain. The primary objective of this paper was to evaluate different statistical, neural, and ensemble techniques in their ability to predict patients’ weekly average expenditures on certain pain medications. Two statistical models, persistence (baseline) and auto-regressive integrated moving average (ARIMA), a multi-layer perceptron (MLP) model, a long short-term memory (LSTM) model, and an ensemble model combining predictions of the ARIMA, MLP, and LSTM models were calibrated to predict the expenditures on two different pain medications. In the MLP and LSTM models, we compared the influence of shuffling of training data and dropout of certain nodes in MLPs and nodes and recurrent connections in LSTMs in layers during training. Results revealed that the ensemble model outperformed the persistence, ARIMA, MLP, and LSTM models across both pain medications. In general, not shuffling the training data and adding the dropout helped the MLP models and shuffling the training data and not adding the dropout helped the LSTM models across both medications. We highlight the implications of using statistical, neural, and ensemble methods for time-series forecasting of outcomes in healthcare domain.
Conference Paper
Full-text available
Managing expenditure on medications is a serious challenge faced by patients, in particular for those who cannot afford costly health care. Predicting patient’s spending on medications becomes crucial for efficient planning, budgeting, and decision-making. However, little attention has been given to predicting patient expenditure using deep time -series forecasting methods. The primary objective of this paper is the time-series forecasting of patient expenditures on medications using both traditional and deep time-series forecasting methods. A traditional Auto-Regressive Integrated Moving Average (ARIMA) model; and, two deep models, a standard Long Short-Term Memory (LSTM) model and a stacked LSTM model were calibrated to predict the monthly expenditure on medication for 50,000+ patients in the US between 2011 and 2015. The first 48 months were used for training the models and the remaining 12 months were used for testing the models. Results revealed that the stacked LSTM model performed better than both the standard LSTM and ARIMA models during test conditions. Overall, both the deep time-series models performed better than the traditional time-series ARIMA model. We highlight the implications of our results for forecasting time - series data involving patient journeys.
Article
Full-text available
Personalized predictive medicine necessitates modeling of patient illness and care processes, which inherently have long-term temporal dependencies. Healthcare observations, recorded in electronic medical records, are episodic and irregular in time. We introduce DeepCare, a deep dynamic neural network that reads medical records and predicts future medical outcomes. At the data level, DeepCare models patient health state trajectories with explicit memory of illness. Built on Long Short-Term Memory (LSTM), DeepCare introduces time parameterizations to handle irregular timing by moderating the forgetting and consolidation of illness memory. DeepCare also incorporates medical interventions that change the course of illness and shape future medical risk. Moving up to the health state level, historical and present health states are then aggregated through multiscale temporal pooling, before passing through a neural network that estimates future outcomes. We demonstrate the efficacy of DeepCare for disease progression modeling and readmission prediction in diabetes, a chronic disease with large economic burden. The results show improved modeling and risk prediction accuracy.
Technical Report
Full-text available
Personalized predictive medicine necessitates the modeling of patient illness and care processes, which inherently have long-term temporal dependencies. Healthcare observations, recorded in electronic medical records, are episodic and irregular in time. We introduce DeepCare, an end-to-end deep dynamic neural network that reads medical records, stores previous illness history, infers current illness states and predicts future medical outcomes. At the data level, DeepCare represents care episodes as vectors in space, models patient health state trajectories through explicit memory of historical records. Built on Long Short-Term Memory (LSTM), DeepCare introduces time parameterizations to handle irregular timed events by moderating the forgetting and consolidation of memory cells. DeepCare also incorporates medical interventions that change the course of illness and shape future medical risk. Moving up to the health state level, historical and present health states are then aggregated through multiscale temporal pooling, before passing through a neural network that estimates future outcomes. We demonstrate the efficacy of DeepCare for disease progression modeling, intervention recommendation, and future risk prediction. On two important cohorts with heavy social and economic burden -- diabetes and mental health -- the results show improved modeling and risk prediction accuracy.
Article
The rapid increase in human population and development in technology have sharply raised power consumption in today's world. Since electricity is consumed simultaneously as it is generated at the power plant, it is important to accurately predict the energy consumption in advance for stable power supply. In this paper, we propose a CNN-LSTM neural network that can extract spatial and temporal features to effectively predict the housing energy consumption. Experiments have shown that the CNN-LSTM neural network, which combines convolutional neural network (CNN) and long short-term memory (LSTM), can extract complex features of energy consumption. The CNN layer can extract the features between several variables affecting energy consumption, and the LSTM layer is appropriate for modeling temporal information of irregular trends in time series components. The proposed CNN-LSTM method achieves almost perfect prediction performance for electric energy consumption that was previously difficult to predict. Also, it records the smallest value of root mean square error compared to the conventional forecasting methods for the dataset on individual household power consumption. The empirical analysis of the variables confirms what affects to forecast the power consumption most.
Chapter
Often datasets may involve thousands of attributes, and it is important to discover relevant features for machine-learning (ML) algorithms. Here, approaches that reduce or select features may become difficult to apply, and feature discovery may be made using frequent-set mining approaches. In this paper, we use the Apriori frequent-set mining approach to discover the most frequently occurring features from among thousands of features in datasets where patients consume pain medications. We use these frequently occurring features along with other demographic and clinical features in specific ML algorithms and compare algorithms’ accuracies for classifying the type and frequency of consumption of pain medications. Results revealed that Apriori implementation for features discovery improved the performance of a large majority of ML algorithms and decision tree performed better among many ML algorithms. The main implication of our analyses is in helping the machine-learning community solves problems involving thousands of attributes.
Conference Paper
The trend of time series characterizes the intermediate upward and downward behaviour of time series. Learning and forecasting the trend in time series data play an important role in many real applications, ranging from resource allocation in data centers, load schedule in smart grid, and so on. Inspired by the recent successes of neural networks, in this paper we propose TreNet, a novel end-to-end hybrid neural network to learn local and global contextual features for predicting the trend of time series. TreNet leverages convolutional neural networks (CNNs) to extract salient features from local raw data of time series. Meanwhile, considering the long-range dependency existing in the sequence of historical trends of time series, TreNet uses a long-short term memory recurrent neural network (LSTM) to capture such dependency. Then, a feature fusion layer is to learn joint representation for predicting the trend. TreNet demonstrates its effectiveness by outperforming CNN, LSTM, the cascade of CNN and LSTM, Hidden Markov Model based method and various kernel based baselines on real datasets.
Article
Short-term traffic forecast is one of the essential issues in intelligent transportation system. Accurate forecast result enables commuters make appropriate travel modes, travel routes, and departure time, which is meaningful in traffic management. To promote the forecast accuracy, a feasible way is to develop a more effective approach for traffic data analysis. The availability of abundant traffic data and computation power emerge in recent years, which motivates us to improve the accuracy of short-term traffic forecast via deep learning approaches. A novel traffic forecast model based on long short-term memory (LSTM) network is proposed. Different from conventional forecast models, the proposed LSTM network considers temporal-spatial correlation in traffic system via a two-dimensional network which is composed of many memory units. A comparison with other representative forecast models validates that the proposed LSTM network can achieve a better performance.
Article
A recurrent neural network that has been trained to separately model the language of several documents by unknown authors is used to measure similarity between the documents. It is able to find clues of common authorship even when the documents are very short and about disparate topics. While it is easy to make statistically significant predictions regarding authorship, it is difficult to group documents into definite clusters with high accuracy.