Conference PaperPDF Available

Deep Regressor Stacking for Air Ticket Prices Prediction

Authors:

Abstract and Figures

Purchasing air tickets by the lowest price is a challenging task for consumers since the prices might fluctuate over time influenced by several factors. In order to support users' decision, some price prediction techniques have been developed. Considering that this problem could be solved by multi-target approaches from Machine Learning, this work proposes a novel method looking forward to obtaining an improvement in air ticket prices prediction. The method, called Deep Regressor Stacking (DRS), applies a naive deep learning methodology to reach more accurate predictions. To evaluate the contribution of the DRS, it was compared with the competence of the single-target regression and two state-of-the-art multi-target regressions (Stacked Single Target and Ensemble of Regressor Chains). All four approaches were performed based on Random Forest and Support Vector Machine algorithms over two real-life airfares datasets. After results, it was concluded DRS outperformed the other three methods, being the most indicated (most predictive) to assist air passengers in the prediction of flight ticket price.
Content may be subject to copyright.
XIII Brazilian Symposium on Information Systems, Lavras, Minas Gerais, June 5-8, 2017
25
Deep Regressor Stacking for Air Ticket Prices Prediction
Everton Jose Santana
State University of Londrina
(UEL)
Electrical Engineering
Department
Londrina, Brazil
santana.everton@ieee.org
Saulo Martiello Mastelini
State University of Londrina
(UEL)
Computer Department
Londrina, Brazil
mastelini@uel.br
Sylvio Barbon Jr.
State University of Londrina
(UEL)
Computer Department
Londrina, Brazil
barbon@uel.br
ABSTRACT
Purchasing air tickets by the lowest price is a challen-
ging task for consumers since the prices might fluctuate over
time influenced by several factors. In order to support users’
decision, some price prediction techniques have been deve-
loped. Considering that this problem could be solved by
multi-target approaches from Machine Learning, this work
proposes a novel method looking forward to obtaining an
improvement in air ticket prices prediction. The method,
called Deep Regressor Stacking (DRS), applies a naive deep
learning methodology to reach more accurate predictions.
To evaluate the contribution of the DRS, it was compared
with the competence of the single-target regression and two
state-of-the-art multi-target regressions (Stacked Single Tar-
get and Ensemble of Regressor Chains). All four approaches
were performed based on Random Forest and Support Vec-
tor Machine algorithms over two real-life airfares datasets.
After results, it was concluded DRS outperformed the other
three methods, being the most indicated (most predictive)
to assist air passengers in the prediction of flight ticket price.
CCS Concepts
Information systems !Decision support systems;
Data mining; Computing methodologies !Machine
learning; Model verification and validation; Applied
computing !Online shopping; Mathematics of com-
puting !Regression analysis; Information theory;
Keywords
Decision Support System, Multi-Target Regression, Air-
fare Prediction, Price Mining
1. INTRODUCTION
The International Air Transport Association’s 2016 review
reported that more than 3.5 billion passengers segments were
flown in 2015, and by 2034, air passenger number is forecast
to increase to seven billion annually [1].
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage and that copies bear this notice and the full citation
on the first page. To copy otherwise, or republish, to post on servers or to redistribute
to lists, requires prior specific permission and/or a fee.
SBSI 2017 June 5th –8
th, 2017, Lavras, Minas Gerais, Brazil
Copyright SBC 2017.
Besides the growth in the number of airplane travellers,
the access to the air ticket prices became more available
on the Internet, allowing the consumers to compare, more
easily, prices over time and to identify some pricing ten-
dencies, in order to choose the right moment to purchase
a ticket. However, identifying the patterns in the pricing
mechanism is a complex process composed of many factors
such as dierent classes of seats in the same flight, diverse
sellers, seasonality, amount of seats available and the price
of other companies, which cause a high variability in the
prince over time [8]. In fact, the large dispersion in airfares
for seats in the same flight, and the questioning of its rea-
sonableness, was already indicated in [15]. The prediction
becomes even harder because some variables are not acces-
sible to consumers [8, 9] and the companies are improving
yield management algorithms to optimize their profits [17].
Intending to support consumers to predict price changes,
some existing data-mining methods were used (Ripper, Q-
learning and Time Series) to propose the stacking generalizer
algorithm Hamlet [8]. Another approach was the represen-
tation of price series by marked point processes [17]. There
was also the proposal of a method that includes the preferen-
ces of passengers about the number of stops in the itinerary
or the specific airline to use [9], further developed to a tech-
nique called Developer-Guided Feature Selection [10].
Observing the multiple output characteristic of some da-
tasets, and the possible mutual dependence among the out-
puts, Spyromitros-Xioufis et al. [14] proposed two multi-
target regression methods: Stacked Single Target (SST) and
Ensemble of Regressor Chains (ERC). Both techniques use
target predictions as additional input variables in order to
increase the prediction accuracy. Among many validation
datasets, the referred work used the one presented in [9].
Motivated by SST, we propose in this paper a novel multi-
target technique denominated Deep Regressor Stacking (DRS)
looking forward to obtaining an improvement in air ticket
prices prediction. This model could be implemented in a
costumer decision support system and, consequently, avoid
that the user buy a high-priced ticket in the searching date
if the price is supposed to decrease in the following days.
This paper aims at comparing the performance of the sin-
gle target regression (ST) and the three multi-targets (SST,
ERC and DRS) approaches to predict air ticket prices, each
one with the regressors Random Forest (RF) and Support
Vector Machine (SVM), and evaluates which method would
bring the biggest improvement to the area.
This paper is organized as follows: Section 2 exposes exis-
ting multi-target regressions and describes our new proposal.
XIII Brazilian Symposium on Information Systems, Lavras, Minas Gerais, June 5-8, 2017
26
Section 3 portrays the experimental configuration, followed
by Section 4, which analyses the results. Lastly, Section 5
contains the final considerations of this paper and suggests
future work.
2. MULTI-TARGET REGRESSION
2.1 Literature Review
Traditionally, multi-target problems were solved through
two broad approaches: algorithms adaptation and problem
transformation methods [4].
The algorithm adaptation is related to the change of some
single-target regression technique to deal with multiple out-
puts and address the possible statistical dependence among
targets. This strategy generates alteration in the original
technique modelling method, like the optimization function
(SVMs) [13, 20, 19], node splitting criteria (regression trees)
[11], among others. Some algorithms adaptation based tech-
niques were proposed [4], being used in diverse tasks [12, 18,
4]. Indeed, multi-output adapted algorithms have achieved
satisfactory performance in prediction, bringing the advan-
tages of internally exploring target dependence and genera-
ting a unique model to deal with all outputs. Nevertheless,
algorithm adaptation methods could be more challenging
because these techniques aim not only to predict the mul-
tiple targets at once but also to interpret the dependencies
among outputs.
The other approach to model multi-target tasks, problem
transformation, manipulates the training data in some man-
ner, adopting well known and applied regression techniques
to predict single-target problems separately. A simply deri-
ved approach is to predict each target variable independen-
tly, as a single-target (ST) problem.
In many cases ST method outperformed multi-target te-
chniques (based both in algorithm adaptation and problem
transformations) [4, 14], and was used as a performance ba-
seline. In contrast, ST does not explore the expected targets’
dependencies, so the use of a multi-target strategy should
lead better results.
Some techniques were proposed in the last years to address
multi-target problems as separated single output tasks, but
with the exploration of inter-targets properties. Zhang et
al. [20] proposed the modification of problem’s input space
through a virtualization procedure so that the task could be
represented as a wider single-target problem. The authors
used a Support Vector Regression (SVR) machine and achi-
eved results comparable to ST strategy. Tsoumakas et al.
[16] proposed the use of random linear targets combinations
to explore the relations between targets values. The origi-
nal feature space dimension is increased, and multiple ST
problems are solved in the transformed space. At the end of
the process, the predicted values are used to solve a linear
system to obtain the original targets predictions.
Inspired by the related area of multi-label classification,
Spyromitros-Xioufis et al. [14] proposed two techniques: SST
(Stacked Single Target), also called MTRS (Multi-Target
Regressor Stacking), and ERC (Ensemble of Regressor Chains).
The SST method consists of separately training ST mo-
dels and using their outputs as additional prediction fe-
atures. Thus, considering a dataset composed by X=
{x1,x
2,...,x
n}input features and Y={y1,y
2,...,y
m}tar-
get variables, SST uses the Y
0={y
0
1,y
0
2,...,y
0
m}ST pre-
dictions as new features, forming a new training dataset
X
0={x1,x
2,...,x
n,y
0
1,y
0
2,...,y
0
m}. The transformed input
is utilized along Yvalues to train another regressors’ layer,
inducing new ST models, whose outputs are the final predic-
tions. New income instances are first subjected to the first
predictors’ layer to obtain targets approximations and com-
pose an augmented testing set, subjected to the second level
of predictors. Although using ST estimations, the inter-
target relationships are modelled and explored, consequently
increasing the task’s description capability and the predic-
tion performance.
The ERC method consists of using a set of randomly cho-
sen target chains to build ST models, following the generated
sequence. For each chain, at first, an ST model is induced
using the first output variable of the sequence. New models
are trained following the chain order. Each new regressor
uses an extended input dataset formed by the combination
of the original input variables and the previous models’ pre-
dictions. The described training process repeats until the
end of the chain sequence. After training all models, new
income instances are subjected to the set of chains. The
final prediction for a target yis the average of the ypre-
dicted values over all chains. Since the output variables
predictions come from the composition of values in dierent
chains positions, multiple levels of combinations and inter-
dependence among targets are investigated. In the original
formulation, ERC explores all possible targets permutations
if their number is less than 10, otherwise exactly ten random
combinations are selected.
Although more than one predictor is used to represent
the multiple targets problem, leading to decrease the model
interpretation facility and increasing the computational trai-
ning cost, this type of modelling oers several advantages.
The possibility of using any base learner, even a hybrid set,
could lead to better predictive performance and particular
task’s characteristics exploration. Besides that, adaptation
techniques improve the solution’s modularity and concep-
tual simplicity, having obtained significantly better accuracy
than state-of-the-art methods [16, 14].
2.2 Deep Regressor Stacking - DRS
Our proposed technique applies the MTRS idea of using
targets approximations as additional predicting features in
a naive deep learning method. It is based on the hypothesis
that the interaction among targets that happens in deeper
layers could outperform the predictions obtained by none or
only one prediction layers as input (ST or MTRS, respecti-
vely).
In this sense, Figure 1 presents the concept of Deep Re-
gressor Stacking (DRS) multi-target regression. In ST method,
the dataset’s original attributes Aare used to compute the
prediction of the Ntargets (T1
1...N ). In its turn, MTRS pre-
dicts the targets using as input Aand T1
1...N , which means
that the output predictions of T2
1...N are dependent, simulta-
neously, on the dataset attributes and the targets predictions
of layer 1. Following the same logic, the DRS method will
originate the prediction of the (j+1)-th layer using as input
the attributes Aand all the predictions from the jprevious
layers (T1...j
1...N ).
MTRS is a particular case of DRS, for the maximum of
prediction layers used as input, j, equal to 1. By definition,
ST uses no prediction layer as input, as already mentioned.
The Algorithm 1 demonstrates how to compute a price
prediction based on DRS regression. The parameters of
XIII Brazilian Symposium on Information Systems, Lavras, Minas Gerais, June 5-8, 2017
27
Figure 1: Comparison of ST, MTRS and DRS approaches.
the price function are the modelling dataset (Datamod ), the
number of targets N, and the number of desired layers ().
To be reasonable to refer to DRS, should be any natural
greater than or equal to 3.
In the beginning of the algorithm, the dataset is split in
two subsets: one for training (Datatr ) and the other for
validation (Dataval).
Targets is a vector representing the outputs, initially set
with the Ntargets.
For Ntimes the following procedure is adopted:
DRS models are obtained with the training set, per-
formed for layers. With the models, a prediction set
Pis resulted laying hand of Dval.
In possession of P, the RMSE (Root Mean Square Er-
ror) between the target predictions and the output va-
lues are determined. The target with minimum RMSE
value (tx) and its corresponding layer (l) are recogni-
zed.
The next step is training the dataset with a DRS struc-
ture again till the layer l. The prediction is calculated
for the target tx, and incorporated to the input set. In
this training phase, if the length of Targets is smaller
than N, besides using the 1 to jprevious layers’ pre-
dictions as attributes, the predictions of the layer of
the targets that were already combined in the input
are also used as feature.
Afterwards, the target that presented the smallest er-
ror is removed from the Targets set and the model is
saved to the price set, which will contain models.
Once the prediction of all targets were combined in the
input (i.e.,Targets is empty), the algorithm is finished.
Due to the stacking process, the dimensionality of this
method increases significantly as and the number of out-
puts of the dataset increases, demanding a considerable pro-
cessing time to obtain the final model.
3. EXPERIMENTAL SETUP
This Section describes the datasets used to compare the
performance of the 4 dierent techniques, the base regression
algorithms, and the software libraries employed.
3.1 Dataset
Algorithm 1 Price prediction algorithm
1: function price(Datamod,N,)
2: price {}
3: {Datatr ,Data
val } split(Datamod)
4: Targets {t1,t
2,...,t
N}
5: repeat
6: train DRS(Datatr,)
7: P predict(train, Datavalinput,)
8: {l, tx} MINRMSE(P, Datavaloutput )
9: model DRS(Datamod,l)
10: T
x predict(model, Datamo d,t
x)
11: Datamo d {Datamo d,T
x}
12: Targets Targetstx
13: price {price, model}
14: until Targets ={}
15: return price
Two benchmark datasets of multi-target regression were
explored in this work: ATP1D e ATP7D 1. A summary of
their attributes can be consulted in Table 1.
Name Observations Features Targets
ATP1 D 337 411 6
ATP7 D 296 411 6
Table 1: Name, number of observations, features and out-
puts of ATP1D e ATP7D datasets.
ATP stands for Air Ticket Prices, and both have 6 tar-
get variables that represent flight preferences: any airline
with any number of stops, any airline non-stop only, Delta
Airlines, Continental Airlines, Airtrain Airlines and United
Airlines. The main dierence between ATP1D and ATP7D
is that the first represent the target price in the next day;
The last, the minimum price observed over the next 7 days.
Among the input variables are present the number of days
between the observation date and the departure date, the
searching day of the week, the minimum price, mean price,
and number of quotes from all airlines and from each airline
quoting more than 50% of the observation days for non-stop,
one-stop, and two-stop flights, for the current day, previous
day, and two previous days [14].
These datasets were collected from a search website between
February 22 and June 10, 2011 for 7 dierent origin–destination
pairs (including major cities in dierent parts of the United
States and some international destinations). The web spider
used to extract the information is representative since it used
the same information a costumer would have for acquiring
the data [10].
With the goal of motivating the application of MT soluti-
ons to address the airline tickets prices predictions, we used
two methods of statistical correlation assessment among tar-
gets variables of the analysed datasets: the correlation coef-
ficients of Pearson and Spearman [3].
The Pearson coecient measures linear relationships of
continuous variables. A relationship among two outputs is
linear when a change in one target is associated with a pro-
portional alteration in the other.
The Spearman coecient measures monotonic relationships
1The datasets can be downloaded from
http://mulan.sourceforge.net/datasets-mtr.html
XIII Brazilian Symposium on Information Systems, Lavras, Minas Gerais, June 5-8, 2017
28
among continuous or ordinal variables. In a monotonic re-
lationship, the targets should change together, but not ne-
cessarily at a constant rate.
Both metrics are equal to 1 when there is a perfect relati-
onship among two variables (1 if a perfect reverse relation
is observed). When the coecients are near to 0, there is no
evidence of correlation among the observed variables. Com-
paring a target with itself will always generate a correlation
coecient equal to 1, for both methods.
Figure 2 shows the results of performed correlation tests
for ATP1D dataset. Observing the coecients results, it
is possible to perceive high levels of linear and monotonic
dependency among target variables in most of the cases,
which is an indication of a MT problem.
1.00 0.84 0.80 0.74 0.94 0.71
1.00 0.84 0.80 0.80 0.78
1.00 0.88 0.81 0.87
1.00 0.74 0.99
1.00 0.73
1.00
1.00 0.61 0.49 0.46 0.90 0.42
1.00 0.79 0.75 0.53 0.74
1.00 0.87 0.51 0.85
1.00 0.47 0.97
1.00 0.44
1.00
Pearson
Spearman
Target 1 Target 2 Target 3 Target 4 Target 5 Target 6 Target 1 Target 2 Target 3 Target 4 Target 5 Target 6
Target 1
Target 2
Target 3
Target 4
Target 5
Target 6
Figure 2: Pearson and Spearman correlation coecients
among ATP1D targets variables.
The same analysis was performed for the ATP7D dataset,
whose correlation results are presented in Figure 3. For this
dataset, it is possible to observe a decrease in both correla-
tion coecients when comparing with ATP1D results, which
is a clue that the targets outputs are less correlated or there
are levels of non-linear relationships among the output va-
lues.
1.00 0.57 0.63 0.43 0.81 0.42
1.00 0.74 0.64 0.52 0.64
1.00 0.79 0.57 0.78
1.00 0.50 0.99
1.00 0.52
1.00
1.00 0.48 0.33 0.14 0.91 0.14
1.00 0.66 0.50 0.35 0.48
1.00 0.72 0.35 0.70
1.00 0.17 0.98
1.00 0.17
1.00
Pearson
Spearman
Target 1 Target 2 Target 3 Target 4 Target 5 Target 6 Target 1 Target 2 Target 3 Target 4 Target 5 Target 6
Target 1
Target 2
Target 3
Target 4
Target 5
Target 6
Figure 3: Pearson and Spearman correlation coecients
among ATP7D targets variables.
3.2 Parameters and Regression Algorithms
For the computation of DRS, the parameter should be
pre-determined. For the tests of this work, the number of
desired layers was set to 10. We wanted to use a number
that was big enough to explore the deep dependencies and
not too big to avoid a long time of computation.
Two regression algorithms from Machine Learning were
used in the experiments: Support Vector Machine (SVM)
and Random Forest (RF). Their wide use and dierent the-
oretical foundation motivated our choice.
All regression algorithms used in this work were imple-
mented in R programming language, version 3.3.0, and used
with their standard parameters settings. The packages e1071
and randomForest were used for SVM and RF, respectively.
3.2.1 Support Vector Machine
The Support Vector Machine (SVM) is a classification and
regression method belonging to the general category of ker-
nel based methods. Its approach is based on maximizing the
separation margin between classes, or minimizing the pre-
diction regression error among training samples. Through
kernel space transformation, this technique has the flexibi-
lity to model varied data sources [2], increasing the input
dimensionality of data to a space where the separability is
also increased.
3.2.2 Random Forest
Random Forest algorithm consists in independently growing
decision trees based on dierent subsets of training data, for-
med by random sampling with replacement (Bagging). Each
tree uses a subset of features randomly chosen. These pro-
cedures increases allow to explore dierent aspects of data,
increasing the generalization capacity. The RF predictions
are formed by taking the average result over all trees in the
Forest [5].
3.3 Performance Metrics
To evaluate the models trained during the experiments
three dierent performance metrics were used: Coecient
of Determination (R2), average Relative Root-Mean-Square
Error (aRRMSE), and Relative Performance (RP). Besides
that, the multi-target techniques were performed using 10-
fold cross-validation.
The RRMSE (Relative Root Mean Square Error) is cal-
culated using the Root Mean Square Error (RMSE) of the
predictions for a target divided by the RMSE of the average
value of this output. This last acts as a baseline in the me-
tric and allows the measurement of the improvement over
a shallow predictor. This metric is very useful to compare
non-homogeneous targets distributions and has been used in
several multi-target works [4, 14]. The aRRMSE is defined
as the average of the dtargets RRMSE.
aRRM SE =1
d
d
X
i=1 sPNtest
l=1 (yl
iˆyl
i)2
PNtest
l=1 (yl
iyi)2(1)
The Coecient of Determination (R2) explains the amount
of the total variation associated with the use of an indepen-
dent variable. Its values range from 0 to 1. The closer R2is
to one, the greater is the quantity of the total variation in
the output which is explained by the independent variables
in the regression model [6].
The Relative Performance compares (RP) the aRRMSE
of a single-target model with the aRRMSE of the other MT
methods M(in our case, with MTRS, ERC and DRS), for d
datasets. Thus, it can measure if there was an increase (RP
greater than 1) or a decrease (RP lower than 1) relatively
to the single-target results [14].
RPd(M)= aRRMSE (ST)
aRRM SE (M)(2)
XIII Brazilian Symposium on Information Systems, Lavras, Minas Gerais, June 5-8, 2017
29
The aRRMSE allows the comparison of possible techni-
que superiority through the application of the Friedman’s
statistical test with significance level at =0.05. The
null hypothesis states that the performances of all compared
multi-target techniques are equivalent regarding the avera-
ged RRMSE per dataset. When the null hypothesis is dis-
carded, the Nemenyi post hoc test could be applied, stating
that the performance of two dierent models are significan-
tly dierent whether the corresponding average ranks dier
by at least a Critical Dierence (CD) value. When multiple
models are compared, a Critical Dierence (CD) diagram
could be used to represent the comparisons, as previously
proposed in [7].
4. RESULTS AND DISCUSSION
After running the price prediction algorithm for the da-
tasets ATP1D and ATP7D, and also ST, MTRS and ERC,
statistical metrics were applied to the results.
The relative performance of the single-target method in
relation to each multi-target method was registered in Table
2.
Table 2: Relative performance for ATP1D and ATP7D of
ST in relation to MTRS, ERC and DRS methods.
Dataset Regressor MTRS ERC DRS
ATP1 D RF 1.0383 0.9549 1.9543
SVM 1.2835 1.0256 1.8245
ATP7 D RF 0.9280 1.0700 1.6649
SVM 1.0680 1.0548 1.7207
In ten out of the twelve values, the RP value was greater
than 1. In other words, multi-target techniques outperfor-
med single-target in 83,3% occurrences.
DRS had the best results among all combinations of methods,
datasets and regressors. For particular cases, the equivalent
DRS aRRMSE was reduced to almost the half of ST aR-
RMSE.
To propitiate a better comprehension of the results, the
average coecient of determination for the four methods
were determined, as Table 3 shows.
Table 3: Average coecient of determination for ATP1D
and ATP7D using ST, MTRS, ERC and DRS methods.
Dataset Regressor ST MTRS ERC DRS
ATP1 D RF 0.8535 0.8478 0.8436 0.9464
SVM 0.7996 0.7960 0.8081 0.9315
ATP7 D RF 0.7701 0.7756 0.7735 0.8612
SVM 0.6430 0.6335 0.6634 0.8826
The dierences among ST, MTRS and ERC were subtle,
with dierences in the order of 102. In contrast, the mean
R2for DRS for the two datasets and regression algorithms
were higher than the others in the order of 101, which is
a relevant dierence since the possible R2value is in an
interval of span 1.
For ATP1D, the overall R2was higher than for ATP7D.
It was expected since the correlation among targets for the
first (Figure 2) already indicated that.
CD = 2.347
1 2 3 4 5 6 7 8
DRS-RF
DRS-SVM
ST-RF
ERC-RF
MTRS-SVM
ST-SVM
ERC-SVM
MTRS-RF
Figure 4: Comparison of the aRRMSE values per dataset
for each CV fold configuration, according to the Nemenyi
test. Groups of methods that are not significantly dierent
(at =0.05) are connected.
The Nemenyi test was performed to verify if the discussed
dierences were statistically significant. According to Figure
4, with a significance level of 5%, the performance of DRS
using as regressor RF presented no dierence in relation to
DRS with Support Vector Machine. DRS with RF was the
the first in the rank, meaning that it had the best outcomes
in relation to the others. In its turn, the critical distance of
DRS with SVM showed that this method is comparable to
ST-RF, ERC-RF and MTRS-RF.
On the account of what was evaluated, both DRS-RF
and DRS-SVM outperformed ERC-SVM, ST-SVM, MTRS-
SVM. This fact shows that RF was the best regression al-
gorithm to interpret these datasets, and only DRS was able
to obtain superior performance for SVM.
Apart from the presented performance advantages, DRS
method had a particular drawback: the training phase re-
quested plenty of time, even though using 10 layers. Howe-
ver, after the final model is complete, the application has a
linear complexity. Additionally, in an ordinary system, the
model is supposed to be created just one time (with possi-
ble re-training due to changes in the input information or
modification in the data behaviour).
Figure 5 exemplifies how RMSE value has the possibility
to decrease with the stacking of multiple layers, provided by
DRS. For this, we used the prediction Precorded during
training for the target LBL+ALLminp0+fut 001 (ATP1D)
in an specific cross-validation fold, with random forest.
In the single-target prediction, the RMSE value was sligh-
tly bellow 0.042. In the second predictive layer, the RMSE
dropped to around 0.019. In the third output layer, this
value was even lower, bellow 0.01. This value continued de-
creasing in layers 4, 5 and 6. In the seventh layer the RMSE
of this target increased, followed by a decrement in layer 8,
an increment in layer 9, and again a decrement in the tenth
layer, where the RMSE dropped to bellow 0.005, the lowest
value among all layers.
The layers in which the growth in the amplitude of RMSE
occurs are not necessarily the same. To exemplify this, the
RMSE behavior in another fold for the same target, dataset
and regressor was represented in Figure 6.
In this fold, the layer 8 interrupts the decreasing mono-
tonicity, instead of the layer 7, as verified for the previous
case.
It is questionable whether stopping the training in the fifth
or in the tenth layer, for instance, would imply in extreme
dierences in the final results since their RMSE dierences
are not so significant. Thus, depending on the required ac-
curacy of a problem, the choice of an optimal would be
crucial to have the fastest model computation without af-
XIII Brazilian Symposium on Information Systems, Lavras, Minas Gerais, June 5-8, 2017
30
0
0.005
0.01
0.015
0.02
0.025
0.03
0.035
0.04
0.045
12345678910
RMSE
Layers
Figure 5: RMSE behavior for target
LBL+ALLminp0+fut 001, from ATP1D, during trai-
ning with random forest. Blue segments represent that the
RMSE value of a higher layer was greater than the RMSE
value of its immediate previous layer.
0
0.005
0.01
0.015
0.02
0.025
0.03
0.035
0.04
0.045
0.05
12345678910
RMSE
Layers
Figure 6: RMSE behavior for target
LBL+ALLminp0+fut 001, from ATP1D, during trai-
ning with random forest, in another CV fold. Blue
segments represent that the RMSE value of a higher layer
was greater than the RMSE value of its immediate previous
layer.
fecting the quality of the prediction.
5. CONCLUSIONS
In this paper, we described a novel method (DRS) for
improving the prediction of air ticket prices. Our original
contribution towards Multi-Target prediction overperformed
the state-of-art methods in two real-life datasets with two
dierent learn-based algorithms.
The next step would be the implementation of a system
with the DRS as the kernel. The output screen would es-
sentially display to the user 3 columns, one for the current
price, another for the for the next-day price and a final for
the minimum price over the following 7 days. The system
would also be capable of extracting some features automati-
cally, for example, the day of the week and the ticket prices
of all airlines present in the dataset output, similar to the
current systems, although more accurate.
The authors arm that DRS could be also used for pre-
dicting other Multi-Target scenarios. Besides the choice of
a, another suggestion of future work is testing DRS with
dierent problems that evolve price prediction.
Referências
[1] International Air Transport Association. Annual Re-
view 2016. Technical report, Dublin, 2016.
[2] Asa Ben-Hur and Jason Weston. A user’s guide to sup-
port vector machines. Data Mining Techniques for the
Life Sciences, pages 223–239, 2010.
[3] Douglas G. Bonett and Thomas A. Wright. Sample size
requirements for estimating pearson, kendall and spe-
arman correlations. Psychometrika,65(1):2328,2000.
[4] Hanen Borchani, Gherardo Varando, Concha Bielza,
and Pedro Larra˜naga. A survey on multi-output re-
gression. Wiley Interdisciplinary Reviews: Data Mining
and Knowledge Discovery,5(5):216233,2015.
[5] Leo Breiman. Random forests. Machine learning,45
(1):5–32, 2001.
[6] JA Cornell and RD Berger. Factors that influence the
value of the coecient of determination in simple linear
and nonlinear regression models. Phytopathology,77(1):
63–70, 1987.
[7] Janez Demˇsar. Statistical comparisons of classifiers over
multiple data ss. The Journal of Machine Learning Re-
search,7:130,2006.
[8] Oren Etzioni, Rattapoom Tuchinda, Craig a. Knoblock,
and Alexander Yates. To buy or not to buy: Mining
airfare data to minimize ticket purchase price. Proce-
edings of the Ninth International Conference on Kno-
wledge Discovery and Data Mining, (August):119–128,
2003.
[9] William Groves and Maria Gini. A regression model for
predicting optimal purchase timing for airline tickets.
Techn i cal Report ,2011.
[10] William Groves and Maria Gini. On Optimizing Air-
line Ticket Purchase Timing. ACM Trans. Intel l. Syst.
Techn o l. 7 , 1 , Ar tic l e 3,7(1):128,2015.
XIII Brazilian Symposium on Information Systems, Lavras, Minas Gerais, June 5-8, 2017
31
[11] Dragi Kocev, Celine Vens, Jan Struyf, and Saˇso Dˇze-
roski. Ensembles of multi-objective decision trees. In
European Conference on Machine Learning, pages 624–
631. Springer, 2007.
[12] Dragi Kocev, Saˇso Dˇzeroski, Matt D White, Graeme R
Newell, and Peter Grioen. Using single- and multi-
target regression trees and ensembles to model a com-
pound index of vegetation condition. Ecological Model-
ling,220(8):11591168,2009.
[13] Guangcan Liu, Zhouchen Lin, and Yong Yu. Multi-
output regression on the output manifold. Pattern Re-
cognition,42(11):27372743,2009.
[14] Eleftherios Spyromitros-Xioufis, Grigorios Tsoumakas,
William Groves, and Ioannis Vlahavas. Multi-target
regression via input space expansion: treating targets
as inputs. Machine Learning,104(1):5598,2016.
[15] Joanna Stavins. Price Discrimination in the Airline
Market: The Eect of Market Concentration. The Re-
view of Economics and Statistics,83(1):200202,2001.
[16] Grigorios Tsoumakas, Eleftherios Spyromitros-Xioufis,
Aikaterini Vrekou, and Ioannis Vlahavas. Multi-target
regression via random linear target combinations. In
Joint European Conference on Machine Learning and
Knowledge Discovery in Databases, pages 225–240.
Springer, 2014.
[17] Till Wohlfarth, St´ephan Cl´emen¸con, Fran¸cois Roue,
and Xavier Casellato. A Data-Mining Approach to Tra-
vel Price Forecasting. ICMLA,2011.
[18] Tao Xiong, Yukun Bao, and Zhongyi Hu. Multiple-
output support vector regression with a firefly algo-
rithm for interval-valued stock price index forecasting.
Knowledge-Based Systems,55:87100,2014.
[19] Shuo Xu, Xin An, Xiaodong Qiao, Lijun Zhu, and Lin
Li. Multi-output least-squares support vector regression
machines. Pattern Recognition Lett.,34(9):10781084,
2013.
[20] Wei Zhang, Xianhui Liu, Yi Ding, and Deming Shi.
Multi-output LS-SVR machine in extended feature
space. CIMSA 2012 - IEEE Int. Conf. Comput. Intell.
Meas. Syst. Appl. Proc., pages 130–144, 2012.
... Chromatographic techniques on the other hand [38][39][40] require to conduct separate analysis of each component in the wine being slow and expensive process. The identification of grape variety by means of isotopic analysis [41,42], nuclear magnetic resonance (NMR) [43,44], or ADN/aRNA [45] techniques are generally used. These techniques perform the identification based on their capacity of generating a fingerprint of wine sample and produce accurate results, but require a large amount of sample and expensive consumables which increases the cost and duration of the analysis. ...
... If the sample is too thick it can be flattened by using a roller blade or a micro compression cell; other method for transmittance mode is by placing the sample in two diamond windows, or as a thin section that can be obtained by microtome. ATR FT-IR-M is frequently used for reflectance mode by using a micro-ATR objective; Germanium (5500-600 cm −1 ) and Silicon (7800-800 cm −1 ) are generally used as ATR objective crystal materials [41][42][43]. Recently Agilent developed a Laser Direct Infrared Imaging System (LDIR); this LDIR can relatively collect data faster compared to conventional FT-IR-M [44]. ...
... ORAC method is one of the most widely used tests; in fact, the ORAC fruit databases have been recently built to emphasize the benefits and establish the antioxidant capacity of foods rich in polyphenolic compounds [41]. Currently, it has been known that the ORAC index is strongly influenced by the type of probe used in the determination of antioxidant capacity [42]. ...
Chapter
“Antioxidant” is a popular term that applies to bioactive compounds that are not necessarily nutrients but can deliver added value to food. Therefore, the determination of the antioxidant capacity of foods has been of considerable interest for several decades.
... Chromatographic techniques on the other hand [38][39][40] require to conduct separate analysis of each component in the wine being slow and expensive process. The identification of grape variety by means of isotopic analysis [41,42], nuclear magnetic resonance (NMR) [43,44], or ADN/aRNA [45] techniques are generally used. These techniques perform the identification based on their capacity of generating a fingerprint of wine sample and produce accurate results, but require a large amount of sample and expensive consumables which increases the cost and duration of the analysis. ...
... If the sample is too thick it can be flattened by using a roller blade or a micro compression cell; other method for transmittance mode is by placing the sample in two diamond windows, or as a thin section that can be obtained by microtome. ATR FT-IR-M is frequently used for reflectance mode by using a micro-ATR objective; Germanium (5500-600 cm −1 ) and Silicon (7800-800 cm −1 ) are generally used as ATR objective crystal materials [41][42][43]. Recently Agilent developed a Laser Direct Infrared Imaging System (LDIR); this LDIR can relatively collect data faster compared to conventional FT-IR-M [44]. ...
... ORAC method is one of the most widely used tests; in fact, the ORAC fruit databases have been recently built to emphasize the benefits and establish the antioxidant capacity of foods rich in polyphenolic compounds [41]. Currently, it has been known that the ORAC index is strongly influenced by the type of probe used in the determination of antioxidant capacity [42]. ...
Chapter
Spectral methods usually produce a large amount of data, and have been greatly applied to food and agricultural products. These products demand several analyses to determine different parameters, that will further indicate their quality. There have been several approaches reported to deal with multi-target regression in recent years, with different applications demanding a specific approach. Multi-target modelling could provide a useful tool for spectral methods, specially when applied to food products, as it could deal with prediction of different parameters from a single data source. This chapter provides an overview of multi-target regression methods, presenting the performance evaluation metrics and discussing its potential application for spectral data. In addition, recent applications to food products are presented, and the future trends discussed.
... Chromatographic techniques on the other hand [38][39][40] require to conduct separate analysis of each component in the wine being slow and expensive process. The identification of grape variety by means of isotopic analysis [41,42], nuclear magnetic resonance (NMR) [43,44], or ADN/aRNA [45] techniques are generally used. These techniques perform the identification based on their capacity of generating a fingerprint of wine sample and produce accurate results, but require a large amount of sample and expensive consumables which increases the cost and duration of the analysis. ...
... If the sample is too thick it can be flattened by using a roller blade or a micro compression cell; other method for transmittance mode is by placing the sample in two diamond windows, or as a thin section that can be obtained by microtome. ATR FT-IR-M is frequently used for reflectance mode by using a micro-ATR objective; Germanium (5500-600 cm −1 ) and Silicon (7800-800 cm −1 ) are generally used as ATR objective crystal materials [41][42][43]. Recently Agilent developed a Laser Direct Infrared Imaging System (LDIR); this LDIR can relatively collect data faster compared to conventional FT-IR-M [44]. ...
... ORAC method is one of the most widely used tests; in fact, the ORAC fruit databases have been recently built to emphasize the benefits and establish the antioxidant capacity of foods rich in polyphenolic compounds [41]. Currently, it has been known that the ORAC index is strongly influenced by the type of probe used in the determination of antioxidant capacity [42]. ...
Chapter
Full-text available
The adulteration, quality control, food safety, and traceability are serious problems in the food industry and hold great importance for the customers. During the last years, laser induced breakdown spectroscopy (LIBS) analysis by direct measurement of the optical emission from laser-induced plasma has been the subject of research in food analysis, mainly because this technique presents a fast and cost effective method. The purpose of this article is to present an overview of the progress made by our research group in food analysis. Specific examples are given to illustrate the ability of this technique to carry out a rapid, qualitative, and quantitative analysis of different food samples. The implementation of combination of LIBS technique with mathematical modeling concretely neural networks algorithms, which have opened up new possibilities, are also discussed with available experimental data and relevant results.
... Some examples of the former are Multi-output Learning via Spectral Filtering [25], Fitted Rule Ensembles [16], Multi-output Contour Regression [26], Multi-output Parameter-insensitive Twin Support Vector Regression [27], Ensemble of Randomly Chained Support Vector Regressor [20], Multi-objective Regression Trees [28,29], Ensembles of Multi-objective Decision Trees [19], Multi-target Regression Trees [15] and Deep Multi-target Regression [30]. On the other hand, some examples of Problem Transformation are Stacked Single-target SST [18], Multi-target Augmented Stacking MTAS [21], Multi-target Stacked Generalization MTSG [31], Ensemble of Regressor Chains ERC [18], Multioutput Tree Chaining MOTC [32], Random Linear Target Combinations [33], Deep Regressor Stacking [34] and Deep Structure for Tracking Asynchronous Regressor Stacking [23]. By complementing the method categorisation proposed in [24], we present some examples of methods related to both branches in Fig. 1. ...
... SVM (with a radial kernel) and RF were used as regressors, performing a k-Fold Cross-ValidationCV resampling validation strategy with k ¼ 10. These algorithms were selected because they are used in most MTR studies [34,32,21,22,31,23]. Besides, the method with the lowest RRMSE [24] was chosen as the best multi-target method for every dataset. ...
Article
Choosing the most suitable algorithm to perform a machine learning task for a new problem is a recurrent and complex task. In multi-target regression tasks, when problem transformation methods are applied, this choice is even harder. The reason is the need to simultaneously choose the problem transformation method and the base learning algorithm. This work investigates how to bridge the gap of method/base learner recommendation for problems with multiple outputs. In meta-learning experiments, we use a large number of multi-target regression datasets to investigate whether using meta-learning can provide good recommendations. To do this, we compared the meta-models induced by different ML algorithms, including three variations for each of them, and selected meta-features that we believe are relevant for extracting good dataset descriptions for the meta-learning process. In the experimental results, the meta-models outperformed the baselines (Majority and Random) by recommending the most suitable solution for multi-target regression (for the transformation method and base-learner) with high predictive performance, including real-world applications. The meta-features and the relation betweensformation method and base-learner provided important insights regarding the optimal problem transformation method. Furthermore, when comparing the application of algorithm adaptation and problem transformation methods, our meta-learning proposal was capable of statistically overcoming all competitors, which resulted in a predictive performance using the best choice per problem.
... However, aforementioned many dynamic external factors can also affect ticket prices and passenger demand. Even though the attributes used by earlier researchers play a significant role in predicting ticket pricing/demand, the incorporation of these external factors could also lead to a better result [8].This research paper briefly explains the background literature of the selected problem including the similar researches that have been taken place in the past and the research gap in section II. Then we focus on the methodology we have used in developing our Predictive Analytic Platform for Airline Industry in four sub sections in section III. ...
Conference Paper
Full-text available
The research is to develop accurate demand forecasting model to control the availability in Airline industry. The primary outcome of the model is that the Airline organization can maximize the revenue by controlling the availability. The product in airline industry is the seat, which is an expensive, unstock able product. The demand for the seats is almost uncertain, the capacity is constraint and difficult to increase and the variable costs are very high. Hence the priority of the expected demand forecast is very high for airline industry. An accurate mechanism to predict the revenue for future months of ODs (Origin destinations) is done using fare and passenger data. The revenue is derived by the number of passengers and the fares they pay which vary for each flight. Airline travel is very susceptible to the social, political and economic changes. Therefore, passenger buying patterns change quite dynamically. Hence, it is challenging to develop an accurate method to project the revenue for each route. To overcome this, we are going to use semi-supervised learning mechanism. We have the current ticketed revenue plus we have the current booked passengers. We also have the ticketed passenger details of previous flights. Hence most of the information is available, however changing market conditions is an unknown variable which can have a significant impact on passenger travel patterns. Through this research We are going to design and develop the best fit model to forecast flight OD level passenger demand based on the historical data.
... Their predictive results showed the performance was superior to the single-target approach [16 ]. Santana et al. [17,18], Tsoumakas et al. [19], and Spyromitros-Xioufis et al. [20] used RC and its variants to deal with multiple-target predictive problems in different fields. Based on this work, Melki et al. adopted a support vector regressor correlation chain (SVRCC) method on 24 multi-target datasets. ...
Conference Paper
Taxi is an essential part of urban traffic, accurately predicts the taxi demand, which not only facilitates people's travel but also promotes the further development of the entire smart city. The gap between demand and the actual amount for taxi causes trouble for travelers. Forecasts for taxi demand do not take into account the possible interactions of taxi demand between areas, which can lead to a decrease in the accuracy of the forecast. In further exploiting the interaction of taxi demand in each area, We propose An extended Maximum Correlation Regressor Chain method (MCRC) and a new MCRC-based Dynamically Adjusted Regressor Chain method. MCRC uses the various relationships existing among the targets, which are evaluated using Spearman's rank correlation coefficient, feature importance matrix, and maximal information coefficient, respectively, to form the maximum correlation chain with higher prediction accuracy. Based on MCRC, DARC dynamically adjusts the base-regressor of the regressor chain. A set of predictive approaches are implemented to compare the performances, and the results show that the maximal information coefficient DARC (DARC_MIC) achieves the best accurate rate by 91.80%. DARC_MIC is not only can provide managers a more rational taxi operation approach but also more proper for dealing with multi-target regression problems with Lots of targets. This idea of first measuring the degree of interaction between targets and then combining algorithms to further exploit this degree of interaction between targets can also be attempted to improve many other multi-target regression algorithms.
Article
Full-text available
In the travel insurance industry, cancel-for-any-reason insurance, also known as a cancellation protection service (CPS), is a recent attempt to strike a balance between customer satisfaction and service provider (SP) profits. However, some exceptional circumstances, particularly the COVID-19 pandemic, have led to a dramatic decrease in SP revenues, especially for non-refundable tickets purchased early with CPS. This paper begins by presenting a risk group segmentation of customers in an online ticket reservation system. Then, a CPS fee is recommended depending on the different customer risk groups provided by the cluster segmentation via different clustering algorithms such as centroid-based K-means, hierarchical agglomerative, DBSCAN, and artificial neural network-based SOM algorithms. According to the implemented cluster metrics, which include the Silhouette index, Davies-Bouldin index, Entropy index, and DBCV index, the SOM algorithm presents the most appropriate result. After predicting the new customer cluster, a CPS fee will be calculated with the proposed adaptive CPS method based on the cluster segmentation weights. Determining the weight of each cluster is related to the total CPS revenue threshold for all clusters defined by the SP. Therefore, to avoid a loss for SPs, the total CPS revenue will be kept constant with the threshold that the SP has been adjusted. The experimental results based on real-world data show that the risk group segmentation of customers helps to maintain a balance between CPS fees and SP profits. Finally, according to the calculated weights, the proposed model pegs the SP gain/loss variation with a 0.00012 exchange ratio.
Article
Energy dispersive X-ray fluorescence (EDXRF) is one of the most quick, environmentally friendly and least expensive spectroscopic analytical methodologies for assessing soil quality parameters. However, challenges in EDXRF spectral data analysis still demand more efficient methods. One possible solution is using Machine Learning (ML), particularly Multi-target Regression (MTR) methods, which predict multiple parameters taking advantage of inter-correlated parameters. In this study, we proposed the Multi-target Stacked Generalisation (MTSG), a novel MTR method relying on learning from different regressors in stacking structure for a boosted outcome. We compared MTSG and 5 MTR methods for predicting 10 parameters of soil fertility. Random Forest and Support Vector Regression (SVR) were used as learning algorithms embedded into each MTR method. Results showed the superiority of MTR methods over the Single-target Regression (the traditional ML method), reducing the predictive error for 5 parameters. Particularly, MTSG obtained the lowest error for phosphorus, total organic carbon and cation exchange capacity. When observing the relative performance of SVR with a radial kernel, the prediction of base saturation percentage was improved by 19%. Finally, the proposed method was able to reduce the average error from 0.67 (single-target) to 0.64 analysing all targets, representing a global improvement of 4.48%.
Article
Full-text available
Investing in the stock market is a complex process due to its high volatility caused by factors as exchange rates, political events, inflation and the market history. To support investor's decisions, the prediction of future stock price and economic metrics is valuable. With the hypothesis that there is a relation among investment performance indicators, the goal of this paper was exploring multi-target regression (MTR) methods to estimate 6 different indicators and finding out the method that would best suit in an automated prediction tool for decision support regarding predictive performance. The experiments were based on 4 datasets, corresponding to 4 different time periods, composed of 63 combinations of weights of stock-picking concepts each, simulated in the US stock market. We compared traditional machine learning approaches with seven state-of-the-art MTR solutions: Stacked Single Target, Ensemble of Regressor Chains, Deep Structure for Tracking Asynchronous Regressor Stacking, Deep Regressor Stacking, Multi-output Tree Chaining, Multi-target Augment Stacking and Multi-output Random Forest (MORF). With the exception of MORF, traditional approaches and the MTR methods were evaluated with Extreme Gradient Boosting, Random Forest and Support Vector Machine regressors. By means of extensive experimental evaluation, our results showed that the most recent MTR solutions can achieve suitable predictive performance, improving all the scenarios (14.70% in the best one, considering all target variables and periods). In this sense, MTR is a proper strategy for building stock market decision support system based on prediction models.
Article
Full-text available
In many practical applications of supervised learning the task involves the prediction of multiple target variables from a common set of input variables. When the prediction targets are binary the task is called multi-label classification, while when the targets are continuous the task is called multi-target regression. In both tasks, target variables often exhibit statistical dependencies and exploiting them in order to improve predictive accuracy is a core challenge. A family of multi-label classification methods address this challenge by building a separate model for each target on an expanded input space where other targets are treated as additional input variables. Despite the success of these methods in the multi-label classification domain, their applicability and effectiveness in multi-target regression has not been studied until now. In this paper, we introduce two new methods for multi-target regression, called stacked single-target and ensemble of regressor chains, by adapting two popular multi-label classification methods of this family. Furthermore, we highlight an inherent problem of these methods—a discrepancy of the values of the additional input variables between training and prediction—and develop extensions that use out-of-sample estimates of the target variables during training in order to tackle this problem. The results of an extensive experimental evaluation carried out on a large and diverse collection of datasets show that, when the discrepancy is appropriately mitigated, the proposed methods attain consistent improvements over the independent regressions baseline. Moreover, two versions of Ensemble of Regression Chains perform significantly better than four state-of-the-art methods including regularization-based multi-task learning methods and a multi-objective random forest approach.
Conference Paper
Full-text available
Multi-target regression is concerned with the simultaneous prediction of multiple continuous target variables based on the same set of input variables. It arises in several interesting industrial and environmental application domains, such as ecological modelling and energy forecasting. This paper presents an ensemble method for multi-target regression that constructs new target variables via random linear combinations of existing targets. We discuss the connection of our approach with multi-label classification algorithms, in particular RA$k$EL, which originally inspired this work, and a family of recent multi-label classification algorithms that involve output coding. Experimental results on 12 multi-target datasets show that it performs significantly better than a strong baseline that learns a single model for each target using gradient boosting and compares favourably to multi-objective random forest approach, which is a state-of-the-art approach. The experiments further show that our approach improves more when stronger unconditional dependencies exist among the targets.
Article
Proper timing of the purchase of airline tickets is difficult even when historical ticket prices and some domain knowledge are available. To address this problem, we introduce an algorithm that optimizes purchase timing on behalf of customers and provides performance estimates of its computed action policy. Given a desired flight route and travel date, the algorithm uses machine-learning methods on recent ticket price quotes from many competing airlines to predict the future expected minimum price of all available flights. The main novelty of our algorithm lies in using a systematic feature-selection technique, which captures time dependencies in the data by using time-delayed features, and reduces the number of features by imposing a class hierarchy among the rawfeatures and pruning the features based on in-situ performance. Our algorithm achieves much closer to the optimal purchase policy than other existing decision theoretic approaches for this domain, and meets or exceeds the performance of existing feature-selection methods from the literature. Applications of our feature-selection process to other domains are also discussed.
Article
In recent years, a plethora of approaches have been proposed to deal with the increasingly challenging task of multi-output regression. This study provides a survey on state-of-the-art multi-output regression methods, that are categorized as problem transformation and algorithm adaptation methods. In addition, we present the mostly used performance evaluation measures, publicly available data sets for multi-output regression real-world problems, as well as open-source software frameworks.For further resources related to this article, please visit the WIREs website.
Article
Random forests are a combination of tree predictors such that each tree depends on the values of a random vector sampled independently and with the same distribution for all trees in the forest. The generalization error for forests converges a.s. to a limit as the number of trees in the forest becomes large. The generalization error of a forest of tree classifiers depends on the strength of the individual trees in the forest and the correlation between them. Using a random selection of features to split each node yields error rates that compare favorably to Adaboost (Y. Freund & R. Schapire, Machine Learning: Proceedings of the Thirteenth International conference, ∗∗∗, 148–156), but are more robust with respect to noise. Internal estimates monitor error, strength, and correlation and these are used to show the response to increasing the number of features used in the splitting. Internal estimates are also used to measure variable importance. These ideas are also applicable to regression.
Conference Paper
Support Vector Regression machine is usually used to predict a single output. Previous multi-output regression problems are dealt with by building up multiple independent single-output regression models. Taking into account the correlations between multi-outputs, a new method for constructing a multi-output model directly is presented. By extending the original feature space using the method of vector virtualization, the multi-output case is expressed as a formally equivalent single-output one in the extended feature space, which can be solved with least square support vector regression machines. Experimental results show that this method presents good performance.
Article
Highly accurate interval forecasting of a stock price index is fundamental to successfully making a profit when making investment decisions, by providing a range of values rather than a point estimate. In this study, we investigate the possibility of forecasting an interval-valued stock price index series over short and long horizons using multi-output support vector regression (MSVR). Furthermore, this study proposes a firefly algorithm (FA)-based approach, built on the established MSVR, for determining the parameters of MSVR (abbreviated as FA-MSVR). Three globally traded broad market indices are used to compare the performance of the proposed FA-MSVR method with selected counterparts. The quantitative and comprehensive assessments are performed on the basis of statistical criteria, economic criteria, and computational cost. In terms of statistical criteria, we compare the out-of-sample forecasting using goodness-of-forecast measures and testing approaches. In terms of economic criteria, we assess the relative forecast performance with a simple trading strategy. The results obtained in this study indicate that the proposed FA-MSVR method is a promising alternative for forecasting interval-valued financial time series.
Article
An important consideration in conservation and biodiversity planning is an appreciation of the condition or integrity of ecosystems. In this study, we have applied various machine learning methods to the problem of predicting the condition or quality of the remnant indigenous vegetation across an extensive area of south-eastern Australia—the state of Victoria. The field data were obtained using the ‘habitat hectares’ approach. This rapid assessment technique produces multiple scores that describe the condition of various attributes of the vegetation at a given site. Multiple sites were assessed and subsequently circumscribed with GIS and remote-sensed data.We explore and compare two approaches for modelling this type of data: to learn a model for each score separately (single-target approach, a regression tree), or to learn one model for all scores simultaneously (multi-target approach, a multi-target regression tree). In order to lift the predictive performance, we also employ ensembles (bagging and random forests) of regression trees and multi-target regression trees. Our results demonstrate the advantages of a multi-target over a single-target modelling approach. While there is no statistically significant difference between the multi-target and single-target models in terms of model performance, the multi-target models are smaller and faster to learn than the single-target ones. Ensembles of multi-target models, also, improve the spatial prediction of condition.The usefulness of models of vegetation condition is twofold. First, they provide an enhanced knowledge and understanding of the condition of different indigenous vegetation types, and identify possible biophysical and landscape attributes that may contribute to vegetation decline. Second, these models may be used to map the condition of indigenous vegetation, in support of biodiversity planning, management and investment decisions.