Content uploaded by Aleksandra Gruca
Author content
All content in this area was uploaded by Aleksandra Gruca on Dec 10, 2015
Content may be subject to copyright.
Application of dimensionality reduction methods
for eye movement data classification
Aleksandra Gruca, Katarzyna Harezlak, Pawel Kasprowski
Institute of Informatics, Silesian University of Technology
Akademicka 16, 44-100, Gliwice, Poland
aleksandra.gruca@polsl.pl
Abstract. In this paper we apply two data dimensionality reduction
methods to eye movement dataset and analyse how the feature reduc-
tion method improves classification accuracy. Due to the specificity of
the recording process, eye movement datasets are characterized by both
big size and high-dimensionality that make them difficult to analyse and
classify using standard classification approaches. Here, we analyse eye
movement data from BioEye 2015 competition and to deal with the prob-
lem of high dimensionality we apply SVM combined with PCA feature
extraction and random forests wrapper variable selection. Our results
show that the reduction of the number of variables improves classifica-
tion results. We also show that some of classes (participants) can be
classified (recognised) with high accuracy while others are very difficult
to be correctly identified.⋆
1 Introduction
Recent advances in computer science technology, development of new technolo-
gies and data processing algorithms provided new tools and methods that are
used to control access to numerous resources. Some of them are widely avail-
able while others should be protected against an unauthorized access. For the
latter case various security methods have been developed like PINs, passwords,
tokens, however biometrics solutions such as as gait, voice, mouse stroke or eye
movement are becoming more and more popular due to their convenience.
In the field of eye movement research the biometric identification plays an
important role as the information included in such signal is difficult to imitate or
⋆This is the accepted version of:
Gruca A., Harezlak K, Kasprowski P.: Application of Dimensionality Reduction
Methods for Eye Movement Data Classification. Gruca A. et al. (Eds.), AISC, 391,
2015, pp. 291-303
The final publication is available at Springer via
http://link.springer.com/chapter/10.1007%2F978-3-319-23437-3 25
2
forge. Acquisition of such data is realized with usage of various types of cameras
and can provide, depending on recording frequency of an eye tracker used, from
25 to 2000 samples per second. A total amount of obtained samples depends
on a time of registration - it can be only during a login phase or continuously
during the whole session. In the latter case, obtained dataset is characterized by
both big size and high dimensionality of features describing it. Analysis of such
big dataset is therefore a challenging task.
It may seem that the more information we include into our analysis, the better
decision we can make, however it is possible to reach a point beyond which data
analysis is very difficult and sometimes impossible. In numerous cases objects
in dataset are characterized by many features, which are redundant or have no
impact on a final result. Taking them into consideration may not improve qual-
ity of results but even make it worse. Additionally, two problems - a complexity
of a classifier used and so-called the curse of dimensionality - may arise. In the
latter case we need to deal with exponential growth of data size to ensure the
same quality of classification process when a dimensionality grows up. Moreover,
collecting such amount of data may be expensive and difficult to perform. To
solve this problem, additional methods are necessary that allow finding relation-
ships existing in data, filter redundant information and select only these features
which are relevant to a studied area, and classification outcome. Data dimen-
sionality reduction methods has been successfully applied in machine learning
in many different fields such as industrial data analysis [3], computer vision [10],
geospatial [17] or biomedical data analysis [20]. Removing unimportant features
has following advantages:
– reduces bias unimportant from a classifier point of view,
– simplifies calculation saving resources usage,
– improves learning performance,
– reveals real relationships among data
– improves classifier accuracy.
In the research described in the paper the problem of dimensionality re-
duction has been applied into analysis of eye movement data for the purpose of
biometric classification [19]. Two methods have been considered PCA [6] method
combined with SVM classifier and random-forest based procedure [5]. Data from
eye movement sessions were transformed into the form suitable to be analysed
by a classifier using Dynamic Time Wrapping (DTW) distance measure method.
The main contribution of this paper is application of DTW metrics to eye move-
ment data analysis and performance comparison of two different data reduction
dimensionality methods: feature selection and feature extraction on classification
results.
The paper is organized as follows: two first sections provide the description
of data used in the research and its pre-processing phase. Next section includes
description of two dimensionality reduction methods. Then, results of analysis
and final conclusions are presented.
3
2 Description of data
Data used in the presented research is a part of the dataset available for the
BioEye competition (www.bioeye.info). The purpose of the competition was to
establish methods enabling human identification using eye movement modality.
Eye movement is known to reveal a lot of interesting information about a human
being and eye movement based identification is yet another biometric possibility
which was initially proposed about 10 years ago [13]. Since then many research
have been done in that field and the BioEye competition follows the previously
announced EMVIC2012 [12] and EMVIC2014 [11] competitions.
The dataset used in this research consisted of eye movement recordings of 37
participants. During each session the participant had to follow with eyes a point
displayed on a screen. As the point changed its position by leaps and bounds
eye movement data consisted of fixations on stable point and sudden saccades
to the subsequent location. The point position changed every 1 second and there
were 100 random point locations during each session so the whole session lasted 1
minute and 40 seconds. Eye movement data was initially recorded with frequency
1000Hz and then down-sampled to 250Hz with the usage of noise removal filter.
Finally, there were 25 000 recordings available for every session. Each recording
was additionally described by a ”Validity” flag. Validity equal to 0 meant that
the eye tracker lost eye position and data recorded is not valid.
There were two sessions available for every participant referred later as the
first session and the second session. The task was to build a classification model
using data from the first session as training samples and then use it to classify
the second session for every subject.
3 Data preprocessing
There were 37 first (training) sessions and 37 second (testing) sessions available.
Initially, every session was divided into segments when displayed point location
was stable. It gave 100 segments for each session. The first segment of each
session was removed. Every segment consisted of 250 recordings but some of
that recordings were invalid (with validity flag set to 0). Segments with less than
200 valid recordings were removed from the set. It resulted in 6885 segments.
Every segment consisted of 200–250 eye movement recordings. 3425 segments
were extracted from the first sessions and were used as training samples and
3460 segments from second sessions were used as test samples. The segments
were divided into four groups: NW, NE, SE and SW based on the direction of
points location change. There were 823 training segments in NE direction, 869
in SW direction, 925 in SE and 808 in NW accordingly.
In the next step pairwise distances among all training segments were calcu-
lated. As the length of the segments was varying and we were interested more
in shape comparison than point-to-point comparison, we used Dynamic Time
Warping to calculate distances among samples [4]. The distance calculation was
done for each of the nine different signal features: velocity, acceleration and jerk
4
in vertical direction, velocity, acceleration and jerk in horizontal direction and
absolute velocity, acceleration and jerk values. The distances were calculated sep-
arately for every group (NW, NE, SE and SW). The results of that calculations
were 9 x 4 = 36 matrices containing distances among training samples.
These distances were treated as features in a way similar to [18]. For every
test sample there were DTW distances of this sample to every training sample
of the same direction calculated and these distances were treated as features
describing this sample.
Finally, the full dataset consisted of 36 sets. Nine NE sets had 1689 sam-
ples (including 823 training samples) with 823 features, nine SW sets had 1769
samples (incl. 869 training) with 869 features, nine NW sets had 1597 samples
(incl. 808 training) with 808 features, and finally nine SE sets consisted of 1830
samples (incl. 925 training) with 925 features.
4 Dimensionality reduction methods
Methods for decreasing dimensionality may be divided into two main groups
– feature extraction or feature selection. In the first group of methods original
features are transformed to obtain their linear or non-linear combination. As a
result data are represented in another feature space. The second technique relies
on such choice of features that discriminate analysed data best. The task of a
feature selection is to reduce redundancy while maximizing quality of the final
classification outcome. The extension of a feature selection method is wrapper
variable selection where during feature selection process the learning algorithm
and the training set interact.
In this section we present application of two data dimensionality reduction
methods: PCA for feature extraction and random forest for wrapper variable
selection into BioEye2015 dataset.
4.1 Feature extraction with Principal Component Analysis
One of the methods utilized in the research was PCA (Principal Component
Analysis) which is an example of the feature extraction method. It was success-
fully used in many classification problems (pattern recognition, bioinformatics),
in these, in field of eye movement data processing as well [2][19].
The PCA task is to reveal a covariance structure in data dimensions to find
differences and similarities between them. As a result transformation of corre-
lated variables into uncorrelated one is possible. These uncorrelated variables
are called principal components. They are constructed in a way ensuring that
the first of components accounts for the most possible variability in the data.
The same regards each succeeding component, which explains as much of the
remaining variability as possible.
In the presented research the feature extraction was done with usage of
prcomp() function available in R language from the default stats package. As
a function input, a matrix representing DTW distances calculated based on
5
one of previously-described features, was provided. Data from this matrix was
limited only to the first sessions of recordings. Center and Scale parameters of
prcomp() function have been used to (1) shift the data to be zero centered and
(2) scale it to have unit variance. Data transformed this way served as a training
set for SVM classifier [1][24], which has been successfully used in the field of
machine learning and pattern recognition. SVM performs classification tasks by
constructing hyperplanes in a multidimensional space that separates objects of
different class labels. It uses a set of mathematical functions called kernels to
map original data from one feature space to another one. The method is very
popular because it solves a variety of problems and was proved to provide a
good classification accuracy even for a relatively small data set. For this reason
it seems to be suitable for an analysis of an eye movement signal, which is often
gathered during short sessions.
There are different types of kernel mappings such as the polynomial kernel
and the Radial Basis Function (RBF) kernel. The latter one was applied in the
presented research with usage of svm() R function from e1071 package and C=215
and gamma=29settings. The classification model was verified using of predict()
function. Its input parameter was a test set constructed on the basis of PCA
model. It was applied to this part of samples in a a form of distance matrix, which
was obtained from the second recording session. Because prediction probabilities
were evaluated for each sample in a distance matrix, they were subsequently
summed up and normalized in regard to samples related to one user. As a result
one probability vector for each user was provided for one distance matrix. This
procedure has been repeated for all 36 distance matrices thus 36 user probability
vectors were achieved, which were finally averaged for the second time.
4.2 Wrapper variable selection with random forest
Another data dimensionality reduction method used was random-forest based
procedure for wrapper variable selection [14]. Unlike feature extraction, feature
selection methods allow improving classifier accuracy by selecting the most im-
portant attributes. Therefore, resulting subset of attributes may be further used
not only for classification purposes but also for data description and interpreta-
tion [15][21].
Wrapper variable selection approach can be used on any machine learning
algorithm, however we decided to choose random forest due to the fact that this
method is particularly suitable for the high-dimensionality problems and it is
known to be hard to over-train, relatively robust to outliers and noise, and fast
to train [23]. Wrapper variable selection method is based on the idea of measure
of importance, which ranks variables from the most to the least important. Then,
in several iterations, less important variables are removed, the random forest is
trained on remaining set of values and its performance is analysed.
Random forest method [5] is based on ensemble learning idea and it combines
number of decision trees in such a way that each tree is learned (grown) based on
a bootstrap sample drawn from the original data. Therefore, during the learning
process the ensemble (forest) of decision trees is generated. Final classification
6
result is obtained based on simple voting strategy. Typically one-third of the
cases are left out of the bootstrap sample and not used to generate the tree. The
objects that are left are later used to estimate so-called out-of-bag (OOB) error.
Additional feature of random forest method is a possibility of obtaining a
measure of importance of the predictor variables. In the literature one can find
various methods to compute importance measures and these methods typically
differs in two ways: how the error is estimated and how the importance of vari-
ables is updated during learning process [8]. Here, we focus on so-called permu-
tation importance that is estimated in such a way that for a particular variable
its values are permuted in OOB cases and then it is checked how much prediction
error increased. The more error increases, the more important is the value or,
in other words, if the variable is not important, then rearranging the values of
that variable will not decrease prediction accuracy. The final importance value
for an attribute is computed as an average over all trees.
There are two backward strategies that can be applied when using impor-
tance ranking. First one is called Non Recursive Feature Elimination (NRFE)
[7][22] and in this approach the variable ranking is computed only once at the
beginning of the learning process. Next, less important variables are removed
from the ranking and the random forest is learned based on the remaining set
of values. This step is repeated in several iterations until no further variables re-
main. Second approach is called Recursive Feature Elimination (RFE) [9] and it
differs from NRFE method is such a way that the importance ranking is updated
(recomputed) at each iteration. Then, similarly to NRFE, the less important vari-
ables are removed and random forest is learned. In the work of Gregorutti at al.
[8] extensive simulation study was performed comparing these two approaches.
Based on their analysis we decided to choose RFE approach as it might be more
reliable than NRFE since the ranking by the permutation importance measure
is likely to change at each step and by recomputing the permutation importance
measure we assure the ranking to be consistent with the current forest [8].
The final procedure used to learn random forest classifier was as follows:
1. Train the random forest.
2. Compute permutation measure of importance for each attribute.
3. Remove half of the less relevant variables.
4. Repeat steps 1–3 until there is less than 10 variables in the remaining at-
tribute set.
As in the case of PCA analysis, the learning procedure has been repeated for
all 36 distance matrices which resulted in obtaining 36 random forests.
5 Results
5.1 Combined SVM and PCA results
To obtain the best possible prediction result, several cases concerning various
cumulative proportion of explained variance – 95%, 97%, 99% and 99.9% – have
7
been analysed. The most interesting issue on the first step of analysis was to
check how dimensionality reduction influenced accuracy of data classification
and what degree of reduction could provide the best possible data classification.
Results were compared to the classification based on the all dimensions used.
Please notice that for one user recording there were 36 sets of samples. Each set
included DTW distance matrix calculated for all users taking one signal feature
into consideration. The number of dimensions related to each set, dependent on
eye movement direction, varied from 808 to 925 elements. The performance of
the classification was assessed using two quality indices:
– Accuracy – the ratio of the number of correctly assigned attempts to the
number of all genuine identification attempts.
– FAR – the ratio calculated by dividing the number of false acceptances by
the number of identification attempts.
First we classified the whole dataset using SVM method and the classifica-
tion accuracy obtained with usage of all dimensions was 24%, and the FAR ratio
was 4%. Then we applied PCA method; the Table 1 presents classification re-
sults for all levels of explained variability considered in the research. They are
complemented by the information about the number of principal components
required to account for a given variability. Due to the fact, that this number
was calculated independently for each set of samples (36 times), the final result
is presented in a form of average, minimal and maximal number of components
utilized.
Table 1. Classification results for various levels of explained variability
Proportion Average Maximal Minimal
of explained Accuracy FAR number of number of number of
variance components components components
95% 11% 5% 1.9 5 1
97% 11% 5% 3.08 8 1
99% 22% 4% 8.81 18 1
99.9% 54% 2% 91.81 203 19
These outcomes clearly indicate that applying PCA dimensionality reduction
method has had significant influence on classification accuracy. It is visible for
both explained variance percentages 99% and 99.9%. While in the former case
the accuracy is comparable with the full dimensionality calculations, in the latter
one it was improved more than twice. It is worth emphasizing that both results
were obtained for remarkable smaller number of dimensions (on average 8.81
and 91.81 components respectively comparing to about 900 features in primary
sets).
Analysing the classification results we noticed that there were some samples
that obtained very similar probability for two or more analysed classes. To deal
8
with this similarity results appropriate acceptance threshold was defined and
another step to data analysis was introduced.
If we denote by:
–pia probability that sample sbelongs to class i,
–pja maximal probability obtained for sample sindicating that this sample
belongs to class j,
– and (j6=iand i, j ∈1...37),
and if:
pi−pj≤acceptance threshold,
then probability of sample sbelonging to both classes iand jis treated as
equally likely.
Four values of the threshold defined as the difference between calculated
probabilities 0.000, 0.001, 0.0025 and 0.005 were studied. As it was expected,
the bigger threshold value the higher accuracy was obtained. However, increasing
threshold values resulted in increasing of FAR ratio as well (table 2 and Figure
1). The last column of the table presents the ratio of accuracy and FAR for
a particular threshold. It can be noticed that for the two first proportions of
explained variability the best ratio was obtained for threshold equal 0.001, while
in the third case both thresholds 0.000 and 0.001 provided similar ratio values.
In the last of the variabilities, proportion threshold 0.000 significantly surpassed
the others. The ratio of accuracy and FAR for all parameters was presented in
Figure 2.
0%
10%
20%
30%
40%
50%
60%
70%
80%
0.000
0.001
0.0025
0.005
0.000
0.001
0.0025
0.005
0.000
0.001
0.0025
0.005
0.000
0.001
0.0025
0.005
0.95
0.95
0.95
0.95
0.97
0.97
0.97
0.97
0.99
0.99
0.99
0.99
0.999
0.999
0.999
0.999
Accuracy
FAR
Fig. 1. Classification results for various threshold values
9
Table 2. Classification results for various threshold values
Explained Ratio of
variability Similarity Accuracy
proportion threshold Accuracy FAR and FAR
0.95 0.000 11% 5% 2.18
0.95 0.001 27% 8% 3.33
0.95 0.0025 43% 19% 2.23
0.95 0.005 70% 46% 1.52
0.97 0.000 11% 5% 2.18
0.97 0.001 24% 8% 2.92
0.97 0.0025 50% 20% 2.45
0.97 0.005 70% 47% 1.5
0.99 0.000 22% 4% 4.97
0.99 0.001 32% 7% 4.5
0.99 0.0025 50% 15% 3.35
0.99 0.005 68% 35% 1.92
0.999 0.000 54% 2% 21.76
0.999 0.001 62% 4% 16.37
0.999 0.0025 70% 7% 10.46
0.999 0.005 76% 13% 5.92
5.2 Random forest results
Recursive feature elimination procedure described in subsection 4.2 was repeated
for all 36 datasets. Therefore, finally we obtained 36 different random forests that
were used to classify examples from the test dataset (presented results do not
include OOB error). During classification, objects from the test dataset were
presented to each of the 36 classifiers and the final decision was made based on
voting strategy.
Analyses were performed using R Project random forest implementation from
randomForest package [16]. The number of trees in each forest was set empir-
ically to 1500 (ntree=1500) and the importance measure was computed as a
mean decrease of accuracy (importance parameter type=1). As we expect that
it might exists a correlation among variables, the values of importance were
not scaled, that is were not divided by their standard errors (importance pa-
rameter scale=FALSE). Classification accuracy obtained for selected number of
important features is presented in Table 3. As each direction (NE, NE, SE and
SW) was characterized by different number of attributes, the number of selected
features is presented separately for each direction.
Analysis of the results presented in Table 3 shows that reduction of the num-
ber of attributes improves classification accuracy. The best results were obtained
for the number of attributes around 50. Further reduction of the attribute num-
ber decreased the performance of the classifier as too much of the important
information is removed from the data description. In addition, we can notice
that with the reasonably reduced number of attributes (more than 50), the clas-
10
0
5
10
15
20
25
0.000 0.001 0.0025 0.005 0.000 0.001 0.0025 0.005 0.000 0.001 0.0025 0.005 0.000 0.001 0.0025 0 .005
0.95 0.95 0.95 0.95 0.97 0.97 0.97 0.97 0.99 0.99 0.99 0.99 0.999 0.999 0.999 0.999
Ratio of accuracy and FAR
Fig. 2. The ratio of accuracy and FAR for all thresholds
Table 3. Classification accuracy obtained for selected number of important features
Number of attributes
NE NW SE SW Accuracy
825 810 927 871 42%
413 405 464 436 46%
207 203 232 218 42%
104 102 116 109 46%
52 51 58 55 49%
26 26 29 28 40%
13 13 15 14 31%
7 7 8 7 31%
sification accuracy is around 40% - 46%. This is different than in case of the
SVM analysis where in case of the full set of attributes, the classification results
were very poor.
Finally, for random forest classifier, we have analysed classification accuracy
for each of 37 class separately. The results were quite surprising, as we noticed,
that some of the classes, such as 7, 15, 30 and 35 are classified with quite high
accuracy and for others we were able to correctly classify only several objects.
This is something that we did not expect and it requires further investigation.
The classification accuracy obtained separately for each class is presented in
Figure 3.
6 Conclusions
The aim of the research presented in this paper was to elaborate a procedure
for classification of individuals based on data obtained from their eye movement
signal. The data for the studies was acquired from the public accessed compe-
11
Fig. 3. Classification accuracy computed for separated classes
tition, which makes the results obtained in the research comparable with other
prospective explorations.
To prepare data for the classification, the set of features was built based
on dissimilarities among training samples. The dissimilarities were calculated
with DWT metrics. The drawback of such approach for data preprocessing is
that as the result it produces a dataset in which each object is described by
a huge number of attributes. High dimensionality of obtained dataset makes it
difficult to analyse, therefore some additional preprocessing steps are required
before selected classification method is applied. The obtained results show that
combining DTW data preprocessing method with a dimensionality reduction
approach provides the better classification accuracy.
Due to different philosophy of feature extraction versus feature selection it
is difficult to directly compare both methods. In case of the combined SVM and
PCA method different ranges of data size reduction and their influence on the fi-
nal result were studied. These outcome confirmed that it is possible to decrease
a data size meaningfully without decreasing classification accuracy, even im-
proving it. Applying a threshold parameter allows obtaining better classification
results, however it is a trade-off between accuracy of the classifier and a security
system false acceptance rate. The second data dimensionality reduction method
used in our analysis was random forest procedure for wrapper variable selec-
tion. Comparing random forest with SVM method we can see that for full set
of features, random forest classifier gives the better results than SVM method.
This is something that is expected as the random forest method is known to be
suitable for the high-dimensionality problems. However, by reducing the number
of attributes we can still improve accuracy of our random forest classifier.
12
Another interesting result that we observed during our analyses is that the
classification accuracy highly differs among classes. Currently, we are not able
to say if it is due to the differences among examined individuals or there was
some bias introduced during data acquisition phase.
Summarizing the results, it must be emphasized that their quality is not suf-
ficient to apply in a real authentication process, yet indicate promising directions
of the future work.
Acknowledgment The work was partially supported by National Science Centre
(decision DEC-2011/01/D/ST6/07007) (A.G). Computations were performed
with the use of the infrastructure provided by the NCBIR POIG.02.03.01-24-
099/13 grant: GCONiI - Upper-Silesian Center for Scientific Computations.
References
1. Aggarwal, C.C.: Data Classification: Algorithms and Applications. Data Mining
and Knowledge Discovery Series, Hapman and Hall CRC (2014)
2. Bednarik, R., Kinnunen, T., Mihaila, A., Fr¨anti, P.: Eye-movements as a biometric.
In: Image analysis, pp. 780–789. Springer (2005)
3. Bensch, M., Schroder, M., Bogdan, M., Rosenstiel, W.: Feature selection for high-
dimensional industrial data. In: ESANN 2005, 13th European Symposium on Ar-
tificial Neural Networks. pp. 375–380 (2005)
4. Berndt, D.J., Clifford, J.: Using dynamic time warping to find patterns in time
series. In: KDD workshop. vol. 10, pp. 359–370. Seattle, WA (1994)
5. Breiman, L.: Random forests. Machine learning 45(1), 5 – 32 (2001)
6. Burges, C.J.C.: Dimension reduction: A guided tour. Foundations and Trends in
Machine Learning 2(4) (2010)
7. D´ıaz-Uriarte, R.and Alvarez de Andr´es, S.: Gene selection and classification of
microarray data using random forest. BMC Bioinformatics 7, 3 (2006)
8. Gregorutti, B., Michel, B., Saint Pierre, P.: Correlation and variable importance
in random forests. ArXiv:1310.5726. (2015)
9. Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classifi-
cation using support vector machines. Machine Learning 46(1-3), 389–422 (2002)
10. Holzer, S., Ilic, S., Tan, D., Navab, N.: Efficient learning of linear predictors us-
ing dimensionality reduction. In: Lee, K., Matsushita, Y., Rehg, J., Hu, Z. (eds.)
Computer Vision ACCV 2012, Lecture Notes in Computer Science, vol. 7726, pp.
15–28. Springer Berlin Heidelberg (2013)
11. Kasprowski, P., Harezlak, K.: The second eye movements verification and identifi-
cation competition. In: Biometrics (IJCB), 2014 IEEE International Joint Confer-
ence on. pp. 1–6. IEEE (2014)
12. Kasprowski, P., Komogortsev, O.V., Karpov, A.: First eye movement verification
and identification competition at btas 2012. In: Biometrics: Theory, Applications
and Systems (BTAS), 2012 IEEE Fifth International Conference on. pp. 195–202.
IEEE (2012)
13. Kasprowski, P., Ober, J.: Eye movements in biometrics. In: Biometric Authentica-
tion, pp. 248–258. Springer (2004)
14. Kohavi, R., Johnb, G.: Wrappers for feature subset selection. Artificial Intelligence
97(1-2), 273324 (1997)
13
15. Kursa, M., Jankowski, A., Rudnicki, W.: Boruta - a system for feature selection.
Journal - American Water Works AssociationFundamenta Informaticae 101(4),
271–285 (2010)
16. Liaw, A., Wiener, M.: Classification and regression by randomforest. R News 2(3),
18–22 (2002)
17. Miller, R., Chen, C., Eick, C., Bagherjeiran, A.: A framework for spatial feature
selection and scoping and its application to geo-targeting. In: Spatial Data Mining
and Geographical Knowledge Services (ICSDM), 2011 IEEE International Confer-
ence on. pp. 26–31 (2011)
18. Pekalska, E., Duin, R.P., Paclik, P.: Prototype selection for dissimilarity-based
classifiers. Pattern Recognition 39(2), 189–208 (2006)
19. Saeed, U.: A survey of automatic person recognition using eye movements. Inter-
national Journal of Pattern Recognition and Artificial Intelligence 28(08), 1456015
(2014)
20. Sikora, M.: Redefinition of decision rules based on the importance of elementary
conditions evaluation. Fundamenta Informaticae 123(2), 171–197 (2013)
21. Sikora, M., Gruca, A.: Quality improvement of rule-based gene group descriptions
using information about go terms importance occurring in premises of determined
rules. Applied Mathematics and Computer Science 20(3), 555–570 (2010)
22. Svetnik, V., Liaw, A., Tong, C., Wang, T.: Application of breimans random forest
to modeling structure-activity relationships of pharmaceutical molecules. In: Roli,
F., Kittler, J., Windeatt, T. (eds.) Multiple Classifier Systems, Lecture Notes in
Computer Science, vol. 3077, pp. 334–343. Springer Berlin Heidelberg (2004)
23. Touw, W., Bayjanov, J., Overmars, L., Backus, L., Boekhorst, J., Wels, M., van
Hijum, S.: Data mining in the life sciences with random forest: a walk in the park
or lost in the jungle? Brief. Bioinformatics 14(3), 315–326 (2013)
24. Vapnik, V., Golowich, S.E., Smola, A.: Support vector method for function ap-
proximation, regression estimation, and signal processing. Advances in neural in-
formation processing systems pp. 281–287 (1997)