Content uploaded by Paweł Tylek
Author content
All content in this area was uploaded by Paweł Tylek on Apr 12, 2020
Content may be subject to copyright.
Joanna GRABSKA-CHRZĄSTOWSKA, Joanna KWIECIEŃ, Michał DROŻDŻ, Zbigniew BUBLIŃSKI „Journal of Research and Applications in Agricultural Engineering” 2017, Vol. 62(1)
Ryszard TADEUSIEWICZ, Jan SZCZEPANIAK, Józef WALCZYK, Paweł TYLEK
31
Joanna GRABSKA-CHRZĄSTOWSKA1, Joanna KWIECIEŃ1, Michał DROŻDŻ1, Zbigniew BUBLIŃSKI1,
Ryszard TADEUSIEWICZ1, Jan SZCZEPANIAK2, Józef WALCZYK3, Paweł TYLEK3
1 AGH University of Science and Technology, Cracow, Poland
e-mail: buba@agh.edu.pl ; drozdzmic@gmail.com ; asior@agh.edu.pl ; kwiecien@agh.edu.pl ; rtad@agh.edu.pl
2 Industrial Institute of Agricultural Engineering, Poznan, Poland
e-mail: janek@pimr.poznan.pl
3 University of Agriculture in Cracow, Poland
e-mail: rltylek@cyf-kr.edu.pl ; rlwalczy@cyf-kr.edu.pl
COMPARISON OF SELECTED CLASSIFICATION METHODS IN AUTOMATED
OAK SEED SORTING
Summary
In this paper the results of automated, vision based classification of oak seeds viability i.e. their ability to germinate are
presented. In the first stage, using a photo of the seed cross-section, a set of feature vectors were determined. Then three
classification methods were examined: k-nearest neighbours (k-NNs), artificial neural networks (ANNs) and support vector
machines (SVMs). Finally, a 73.1% precision was obtained for kNN and a 64 bin histogram, 78.5% for ANN and a 4 bin
histogram and 78.8% for SVM with a 64 bin histogram.
Key words: acorn classification, automated acorn sorting, image processing and analysis, kNN, ANN, SVM
PORÓWNANIE WYBRANYCH METOD KLASYFIKACJI W AUTOMATYCZNYM
SORTOWANIU NASION DĘBU
Streszczenie
W artykule zaprezentowano wyniki badań automatycznej, wizyjnej klasyfikacji nasion dębu pod względem ich żywotności,
tj. zdolności do kiełkowania. W pierwszym etapie prac, na podstawie zdjęcia przekroju nasiona, wyznaczono zbiór cech, który w spo-
sób niezależny od kształtu i rozmiaru poszczególnych obiektów pozwala na opisanie ich budowy anatomicznej. Następnie zbadano,
dla wyselekcjonowanych wektorów cech, trzy metody klasyfikacji: k-najbliższych sąsiadów (k-NN), artificial neural networks (ANN)
oraz maszynę wektorów nośnych (SVM). Uzyskano 73,1% precyzji rozpoznawania dla histogramu o długości 64 metodą kNN, 78,5%
dla histogramu o długości 4 dla ANN i 78,8% dla histogramu o długości 64 metodą SVM.
Słowa kluczowe: klasyfikacja żołędzi, automatyczne sortowanie żołędzi, przetwarzanie i analiza obrazów, kNN, ANN, SVM
1. Introduction
The preparation of cuttings with high germination abil-
ity is a very important issue in oak growing. Traditionally,
this process involves seed scarification (cutting the top part
of the acorn) and visual evaluation by a trained expert [5].
Due to the large number of oak seeds, which require this
kind of analysis and limited time to accomplish this task,
significant human resources have to be involved. This turns
out to be very expensive and inconvenient.
An automaton able to perform the scarification followed by
visual evaluation of the cross-section and finally seed sorting
could be a solution to the problem outlined above. In this paper
the research results on automated, vision based seeds cross-
section analysis are presented. The proposed solution is a clas-
sic example of a vision system, which consists of the following
steps: image preprocessing, feature extraction (analysis) and
classification. In should be emphasized that each of the men-
tioned steps is very important for the final system perfor-
mance. In particular, two key elements are feature vector ex-
traction, which should describe the seeds' “quality” (usually
the stage of mummification changes) and the classification.
During the study several different options for feature vector
construction and three classifiers: k-nearest neighbours
(kNNs), artificial neural networks (ANNs) and support vector
machines (SVMs) were examined. In similar works for wheat
seeds classification neural networks were used [2, 7, 8].
The performance of particular classification methods was
evaluated for greyscale image histogram and HSV (Hue, Satu-
ration, Value) colour features of the seeds’ cross-sections.
2. Feature vector computation
The proposed feature vector computation process con-
sisted of the following stages: image acquisition, image
normalization and feature extraction. In the experiments a
set of 400 images of oak acorns cross-sections was used.
The preliminary analysis has shown a large lighting var-
iance across the image dataset. These differences included
both uneven lighting within a single image and variable
lighting between different images. Therefore, it was neces-
sary to develop methods to compensate for the disparities.
Two solutions were proposed – one for non-uniformity
of luminance compensation and second for colour normali-
zation. Both approaches were based on a similar mecha-
nism and made use of the presence of uniform white back-
ground in the image. For the first approach – based on the
lower and upper parts of the image (without the acorn), a
brightness function was determined, applied as to unify
lighting in the entire image. An example of this procedure
is presented in Figure 1a and 1b. As for the second case, the
colour components Cb and Cr (after initial image conversion
from RGB to YCbCr) in nominally white areas (Cb and Cr
values should be 128) were determined. Then, using the
computed difference between the expected and actual val-
ue, a correction for all of the pixel values was applied. A
sample result for combined brightness and colour compen-
sation is presented in Figure 1c and 1d.
Joanna GRABSKA-CHRZĄSTOWSKA, Joanna KWIECIEŃ, Michał DROŻDŻ, Zbigniew BUBLIŃSKI „Journal of Research and Applications in Agricultural Engineering” 2017, Vol. 62(1)
Ryszard TADEUSIEWICZ, Jan SZCZEPANIAK, Józef WALCZYK, Paweł TYLEK
32
Source: own work / Źródło: opracowanie własne
Fig. 1. Image normalization example. Input image (a), image
after brightness normalization (b), input image (c), image after
brightness and colour normalization (d)
Rys. 1. Przykład normalizacji. Obraz wejściowy (a), obraz po
normalizacji jasności (b), obraz wejściowy (c), obraz po nor-
malizacji jasności i koloru (d)
During preliminary analysis a number of different fea-
tures of acorns cross-section were considered. They were
mainly based on brightness and colour, as these two proper-
ties were mentioned by experts (they used them in the man-
ual visual evaluation of seeds). All the described below op-
erations were performed for the area covering only the
cross-section i.e. cotyledons and shell. Therefore, a manual-
ly determined mask, which defined the region of interest
(ROI) was used. An example of the mask and its applica-
tion is shown in Figure 2.
a) b) c)
Source: own work / Źródło: opracowanie własne
Fig. 2. Determining the region of interest. Input image (a),
manually-chosen mask (b), the selected ROI (c)
Rys. 2. Maskowanie obszaru analizy. Obraz wejściowy (a),
maska wyznaczona ręcznie (b), obszar analizy (c)
Among the considered features were mean, median and
variance of greyscale and colour images, as well as histo-
grams and cumulative histograms for greyscale images.
These values can be calculated for both the whole ROI area,
as well as for its different parts (for example rings). There-
fore, the number of features used in the classification stage
can be increased.
After some initial result analysis, it was decided to carry out
more extensive experiments with greyscale histograms (number
of used bins: 4, 8, 16, 32, 64), as well as mean and median for
brightness and colour (in HSV colour space) images.
In the study of acorns viability classification a set of 400
images presented oak seeds after scarification. This set was
divided into two parts. One for classifier training and sec-
ond for quality evaluation (i.e. train and test sets). Both of
them contained varying images (“obvious” and “difficult”
cases). There were 240 samples in the training set and 160
in the test set. For both groups the “ground truth” i.e. the
information if a given seed germinated was available.
3. The used classifiers
3.1. K-nearest neighbours
The concept of k nearest neighbours (kNN) is based on
finding k feature vectors in the training set, which are most
in terms of a certain measure to the currently considered
sample (e.g. from the test set or input device) [3 and 6]. If
each component of a sample in the training set is consid-
ered as another dimension in a certain space and the partic-
ular values are coordinates in that dimension, the similarity
of two samples (described by these coordinates) is defined
by a metric.
The selection of parameters i.e. appropriate number of
neighbours and a distance measure has an impact on the
classification performance. The k value should allow to
minimize the probability of misclassification of the consid-
ered sample by taking into consideration only the suffi-
ciently close neighbours. During the experiments, the fol-
lowing k values were used: 1, 3, 5, 7, 9 and 11. It was as-
sumed that the distance between two samples was comput-
ed with the Euclidean metric. All experiments were per-
formed in Matlab software.
3.2. Artificial neural networks
Artificial neural networks, which model the behaviour
of neural networks present in neural system of animals and
humans (at different levels of specificity), are a well-known
and proved classifier [4 and 10]. The processing elements
(i.e. neurons) are arranged in connected layers. Parameters
of each connection (so-called weights) are determined via a
learning process using a set of examples. These set is divid-
ed into a training, validation and test part. The first two are
used in the network training (directly and supportive) and
the third is used to evaluate the classifiers performance on-
ly. In the experiments the Statistica® Statsoft software was
used. If two classes were considered i.e. “grown” and “not
grown” the output layer of the network contained only one
neuron. To the first class the value “0” and to the other “1”
was assigned. In case of using a single neuron with a con-
tinuous function is was necessary to use a threshold. The
value 0.5 was used to separate elements of class “0” from
class “1”.
3.3. Support Vector Machines (SVM)
Support Vector machine are binary linear classifiers [1
and 9]. They allow to separate two classes by a hyper plane.
Their parameters (support vectors) are determined in a su-
pervised learning procedure. Among the many possibilities,
the one that can best separate the two classes (according to
a particular cost function) is selected. Then, the classifica-
tion of a particular sample involves the determination of its
location with correspondence to the hyper plane. For non-
linear problems the so-called kernel trick is used. It allows
increasing the dimensionality of the feature vector and usu-
ally improving the performance. In the experiments the
SVM implementation available in Matlab software was
used.
4. The obtained results
The goal of the described research was the determina-
tion of a feature vector (Section 2) and classification meth-
od (Section 3) which will have the best performance in vis-
ual based oak seeds evaluation. In case of a binary classifi-
cation there are four possible outcomes TP (true positive –
should grow and did grow), TN (true negative - shouldn't
grow and didn't grow), FP (false positive – should grow,
but didn't grow), FN (false negative – shouldn't grow and
did grow). On their basis the following measures can be de-
termined: sensitivity (TP / (TP + FN)); specificity (TN /
(TN + FP)); precision (TP / (TP + FP)); accuracy ((TP +
TN) / (TP + FP + FN + TN)). All these values can be ex-
Joanna GRABSKA-CHRZĄSTOWSKA, Joanna KWIECIEŃ, Michał DROŻDŻ, Zbigniew BUBLIŃSKI „Journal of Research and Applications in Agricultural Engineering” 2017, Vol. 62(1)
Ryszard TADEUSIEWICZ, Jan SZCZEPANIAK, Józef WALCZYK, Paweł TYLEK
33
pressed as percentages. It was decided that for the automa-
ton the most important parameter is precision, as the main
observed problem was the distinction between TP and FP.
4.1. Greyscale histogram based classification
The obtained results for pre-selected histogram based
features and kNN, ANN and SVM classifiers are summa-
rized in Table 1. For the ANNs the best results were
achieved for 4 bins histogram. The corresponding network
had 4 neurons in the input layer, 4 in the hidden one and
one as the output. The results can be explained by the size
of the used training set. For larger histograms and therefore
larger networks the set was too small to allow effective
learning (weight determination). For the kNN method with
k=7 the best precision was obtained for a 64 bin histogram,
whereas for SVM with RBF (Radial Basis Function) kernel
the best performance was observed for 64 bin histogram.
Table 1. Classification performance for histogram based
features
Tab. 1. Miary jakości klasyfikacji na podstawie wartości
histogramów
METHOD
ANN
kNN
SVM
NUMBER OF
HISTOGRAM BINS
4
64
64
SENSITIVITY [%]
81.0
90,5
76.2
SPECIFICITY [%]
85.6
78.4
86.6
PRECISION [%]
78.5
73.1
78.7
ACCURACY [%]
83.8
83.1
82.5
Source: own work / Źródło: opracowanie własne
4.2. HSV colour space based classification
The obtained results for selected HSV colour space
based features and classifiers : kNN, ANN and SVM are
summarized in Table 2. In the experiments mean and medi-
an values separately for HSV colour components, six values
(mean and median for H+S+V), as well as additional mean
and median for GR (green) and GRN (normalized green
component) were analysed.
Table 2. Classification performance for HSV colour space
based features
Tab. 2. Miary jakości klasyfikacji na podstawie reprezenta-
cji barwnej HSV
FEATURE
V
10 param*
10 param.*
METHOD
kNN
ANN
SVM
SENSITIVITY [%]
65.1
68.3
44.4
SPECIFICITY [%]
74.2
78.4
89.7
PRECISION [%]
62.1
67.2
73.7
ACCURACY [%]
70.6
74.4
71.9
*(GR+GRN+H+S+V, median and average)
Source: own work / Źródło: opracowanie własne
The precision results for kNN method were in the range
[50.5%, 62.1%]. The best performance was obtained for a
single feature - median of the V component and k=5. Both
the ANN and SVM method achieved the best results for 10
features. In the first case they were in range [50.5%, 67.2%]
and in the second [50.0%, 73.7%].
5. Summary
A comparison of the three analysed classification meth-
ods is presented in Table 3. The used features and classifi-
ers parameters were selected using the best precision value.
Table 3. Best feature vectors and classification performance
for each of the considered classifiers
Tab. 3. Najlepszy zestaw cech dla każdej z metod i wyniki
klasyfikacji według tego zestawu
FEATURE
Histogram
64 BIN
Histogram
4 BIN
Histogram
64 BIN
METHOD
kNN
ANN
SVM
SENSITIVITY [%]
90.5
81.0
76.2
SPECIFICITY [%]
78.4
85.6
86.6
PRECISION [%]
73.1
78.5
78.7
ACCURACY [%]
83.1
83.8
82.5
Source: own work / Źródło: opracowanie własne
In case of HSV colour space based features, the best re-
sults were obtained by the SVM method. However, a simi-
lar performance was achieved for ANN and 4 bin histo-
grams. The kNN performance was about 5% percentage
points lower (for 64 bin histogram). It can be concluded
that ANN and SVM allow for the considered problem to
achieve comparable results, with a slight advantage of the
second approach.
6. References
[1] Cortes C., Vapnik, V.: Support-Vector Networks, Machine
Learning, 1995, Vol. 20, 273-297.
[2] Ducournau S, Feutry A, Plainchault P, Revollon P, Vigouroux B,
Wagner MH: An image acquisition system for automated moni-
toring of the germination rate of sunflower seeds. Computers and
Electronics in Agriculture, 2004, 44, 3, 189-202.
[3] Duda R.O., Hart P.E., Stork D.G.: Pattern Classification, 2nd
edition. John Wiley & Sons, 2001.
[4] Dudek-Dyduch E., Tadeusiewicz R., Horzyk A., Neural Network
Adaptation Process Effectiveness Dependent of Constant Train-
ing Data Availability, Elsevier, Neurocomputing, 2009, 72, 3138-
3149.
[5] Jabłoński M, Tylek P., Walczyk J., Tadeusiewicz R., Piłat A.:
Colour-Based Binary Discrimination of Scarified Quercus robur
Acorns under Varying Illumination. Sensors, 2016, 16(8), 1319.
[6] Jain A.K., Duin R.P., Mao Jianchang W.: Statistical pattern
recognition: a review. IEEE Transactions on Pattern Analysis and
Machine Intelligence, 2000, 22, 1, 4-37.
[7] Jayas D.S., Paliwal J., Visen N.S.: Review Paper (AE-
Automation and Emerging Technologies): Multi-layer Neural
Networks for Image Analysis of Agricultural Products. Journal of
Agricultural Engineering Research, 2000, 77, 2, 119-128.
[8] Kubiak A., Mikrut Z.: Application of Neural Networks and Two
Representations of Color Components for Recognition of Wheat
Grains Infected by Fusarium Culmorum Fungi. Lecture Notes in
Proc. of the 7th ICAISC, Zakopane, Poland, 2004, Rutkowski L.
et al (eds), Springer.
[9] Shigeo A.: Support Vector Machines for Pattern Classification.
Springer-Verlag London, 2010.
[10] Tadeusiewicz R., Chaki R., Chaki N.: Exploring Neural Net-
works with C#. CRC Press, 2015.
Acknowledgments
The work presented was supported by National Centre of Research and Development of Republic of Poland (NCBiR) within
the project "Functional model of automaton, comprising machine vision system, for scarification and assessment of acorn
viability by means of automatic recognition of topography of mummification changes", grant no.: PBS3/A8/134/2015.
The authors would like to thank Tomasz Kryjak for his support in preparing this manuscript.