ArticlePDF Available

Effects of Varying Resolution on Performance of CNN based Image Classification An Experimental Study

Authors:

Abstract and Figures

Image quality is affected by different types of quality factors (such as resolution, noise, contrast, blur, compression). Resolution of the image affects the visual information of the image. Image with higher resolution contains more visual information details while an image with lower resolution contains less visual details. Convolutional neural network (CNN) based image classifiers always take input as an image, automatically learn image features and classify into its output class. If input image resolution varies, then it hinders classification performance of CNN based image classifier. This paper proposes a methodology (training testing methods TOTV, TVTV) and presents the experimental study on the effects of varying resolution on CNN based image classification for standard image dataset MNIST and CIFAR10. The experimental result shows that degradation in image resolution from higher to lower decreases performance score (accuracy, precision and F1 score) of CNN based Image classification.
Content may be subject to copyright.
© 2018, IJCSE All Rights Reserved 451
International Journal of Computer Sciences and Engineering Open Access
Research Paper Vol.-6, Issue-9, Sept. 2018 E-ISSN: 2347-2693
Effects of Varying Resolution on Performance of CNN based Image
Classification: An Experimental Study
Suresh Prasad Kannojia1, Gaurav Jaiswal2*
1,2ICT Research Lab, Department of Computer Science, University of Lucknow, Lucknow, India
*Corresponding Author: gauravjais88@gmail.com, Tel.: +91-8299837473
Available online at: www.ijcseonline.org
Accepted: 22/Sept/2018, Published: 30/Sept/2018
AbstractConvolutional neural network (CNN) based image classifiers always take input as an image, automatically learn its
feature and classify into predefined output class. If input image resolution varies, then it hinders classification performance of
CNN based image classifier. This paper proposes a methodology (training testing methods TOTV, TVTV) and presents the
experimental study on the effects of varying resolution on CNN based image classification for standard image dataset MNIST
and CIFAR10. The experimental result shows that degradation in resolution from higher to lower decreases performance score
(accuracy, precision and F1 score) of CNN based Image classification.
Keywords Varying Resolution, Convolution Neural Network, Image Classification, Feature Learning, Classification
I. INTRODUCTION
Nowadays, a large amount of image data are generated and
processed in real-world application [1]. Resolution of the
image varies due to different input sources, different imaging
devices. Variation in images resolution alters the visual
information of images [2][3]. Simple visual information does
not vary significantly, but complex visual information varies
drastically with the reduction of image resolution. Figure 1
shows the reduction of original image resolution and their
reduced visual information. The visual information plays a
vital role to determine images to classify in their
corresponding class.
Figure 1 Reduction of resolution and visual information (images are
taken from MNIST [14] and CIFAR10 [15] dataset)
With the advent of deep learning technology and the growth
of computing power, convolutional neural network (CNN)
has emerged one of the successful image classification
models [1][4][5]. CNN based image classifier consists of the
convolutional layer, pooling layer and soft-max layer. It
takes input as an image, learns automatically image spatial
information and preserves these spatial feature maps into
their higher to lower layers [5]. Spatial-visual information
constraint affects the performance of CNN based image
classifier. Physical barrier to spatial visual information of an
image is image quality factors (such as resolution, noise,
contrast, blur, compression). In literature, most of these
quality factors are experimentally visualised that how they
affect the image classification performance. Dodge et al. [6]
explore the effect of image quality distortions (blur, noise,
contrast, JPEG and JPEG 2000 compression) on the deep
neural network (VGG-16, VGG-CNN-S, GoogleNet) but not
explore the effect of image resolution. While, Basu et al. [7]
present modified MNIST dataset using motion blur, noise,
contrast variation and successfully handle these image
distortions by probabilistic quadtree DBN framework.
Dejean et al. [8] show the impact of compression on CNN
classification performance. They also suggest that an image
can be compressed by a factor 7, 16, 40 for JPEG, JPEG2000
compression while still maintaining a correct classification.
Sanchez et al. [9] analyse the impact of contrast in large-
scale recognition by estimating different illumination quality.
The effect of image resolution on classification is also
considered. Chevalier et al. [10] propose LR-CNN model and
analyse the effect of varying resolution on fine-grained
International Journal of Computer Sciences and Engineering Vol.6(9), Sept 2018, E-ISSN: 2347-2693
© 2018, IJCSE All Rights Reserved 452
classification. Ullman et al. [11] explore the effects of image
resolution to classification by human and classification by
DNN. Chen et al. [12] examine the effect of spatial
resolution and texture window size on the performance of
maximum likelihood classifier for urban land cover use. The
effects of varying resolution on CNN based image classifier
have not been explored in the direction of different training
testing methods.
The primary goal of this paper is to visualise and analyse the
experimental study of varying image resolution and its
effects on the performance of CNN based image classifier.
For this, an experimental methodology is proposed which
have two separate training testing experiments. In the first
experiment, this classifier is trained on original resolution
train image dataset and evaluated on a set of varying
resolution test image dataset. In another experiment, same
classifier is trained on each varying resolution train image
dataset and evaluated on the corresponding resolution test
image dataset.
Remaining paper is organised as follow: Section II describes
the experimental methodology for the study of effects of
varying resolution on performance of CNN based image
classifier. This methodology is implemented and evaluated
on performance metrics (accuracy, precision and F1 score)
[13], for standard image dataset MNIST [14] and CIFAR10
[15] in Section III. Section IV concludes the paper.
II. METHODOLOGY
The main components of proposed methodology are
preparation of varying resolution images, implementation of
CNN based image classifier, their training testing methods
(TOTV, TVTV) and evaluation on performance metrics. The
flow diagram of this methodology is shown in Figure 2.
A. Preparing Varying resolution images
For generating varying resolution sets of the original image,
image rescale/resize operation is performed on an original
image with a defined set of lower resolutions. For 32x32
resolution image, varying resolution image set contains 8x8,
16x16, 24x24 and 32x32 pixel resolution image. Afterwards,
each image is resized into original size for the sake of input
tensor of Convolutional Neural Network. Original image
with varying resolution images is shown in Figure 3. In this
methodology, varying resolution of standard image dataset
MNIST and CIFAR10 are prepared.
Figure 2 Flow Diagram of the Proposed Methodology
International Journal of Computer Sciences and Engineering Vol.6(9), Sept 2018, E-ISSN: 2347-2693
© 2018, IJCSE All Rights Reserved 453
Figure 3 Original image and their varying resolution images (images are taken from MNIST [14] and CIFAR10 [15] dataset)
B. Implementation of CNN based Image Classifier
Convolutional neural network consists of mainly three types
of layers: Convolutional layer, Pooling layer and Softmax
layer. In convolutional layer, the input image is convolved
with multiple kernels. CNN always preserve the spatial
information and generate multiple feature maps. Pooling layer
reduces the size of the feature map by spatial invariance
average or maximum operation. Both convolutional layer and
pooling layer compose feature extraction module. In the
softmax layer, softmax activation function is used to classify
the input feature map into class value. Traditional CNN
architecture is given in Figure 4. In this methodology, CNN is
implemented for image classification task. Separate CNN
based image classifier is implemented for MNIST and
CIFAR10 dataset respectively.
Figure 4 Traditional architecture of CNN
C. Training Testing Methods
We perform different training testing strategy on CNN based
image classifier for analysing the effects of varying resolution
on the performance of CNN based image classifier. From
which we can evaluate the performance of learned CNN
classifier on original image dataset or varying image dataset
and their prediction on varying image dataset. For this, two
training and testing methods are adopted. These methods are
described as follows:
1) Training with original resolution image dataset and
testing with varying resolution image dataset (TOTV):
This training and testing method is used to analyse how the
reduction of image resolution affects the performance of
classifier which trained on higher resolution images. In this
method, Classifier is trained on original resolution train
image dataset and evaluated on a set of varying resolution
test image dataset. For 32x32 image dataset, Classifier is
trained on 32x32 resolution image dataset and evaluated on
separately 8x8, 16x16, 24x24 and 32x32 resolution image
dataset.
2) Training and testing with each Varying resolution image
dataset separately (TVTV):
This training testing method analyses the performance of
CNN based image classifier for training and testing with
lower resolution images. In this method, Classifiers separately
trained on each varying resolution train image dataset and
evaluated on the corresponding resolution test image dataset.
For 32x32 image dataset, Classifier is trained separately on
8x8, 16x16, 24x24 and 32x32 resolution image dataset and
evaluated on corresponding 8x8, 16x16, 24x24 and 32x32
resolution image dataset.
D. Performance Evaluation
For the performance evaluation of CNN based image
classifier, standard classification performance metrics
accuracy, precision, and F1 score are used in this
experimental methodology. Accuracy is the fraction of correct
International Journal of Computer Sciences and Engineering Vol.6(9), Sept 2018, E-ISSN: 2347-2693
© 2018, IJCSE All Rights Reserved 454
predicted class to all predicted class. It works better for a
balanced image dataset than imbalance image dataset.
Precision is the ratio of number of correctly classified positive
instances to the number of instances labelled by the classifier
as positive. Precision metric is effective to identify actual
positive from predicted positive labels whereas Recall is ratio
of number of correctly positive instances to number of
instances labelled are relevant. F1 score is the harmonic mean
of precision and recall. Here, precision and F1 score are
calculated as average per-class.
III. EXPERIMENT AND ANALYSIS
A. Experimental setup
Standard benchmark image dataset MNIST and CIFAR10 is
chosen for this experimental study. MNIST image dataset
contains 8 bit 28x28 resolution handwritten numerical digits
(0 9) images. These digit images are very simple and have
less visual information. CIFAR10 image dataset contains
32x32 resolution colour images. These images are complex
and have high visual information. For preparing varying
resolution, MNIST dataset is rescaled into 7x7, 14x14, 21x21
and 28x28 pixel resolution image dataset while CIFAR10
dataset is rescaled into 8x8, 16x16, 24x24 and 32x32 pixel
resolution image dataset.
Now, CNN based image classifiers are implemented in
python library sk-learn and keras with the backend of
Tensorflow. Different architecture of convolutional neural
network is implemented for each standard dataset MNIST and
CIFAR10. Layer-wise architectural details of each CNN
based image classifier is shown in Table 1. Here, both CNN
based classifiers are trained and tested using TOTV, TVTV
methods.
B. Result and Analysis
These experiments have been evaluated on three standard
performance metrics: accuracy, precision and F1 score. The
detail results of each performance score of CNN based image
classifier with varying image resolution for both training
testing methods (TOTV, TVTV) on MNIST and CIFAR10
dataset are shown in Table 2 and Table 3 respectively.
Table 1 Layer-wise architectural details of CNN for MNIST
and CIFAR10 dataset
CNN Architecture for MNIST
Layers
Layers Parameter
Activation Function
Conv2D
Conv2D
Maxpooling2D
32, size=(3,3)
32,size=(3,3)
Size=(2,2)
Relu
Relu
Conv2D
Conv2D
Maxpooling2D
64,size=(3,3)
64,size=(3,3)
Size=(2,2)
Relu
Relu
Dense
Dropout
512
0.2
Relu
Dense
10
Softmax
CNN Architecture for CIFAR10
Layers
Layers Parameter
Activation Function
Conv2D
Conv2D
Conv2D
Maxpooling2D
Dropout
32,size=(3,3)
32,size=(3,3)
32,size=(3,3)
Size=(2,2)
0.25
Relu
Relu
Relu
Conv2D
Conv2D
Conv2D
Maxpooling2D
Dropout
64,size=(3,3)
64,size=(3,3)
64,size=(3,3)
Size=(2,2)
0.25
Relu
Relu
Relu
Dense
Dropout
512
0.5
Relu
Dense
10
Softmax
Table 2 Performance result of the experimental study on MNIST dataset
MNIST
Dataset
Resolution
TOTV
Trained on original resolution dataset
(28x28) and tested on varying
resolution dataset (28x28, 21x21,
14x14, 7x7)
TVTV
Trained and tested on each varying
resolution dataset separately (28x28,
21x21, 14x14, 7x7)
Precision
F1 Score
Accuracy
Precision
F1 Score
28x28
0.99269
0.99263
0.9927
0.99269
0.99263
21x21
0.99062
0.99045
0.9924
0.99247
0.99234
14x14
0.97828
0.97749
0.9854
0.98586
0.98544
7x7
0.79071
0.68767
0.7770
0.84433
0.78416
International Journal of Computer Sciences and Engineering Vol.6(9), Sept 2018, E-ISSN: 2347-2693
© 2018, IJCSE All Rights Reserved 455
Table 3 Performance result of experimental study on CIFAR10 dataset
CIFAR10
Dataset
Resolution
TOTV
Trained on original resolution dataset (32x32)
and tested on varying resolution dataset (32x32,
24x24, 16x16, 8x8)
TVTV
Trained and tested on each varying resolution
dataset separately (32x32, 24x24, 16x16, 8x8)
Accuracy
Precision
F1 Score
Accuracy
Precision
F1 Score
32x32
0.8752
0.87652
0.87548
0.8752
0.87652
0.87548
24x24
0.6409
0.72365
0.65320
0.6204
0.70501
0.63220
16x16
0.3166
0.48415
0.29897
0.4233
0.62030
0.40654
8x8
0.1855
0.27090
0.13986
0.3020
0.54599
0.24262
Now, comparison graphs of each performance metrics are
generated for detail analysis of the effects of varying
resolution on CNN image classifier for both datasets. The
performance comparison graph of classifier with varying
resolution on MNIST and CIFAR10 dataset is shown in
Figure 5 and Figure 6 respectively.
Figure 5 Comparison of performance of CNN based image classifier with varying resolution on MNIST dataset
Figure 6 Comparison of performance of CNN based image classifier with varying resolution on CIFAR10 dataset
International Journal of Computer Sciences and Engineering Vol.6(9), Sept 2018, E-ISSN: 2347-2693
© 2018, IJCSE All Rights Reserved 456
After analysing the performance comparison graph of
classifier for both dataset, it is noticeable that performance
score decreases when image resolution decreases. For MNIST
dataset which contains images of simple visual information,
the performance curve falls with little change to 14x14 pixel
resolution and after this curve falls with significant change.
However, for CIFAR10 dataset which contains images of
complex visual information, the performance curve falls
immediately with reduction of image resolution for both
training testing methods. Therefore, the effects of varying
resolution on the performance of classification of complex
visual information images are more than simple visual
information images. It is also noticeable that CNN based
image classifier using TVTV training testing method is less
affected than using TOTV training testing method. The
precision score of both methods are higher than the accuracy
and F1 score of both dataset. The higher values of precision
score show that classifiers perform classification into more
relevant than irrelevant images.
IV. CONCLUSION
This paper proposed a methodology and implemented on
standard image datasets (MNIST, CIFAR10) for study of
effects of varying resolution on performance of CNN based
image classification. The experimental results and analysis
conclude that performance of the classifier is mainly
depended upon visual information and resolution of images.
Here, degradation in image resolution from higher to lower,
decreases performance score (accuracy, precision and F1
score) of CNN based image classification.
ACKNOWLEDGMENT
We are thankful to Central facility of Computational
Research, University of Lucknow for providing access to
KRISHNA cluster. Second author is also thankful to UGC for
providing UGC-SRF fellowship to sustain his research.
REFERENCES
[1] Guo, Yanming, et al. "Deep learning for visual understanding: A
review." Neurocomputing, Vol.187, pp.27-48, 2016.
[2] Sheikh, H. R., and A. C. Bovik. "Image information and visual
quality.", IEEE Transactions on Image Processing,Vol.15, Issue.2,
pp.430-444, 2006.
[3] Lu, Dengsheng, and Qihao Weng. "A survey of image
classification methods and techniques for improving classification
performance.", International Journal of Remote sensing, Vol.28,
Issue.5, pp.823-870, 2007.
[4] Hoo-Chang, Shin, et al. "Deep convolutional neural networks for
computer-aided detection: CNN architectures, dataset
characteristics and transfer learning.", IEEE transactions on
medical imaging, Vol.35, Issues.5, pp.1285, 2016.
[5] Deng, Li, and Dong Yu. "Deep learning: methods and
applications.", Foundations and Trends® in Signal Processing
Vol.7, Issue.34, pp.197-387, 2014.
[6] Dodge, Samuel, and Lina Karam. "Understanding how image
quality affects deep neural networks.", Quality of Multimedia
Experience (QoMEX), 2016 Eighth International Conference on.
IEEE, 2016.
[7] Basu, Saikat, et al. "Learning sparse feature representations using
probabilistic quadtrees and deep belief nets.", Neural Processing
Letters, Vol.45, Issue.3, pp.855-867, 2017.
[8] Dejean-Servières, Mathieu, et al. “Study of the Impact of Standard
Image Compression Techniques on Performance of Image
Classification with a Convolutional Neural Network”, Diss. INSA
Rennes; Univ Rennes; IETR; Institut Pascal, 2017.
[9] Sanchez, Angel, et al. "Analyzing the influence of contrast in
large-scale recognition of natural images.", Integrated Computer-
Aided Engineering, Vol.23, Issue.3, pp.221-235, 2016.
[10] Chevalier, Marion, et al. "LR-CNN for fine-grained classification
with varying resolution.", Image Processing (ICIP), 2015 IEEE
International Conference on. IEEE, 2015.
[11] Ullman, Shimon, et al. "Atoms of recognition in human and
computer vision.", Proceedings of the National Academy of
Sciences, Vol.113, Issue.10, pp.2744-2749, 2016.
[12] Chen*, D., D. A. Stow, and P. Gong. "Examining the effect of
spatial resolution and texture window size on classification
accuracy: an urban environment case.", International Journal of
Remote Sensing, Vol.25, Issue.11, pp.2177-2192, 2004.
[13] Sokolova, Marina, and Guy Lapalme. "A systematic analysis of
performance measures for classification tasks.", Information
Processing & Management, Vol.45, Issue.4,pp. 427-437,2009.
[14] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. "Gradient-based
learning applied to document recognition.", Proceedings of the
IEEE, Vol.86, Issue.11, pp.2278-2324, 1998.
[15] Krizhevsky, Alex, and Geoffrey Hinton., Learning multiple
layers of features from tiny images”, Technical report, University
of Toronto, Vol.1, Issue.4, 2009.
Authors Profile
Suresh Prasad Kannojia is working as
Assistant Professor in Department of
Computer Science, University of
Lucknow, Lucknow, since 2005. He has
completed Ph.D. in 2013 from University
of Lucknow, Lucknow. His current area
of research interest includes Pattern
Recognition, Image Security, software quality, system
security/reliability, data warehousing and data mining. He
has also organized three national conferences and one
national research scholars meet. He has published thirteen
research papers in national and international
journals/conferences.
Gaurav Jaiswal is Senior Research Fellow
and Research Scholar in Department of
Computer Science, University of
Lucknow, Lucknow, India. He received his
Master degree in Computer Science from
BHU, Varanasi, India and Bachelor degree
in Computer Science from University of
Allahabad, Allahabad, India. His research
interests include Computer Vision, Deep Learning, Machine
Learning and Artificial intelligence.
... The input dimension of a network objectively affects the accuracy of the network. For example, in convolutional neural networks (CNNs), degradation in the image resolution reduces model performance [30]. Hence, the input dimension can be addressed by inserting more features into the input vector to enhance the PINN. ...
Preprint
Physics-informed neural networks (PINNs) have been widely applied in different fields due to their effectiveness in solving partial differential equations (PDEs). However, the accuracy and efficiency of PINNs need to be considerably improved for scientific and commercial use. To address this issue, we systematically propose a novel dimension-augmented physics-informed neural network (DaPINN), which simultaneously and significantly improves the accuracy and efficiency of the PINN. In the DaPINN model, we introduce inductive bias in the neural network to enhance network generalizability by adding a special regularization term to the loss function. Furthermore, we manipulate the network input dimension by inserting additional sample features and incorporating the expanded dimensionality in the loss function. Moreover, we verify the effectiveness of power series augmentation, Fourier series augmentation and replica augmentation, in both forward and backward problems. In most experiments, the error of DaPINN is 1$\sim$2 orders of magnitude lower than that of PINN. The results show that the DaPINN outperforms the original PINN in terms of both accuracy and efficiency with a reduced dependence on the number of sample points. We also discuss the complexity of the DaPINN and its compatibility with other methods.
... The Conv2d module has a key value that calculates the size of the output channel according to the size and factor of the input channel. The Conv2d module uses the convolution layer, which is typically used in image classification, which has a key value that calculates the size of the output channel according to the size and factor of the input channel [29]. ...
Article
Full-text available
Along with the importance of digital literacy, the need for SW(Software) education is steadily emerging. Programming education in public education targets a variety of learners from elementary school to high school. This study was conducted for the purpose of judging the proficiency of low school-age learners in programming education. To achieve the goal, a tool to collect data on the entire programming learning process was developed, and a machine learning model was implemented to judge the proficiency of learners based on the collected data. As a result of determining the proficiency of 20 learners, the model developed through this study showed an average accuracy of approximately 75%. Through the development of programming-related data collection tools and programming proficiency judging models for low school-age learners, this study is meaningful in that it presents basic data for providing learner-tailored feedback.
... Based on the TensorFlow platform, a convolutional neural network model with two convolutional layers was established and trained through MNIST data sets (Liang et al., 2019). Two training methods of TOTV and TVTV convolutional neural networks are proposed and validated with MNIST and CIFAR10 standard image data sets (Kannojia and Jaiswal, 2018). An extended CNN model is constructed and successfully applied for handwritten digit recognition of Mnist data sets (Lei et al., 2019). ...
Article
Full-text available
Soybean is an important oil crop and plant protein source, and phenotypic traits' detection for soybean diseases, which seriously restrict yield and quality, is of great significance for soybean breeding, cultivation, and fine management. The recognition accuracy of traditional deep learning models is not high, and the chemical analysis operation process of soybean diseases is time-consuming. In addition, artificial observation and experience judgment are easily affected by subjective factors and difficult to guarantee the accuracy of the objective. Thus, a rapid identification method of soybean diseases was proposed based on a new residual attention network (RANet) model. First, soybean brown leaf spot, soybean frogeye leaf spot, and soybean phyllosticta leaf spot were used as research objects, the OTSU algorithm was adopted to remove the background from the original image. Then, the sample dataset of soybean disease images was expanded by image enhancement technology based on a single leaf image of soybean disease. In addition, a residual attention layer (RAL) was constructed using attention mechanisms and shortcut connections, which further embedded into the residual neural network 18 (ResNet18) model. Finally, a new model of RANet for recognition of soybean diseases was established based on attention mechanism and idea of residuals. The result showed that the average recognition accuracy of soybean leaf diseases was 98.49%, and the F1-value was 98.52 with recognition time of 0.0514 s, which realized an accurate, fast, and efficient recognition model for soybean leaf diseases.
... The CNN training was done along hundreds of cycles named epochs, as shown in Figure 6, in which the entire dataset is passed through the CNN. The images are downsampled to the resolution of 512 x 512, such as suggested by Sabotke & Spieler (2020) and Kannojia & Jaiswal (2018), since the full resolution causes more computational burden than benefit in terms of accuracy. In order to input a random factor to mitigate the effects of the sensor position and its particular sensibility for the light, a random brightness variation of 10% as an image augmentation technique was used (Mikołajczyk & Grochowski 2018). ...
Article
Full-text available
Urbanization brought a lot of pollution-related issues that are mitigable by the presence of urban vegetation. Therefore, it is necessary to map vegetation in urban areas, to assist the planning and implementation of public policies. As a technology presented in the last decades, the so-called Terrestrial Mobile Mapping Systems - TMMS, are capable of providing cost and time effective data acquisition, they are composed primarily by a Navigation System and an Imaging System, both mounted on a rigid platform, attachable to the top ofa ground vehicle. In this context, it is proposed the creation of a low-cost TMMS, which has the feature of imaging in the near-infrared (NIR) where the vegetation is highly discriminable. After the image acquisition step, it becomes necessary for the semantic segmentation of vegetation and non-vegetation. The current state of the art algorithms in semantic segmentation scope are the Convolutional Neural Networks - CNNs. In this study, CNNs were trained and tested, reaching a mean value of 83% for the Intersection Over Union (IoU) indicator. From the results obtained, which demonstrated good performance for the trained neural network, it is possible to concludethat the developed TMMS is suitable to capture data regarding urban vegetation.
Chapter
Deep learning has become a widely practiced approach in research arenas related to civil infrastructures. Monitoring concrete structures is time-consuming, costly, unsafe, and laborious. Instead of manual inspection, the deep learning approach increases more possibility to automate this inspection process helping to mitigate future risk. This study introduces an automatic concrete surface crack detection and classification technique using a deep learning architecture, namely Xception to alleviate the risks due to deteriorating structure conditions. At first, the Xception model was trained and tested on a public dataset consisting of cracked and non-cracked images, and the model has shown superior accuracy in two-class classification. Afterward, the cracked sub-dataset was split into two classes–horizontally cracked and vertically cracked using a traditional computer vision approach to determine the inclination angle of a crack. The proposed deep learning model was trained on the newly formed dataset and performed remarkably in three-class classification as well. This paper demonstrates the proposed model's effectiveness, performance, and findings, providing a reference for concrete surface crack detection and classification for related domains.
Conference Paper
Full-text available
Una parte importante del coste de las explotaciones forestales madereras se deriva del método de cubicación empleado. En este sentido, la cubicación manual de madera apilada resulta ineficiente e imprecisa, siendo sus alternativas demasiado costosas. Por ello, han surgido tecnologías basadas en sensores ópticos de bajo coste que aplican algoritmos de Visión Artificial e Inteligencia Artificial para obtener estimaciones del volumen de madera. En esta investigación, hemos aplicado una Red Neuronal Convolucional (CNN) para detectar y segmentar testas de Pinus radiata D. Don cargadas sobre camiones. Para ello, hemos entrenado el algoritmo Mask R-CNN usando una base de datos de 135 imágenes de cargamentos de madera (5.381 trozas) tomadas con orientación, iluminación y resolución variables. Estas imágenes se procesaron con el fin de incrementar la cantidad de datos disponibles de 135 a 418 imágenes, utilizándose el 60% de estas para entrenar el modelo, y el 40% restante para validarlo. Nuestros resultados preliminares muestran que el modelo ha detectado más del 95% de las testas con un error en la estimación de su superficie inferior al 3.6%.
Article
Full-text available
Learning sparse feature representations is a useful instrument for solving an unsupervised learning problem. In this paper, we present three labeled handwritten digit datasets, collectively called n-MNIST. Then, we propose a novel framework for the classification of handwritten digits that learns sparse representations using probabilistic quadtrees and Deep Belief Nets. On the MNIST and n-MNIST datasets, our framework shows promising results and significantly outperforms traditional Deep Belief Networks.
Article
Full-text available
The purpose of this paper is to evaluate spatial resolution effects on image classification. Classification maps were generated with a maximum likelihood (ML) classifier applied to three multi-spectral bands and variance texture images. A total of eight urban land use/cover classes were obtained at six spatial resolution levels based on a series of aggregated Colour Infrared Digital Orthophoto Quarter Quadrangle (DOQQ) subsets in urban and rural fringe areas of the San Diego metropolitan area. The classification results were compared using overall and individual classification accuracies. Classification accuracies were shown to be influenced by image spatial resolution, window size used in texture extraction and differences in spatial structure within and between categories. The more heterogeneous are the land use/cover units and the more fragmented are the landscapes, the finer the resolution required. Texture was more effective for improving the classification accuracy of land use classes at finer resolution levels. For spectrally homogeneous classes, a small window is preferable. But for spectrally heterogeneous classes, a large window size is required.
Article
Full-text available
This paper presents a systematic analysis of twenty four performance measures used in the complete spectrum of Machine Learning classification tasks, i.e., binary, multi-class, multi-labelled, and hierarchical. For each classification task, the study relates a set of changes in a confusion matrix to specific characteristics of data. Then the analysis concentrates on the type of changes to a confusion matrix that do not change a measure, therefore, preserve a classifier’s evaluation (measure invariance). The result is the measure invariance taxonomy with respect to all relevant label distribution changes in a classification problem. This formal analysis is supported by examples of applications where invariance properties of measures lead to a more reliable evaluation of classifiers. Text classification supplements the discussion with several case studies.
Article
This paper analyzes both the isolated influence of illumination quality in 2D facial recognition and also the influence of contrast measures in large-scale recognition of low-resolution natural images. First, using the Yale Face Database B, we have shown that by separately estimating the illumination quality of facial images (through a fuzzy inference system that combines average brightness and global contrast of the patterns) and by recognizing the same images using a multilayer perceptron, there exists a nearly-linear correlation between both illumination and recognition results. Second, we introduced a new contrast measure, called Harris Points Measured Contrast (HPMC), which assigns values of contrast in a more consistent form to images, according to their recognition rate than other global and local compared contrast analysis methods. For our experiments on image contrast analysis, we have used the CIFAR-10 dataset with 60,000 images and convolutional neural networks as classification models. Our results can be considered to decide if it is worth using a given test image, according to its calculated contrast applying the proposed HPCM metric, for further recognition tasks.
Article
Deep learning algorithms are a subset of the machine learning algorithms, which aim at discovering multiple levels of distributed representations. Recently, numerous deep learning algorithms have been proposed to solve traditional artificial intelligence problems. This work aims to review the state-of-the-art in deep learning algorithms in computer vision by highlighting the contributions and challenges from over 210 recent research papers. It first gives an overview of various deep learning approaches and their recent developments, and then briefly describes their applications in diverse vision tasks, such as image classification, object detection, image retrieval, semantic segmentation and human pose estimation. Finally, the paper summarizes the future trends and challenges in designing and training deep neural networks.
Article
This book is aimed to provide an overview of general deep learning methodology and its applications to a variety of signal and information processing tasks. The application areas are chosen with the following three criteria: 1) expertise or knowledge of the authors; 2) the application areas that have already been transformed by the successful use of deep learning technology, such as speech recognition and computer vision; and 3) the application areas that have the potential to be impacted significantly by deep learning and that have gained concentrated research efforts, including natural language and text processing, information retrieval, and multimodal information processing empowered by multi-task deep learning. In Chapter 1, we provide the background of deep learning, as intrinsically connected to the use of multiple layers of nonlinear transformations to derive features from the sensory signals such as speech and visual images. In the most recent literature, deep learning is embodied also as representation learning, which involves a hierarchy of features or concepts where higher-level representations of them are defined from lower-level ones and where the same lower-level representations help to define higher-level ones. In Chapter 2, a brief historical account of deep learning is presented. In particular, selected chronological development of speech recognition is used to illustrate the recent impact of deep learning that has become a dominant technology in speech recognition industry within only a few years since the start of a collaboration between academic and industrial researchers in applying deep learning to speech recognition. In Chapter 3, a three-way classification scheme for a large body of work in deep learning is developed. We classify a growing number of deep learning techniques into unsupervised, supervised, and hybrid categories, and present qualitative descriptions and a literature survey for each category. From Chapter 4 to Chapter 6, we discuss in detail three popular deep networks and related learning methods, one in each category. Chapter 4 is devoted to deep autoencoders as a prominent example of the unsupervised deep learning techniques. Chapter 5 gives a major example in the hybrid deep network category, which is the discriminative feed-forward neural network for supervised learning with many layers initialized using layer-by-layer generative, unsupervised pre-training. In Chapter 6, deep stacking networks and several of the variants are discussed in detail, which exemplify the discriminative or supervised deep learning techniques in the three-way categorization scheme. In Chapters 7-11, we select a set of typical and successful applications of deep learning in diverse areas of signal and information processing and of applied artificial intelligence. In Chapter 7, we review the applications of deep learning to speech and audio processing, with emphasis on speech recognition organized according to several prominent themes. In Chapters 8, we present recent results of applying deep learning to language modeling and natural language processing. Chapter 9 is devoted to selected applications of deep learning to information retrieval including Web search. In Chapter 10, we cover selected applications of deep learning to image object recognition in computer vision. Selected applications of deep learning to multi-modal processing and multi-task learning are reviewed in Chapter 11. Finally, an epilogue is given in Chapter 12 to summarize what we presented in earlier chapters and to discuss future challenges and directions.
Conference Paper
Measurement of image quality is crucial for many image-processing algorithms. Traditionally, image quality assessment algorithms predict visual quality by comparing a distorted image against a reference image, typically by modeling the human visual system (HVS), or by using arbitrary signal fidelity criteria. We adopt a new paradigm for image quality assessment. We propose an information fidelity criterion that quantifies the Shannon information that is shared between the reference and distorted images relative to the information contained in the reference image itself. We use natural scene statistics (NSS) modeling in concert with an image degradation model and an HVS model. We demonstrate the performance of our algorithm by testing it on a data set of 779 images, and show that our method is competitive with state of the art quality assessment methods, and outperforms them in our simulations.