Reverse Image Search for Scientiﬁc Data within and
beyond the Visible Spectrum
Flavio H. D. de Araujoa,b,c,d,∗
, Romuere R. V. e Silvaa,b,c,d, Fatima N. S. de
Medeirosc, Dilworth D. Parkinsonb, Alexander Hexemerb, Claudia M.
Carneiroe,f, Daniela M. Ushizimaa,b
aUniversity of California, Berkeley, CA, USA
bLawrence Berkeley National Laboratory, Berkeley, CA, USA
cFederal University of Cear´a, Fortaleza, CE, Brazil
dFederal University of Piau´ı, Picos, PI, Brazil
eFederal University of Ouro Preto, Ouro Preto, MG, Brazil
fUniversity of Manchester, England
The recent explosion in the rate, quality and diversity of image acquisition sys-
tems has propelled the development of tools to organize and ﬁnd pictures more
eﬃciently. This paper introduces our software tool pyCBIR for content-based
image retrieval that enables fast searches and ranking of samples from large pic-
torial datasets. The reverse image search module within pyCBIR exploits Con-
volutional Neural Networks, such as the Lenet and Resnet, constructed with
TensorFlow, and wrapped with a user-friendly Qt interface. We test pyCBIR
search capabilities applied to scientiﬁc datasets across and beyond the visible
spectrum, with examples from atomic diﬀraction patterns, cell microscopy, and
materials science imagery. In addition, we demonstrate the versatility and accu-
racy of pyCBIR when handling both binary and multiclass tasks during recovery
of consistent items with accuracy over 95% in most cases. Our experimental
results include both simulation-based datasets as well as experimental observa-
tional data ranging from thousands to millions of image samples.
∗Corresponding author:ﬂavio86@ufpi.edu.br; phone:+1 510 4864061
Email addresses: email@example.com (Romuere R. V. e Silva), firstname.lastname@example.org
(Fatima N. S. de Medeiros), DYParkinson@lbl.gov (Dilworth D. Parkinson),
email@example.com (Alexander Hexemer), firstname.lastname@example.org (Claudia M. Carneiro),
email@example.com (Daniela M. Ushizima)
Preprint submitted to Expert Systems with Applications March 12, 2018
Keywords: Reverse image search, content-based image retrieval, scientiﬁc
images, Convolutional Neural Network.
With the increased availability of large data repositories, a substantial amount
of time is spent searching for pictures, which became an ineﬃcient and cumber-
some procedure lately. Recent reports (Bethel et al., 2015; GE Digital, 2016;
Evans, 2016) point out that the growth in data size, rates and variety is sig-
niﬁcant; they also suggest that scientiﬁc data will grow twice as quickly as any
other sector while less than 3% of that data will be tagged to be used in a
meaningful way. Several research imaging facilities will soon be generating 1 to
50 petabytes of data per year. This amount of data pose several challenges: (a)
inadequate or insuﬃcient meta-data describing experimental records; (b) the
impracticality of manual curation of massive datasets; and (c) the lack of tools
adapted to the new data acquisition modes.
A critical need is to create accurate methods to query and retrieve images,
since “everyone searches all the time” (Eckstein, 2011). As an important step,
some tools for photo organization have improved to include operations such as
sorting and categorization by dates or media types, in addition to referencing
images through user annotations and tags. However, manual curation is seldom
achievable at scale, and even impossible in some scenarios, such as with high-
throughput imaging instruments. What is needed are more sophisticated tools
that can retrieve relevant items based on a data-driven taxonomy.
Recommender systems, also known as reverse image search tools, also repre-
sent an excellent opportunity for data reduction, in which the imaging acquisi-
tion, data collection and storage can be tailored based on a desired sample pat-
tern. For example, the inability to adjust experimental parameters fast enough
for optimal data collection leads to dire challenges at imaging facilities, with
scientists distressed by overwhelming amounts of useless data.
2. pyCBIR: an environment for scientiﬁc image retrieval
In this paper, we introduce our recommender tool, pyCBIR, which provides
a ranking system that enables humans to quickly interact with images from di-
verse domains and experiments, and oﬀers mechanisms to incorporate labeled
data when available, as discussed in other recommendation systems (Yu et al.,
2016; Kang et al., 2016). In this paper, we focus on data ﬂows using three dif-
ferent architectures of Convolutional Neural Networks (CNN) to extract image
signatures. pyCBIR delivers the classiﬁcation accuracy supported by Lenet (Le-
cun et al., 1998) and Inception-ResNet-v2 (Szegedy et al., 2016) architectures,
exploiting optimized routines from TensorFlow (Abadi et al., 2015).
The main contributions of this paper are as follows:
1. We design and develop pyCBIR, a visual image search engine able to learn
compact similarity-preserving signatures for recovering image relevance
based on approximate ranking, and including 10 schemes to measure dis-
tance based on diﬀerent sets of features;
2. We deliver, test and compare three CNN implementations that achieve
highly accurate results, with instances that rely upon very deep and com-
plex networks (Zhang et al., 2015a; Hong et al., 2015; Zeng et al., 2015). In
addition, pyCBIR contains algorithms that help to generalize and quickly
extend training datasets. This paper describes CNN models with increas-
ing complexities tested against four science problems, and reports on ac-
curacy and time consumption given diﬀerent architectural choices;
3. pyCBIR provides an interactive system to ﬁnd images from diverse do-
mains through an intuitive IDE, helpful to users with diverse science
backgrounds, and relevant to science questions using labeled and unla-
4. This paper improves reproducibility and support to code benchmarks by
performing tests using image repositories publicly available, and construct-
ing software based on open-source tools.1.
Figure 1: Diagram showing how data moves through pyCBIR modules during reverse image
search: gray box emphasizes the starting point given a trained state.
The two main sets of algorithms at the core of pyCBIR that drive reverse im-
age search of scientiﬁc data are CBIR and CNN, which targets material recog-
nition at scale using massive image collections, as described in the following
3. Related work
Previous eﬀorts to optimize image search tasks employing content-based im-
age retrieval (CBIR) systems (Zhang et al., 2015a; Hong et al., 2015; Zhang
et al., 2016) exploit computer vision and machine learning algorithms to repre-
sent images in terms of more compact primitives. Given an image as an input
query, instead of keywords or metadata, such approach allows matching sam-
ples by similarity. Several free engines for CBIR are thriving at e-commerce
1Source codes/data to be published upon paper acceptance at camera.lbl.gov
tasks (Yu et al., 2016; Shamoi et al., 2015), but underlying codes remain closed,
and has scarcely been deployed for use with scientiﬁc images.
The term CBIR was introduced in 1992 by Kato (Kato, 1992; Hirata and
Kato, 1992), and it has been associated with systems that provide image match-
ing and retrieval for queries performed by visual example. A quarter of century
later, most image retrieval systems available for scientiﬁc image search still rely
on keyword-based image retrieval (van den Broek et al., 2005), although most
of the image collections generated by humans lack proper annotations (Bethel
et al., 2015).
With the advent of deep learning (Goodfellow et al., 2016; Guo et al., 2016)
and the ability to extrapolate annotations from curated datasets (Ushizima
et al., 2016), new systems promise to move human experience from hardly rele-
vant retrievals to broadly useful results that can achieve over 85% accuracy. For
example, the application Google Photos has provided automated and custom-
labeling features since 2015, so that users can quickly organize large photo
collections (JR Raphael, 2015), with high retrieval accuracies for face detection.
These are signiﬁcant strides toward automating image catalogs, and motivates
our eﬀorts to construct TensorFlow-based CNNs to organize scientiﬁc data.
TensorFlow (Abadi et al., 2015), an open-source software library for Machine
Intelligence, presents advantages regarding ﬂexibility, portability, performance,
and compatibility to GPU. In order to deliver high-performance C++ code,
TensorFlow uses the Eigen linear algebra library in addition to CUDA numerical
libraries, such as cuDNN to accelerate core computations and scale to large
datasets. In pyCBIR, we use python to model the dataﬂow graph, which is
responsible for coordinating execution of operations that transform inputs into
ranked image samples, whose labels can be used in the calculation of uncertainty
A typical CNN pipeline is shown in Figure 2, consisting of three main “neu-
ral” layers: convolutional layers, pooling layers, and fully connected layers. This
algorithm requires two stages for training the network: (a) forward stage, which
represents the input image in each layer and outputs a prediction used to com-
Figure 2: Main components of the CNN layers for feature extraction: each step illustrates
the ﬁber dataset transformation, although the same CNN architecture applies to the other
pute the loss cost based on the curated data (labeled samples), and (b) backward
stage, which computes the gradients of layer parameters to drive the cost func-
tion to very low values (Guo et al., 2016; Goodfellow et al., 2016).
By exploring algorithms that learn features at multiple levels of abstrac-
tion from large datasets, CBIR systems can beneﬁt from complex non-linear
functions within CNNs that map unprocessed input data to the results, bypass-
ing human-designed features reliant on domain knowledge. Wan et al. (Wan
et al., 2014) investigated diﬀerent deep learning frameworks for CBIR when ap-
plied to natural images, such as the ILSVRC2012 (ImageNet Large Scale Visual
Recognition Challenge) dataset. That paper reported mean average precision
of 0.4711 using a massive image collection with 10,000,000 hand-labeled images
depicting 10,000+ object categories as training.
Apart from natural scenes, recent work on recognizing material categories
from images (Bell et al., 2014; Liu et al., 2010; Sharan et al., 2014; Zhang et al.,
2015b) includes experiments using the Flickr Material Dataset (FMD) and/or
the Materials in Context Database (MINC). Sharan et al (Sharan et al., 2014)
explored low and mid-level features, such as color, SIFT (Scale-Invariant Fea-
ture Transform), HOG (Histogram of Oriented Gradients), combined to an aug-
mented Latent Dirichlet Allocation model under a Bayesian generative perspec-
tive, achieving 44.6% accurate recognition rate on FMD. Using a CNN-based
feature extraction mechanism, Bell et. al. (Bell et al., 2014) designed materials
recognition frameworks employing two Caﬀe-based (Jia et al., 2014) architec-
tures: AlexNet and GoogLenet trained on materials patches from the MINC
database, achieving accuracy of 79.1% and 83.3%, respectively. Additionally,
attempts to train CNNs with public datasets such as FMD were less favorable
since FMD alone contained a small amount of samples. Moreover, those authors
noticed that the AlexNet trained on the MINC database performed better on
FMD classiﬁcation, showing 66.5% accuracy.
In (Zhang et al., 2015b), diﬀerent authors continued investigating material
categories, now tuning a Caﬀe VGG-D pre-trained using two datasets: MINC
and ILSVRC2012 (Russakovsky et al., 2015) for image classiﬁcation. They
resort to a customized feature selection and integration method to concatenate
the values of the 7th layer for both networks. Finally, the integrated features
were input to a support vector machine (SVM) with radial basis function kernels
for training and testing; they improved accuracy to 82.3% for the FMD images.
The next section describes our experiments on materials databases such
as FMD, as well as novel databases of scientiﬁc images, and diﬀerent CNN
architectures with increasing levels of complexity.
4. Description of Scientiﬁc Databases
We have explored deep learning and considered samples from diverse imag-
ing systems, varying in terms of their electromagnetic wave interaction with the
samples and space scale. These sample collections require nontrivial mathemat-
ical methods to index and recover relevant results, taking into account image
composition and structure. We summarize the four databases under investiga-
tion in Table 1, describing the main characteristics shown by ﬁbers, ﬁlms, cells
and other materials.
4.1. Fiber proﬁles
The ﬁber database consists of volume cross-sections, based on hard X-ray mi-
crotomograph (microCT) from ceramic matrix composites (CMC). This image
technique allows for inspection of the structural properties and quality control,
Table 1: Scientiﬁc data under investigation: experimental speciﬁcations, number of samples
and respective image size.
Specimen Modality #Samples
Target data analysis
Ceramic composite X-ray
Detection of ﬁber proﬁles
from 3D cross-sections.
Sec. 4.1. Fig. 2.
Thin ﬁlms GISAXS 4,024,789
Classiﬁcation of simulated
scattering patterns into
space groups. Sec. 4.2.
Pap smears Light
Inspection of cervical cell
morphology for cancer de-
tection. Sec. 4.3. Fig. 4.
Materials patterns Photography
Inspection of molecular
structure with 2D orienta-
tion classiﬁcation. Sec. 4.4.
while exposing CMC samples to high temperature and tensile forces, causing
multiple deformations (Bale et al., 2012; Ushizima et al., 2014).
Figure 2 illustrates a CMC sample cross-section. The sample is 1mm in
diameter and 55mm length, reinforced with hundreds of ceramic ﬁbers of ap-
proximately 10µm diameter. Each ﬁber is coated with a boron nitride layer,
which has a lower X-ray absorption coeﬃcient, therefore ﬁber cross-sections ap-
pear as dark rings. Frequently, the 3D images are examined manually, slice by
slice (2D), in order to identify defects. As an alternative, there is increasing
interest in automation by designing “inspecting bots”, i.e., algorithms that de-
tect mechanical deformations and sort the experimental instances (e.g. image
stacks) according to the structural damage. The ﬁrst step in Figure 2 highlights
a ﬁber cross-section, a structure commonly used as a guiding pattern to detect
other ﬁbers Ushizima et al. (2014, 2016). A major demand is the ability to
perform pattern ranking which can steer data management needed by beamline
Our paper reports results on labeled samples that went through a triage
including both automated segmentation methods based on traditional computer
vision (Bale et al., 2012; Alegro et al., 2016; ?) and visual inspection by domain
scientists. To the best of our knowledge, all the images contain accurate labels,
which are used to determine the success rate of the CNNs. Among the ≈3
million available labeled images, half contain ﬁbers and the other half present
areas with no ﬁbers. Figure 2 shows how to obtain these ﬁber proﬁle images.
Additional ﬁber proﬁle samples are illustrated in the Appendix A.
Grazing Incidence Small Angle X-ray Scattering (GISAXS) is a method for
characterizing the nanostructural features of materials, especially at surfaces
and interfaces, which would otherwise be impossible using standard transmission-
based scattering techniques. As a surface-sensitive tool for simultaneously prob-
ing the electron density of the sample, this imaging modality supports measure-
ments of the size, shape, and spatial organization of nanoscale objects located at
the top of surfaces or embedded in mono- or multi-layered thin ﬁlm materials.
Individual GISAXS images serve as static snapshots of nanoscale structure,
while successive images provide a means to monitor and probe dynamical pro-
cesses. Although microscopy techniques provide valuable local information on
the structure, GISAXS is the only method to provide statistical information at
the nanometer level (Hexemer and M¨uller-Buschbaum, 2015). A major bottle-
neck preventing GISAXS from reaching its full potential has been the availability
of curated data, analysis methods and modeling resources for interpreting the
experimental data (Chourou et al., 2013).
In order to advance GISAXS diﬀraction image understanding and usability,
we have used GISAXS simulation codes to generate more complete catalogs
of potential experimental outcomes. Our paper takes data from HipGISAXS,
(a) cubic (b) bcc (c) fcc (d) hcp
(e) µcubic (f) µbcc (g) µf cc (h) µhcp
(i) σcubic (j) σbcc (k) σfcc (l) σhcp
Figure 3: GISAXS diﬀraction patterns of crystal lattices: (a-d) a sample of each structure
class, (e-h) images representing the samples average (µ) and (k-o) the standard deviation (σ)
of a subset of 1,000 random selected samples from each class.
a massively parallel simulator, developed using C++, augmented with MPI,
NVIDIA CUDA, OpenMP, and parallel-HDF5 libraries, to take advantage of
large-scale clusters of multi/many-cores and graphics processors. HipGISAXS
currently supports Linux, Mac and Windows based systems. It is able to har-
ness computational power from any general-purpose CPUs including state-of-
the-art multicores, as well as NVIDIA GPUs and Intel processors, delivering
experimental simulations at high resolutions (Hexemer and M¨uller-Buschbaum,
2015; Hexemer, 2016). Using the HipGISAXS code, beamline scientists can cre-
ate sample image data with scattering patterns corresponding to four diﬀerent
(a) (b) (c) (d) (e) (f) (g) (h)
Figure 4: Examples of cell images from CRIC database: abnormal cells (a-d) and normal cells
crystal unit cell structures or lattices, such as Cubic: Simple cubic (8 on the
corners of a cube), BCC: Body Centered Cubic (8 on corners, one in the center
of cube), FCC: Face Centered Cubic (8 on corners, one in the center of each face
of cube), and HCP: Hexagonal Close Packed (non-cubic, but one of the most
commonly occurring lattices).
4.3. Cervical Cells
Cervical cancer is the third most frequent type of tumor in the female
populations, especially aﬀecting developing and populous countries, such as
Brazil (Lowy, 2011), China and India. According to the National Cancer Insti-
tute (NCI) (Nci, 2017), Pap tests are an essential mechanism to detect abnormal
cervical cells (Lu et al., 2017) as part of regular screenings, which can reduce
cervical cancer rates and mortality by 80 percent. However, cervical cancer
continues to be the fourth leading cause of cancer deaths in Brazil (Inca, 2016),
where most of the female population depends on visually-screened cervical cytol-
ogy from routine conventional Pap smears. Although more than 80% of exams
in the U.S. use liquid-based Pap test, this protocol is more than 50% more
expensive than conventional Pap smears, and remains unavailable for most of
As a viable alternative, we have improved the analysis of Pap smears by
using computer vision allied to CNN, targeting the increase of the number of
ﬁelds of view and reduction of false negative during the inspection of microscopic
slides. Sparking more interest in supporting cell analysis using conventional Pap
smears, pathologists within our team were granted access to an anonymized
image database from the Brazilian Public Health System (of Health, 2016),
containing samples from a heterogeneous population across age, race, and socio-
economical status. A large portion of these images is available through the Cell
Recognition for Inspection of Cervix (CRIC) database that catalogs numerous
cases of cervical cells, classiﬁed according to the Bethesda System as atypical
squamous cells of high risk and undetermined signiﬁcance (#ASCH=470 and
#ASCUS=116), normal (#Normal=343), low-grade and high-grade squamous
intra-epithelial lesions (#LSIL=115 and #HSIL=1,018), and invasive carcinoma
(#IC=60). This paper uses a subset of the CRIC database2, previously classiﬁed
by at least three cyto-pathologists, comprising 169 digitized Pap smear glass
slides, which results in 3,393 cervical cells with normal or abnormal morphology,
including overlapping cells.
Figure 4 displays image samples of the CRIC collection digitized from con-
ventional Pap smears, containing special characteristics such as coming from a
broad racial diversity, which is a trace of the Brazilian population.
4.4. Public images: the Flicker Material Database
The Flicker Material Database (FMD) (Sharan et al., 2014) was designed to
facilitate progress in material recognition, and it contains real world snapshots
of ten common material categories: fabric, foliage, glass, leather, metal, paper,
plastic, stone, water, and wood, as illustrated in Figure 5. According to Sharan
et al. (Sharan et al., 2014), each image in this database (100 images per cate-
gory) was selected manually from Flickr.com to ensure a variety of illumination
conditions, compositions, colors, texture surface shapes, material sub-types, and
The intentional diversity of FMD reduces the chances that simple or low-level
information descriptors, e.g., color or ﬁrst order intensity features, are enough
to distinguish material categories. Strategies to construct middle-level features
2Original images will be posted at http://cricdatabase.com.br/ upon paper acceptance.
(a) Fabric (b) Foliage (c) Glass (d) Leather (e) Metal
(f) Paper (g) Plastic (h) Stone (i) Water (j) Wood
Figure 5: Examples of the Flicker Material Database for of its each classes.
have enabled accuracy improvements in materials recognition problems (Cim-
poi et al., 2014), especially when including larger materials databases, such as
MINC (Bell et al., 2014). This previous research on FMD description and learn-
ing schemes points out to limitations of using FMD alone as the training data
source. We address some of these gaps, using two model training approaches as
discussed in the next section.
5. Methods in pyCBIR
One of the main challenges in image recognition consists in performing tasks
that are easy for humans to do intuitively, but hard to describe formally (Good-
fellow et al., 2016). This situation happens frequently among domain scien-
tists Donatelli et al. (2015), who are visually trained to identify complex pat-
terns from their experimental data, although many times are unable to describe
mathematically the primitives that construct the motif. Data-driven algorithms
that learn from accumulated experience, such as those in pyCBIR, can support
software tools to rank image sets in face of (a) the diﬃculties to obtain speciﬁc
knowledge needed for modeling, (b) limitation in learning the intricacies of ev-
ery science domain, and (c) restricted generalization of hard-coded knowledge
This section describes how pyCBIR uses CNN in order to provide data reduc-
tion by automatically learning compact signatures that represent each image.
An essential step required to organize the database is to construct models for
diﬀerent science problems in conjunction with algorithms for enhanced search
experience. The next sections explain how we enable CNN use to obtain image
characteristics and rank images by similarity. Although we omit results using
classic feature extraction methods in pyCBIR, they are also available and include
Gray-Level Co-Occurrence Matrix (GLCM), Histogram of Oriented Gradients
(HOG), Histogram features, Local Binary Pattern (LBP) and Daisy (van der
Walt et al., 2014).
5.1. Neural network, CNN and topology
The way in which artiﬁcial neurons connect to each other speciﬁes the topol-
ogy of the neural network (NN), a preponderant attribute to their overall mode
of operation and scalability. In supervised learning, the traditional topology
is the fully connected, three-layer, feed-forward network; in computer vision
problems, this has been mostly replaced by NN layers that explore local con-
nectivity. Independently of the topology, learning involves modifying/updating
the weights of the network connections (Miikkulainen, 2010) through the pro-
cessing of large data amounts that optimizes a speciﬁc model. However, diﬀerent
NN topologies determine structural compositions that deﬁne the learning pro-
cess within a NN architecture, e.g. with backpropagation, to enhance feature
selection, recurrent memory, abstraction, or generalization.
Among the several NN design options, this paper explores two diﬀerent CNN
architectures: (a) the Lenet (Lecun et al., 1998), a neuronal arrangement that
switches from fully connected to sparsely connected neurons allowing feedback
in real-time for most applications, particularly when considering graphic card
units for computation. Due to its simplicity, training often performs well with
a smaller number of examples in comparison with deeper NN, however mostly
inaccurate when dealing with complex recognition problems; and (b) Inception-
ResNet-v2 (Szegedy et al., 2016), a deeper and wider architecture formed by
multiple sub-networks, in which hierarchical layers promote many levels of non-
linearity needed for more elaborated pattern classiﬁcation. This model requires
roughly twice as much memory and computation as the previous version (In-
ception v3 (Szegedy et al., 2015)), but it has demonstrated to be more accurate
than previous state-of-the-art models, particularly when considering tasks such
as the Top-1 and Top-5 recommendations using the ILSVRC2012 benchmark
5.2. Query with signatures from CNN layer
The goal of reverse image search engines is to enable high-level query using
pictures instead of, or, in addition to, metadata, keywords or watermarks as
the main mechanism to retrieve relevant samples that match the query. As
in any data-driven application, the image database quality highly inﬂuences
the retrieved results, but several other components play a major role, such as
image properties represented as features vectors, or signatures, and respective
Typically, CNN forms a hierarchical feature extractor that maps the input
image into increasingly reﬁned features, which serve as input to a fully con-
nected layer that solves the classiﬁcation. Figure 2 illustrates the alternating
convolutional and pooling layers, which transform and reduce the input data be-
fore it reaches deeper stages as the fully connected layer. Notice that we bypass
the classiﬁcation layer to use the features as signatures to drive the retrieval
process, so pyCBIR can search and match the current image-query to the most
similar samples in the database using “machine-designed” features.
With the purpose of suggesting a category based on a pre-deﬁned taxonomy,
we denote Yclasses to an image database X, reduced into signatures, consisting
of nimages xi, for 1 ≤i≤n. Searches will occur in a h-dimensional space,
obtained through xitransformation throughout the CNN. Alternatively, prin-
cipal component analysis (PCA) can be used in order to reduce the signature
dimensionality to improve the computational cost, while preserving the retrieval
The pyCBIR retrieval module will recognize similar samples through a simi-
larity function S, so that S(xi, xq) returns relevant items as well as the respective
uncertainty value. In other words, the engine returns the top kmost similar
images, their respective yjfor a query-image xq∈Rh×n, and S(xi, xq) here is
deﬁned as follows:
S(xi, xq) = xi·xq
(||xi|| ∗ ||xq||).(1)
where ·is the dot product.
Although we report our results using the cosine similarity metric, other dis-
tance metrics are available through the pyCBIR graphical user interface, includ-
ing Euclidean distance, Inﬁnity distance, Pearson correlation, Chi-square dis-
similarity, Kullback-Leibler divergence, Jeﬀrey divergence, Kolmogorov-Smirnov
divergence, Cramer divergence, and Earth mover’s distance (Jones et al., 2001).
5.3. Indexing and searching methods
Quick feedback when searching images by similarity is essential, but a linear
search through a database with millions of items may lead to unacceptable
waiting times. Therefore, after the computation of the image signatures, we
map them to a lower dimensional space using PCA, and use the most signiﬁcant
components as input to an indexing algorithm.
We propose an indexing routine that employs the Locality Sensitive Hash-
ing Forest (LSH) (Bawa et al., 2005), whose polynomial cost and sub-linear
query time considerably speed up the information retrieval. This algorithm de-
livers eﬃcient approximate nearest-neighbor queries by improving the original
LSF (Indyk and Motwani, 1998) scheme, which otherwise would require tun-
ing parameters, such as the number of features and the distance radius, as a
function of the data domain. Moreover, it enables fast signature insertion and
deletion while ensuring minimal use of storage.
By using LSH, our routine allows selection of a small set of potential im-
ages to be compared against the image-query; this happens because similar
images, according to some metric, e.g., cosine similarity, are more likely to hash
to the same bucket. We used random projection as the hash function to ap-
proximate the cosine distance between vectors, so that pyCBIR can translate
CNN-calculated signatures into 32-bit ﬁxed-length hash values. In addition,
we store pre-computed versions of the LSH in disk for future analogous search
requests using the same trained model.
Algorithm 1 shows the steps of the LSH-based indexing scheme before re-
turning the result Rof ranked outputs, where |.|is the length of a set, sis
a image signature within the set S,s’ is the transformed (hashed) signature.
The LSH-based indexing function retrieves the approximate nearest neighboring
items from the hash table, and its output are the kmost similar images to the
query-image set Q.
5.4. Database augmentation
CNN-based recognition systems often require a large number of examples in
order to ﬁne-tune models during the training stage and deliver accurate classi-
ﬁcation results. A few strategies are commonly devised to deal with relatively
small datasets, for example, modiﬁcations of the original observations to gener-
ate new ones, following expected distortions given certain degrees of freedom.
In this context, data augmentation consists in applying mathematical transfor-
mations to typical samples in order to generate new images that are slightly
diﬀerent, but relatively similar so that they will belong to the same class. The
most common image transformations are scaling, translations, rotations, noise
addition and blurring.
We include the augmentation as an extra processing step as part of the CNN
training using the cell and FMD databases. Both databases presented a limited
amount of samples, here less than 2,000 per class. Therefore we performed 12
translations (three values in each direction: cells 7, 14 and 20 pixels; FMD 8,
16 and 24 pixels) and 3 rotations (every 90o) to each image, augmenting the
dataset by a factor of 51.
Algorithm 1: LSH-based indexing.
-Qset of signatures from query-images.
-knumber of image matches.
-Rranked output (top-kmatches).
2if ∃LSH previously computed then
3Read the LSH from ﬁle;
6LSH.add(Random projection hash(s));
9Create Rwith |Q|lines and kcolumns;
10 for q∈Qdo
11 Rq,k = LSH.similarity(q,k);
5.5. Evaluation metrics
We used the Mean Average Precision (MAP) (Wang et al., 2015) metric to
evaluate the quality of the retrieved images. To compute MAP, the Average
Precision score AP (Q) is deﬁned for each image Qin the rank as:
AP (Q) = PM
where P(n) is the precision at cut-oﬀ nin the rank, f(n) is equal to 1 if the
image at rank nbelongs to the same class of the query, and 0 otherwise. M
is the number of images in the rank and Nis the number of images of the
same class given by the query. The MAP score is obtained by averaging the AP
score over all images in the rank. The higher the MAP score is, the better the
We also computed the classiﬁcation accuracy rate for each class in the
databases. This accuracy was calculated by using the k-nearest neighbor (Alt-
man, 1992) for diﬀerent values of k.
6. Experimental results
This section describes how to run pyCBIR using diﬀerent datasets as well as
how to compare the output from diﬀerent network topologies when carrying out
reverse image retrieval experiments. The results refer to the databases described
in Section 4 and the algorithms discussed in Section 5.
6.1. CNN training
When deriving PCA-based signatures from LeNet or Inception-ResNet-v2
outputs, both schemes require only two parameters: the initial learning rate
and the decay factor. We set the initial learning rate as 0.1 and the decay factor
as 0.04 in the LeNet. The ﬁne-tuning operation requires an initial learning rate
smaller than the one used to train the CNN with random initialization. Here,
we report experiments setting this parameter to 0.008 and the decay factor as
0.0004. Experiments showed that slightly diﬀerent values aﬀect the training
time, but may lead to similar classiﬁcation accuracy results, as illustrated in
The number of epochs varies with respect to the databases due to the image
inter-class variation, the number of classes, and the image size, while the number
of epochs was set according to the loss function. When the loss remains constant
between epochs and it is lower than 0.1, the CNN training automatically stops.
Table 2 shows the number of epochs and processing time to train the LeNet and
the longer ﬁne tuning step needed by the Inception-ResNet-v2.
Table 2: Number of epochs and processing time to train both LeNet and Inception-ResNet-v2
neural networks using diﬀerent image databases.
Database Epochs Processing Time
Fibers 5 9min
GISAXS 5 45min
Cells 6 70min
FMD 20 42min
Fibers 21 336min
GISAXS 13 273min
Cells 17 289min
FMD 51 153min
6.2. Performance evaluation
In order to evaluate the ﬁbers and GISAXs databases, we used 10% of the
images from both databases for LeNet training, Inception-ResNet-v2 ﬁne-tuning
and auto-value estimation of the PCA. We used the other 90% of both databases
to calculate the MAP and k-accuracy. For the Cells and FMD databases, we
used 50% of the image sets for the LeNet training, Inception-ResNet-v2 ﬁne-
tuning and PCA transformation, and the other half to calculate the MAP and
k-accuracy measures. Such diﬀerent splits were necessary given the sizes of
these databases, for example, the cells and FMD databases have a few thousand
samples, which is a limited amount of data for the CNN training. Furthermore,
we augmented both training subsets using aﬃne image transformations as a
preprocessing step. To illustrate how important this step impacts our pipeline,
we performed experiments with and without data augmentation for the cell
database. The MAP value without augmentation yielded 0.85 as opposed to 0.94
when considering the augmentation step (preprocessing). The mean average
also increased, which indicates that the CNN converged to a model that best
classiﬁes the presented samples.
Figure 6 shows the MAP results for all databases in comparison with the
number of PCA components, when considering the LeNet and Inception-ResNet-
v2 pre-trained and after ﬁne-tuning. The curves show that it is possible to
reduce the number of features in the retrieval process by using 16 or less com-
ponents. We notice that this reduction improves the indexing and searching
procedures when applied to the four diﬀerent image sets.
Surprisingly, the LeNet achieved better results than Inception-ResNet-v2 re-
garding the ﬁbers, GISAXs and cell databases. One of the reasons this CNN
outperformed Inception-ResNet-v2 for these databases was due to the size of the
images. The Inception-ResNet-v2 expects an input image of 299x299, therefore
the current pipeline resizes the images from 100x100 (Cells and GISAXS) and
16x16 (Fibers) to such representation, an operation that might distorts impor-
tant aspects of the data. We also tested zero-padding operations instead of
resizing, but the results accuracy remained the same. The lower MAP and k-
accuracy values obtained by using the Inception-ResNet-v2 pre-trained are most
likely due to the learned model, which depended on a broad objects database
(ILSVRC2012), which poorly correlates to our image databases.
Regarding the FMD images, the Inception-ResNet-v2 outperformed the LeNet,
which might be inﬂuenced by the similarity between FMD and the ILSVRC2012
database. Remember that the FMD samples contain complex patterns, resem-
bling ILSVRC2012 data points. Fine-tuning the Inception-ResNet-v2 further
improves the results in relation to the pre-trained due to tuned layers customized
to extract features for the FMD data.
We also computed the classiﬁcation k-accuracy for each class of all databases,
where k∈[1,5,10,20,Ω], and Ω is the total number of images in a particular
class. Tables 3, 4, 5 and 6 conﬁrm the MAP results presented in Figure 6. Based
on these results, when there are more than a couple of thousand images and
a few classes to train for, we observed that the LeNet outperforms the deeper
network for our databases. In contrast, when there are millions of images to
ﬁne-tune the network, the Inception-ResNet-v2 performed better for a single
database, FMD, characterized by several classes and larger images. When no
classes/labels are available, the pre-trained Inception-ResNet-v2 is a promising
starting point for tasks such as image sorting.
(a) Fibers database. (b) GISAXS database.
(c) Cells database. (d) FMD.
Figure 6: MAP values in relation to the number of PCA components for the LeNet and
6.3. Time during image-query-based retrieval
In addition to accuracy, the computational cost to search images given a
query is another measure of the value of recommendation systems. Table 7
shows the computational time to retrieve kimages (equal to the database size)
given a query. We used the value kbecause it is the worst case of image searching
Table 3: Accuracy rate of the LeNet and the Inception-ResNet-v2 for the Fibers database
using diﬀerent kvalues, where Ω is the number of images of the class.
k1 5 10 20 Ω
No-Fibers 0.973 0.977 0.979 0.978 0.963
Fibers 0.975 0.988 0.991 0.990 0.996
No-Fibers 0.781 0.777 0.797 0.746 0.667
Fibers 0.925 0.978 0.980 0.983 0.975
Inception-ResNet-v2 pre trained
No-Fibers 0.727 0.743 0.736 0.743 0.167
Fibers 0.825 0.890 0.927 0.933 1.000
Table 4: Accuracy rate of the LeNet and the Inception-ResNet-v2 for the GISAXS database
using diﬀerent kvalues, where Ω is the number of images of the class.
k1 5 10 20 Ω
bcc 1.000 1.000 1.000 1.000 1.000
fcc 1.000 1.000 1.000 1.000 0.999
cubic 1.000 1.000 1.000 1.000 1.000
hpc 1.000 1.000 1.000 1.000 1.000
k1 5 10 20 Ω
bcc 1.000 0.996 0.990 0.980 0.974
fcc 0.998 1.000 1.000 1.000 0.998
cubic 1.000 0.998 1.000 0.998 0.972
hpc 1.000 1.000 1.000 1.000 0.996
Inception-ResNet-v2 pre trained
bcc 0.980 0.975 0.972 0.959 0.599
fcc 1.000 1.000 1.000 1.000 1.000
cubic 0.993 0.995 0.995 0.991 0.993
hpc 0.995 0.996 0.997 0.995 0.698
Table 5: Accuracy rate of the LeNet and the Inception-ResNet-v2 for the Cells database using
diﬀerent kvalues. Ω is the number of images of the class.
k1 5 10 20 Ω
Normal 0.962 0.970 0.975 0.977 0.969
Abnormal 0.969 0.979 0.983 0.981 0.984
k1 5 10 20 Ω
Normal 0.938 0.932 0.935 0.924 0.810
Abnormal 0.971 0.985 0.991 0.986 0.984
Inception-ResNet-v2 pre trained
Normal 0.846 0.861 0.857 0.851 0.728
Abnormal 0.893 0.929 0.947 0.952 0.957
(sort all images of the database given a query). We also computed the time for
the 1st (before LSH creation) and 2nd execution (after LSH creation).
Although the LSH-based module computes 32-bit hashes, the computational
cost dramatically increases with the dimension of the input vector as well, there-
fore we use only the most signiﬁcant principal components. Our motivations
to keep only the 16 most signiﬁcant PCA components are: (a) this subset of
components explains more than 98% of the data for each scientiﬁc domain
under investigation; (b) our results showed that the Mean Average Precision
remains constant around 16 components, as illustrated in Figure 6; (c) this rep-
resentation is more than 3 times faster than using the raw signatures with the
Inception-ResNet-v2, as Table 7 illustrates.
All computational experiments involving pyCBIR ran on a Deep Learning Ma-
chine (DevBox) with six cores of Intel Xeon E5-2643 @ 3.40 GHz, four graphics
processors GeForce GTX Titan-X and 251GB memory. However, we have been
able to run pyCBIR in standard laptops as well, but at a much higher com-
puting time and restricted to smaller subsets. Software dependencies include
Ubuntu 14.04, CUDA 7.5, cuDNN v4, Google Tensor Flow v.0.11.0 and python
3.5.2 through Anaconda 4.1.1. Also, pyCBIR relies on an assortment of packages
Table 6: Accuracy rate of the LeNet and the Inception-ResNet-v2 for the FMD using diﬀerent
kvalues, where Ω is the number of images of the class.
k1 5 10 20 Ω
Fabric 0.140 0.200 0.200 0.280 0.320
Foliage 0.240 0.360 0.440 0.440 0.360
Glass 0.120 0.200 0.260 0.140 0.040
Leather 0.140 0.300 0.200 0.080 0.100
Metal 0.360 0.540 0.540 0.480 0.520
Paper 0.200 0.280 0.240 0.340 0.260
Plastic 0.140 0.160 0.220 0.300 0.280
Stone 0.200 0.340 0.320 0.280 0.340
Water 0.220 0.340 0.320 0.280 0.280
Wood 0.500 0.700 0.760 0.700 0.700
Fabric 0.80 0.88 0.88 0.90 0.88
Foliage 0.96 0.90 0.90 0.92 0.92
Glass 0.82 0.86 0.82 0.78 0.80
Leather 0.90 0.86 0.82 0.82 0.82
Metal 0.70 0.76 0.70 0.70 0.66
Paper 0.86 0.92 0.92 0.94 0.92
Plastic 0.66 0.60 0.58 0.54 0.50
Stone 0.78 0.80 0.80 0.76 0.74
Water 0.94 0.94 0.94 0.94 0.94
Wood 0.82 0.86 0.90 0.94 0.92
Inception-ResNet-v2 pre trained
Fabric 0.680 0.800 0.820 0.760 0.740
Foliage 0.760 0.760 0.740 0.800 0.700
Glass 0.840 0.900 0.900 0.860 0.900
Leather 0.520 0.680 0.620 0.600 0.460
Metal 0.920 0.900 0.860 0.860 0.820
Paper 0.520 0.620 0.600 0.700 0.740
Plastic 0.620 0.780 0.840 0.860 0.820
Stone 0.660 0.740 0.600 0.560 0.620
Water 0.780 0.800 0.800 0.700 0.560
Wood 0.860 0.880 0.880 0.840 0.820
Table 7: Time in seconds to retrieve all database images given a query by using the LeNet
and the Inception-ResNet-v2. In the PCA results we used 16 components.
Fibers GISAXS Cells FMD
LeNet 1st 12.08 61.42* 0.020 0.015
2nd 10.35 61.75* 0.017 0.008
Inception-ResNet-v2 1st 19.41* 75.11* 0.130 0.028
2nd 19.63* 75.17* 0.032 0.009
Using PCA 1st 10.27 22.29 0.017 0.009
2nd 9.28 19.75 0.014 0.005
*use brute force search; 1st means the ﬁrst iteration and 2nd means second
iteration and beyond.
within the Python ecosystem, such as: numpy, scipy, scikit-learn, scikit-image,
PyQt5 and matplotlib.
7. Conclusions and future works
Being able to detect materials properties in real time will add an entirely
new level of experimental capability, including triage, quality control and prior-
itization. Tying this capability to the control systems at imaging instruments,
such as at synchrotron beamlines, promises to enable scientists to automatically
steer the machine in response to speciﬁc structures present in the sample with
minimum human interference.
Visual exploration of image microstructures drives many tasks performed
by cyto-pathologists and material scientists, who are able to manually curate
just a small portion of the collected experimental data. This paper showed how
our new data-driven recommendation system leveraged such curated datasets to
provide functions to automatically organize catalogs of scientiﬁc images. Taking
into account that human curated data is often limited, we also designed image
augmentation routines that allow increasing the number of samples following
typical image transformations. Next, the underlying inferential engine ranks
images using CNN for multiple data representations, and allows fast retrieval of
the top matches within each particular image set. The algorithms behind our
software tool pyCBIR enable optimization of key parameters to control perfor-
mance, such as the number of CNN layers, CNN epochs, and image sizes.
Our results showed the importance of feature reduction in the searching pro-
cess, and indicates a promising direction for the system improvement. Current
limitations are the restriction of the LSH Forest option to retrieval experiments
using the Cell and FMD databases, and brute force otherwise. Future work will
include more scalable hashing mechanisms that circumvent the maximum size of
the search tree available in scikit-learn. Another challenge will be to expand
the automated data curation capability to also extrapolate metadata to unseen
samples using visual attributes combined to natural language processing.
This work was supported by the Oﬃce of Science, of the U.S. Department
of Energy (DOE) under Contract No. DE-AC02-05CH11231, the Moore-Sloan
Foundation, CNPq (304673/2011-0, 472565/2011-7, 401120/2013-9, 401442/2014-
4,444784/2014-4, 306600/2016-1) and Fapemig (APQ-00802-11). This work
is partially supported by the DOE Advanced Scientiﬁc Computing Research
(ASCR) Early Career Award, and partially supported by the Center for Ap-
plied Mathematics for Energy Research Applications (CAMERA), which is a
partnership between Basic Energy Sciences (BES) and ASCR within DOE. We
are specially grateful to Fernando Perez for incentivizing exploration of python
packages and deployment of open-source code. Any opinion, ﬁndings, and con-
clusions or recommendations expressed in this material are those of the authors
and do not necessarily reﬂect the views of DOE or the University of California.
Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado,
G. S., Davis, A., Dean, J., Devin, M., Ghemawat, S., Goodfellow, I., Harp, A.,
Irving, G., Isard, M., Jia, Y., Jozefowicz, R., Kaiser, L., Kudlur, M., Leven-
berg, J., Man´e, D., Monga, R., Moore, S., Murray, D., Olah, C., Schuster, M.,
Shlens, J., Steiner, B., Sutskever, I., Talwar, K., Tucker, P., Vanhoucke, V.,
Vasudevan, V., Vi´egas, F., Vinyals, O., Warden, P., Wattenberg, M., Wicke,
M., Yu, Y., Zheng, X., 2015. TensorFlow: Large-scale machine learning on
heterogeneous systems. Software available from tensorﬂow.org.
Alegro, M., Amaro-Jr, E., Loring, B., Heinsen, H., Alho, E., Zollei, L., Ushizima,
D., Grinberg, L. T., 2016. Multimodal whole brain registration: Mri and high
resolution histology. In: Proceedings of the IEEE Conference on Computer
Vision and Pattern Recognition Workshops. pp. 194–202.
Altman, N. S., 1992. An introduction to kernel and nearest-neighbor nonpara-
metric regression. The American Statistician 46 (3), 175–185.
Bale, H. A., Haboub, A., et al., 2012. Real-time quantitative imaging of failure
events in materials under load at temperatures above 1,600C. Nat Mater 12,
Bawa, M., Condie, T., Ganesan, P., 2005. LSH forest: Self-tuning indexes for
similarity search. In: Fourteenth International World Wide Web Conference
Bell, S., Upchurch, P., Snavely, N., Bala, K., 2014. Material recognition in the
wild with the materials in context database. CoRR abs/1412.0623.
Bethel, W., Greenwald, M., Nowell, L., 2015. Management, visualization, and
analysis of experimental and observational data (EOD) - the convergence of
data and computing. In: DOE ASCR Workshop. DOE, pp. 2–30.
Chourou, S., Sarje, A., Li, X., Chan, E., Hexemer, A., 2013. Hipgisaxs: A high
performance computing code for simulating grazing incidence x-ray scattering
data. Journal of Applied Crystallography (6), 1781–1795.
Cimpoi, M., Maji, S., Kokkinos, I., Mohamed, S., Vedaldi, A., 2014. Describing
textures in the wild. In: Proceedings of the 2014 IEEE Conference on Com-
puter Vision and Pattern Recognition. CVPR ’14. IEEE Computer Society,
Washington, DC, USA, pp. 3606–3613.
Donatelli, J., Haranczyk, M., Hexemer, A., Krishnan, H., Li, X., Lin, L., Maia,
F., Marchesini, S., Parkinson, D., Perciano, T., Shapiro, D., Ushizima, D.,
Yang, C., Sethian, J., 2015. Camera: The center for advanced mathematics
for energy research applications. Synchrotron Radiation News 28 (2), 4–9.
Eckstein, M. P., 2011. Visual search: A retrospective. Journal of Vision 11 (5),
Evans, D., 2016. The internet of things: how the next evolution of the internet
is changing everything. http://www.cisco.com/c/dam/en_us/about/ac79/
docs/innov/IoT_IBSG_0411FINAL.pdf, accessed on Dec 13, 2016.
GE Digital, 2016. Predix: the industrial internet plat-
predix-platform-brief-ge- digital.pdf, accessed on Dec 13, 2016.
Goodfellow, I., Bengio, Y., Courville, A., 2016. Deep Learning. MIT Press.
Guo, Y., Liu, Y., Oerlemans, A., Lao, S., Wu, S., Lew, M. S., 2016. Deep
learning for visual understanding: A review. Neurocomputing 187, 27–48.
Hexemer, A., 2016. HipGISAXS: A massively-parallel high-performance x-ray
scattering data analysis code. "http://www.camera.lbl.gov/gisaxs", ac-
cessed on Dec 13, 2016.
Hexemer, A., M¨uller-Buschbaum, P., 2015. Advanced grazing-incidence tech-
niques for modern soft-matter materials analysis. IUCrJ 2 (1), 106–125.
Hirata, K., Kato, T., 1992. Query by visual example - content based image
retrieval. In: Proceedings of the 3rd International Conference on Extend-
ing Database Technology: Advances in Database Technology. EDBT ’92.
Springer-Verlag, London, UK, pp. 56–71.
Hong, C., Yu, J., Wan, J., Tao, D., Wang, M., Dec 2015. Multimodal deep
autoencoder for human pose recovery. IEEE Trans. Image Process. 24 (12),
Inca, 2016. Instituto Nacional de Cancer. http://www2.inca.gov.br/wps/
wcm/connect/tiposdecancer/site/home/colo_utero visited on 2017-11-
Indyk, P., Motwani, R., 1998. Approximate nearest neighbors: Towards remov-
ing the curse of dimensionality. In: Proceedings of the Thirtieth Annual ACM
Symposium on Theory of Computing. STOC ’98. ACM, New York, NY, USA,
Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadar-
rama, S., Darrell, T., 2014. Caﬀe: Convolutional architecture for fast feature
embedding. arXiv preprint arXiv:1408.5093.
Jones, E., Oliphant, T., Peterson, P., et al., 2001. SciPy: Open source scientiﬁc
tools for Python. http://www.scipy.org/, accessed on Dec 13, 2016.
JR Raphael, 2015. How google photos new custom-labeling feature can
help clean up your collection. https://www.computerworld.com/article/
2988232/android/google-photos-custom-labeling.html, accessed on
Oct 15, 2017.
Kang, Z., Peng, C., Cheng, Q., 2016. Top-n recommender system via matrix
completion. In: Proceedings of the Thirtieth AAAI Conference on Artiﬁcial
Kato, T., 1992. Database architecture for content-based image retrieval. In:
Proc. of SPIE Image Storage and Retrieval Systems. Vol. 1662. San Jose,
CA, USA, pp. 112–123.
Lecun, Y., Bottou, L., Bengio, Y., Haﬀner, P., 1998. Gradient-based learning
applied to document recognition. In: Proceedings of the IEEE. pp. 2278–2324.
Liu, C., Sharan, L., Adelson, E. H., Rosenholtz, R., 2010. Exploring features in
a bayesian framework for material recognition. In: CVPR. IEEE Computer
Society, pp. 239–246.
Lowy, I., 2011. A Woman’s Disease: The history of cervical cancer. Oxford.
Lu, Z., Carneiro, G., Bradley, A. P., Ushizima, D., Nosrati, M. S., Bianchi,
A. G. C., Carneiro, C. M., Hamarneh, G., March 2017. Evaluation of three
algorithms for the segmentation of overlapping cervical cells. IEEE Journal
of Biomedical and Health Informatics 21 (2), 441–450.
Miikkulainen, R., 2010. Topology of a Neural Network. Springer US, Boston,
MA, pp. 988–989.
Nci, 2017. National Cancer Institute. http://www.cancer.gov/types/
cervical/hp/cervical-screening-pdq visited on 2017-11-17.
of Health, M., 2016. Brazilian Uniﬁed Health System. http://portalsaude.
saude.gov.br/index.php/cidadao/principal/english, accessed on Dec
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang,
Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A. C., Fei-Fei, L., 2015.
ImageNet Large Scale Visual Recognition Challenge. International Journal of
Computer Vision (IJCV) 115 (3), 211–252.
Shamoi, P., Inoue, A., Kawanaka, H., 2015. Deep color semantics for e-commerce
content-based image retrieval. In: Conf. Fuzzy Logic in Artiﬁcial Intelligence.
Sharan, L., Rosenholtz, R., Adelson, E. H., 2014. Accuracy and speed of material
categorization in real-world images. Journal of Vision 14 (9), 12.
Szegedy, C., Ioﬀe, S., Vanhoucke, V., 2016. Inception-v4, inception-resnet and
the impact of residual connections on learning. Computing Research Reposi-
Szegedy, C., Vanhoucke, V., Ioﬀe, S., Shlens, J., Wojna, Z., 2015. Rethinking
the inception architecture for computer vision. Computer Vision and Pattern
Ushizima, D., Perciano, T., Krishnan, H., Loring, B., Bale, H., Parkinson, D.,
Sethian, J., Oct. 2014. Structure recognition from high resolution images of
ceramic composites. IEEE International Conference on Big Data.
Ushizima, D. M., Bale, H. A., Bethel, E. W., Ercius, P., Helms, B. A., Krish-
nan, H., Grinberg, L. T., Haranczyk, M., Macdowell, A. A., Odziomek, K.,
Parkinson, D. Y., Perciano, T., Ritchie, R. O., Yang, C., Sep 2016. Ideal:
Images across domains, experiments, algorithms and learning. The Journal of
The Minerals, Metals & Materials Society, 1–10.
van den Broek, E. L., van Rikxoort, E. M., Schouten, T. E., 2005. Human-
Centered Object-Based Image Retrieval. Springer Berlin Heidelberg, Berlin,
Heidelberg, pp. 492–501.
van der Walt, S., Sch¨onberger, J. L., Nunez-Iglesias, J., Boulogne, F., Warner,
J. D., Yager, N., Gouillart, E., Yu, T., the scikit-image contributors, 6 2014.
scikit-image: image processing in Python. PeerJ 2, e453.
Wan, J., Wang, D., Hoi, S. C., Wu, P., Zhu, J., Zhang, Y., Li, J., 2014. Deep
learning for content-based image retrieval: A comprehensive study. In: Pro-
ceedings of the ACM International Conference on Multimedia, MM ’14, Or-
lando, FL, USA. pp. 157–166.
Wang, B., Brown, D., Gao, Y., Salle, J. L., 2015. March: Multiscale-arch-height
description for mobile retrieval of leaf images. Information Sciences 302, 132
Yu, Q., Liu, F., SonG, Y.-Z., Xiang, T., Hospedales, T., Loy, C. C., 2016. Sketch
me that shoe. In: Computer Vision and Pattern Recognition. pp. 799–807.
Zeng, Y., Xu, X., Fang, Y., Zhao, K., 2015. Traﬃc Sign Recognition Using
Deep Convolutional Networks and Extreme Learning Machine. Springer In-
ternational Publishing, Cham, pp. 272–280.
Zhang, L., Shum, H. P. H., Shao, L., 2016. Discriminative semantic subspace
analysis for relevance feedback. IEEE Trans. Image Processing 25 (3), 1275–
Zhang, R., Lin, L., Zhang, R., Zuo, W., Zhang, L., Dec 2015a. Bit-scalable deep
hashing with regularized similarity learning for image retrieval and person
re-identiﬁcation. IEEE Trans. Image Process. 24 (12), 4766–4779.
Zhang, Y., Ozay, M., Liu, X., Okatani, T., 2015b. Integrating deep features for
material recognition. Computing Research Repository abs/1511.06522.
Appendix A Graphical user interface for reverse image search
pyCBIR also provides a graphical user interface (GUI) (Figure 7). The main
advantages of the pyCBIR GUI are: (a) The visual output shows both correct
and miss-classiﬁed results when ground-truth is available; (b) pyCBIR allows
to choose other feature extraction methods, besides CNN, that do not require
training or labeled samples; (c) pyCBIR contains ten diﬀerent similarity metrics
for evaluation of the searching results; (d) One can load the database from
a comma-separated values (CSV) ﬁle or simply querying the ﬁle system. As
result, pyCBIR shows the k-ranked outputs (k= 10 in Figure 7), the ﬁrst column
represents the query images. Each output has a bound box that corresponds
to the correct retrieved images (green box), and miss-classiﬁed images (red
box). For each execution pyCBIR saves the result as a portable network graphics
(PNG) ﬁle, as Figures 8, 9, 10 show. These ﬁgures illustrate the use of LeNet and
the Cell database for 11 query images chosen randomly and their corresponding
top-10 ranked outputs.
Figure 7: pyCBIR interface: retrieval options (left) with feature extraction, searching method,
retrieval number, and data paths, and retrieval results (right) with query images (ﬁrst column)
and top matches; green border indicates match, and red, misclassiﬁed images.
Figure 8: Results using the LeNet and ﬁbers database for 6 query images chosen randomly
and their corresponding top-6 ranked outputs.
Figure 9: Results using the LeNet and GISAXS database for 6 query images chosen randomly
and their corresponding top-6 ranked outputs.
Figure 10: Results using the Inception-ResNet-v2 and fmd database for 6 query images chosen
randomly and their corresponding top-6 ranked outputs.