ArticlePDF Available

Abstract and Figures

The explosion in the rate, quality and diversity of image acquisition instruments has propelled the development of expert systems to organize and query image collections more efficiently. Recommendation systems that handle scientific images are rare, particularly if records lack metadata. This paper introduces new strategies to enable fast searches and image ranking from large pictorial datasets with or without labels. The main contribution is the development of pyCBIR, a deep neural network software to search scientific images by content. This tool exploits convolutional layers with locality sensitivity hashing for querying images across domains through a user-friendly interface. Our results report image searches over databases ranging from thousands to millions of samples. We test pyCBIR search capabilities using three convNets against four scientific datasets, including samples from cell microscopy, microtomography, atomic diffraction patterns, and materials photographs to demonstrate 95% accurate recommendations in most cases. Notice: the final version of the paper can only be find at: https://www.sciencedirect.com/science/article/pii/S0957417418302987
Content may be subject to copyright.
Reverse Image Search for Scientific Data within and
beyond the Visible Spectrum
Flavio H. D. de Araujoa,b,c,d,
, Romuere R. V. e Silvaa,b,c,d, Fatima N. S. de
Medeirosc, Dilworth D. Parkinsonb, Alexander Hexemerb, Claudia M.
Carneiroe,f, Daniela M. Ushizimaa,b
aUniversity of California, Berkeley, CA, USA
bLawrence Berkeley National Laboratory, Berkeley, CA, USA
cFederal University of Cear´a, Fortaleza, CE, Brazil
dFederal University of Piau´ı, Picos, PI, Brazil
eFederal University of Ouro Preto, Ouro Preto, MG, Brazil
fUniversity of Manchester, England
Abstract
The recent explosion in the rate, quality and diversity of image acquisition sys-
tems has propelled the development of tools to organize and find pictures more
efficiently. This paper introduces our software tool pyCBIR for content-based
image retrieval that enables fast searches and ranking of samples from large pic-
torial datasets. The reverse image search module within pyCBIR exploits Con-
volutional Neural Networks, such as the Lenet and Resnet, constructed with
TensorFlow, and wrapped with a user-friendly Qt interface. We test pyCBIR
search capabilities applied to scientific datasets across and beyond the visible
spectrum, with examples from atomic diffraction patterns, cell microscopy, and
materials science imagery. In addition, we demonstrate the versatility and accu-
racy of pyCBIR when handling both binary and multiclass tasks during recovery
of consistent items with accuracy over 95% in most cases. Our experimental
results include both simulation-based datasets as well as experimental observa-
tional data ranging from thousands to millions of image samples.
Corresponding author:flavio86@ufpi.edu.br; phone:+1 510 4864061
Email addresses: romuere@ufpi.edu.br (Romuere R. V. e Silva), fsombra@ufc.br
(Fatima N. S. de Medeiros), DYParkinson@lbl.gov (Dilworth D. Parkinson),
ahexemer@lbl.gov (Alexander Hexemer), carneirocm@gmail.com (Claudia M. Carneiro),
dushizima@lbl.gov (Daniela M. Ushizima)
Preprint submitted to Expert Systems with Applications March 12, 2018
Keywords: Reverse image search, content-based image retrieval, scientific
images, Convolutional Neural Network.
1. Introduction
With the increased availability of large data repositories, a substantial amount
of time is spent searching for pictures, which became an inefficient and cumber-
some procedure lately. Recent reports (Bethel et al., 2015; GE Digital, 2016;
Evans, 2016) point out that the growth in data size, rates and variety is sig-
nificant; they also suggest that scientific data will grow twice as quickly as any
other sector while less than 3% of that data will be tagged to be used in a
meaningful way. Several research imaging facilities will soon be generating 1 to
50 petabytes of data per year. This amount of data pose several challenges: (a)
inadequate or insufficient meta-data describing experimental records; (b) the
impracticality of manual curation of massive datasets; and (c) the lack of tools
adapted to the new data acquisition modes.
A critical need is to create accurate methods to query and retrieve images,
since “everyone searches all the time” (Eckstein, 2011). As an important step,
some tools for photo organization have improved to include operations such as
sorting and categorization by dates or media types, in addition to referencing
images through user annotations and tags. However, manual curation is seldom
achievable at scale, and even impossible in some scenarios, such as with high-
throughput imaging instruments. What is needed are more sophisticated tools
that can retrieve relevant items based on a data-driven taxonomy.
Recommender systems, also known as reverse image search tools, also repre-
sent an excellent opportunity for data reduction, in which the imaging acquisi-
tion, data collection and storage can be tailored based on a desired sample pat-
tern. For example, the inability to adjust experimental parameters fast enough
for optimal data collection leads to dire challenges at imaging facilities, with
scientists distressed by overwhelming amounts of useless data.
2
2. pyCBIR: an environment for scientific image retrieval
In this paper, we introduce our recommender tool, pyCBIR, which provides
a ranking system that enables humans to quickly interact with images from di-
verse domains and experiments, and offers mechanisms to incorporate labeled
data when available, as discussed in other recommendation systems (Yu et al.,
2016; Kang et al., 2016). In this paper, we focus on data flows using three dif-
ferent architectures of Convolutional Neural Networks (CNN) to extract image
signatures. pyCBIR delivers the classification accuracy supported by Lenet (Le-
cun et al., 1998) and Inception-ResNet-v2 (Szegedy et al., 2016) architectures,
exploiting optimized routines from TensorFlow (Abadi et al., 2015).
The main contributions of this paper are as follows:
1. We design and develop pyCBIR, a visual image search engine able to learn
compact similarity-preserving signatures for recovering image relevance
based on approximate ranking, and including 10 schemes to measure dis-
tance based on different sets of features;
2. We deliver, test and compare three CNN implementations that achieve
highly accurate results, with instances that rely upon very deep and com-
plex networks (Zhang et al., 2015a; Hong et al., 2015; Zeng et al., 2015). In
addition, pyCBIR contains algorithms that help to generalize and quickly
extend training datasets. This paper describes CNN models with increas-
ing complexities tested against four science problems, and reports on ac-
curacy and time consumption given different architectural choices;
3. pyCBIR provides an interactive system to find images from diverse do-
mains through an intuitive IDE, helpful to users with diverse science
backgrounds, and relevant to science questions using labeled and unla-
beled datasets.
4. This paper improves reproducibility and support to code benchmarks by
performing tests using image repositories publicly available, and construct-
3
ing software based on open-source tools.1.
Figure 1: Diagram showing how data moves through pyCBIR modules during reverse image
search: gray box emphasizes the starting point given a trained state.
The two main sets of algorithms at the core of pyCBIR that drive reverse im-
age search of scientific data are CBIR and CNN, which targets material recog-
nition at scale using massive image collections, as described in the following
sections.
3. Related work
Previous efforts to optimize image search tasks employing content-based im-
age retrieval (CBIR) systems (Zhang et al., 2015a; Hong et al., 2015; Zhang
et al., 2016) exploit computer vision and machine learning algorithms to repre-
sent images in terms of more compact primitives. Given an image as an input
query, instead of keywords or metadata, such approach allows matching sam-
ples by similarity. Several free engines for CBIR are thriving at e-commerce
1Source codes/data to be published upon paper acceptance at camera.lbl.gov
4
tasks (Yu et al., 2016; Shamoi et al., 2015), but underlying codes remain closed,
and has scarcely been deployed for use with scientific images.
The term CBIR was introduced in 1992 by Kato (Kato, 1992; Hirata and
Kato, 1992), and it has been associated with systems that provide image match-
ing and retrieval for queries performed by visual example. A quarter of century
later, most image retrieval systems available for scientific image search still rely
on keyword-based image retrieval (van den Broek et al., 2005), although most
of the image collections generated by humans lack proper annotations (Bethel
et al., 2015).
With the advent of deep learning (Goodfellow et al., 2016; Guo et al., 2016)
and the ability to extrapolate annotations from curated datasets (Ushizima
et al., 2016), new systems promise to move human experience from hardly rele-
vant retrievals to broadly useful results that can achieve over 85% accuracy. For
example, the application Google Photos has provided automated and custom-
labeling features since 2015, so that users can quickly organize large photo
collections (JR Raphael, 2015), with high retrieval accuracies for face detection.
These are significant strides toward automating image catalogs, and motivates
our efforts to construct TensorFlow-based CNNs to organize scientific data.
TensorFlow (Abadi et al., 2015), an open-source software library for Machine
Intelligence, presents advantages regarding flexibility, portability, performance,
and compatibility to GPU. In order to deliver high-performance C++ code,
TensorFlow uses the Eigen linear algebra library in addition to CUDA numerical
libraries, such as cuDNN to accelerate core computations and scale to large
datasets. In pyCBIR, we use python to model the dataflow graph, which is
responsible for coordinating execution of operations that transform inputs into
ranked image samples, whose labels can be used in the calculation of uncertainty
values.
A typical CNN pipeline is shown in Figure 2, consisting of three main “neu-
ral” layers: convolutional layers, pooling layers, and fully connected layers. This
algorithm requires two stages for training the network: (a) forward stage, which
represents the input image in each layer and outputs a prediction used to com-
5
Figure 2: Main components of the CNN layers for feature extraction: each step illustrates
the fiber dataset transformation, although the same CNN architecture applies to the other
datasets.
pute the loss cost based on the curated data (labeled samples), and (b) backward
stage, which computes the gradients of layer parameters to drive the cost func-
tion to very low values (Guo et al., 2016; Goodfellow et al., 2016).
By exploring algorithms that learn features at multiple levels of abstrac-
tion from large datasets, CBIR systems can benefit from complex non-linear
functions within CNNs that map unprocessed input data to the results, bypass-
ing human-designed features reliant on domain knowledge. Wan et al. (Wan
et al., 2014) investigated different deep learning frameworks for CBIR when ap-
plied to natural images, such as the ILSVRC2012 (ImageNet Large Scale Visual
Recognition Challenge) dataset. That paper reported mean average precision
of 0.4711 using a massive image collection with 10,000,000 hand-labeled images
depicting 10,000+ object categories as training.
Apart from natural scenes, recent work on recognizing material categories
from images (Bell et al., 2014; Liu et al., 2010; Sharan et al., 2014; Zhang et al.,
2015b) includes experiments using the Flickr Material Dataset (FMD) and/or
the Materials in Context Database (MINC). Sharan et al (Sharan et al., 2014)
explored low and mid-level features, such as color, SIFT (Scale-Invariant Fea-
ture Transform), HOG (Histogram of Oriented Gradients), combined to an aug-
mented Latent Dirichlet Allocation model under a Bayesian generative perspec-
tive, achieving 44.6% accurate recognition rate on FMD. Using a CNN-based
feature extraction mechanism, Bell et. al. (Bell et al., 2014) designed materials
recognition frameworks employing two Caffe-based (Jia et al., 2014) architec-
6
tures: AlexNet and GoogLenet trained on materials patches from the MINC
database, achieving accuracy of 79.1% and 83.3%, respectively. Additionally,
attempts to train CNNs with public datasets such as FMD were less favorable
since FMD alone contained a small amount of samples. Moreover, those authors
noticed that the AlexNet trained on the MINC database performed better on
FMD classification, showing 66.5% accuracy.
In (Zhang et al., 2015b), different authors continued investigating material
categories, now tuning a Caffe VGG-D pre-trained using two datasets: MINC
and ILSVRC2012 (Russakovsky et al., 2015) for image classification. They
resort to a customized feature selection and integration method to concatenate
the values of the 7th layer for both networks. Finally, the integrated features
were input to a support vector machine (SVM) with radial basis function kernels
for training and testing; they improved accuracy to 82.3% for the FMD images.
The next section describes our experiments on materials databases such
as FMD, as well as novel databases of scientific images, and different CNN
architectures with increasing levels of complexity.
4. Description of Scientific Databases
We have explored deep learning and considered samples from diverse imag-
ing systems, varying in terms of their electromagnetic wave interaction with the
samples and space scale. These sample collections require nontrivial mathemat-
ical methods to index and recover relevant results, taking into account image
composition and structure. We summarize the four databases under investiga-
tion in Table 1, describing the main characteristics shown by fibers, films, cells
and other materials.
4.1. Fiber profiles
The fiber database consists of volume cross-sections, based on hard X-ray mi-
crotomograph (microCT) from ceramic matrix composites (CMC). This image
technique allows for inspection of the structural properties and quality control,
7
Table 1: Scientific data under investigation: experimental specifications, number of samples
and respective image size.
Specimen Modality #Samples
and Size
Target data analysis
Ceramic composite X-ray
microCT
1,013,550
16×16
Detection of fiber profiles
from 3D cross-sections.
Sec. 4.1. Fig. 2.
Thin films GISAXS 4,024,789
100×100
Classification of simulated
scattering patterns into
space groups. Sec. 4.2.
Fig. 3.
Pap smears Light
microscopy
3,393
100×100
Inspection of cervical cell
morphology for cancer de-
tection. Sec. 4.3. Fig. 4.
Materials patterns Photography
public DB
1,000
512×384
Inspection of molecular
structure with 2D orienta-
tion classification. Sec. 4.4.
Fig. 5.
while exposing CMC samples to high temperature and tensile forces, causing
multiple deformations (Bale et al., 2012; Ushizima et al., 2014).
Figure 2 illustrates a CMC sample cross-section. The sample is 1mm in
diameter and 55mm length, reinforced with hundreds of ceramic fibers of ap-
proximately 10µm diameter. Each fiber is coated with a boron nitride layer,
which has a lower X-ray absorption coefficient, therefore fiber cross-sections ap-
pear as dark rings. Frequently, the 3D images are examined manually, slice by
slice (2D), in order to identify defects. As an alternative, there is increasing
interest in automation by designing “inspecting bots”, i.e., algorithms that de-
tect mechanical deformations and sort the experimental instances (e.g. image
stacks) according to the structural damage. The first step in Figure 2 highlights
a fiber cross-section, a structure commonly used as a guiding pattern to detect
8
other fibers Ushizima et al. (2014, 2016). A major demand is the ability to
perform pattern ranking which can steer data management needed by beamline
scientists.
Our paper reports results on labeled samples that went through a triage
including both automated segmentation methods based on traditional computer
vision (Bale et al., 2012; Alegro et al., 2016; ?) and visual inspection by domain
scientists. To the best of our knowledge, all the images contain accurate labels,
which are used to determine the success rate of the CNNs. Among the 3
million available labeled images, half contain fibers and the other half present
areas with no fibers. Figure 2 shows how to obtain these fiber profile images.
Additional fiber profile samples are illustrated in the Appendix A.
4.2. GISAXS
Grazing Incidence Small Angle X-ray Scattering (GISAXS) is a method for
characterizing the nanostructural features of materials, especially at surfaces
and interfaces, which would otherwise be impossible using standard transmission-
based scattering techniques. As a surface-sensitive tool for simultaneously prob-
ing the electron density of the sample, this imaging modality supports measure-
ments of the size, shape, and spatial organization of nanoscale objects located at
the top of surfaces or embedded in mono- or multi-layered thin film materials.
Individual GISAXS images serve as static snapshots of nanoscale structure,
while successive images provide a means to monitor and probe dynamical pro-
cesses. Although microscopy techniques provide valuable local information on
the structure, GISAXS is the only method to provide statistical information at
the nanometer level (Hexemer and M¨uller-Buschbaum, 2015). A major bottle-
neck preventing GISAXS from reaching its full potential has been the availability
of curated data, analysis methods and modeling resources for interpreting the
experimental data (Chourou et al., 2013).
In order to advance GISAXS diffraction image understanding and usability,
we have used GISAXS simulation codes to generate more complete catalogs
of potential experimental outcomes. Our paper takes data from HipGISAXS,
9
(a) cubic (b) bcc (c) fcc (d) hcp
(e) µcubic (f) µbcc (g) µf cc (h) µhcp
(i) σcubic (j) σbcc (k) σfcc (l) σhcp
Figure 3: GISAXS diffraction patterns of crystal lattices: (a-d) a sample of each structure
class, (e-h) images representing the samples average (µ) and (k-o) the standard deviation (σ)
of a subset of 1,000 random selected samples from each class.
a massively parallel simulator, developed using C++, augmented with MPI,
NVIDIA CUDA, OpenMP, and parallel-HDF5 libraries, to take advantage of
large-scale clusters of multi/many-cores and graphics processors. HipGISAXS
currently supports Linux, Mac and Windows based systems. It is able to har-
ness computational power from any general-purpose CPUs including state-of-
the-art multicores, as well as NVIDIA GPUs and Intel processors, delivering
experimental simulations at high resolutions (Hexemer and M¨uller-Buschbaum,
2015; Hexemer, 2016). Using the HipGISAXS code, beamline scientists can cre-
ate sample image data with scattering patterns corresponding to four different
10
(a) (b) (c) (d) (e) (f) (g) (h)
Figure 4: Examples of cell images from CRIC database: abnormal cells (a-d) and normal cells
(e-h).
crystal unit cell structures or lattices, such as Cubic: Simple cubic (8 on the
corners of a cube), BCC: Body Centered Cubic (8 on corners, one in the center
of cube), FCC: Face Centered Cubic (8 on corners, one in the center of each face
of cube), and HCP: Hexagonal Close Packed (non-cubic, but one of the most
commonly occurring lattices).
4.3. Cervical Cells
Cervical cancer is the third most frequent type of tumor in the female
populations, especially affecting developing and populous countries, such as
Brazil (Lowy, 2011), China and India. According to the National Cancer Insti-
tute (NCI) (Nci, 2017), Pap tests are an essential mechanism to detect abnormal
cervical cells (Lu et al., 2017) as part of regular screenings, which can reduce
cervical cancer rates and mortality by 80 percent. However, cervical cancer
continues to be the fourth leading cause of cancer deaths in Brazil (Inca, 2016),
where most of the female population depends on visually-screened cervical cytol-
ogy from routine conventional Pap smears. Although more than 80% of exams
in the U.S. use liquid-based Pap test, this protocol is more than 50% more
expensive than conventional Pap smears, and remains unavailable for most of
world’s population.
As a viable alternative, we have improved the analysis of Pap smears by
using computer vision allied to CNN, targeting the increase of the number of
fields of view and reduction of false negative during the inspection of microscopic
slides. Sparking more interest in supporting cell analysis using conventional Pap
11
smears, pathologists within our team were granted access to an anonymized
image database from the Brazilian Public Health System (of Health, 2016),
containing samples from a heterogeneous population across age, race, and socio-
economical status. A large portion of these images is available through the Cell
Recognition for Inspection of Cervix (CRIC) database that catalogs numerous
cases of cervical cells, classified according to the Bethesda System as atypical
squamous cells of high risk and undetermined significance (#ASCH=470 and
#ASCUS=116), normal (#Normal=343), low-grade and high-grade squamous
intra-epithelial lesions (#LSIL=115 and #HSIL=1,018), and invasive carcinoma
(#IC=60). This paper uses a subset of the CRIC database2, previously classified
by at least three cyto-pathologists, comprising 169 digitized Pap smear glass
slides, which results in 3,393 cervical cells with normal or abnormal morphology,
including overlapping cells.
Figure 4 displays image samples of the CRIC collection digitized from con-
ventional Pap smears, containing special characteristics such as coming from a
broad racial diversity, which is a trace of the Brazilian population.
4.4. Public images: the Flicker Material Database
The Flicker Material Database (FMD) (Sharan et al., 2014) was designed to
facilitate progress in material recognition, and it contains real world snapshots
of ten common material categories: fabric, foliage, glass, leather, metal, paper,
plastic, stone, water, and wood, as illustrated in Figure 5. According to Sharan
et al. (Sharan et al., 2014), each image in this database (100 images per cate-
gory) was selected manually from Flickr.com to ensure a variety of illumination
conditions, compositions, colors, texture surface shapes, material sub-types, and
object associations.
The intentional diversity of FMD reduces the chances that simple or low-level
information descriptors, e.g., color or first order intensity features, are enough
to distinguish material categories. Strategies to construct middle-level features
2Original images will be posted at http://cricdatabase.com.br/ upon paper acceptance.
12
(a) Fabric (b) Foliage (c) Glass (d) Leather (e) Metal
(f) Paper (g) Plastic (h) Stone (i) Water (j) Wood
Figure 5: Examples of the Flicker Material Database for of its each classes.
have enabled accuracy improvements in materials recognition problems (Cim-
poi et al., 2014), especially when including larger materials databases, such as
MINC (Bell et al., 2014). This previous research on FMD description and learn-
ing schemes points out to limitations of using FMD alone as the training data
source. We address some of these gaps, using two model training approaches as
discussed in the next section.
5. Methods in pyCBIR
One of the main challenges in image recognition consists in performing tasks
that are easy for humans to do intuitively, but hard to describe formally (Good-
fellow et al., 2016). This situation happens frequently among domain scien-
tists Donatelli et al. (2015), who are visually trained to identify complex pat-
terns from their experimental data, although many times are unable to describe
mathematically the primitives that construct the motif. Data-driven algorithms
that learn from accumulated experience, such as those in pyCBIR, can support
software tools to rank image sets in face of (a) the difficulties to obtain specific
knowledge needed for modeling, (b) limitation in learning the intricacies of ev-
ery science domain, and (c) restricted generalization of hard-coded knowledge
rules.
13
This section describes how pyCBIR uses CNN in order to provide data reduc-
tion by automatically learning compact signatures that represent each image.
An essential step required to organize the database is to construct models for
different science problems in conjunction with algorithms for enhanced search
experience. The next sections explain how we enable CNN use to obtain image
characteristics and rank images by similarity. Although we omit results using
classic feature extraction methods in pyCBIR, they are also available and include
Gray-Level Co-Occurrence Matrix (GLCM), Histogram of Oriented Gradients
(HOG), Histogram features, Local Binary Pattern (LBP) and Daisy (van der
Walt et al., 2014).
5.1. Neural network, CNN and topology
The way in which artificial neurons connect to each other specifies the topol-
ogy of the neural network (NN), a preponderant attribute to their overall mode
of operation and scalability. In supervised learning, the traditional topology
is the fully connected, three-layer, feed-forward network; in computer vision
problems, this has been mostly replaced by NN layers that explore local con-
nectivity. Independently of the topology, learning involves modifying/updating
the weights of the network connections (Miikkulainen, 2010) through the pro-
cessing of large data amounts that optimizes a specific model. However, different
NN topologies determine structural compositions that define the learning pro-
cess within a NN architecture, e.g. with backpropagation, to enhance feature
selection, recurrent memory, abstraction, or generalization.
Among the several NN design options, this paper explores two different CNN
architectures: (a) the Lenet (Lecun et al., 1998), a neuronal arrangement that
switches from fully connected to sparsely connected neurons allowing feedback
in real-time for most applications, particularly when considering graphic card
units for computation. Due to its simplicity, training often performs well with
a smaller number of examples in comparison with deeper NN, however mostly
inaccurate when dealing with complex recognition problems; and (b) Inception-
ResNet-v2 (Szegedy et al., 2016), a deeper and wider architecture formed by
14
multiple sub-networks, in which hierarchical layers promote many levels of non-
linearity needed for more elaborated pattern classification. This model requires
roughly twice as much memory and computation as the previous version (In-
ception v3 (Szegedy et al., 2015)), but it has demonstrated to be more accurate
than previous state-of-the-art models, particularly when considering tasks such
as the Top-1 and Top-5 recommendations using the ILSVRC2012 benchmark
database.
5.2. Query with signatures from CNN layer
The goal of reverse image search engines is to enable high-level query using
pictures instead of, or, in addition to, metadata, keywords or watermarks as
the main mechanism to retrieve relevant samples that match the query. As
in any data-driven application, the image database quality highly influences
the retrieved results, but several other components play a major role, such as
image properties represented as features vectors, or signatures, and respective
dimensionality.
Typically, CNN forms a hierarchical feature extractor that maps the input
image into increasingly refined features, which serve as input to a fully con-
nected layer that solves the classification. Figure 2 illustrates the alternating
convolutional and pooling layers, which transform and reduce the input data be-
fore it reaches deeper stages as the fully connected layer. Notice that we bypass
the classification layer to use the features as signatures to drive the retrieval
process, so pyCBIR can search and match the current image-query to the most
similar samples in the database using “machine-designed” features.
With the purpose of suggesting a category based on a pre-defined taxonomy,
we denote Yclasses to an image database X, reduced into signatures, consisting
of nimages xi, for 1 in. Searches will occur in a h-dimensional space,
obtained through xitransformation throughout the CNN. Alternatively, prin-
cipal component analysis (PCA) can be used in order to reduce the signature
dimensionality to improve the computational cost, while preserving the retrieval
success rate.
15
The pyCBIR retrieval module will recognize similar samples through a simi-
larity function S, so that S(xi, xq) returns relevant items as well as the respective
uncertainty value. In other words, the engine returns the top kmost similar
images, their respective yjfor a query-image xqRh×n, and S(xi, xq) here is
defined as follows:
S(xi, xq) = xi·xq
(||xi|| ∗ ||xq||).(1)
where ·is the dot product.
Although we report our results using the cosine similarity metric, other dis-
tance metrics are available through the pyCBIR graphical user interface, includ-
ing Euclidean distance, Infinity distance, Pearson correlation, Chi-square dis-
similarity, Kullback-Leibler divergence, Jeffrey divergence, Kolmogorov-Smirnov
divergence, Cramer divergence, and Earth mover’s distance (Jones et al., 2001).
5.3. Indexing and searching methods
Quick feedback when searching images by similarity is essential, but a linear
search through a database with millions of items may lead to unacceptable
waiting times. Therefore, after the computation of the image signatures, we
map them to a lower dimensional space using PCA, and use the most significant
components as input to an indexing algorithm.
We propose an indexing routine that employs the Locality Sensitive Hash-
ing Forest (LSH) (Bawa et al., 2005), whose polynomial cost and sub-linear
query time considerably speed up the information retrieval. This algorithm de-
livers efficient approximate nearest-neighbor queries by improving the original
LSF (Indyk and Motwani, 1998) scheme, which otherwise would require tun-
ing parameters, such as the number of features and the distance radius, as a
function of the data domain. Moreover, it enables fast signature insertion and
deletion while ensuring minimal use of storage.
By using LSH, our routine allows selection of a small set of potential im-
ages to be compared against the image-query; this happens because similar
images, according to some metric, e.g., cosine similarity, are more likely to hash
16
to the same bucket. We used random projection as the hash function to ap-
proximate the cosine distance between vectors, so that pyCBIR can translate
CNN-calculated signatures into 32-bit fixed-length hash values. In addition,
we store pre-computed versions of the LSH in disk for future analogous search
requests using the same trained model.
Algorithm 1 shows the steps of the LSH-based indexing scheme before re-
turning the result Rof ranked outputs, where |.|is the length of a set, sis
a image signature within the set S,s’ is the transformed (hashed) signature.
The LSH-based indexing function retrieves the approximate nearest neighboring
items from the hash table, and its output are the kmost similar images to the
query-image set Q.
5.4. Database augmentation
CNN-based recognition systems often require a large number of examples in
order to fine-tune models during the training stage and deliver accurate classi-
fication results. A few strategies are commonly devised to deal with relatively
small datasets, for example, modifications of the original observations to gener-
ate new ones, following expected distortions given certain degrees of freedom.
In this context, data augmentation consists in applying mathematical transfor-
mations to typical samples in order to generate new images that are slightly
different, but relatively similar so that they will belong to the same class. The
most common image transformations are scaling, translations, rotations, noise
addition and blurring.
We include the augmentation as an extra processing step as part of the CNN
training using the cell and FMD databases. Both databases presented a limited
amount of samples, here less than 2,000 per class. Therefore we performed 12
translations (three values in each direction: cells 7, 14 and 20 pixels; FMD 8,
16 and 24 pixels) and 3 rotations (every 90o) to each image, augmenting the
dataset by a factor of 51.
17
Algorithm 1: LSH-based indexing.
Input :
-Ssignature database.
-Qset of signatures from query-images.
-knumber of image matches.
Output:
-Rranked output (top-kmatches).
1begin
2if LSH previously computed then
3Read the LSH from file;
4else
5for sSdo
6LSH.add(Random projection hash(s));
7end
8end
9Create Rwith |Q|lines and kcolumns;
10 for qQdo
11 Rq,k = LSH.similarity(q,k);
12 end
13 end
5.5. Evaluation metrics
We used the Mean Average Precision (MAP) (Wang et al., 2015) metric to
evaluate the quality of the retrieved images. To compute MAP, the Average
Precision score AP (Q) is defined for each image Qin the rank as:
AP (Q) = PM
n=1(P(n)f(n))
N,(2)
where P(n) is the precision at cut-off nin the rank, f(n) is equal to 1 if the
image at rank nbelongs to the same class of the query, and 0 otherwise. M
is the number of images in the rank and Nis the number of images of the
18
same class given by the query. The MAP score is obtained by averaging the AP
score over all images in the rank. The higher the MAP score is, the better the
performance.
We also computed the classification accuracy rate for each class in the
databases. This accuracy was calculated by using the k-nearest neighbor (Alt-
man, 1992) for different values of k.
6. Experimental results
This section describes how to run pyCBIR using different datasets as well as
how to compare the output from different network topologies when carrying out
reverse image retrieval experiments. The results refer to the databases described
in Section 4 and the algorithms discussed in Section 5.
6.1. CNN training
When deriving PCA-based signatures from LeNet or Inception-ResNet-v2
outputs, both schemes require only two parameters: the initial learning rate
and the decay factor. We set the initial learning rate as 0.1 and the decay factor
as 0.04 in the LeNet. The fine-tuning operation requires an initial learning rate
smaller than the one used to train the CNN with random initialization. Here,
we report experiments setting this parameter to 0.008 and the decay factor as
0.0004. Experiments showed that slightly different values affect the training
time, but may lead to similar classification accuracy results, as illustrated in
Figure 6.
The number of epochs varies with respect to the databases due to the image
inter-class variation, the number of classes, and the image size, while the number
of epochs was set according to the loss function. When the loss remains constant
between epochs and it is lower than 0.1, the CNN training automatically stops.
Table 2 shows the number of epochs and processing time to train the LeNet and
the longer fine tuning step needed by the Inception-ResNet-v2.
19
Table 2: Number of epochs and processing time to train both LeNet and Inception-ResNet-v2
neural networks using different image databases.
Database Epochs Processing Time
LeNet
Fibers 5 9min
GISAXS 5 45min
Cells 6 70min
FMD 20 42min
Inception-ResNet-v2
Fibers 21 336min
GISAXS 13 273min
Cells 17 289min
FMD 51 153min
6.2. Performance evaluation
In order to evaluate the fibers and GISAXs databases, we used 10% of the
images from both databases for LeNet training, Inception-ResNet-v2 fine-tuning
and auto-value estimation of the PCA. We used the other 90% of both databases
to calculate the MAP and k-accuracy. For the Cells and FMD databases, we
used 50% of the image sets for the LeNet training, Inception-ResNet-v2 fine-
tuning and PCA transformation, and the other half to calculate the MAP and
k-accuracy measures. Such different splits were necessary given the sizes of
these databases, for example, the cells and FMD databases have a few thousand
samples, which is a limited amount of data for the CNN training. Furthermore,
we augmented both training subsets using affine image transformations as a
preprocessing step. To illustrate how important this step impacts our pipeline,
we performed experiments with and without data augmentation for the cell
database. The MAP value without augmentation yielded 0.85 as opposed to 0.94
when considering the augmentation step (preprocessing). The mean average
also increased, which indicates that the CNN converged to a model that best
classifies the presented samples.
20
Figure 6 shows the MAP results for all databases in comparison with the
number of PCA components, when considering the LeNet and Inception-ResNet-
v2 pre-trained and after fine-tuning. The curves show that it is possible to
reduce the number of features in the retrieval process by using 16 or less com-
ponents. We notice that this reduction improves the indexing and searching
procedures when applied to the four different image sets.
Surprisingly, the LeNet achieved better results than Inception-ResNet-v2 re-
garding the fibers, GISAXs and cell databases. One of the reasons this CNN
outperformed Inception-ResNet-v2 for these databases was due to the size of the
images. The Inception-ResNet-v2 expects an input image of 299x299, therefore
the current pipeline resizes the images from 100x100 (Cells and GISAXS) and
16x16 (Fibers) to such representation, an operation that might distorts impor-
tant aspects of the data. We also tested zero-padding operations instead of
resizing, but the results accuracy remained the same. The lower MAP and k-
accuracy values obtained by using the Inception-ResNet-v2 pre-trained are most
likely due to the learned model, which depended on a broad objects database
(ILSVRC2012), which poorly correlates to our image databases.
Regarding the FMD images, the Inception-ResNet-v2 outperformed the LeNet,
which might be influenced by the similarity between FMD and the ILSVRC2012
database. Remember that the FMD samples contain complex patterns, resem-
bling ILSVRC2012 data points. Fine-tuning the Inception-ResNet-v2 further
improves the results in relation to the pre-trained due to tuned layers customized
to extract features for the FMD data.
We also computed the classification k-accuracy for each class of all databases,
where k[1,5,10,20,Ω], and Ω is the total number of images in a particular
class. Tables 3, 4, 5 and 6 confirm the MAP results presented in Figure 6. Based
on these results, when there are more than a couple of thousand images and
a few classes to train for, we observed that the LeNet outperforms the deeper
network for our databases. In contrast, when there are millions of images to
fine-tune the network, the Inception-ResNet-v2 performed better for a single
database, FMD, characterized by several classes and larger images. When no
21
classes/labels are available, the pre-trained Inception-ResNet-v2 is a promising
starting point for tasks such as image sorting.
(a) Fibers database. (b) GISAXS database.
(c) Cells database. (d) FMD.
Figure 6: MAP values in relation to the number of PCA components for the LeNet and
Inception-ResNet-v2.
6.3. Time during image-query-based retrieval
In addition to accuracy, the computational cost to search images given a
query is another measure of the value of recommendation systems. Table 7
shows the computational time to retrieve kimages (equal to the database size)
given a query. We used the value kbecause it is the worst case of image searching
22
Table 3: Accuracy rate of the LeNet and the Inception-ResNet-v2 for the Fibers database
using different kvalues, where Ω is the number of images of the class.
LeNet
k1 5 10 20
No-Fibers 0.973 0.977 0.979 0.978 0.963
Fibers 0.975 0.988 0.991 0.990 0.996
Inception-ResNet-v
No-Fibers 0.781 0.777 0.797 0.746 0.667
Fibers 0.925 0.978 0.980 0.983 0.975
Inception-ResNet-v2 pre trained
No-Fibers 0.727 0.743 0.736 0.743 0.167
Fibers 0.825 0.890 0.927 0.933 1.000
Table 4: Accuracy rate of the LeNet and the Inception-ResNet-v2 for the GISAXS database
using different kvalues, where Ω is the number of images of the class.
LeNet
k1 5 10 20
bcc 1.000 1.000 1.000 1.000 1.000
fcc 1.000 1.000 1.000 1.000 0.999
cubic 1.000 1.000 1.000 1.000 1.000
hpc 1.000 1.000 1.000 1.000 1.000
Inception-ResNet-v2
k1 5 10 20
bcc 1.000 0.996 0.990 0.980 0.974
fcc 0.998 1.000 1.000 1.000 0.998
cubic 1.000 0.998 1.000 0.998 0.972
hpc 1.000 1.000 1.000 1.000 0.996
Inception-ResNet-v2 pre trained
bcc 0.980 0.975 0.972 0.959 0.599
fcc 1.000 1.000 1.000 1.000 1.000
cubic 0.993 0.995 0.995 0.991 0.993
hpc 0.995 0.996 0.997 0.995 0.698
23
Table 5: Accuracy rate of the LeNet and the Inception-ResNet-v2 for the Cells database using
different kvalues. Ω is the number of images of the class.
LeNet
k1 5 10 20
Normal 0.962 0.970 0.975 0.977 0.969
Abnormal 0.969 0.979 0.983 0.981 0.984
Inception-ResNet-v2
k1 5 10 20
Normal 0.938 0.932 0.935 0.924 0.810
Abnormal 0.971 0.985 0.991 0.986 0.984
Inception-ResNet-v2 pre trained
Normal 0.846 0.861 0.857 0.851 0.728
Abnormal 0.893 0.929 0.947 0.952 0.957
(sort all images of the database given a query). We also computed the time for
the 1st (before LSH creation) and 2nd execution (after LSH creation).
Although the LSH-based module computes 32-bit hashes, the computational
cost dramatically increases with the dimension of the input vector as well, there-
fore we use only the most significant principal components. Our motivations
to keep only the 16 most significant PCA components are: (a) this subset of
components explains more than 98% of the data for each scientific domain
under investigation; (b) our results showed that the Mean Average Precision
remains constant around 16 components, as illustrated in Figure 6; (c) this rep-
resentation is more than 3 times faster than using the raw signatures with the
Inception-ResNet-v2, as Table 7 illustrates.
All computational experiments involving pyCBIR ran on a Deep Learning Ma-
chine (DevBox) with six cores of Intel Xeon E5-2643 @ 3.40 GHz, four graphics
processors GeForce GTX Titan-X and 251GB memory. However, we have been
able to run pyCBIR in standard laptops as well, but at a much higher com-
puting time and restricted to smaller subsets. Software dependencies include
Ubuntu 14.04, CUDA 7.5, cuDNN v4, Google Tensor Flow v.0.11.0 and python
3.5.2 through Anaconda 4.1.1. Also, pyCBIR relies on an assortment of packages
24
Table 6: Accuracy rate of the LeNet and the Inception-ResNet-v2 for the FMD using different
kvalues, where Ω is the number of images of the class.
LeNet
k1 5 10 20
Fabric 0.140 0.200 0.200 0.280 0.320
Foliage 0.240 0.360 0.440 0.440 0.360
Glass 0.120 0.200 0.260 0.140 0.040
Leather 0.140 0.300 0.200 0.080 0.100
Metal 0.360 0.540 0.540 0.480 0.520
Paper 0.200 0.280 0.240 0.340 0.260
Plastic 0.140 0.160 0.220 0.300 0.280
Stone 0.200 0.340 0.320 0.280 0.340
Water 0.220 0.340 0.320 0.280 0.280
Wood 0.500 0.700 0.760 0.700 0.700
Inception-ResNet-v2
Fabric 0.80 0.88 0.88 0.90 0.88
Foliage 0.96 0.90 0.90 0.92 0.92
Glass 0.82 0.86 0.82 0.78 0.80
Leather 0.90 0.86 0.82 0.82 0.82
Metal 0.70 0.76 0.70 0.70 0.66
Paper 0.86 0.92 0.92 0.94 0.92
Plastic 0.66 0.60 0.58 0.54 0.50
Stone 0.78 0.80 0.80 0.76 0.74
Water 0.94 0.94 0.94 0.94 0.94
Wood 0.82 0.86 0.90 0.94 0.92
Inception-ResNet-v2 pre trained
Fabric 0.680 0.800 0.820 0.760 0.740
Foliage 0.760 0.760 0.740 0.800 0.700
Glass 0.840 0.900 0.900 0.860 0.900
Leather 0.520 0.680 0.620 0.600 0.460
Metal 0.920 0.900 0.860 0.860 0.820
Paper 0.520 0.620 0.600 0.700 0.740
Plastic 0.620 0.780 0.840 0.860 0.820
Stone 0.660 0.740 0.600 0.560 0.620
Water 0.780 0.800 0.800 0.700 0.560
Wood 0.860 0.880 0.880 0.840 0.820
25
Table 7: Time in seconds to retrieve all database images given a query by using the LeNet
and the Inception-ResNet-v2. In the PCA results we used 16 components.
Fibers GISAXS Cells FMD
LeNet 1st 12.08 61.42* 0.020 0.015
2nd 10.35 61.75* 0.017 0.008
Inception-ResNet-v2 1st 19.41* 75.11* 0.130 0.028
2nd 19.63* 75.17* 0.032 0.009
Using PCA 1st 10.27 22.29 0.017 0.009
2nd 9.28 19.75 0.014 0.005
*use brute force search; 1st means the first iteration and 2nd means second
iteration and beyond.
within the Python ecosystem, such as: numpy, scipy, scikit-learn, scikit-image,
PyQt5 and matplotlib.
7. Conclusions and future works
Being able to detect materials properties in real time will add an entirely
new level of experimental capability, including triage, quality control and prior-
itization. Tying this capability to the control systems at imaging instruments,
such as at synchrotron beamlines, promises to enable scientists to automatically
steer the machine in response to specific structures present in the sample with
minimum human interference.
Visual exploration of image microstructures drives many tasks performed
by cyto-pathologists and material scientists, who are able to manually curate
just a small portion of the collected experimental data. This paper showed how
our new data-driven recommendation system leveraged such curated datasets to
provide functions to automatically organize catalogs of scientific images. Taking
into account that human curated data is often limited, we also designed image
augmentation routines that allow increasing the number of samples following
typical image transformations. Next, the underlying inferential engine ranks
26
images using CNN for multiple data representations, and allows fast retrieval of
the top matches within each particular image set. The algorithms behind our
software tool pyCBIR enable optimization of key parameters to control perfor-
mance, such as the number of CNN layers, CNN epochs, and image sizes.
Our results showed the importance of feature reduction in the searching pro-
cess, and indicates a promising direction for the system improvement. Current
limitations are the restriction of the LSH Forest option to retrieval experiments
using the Cell and FMD databases, and brute force otherwise. Future work will
include more scalable hashing mechanisms that circumvent the maximum size of
the search tree available in scikit-learn. Another challenge will be to expand
the automated data curation capability to also extrapolate metadata to unseen
samples using visual attributes combined to natural language processing.
8. Acknowledgments
This work was supported by the Office of Science, of the U.S. Department
of Energy (DOE) under Contract No. DE-AC02-05CH11231, the Moore-Sloan
Foundation, CNPq (304673/2011-0, 472565/2011-7, 401120/2013-9, 401442/2014-
4,444784/2014-4, 306600/2016-1) and Fapemig (APQ-00802-11). This work
is partially supported by the DOE Advanced Scientific Computing Research
(ASCR) Early Career Award, and partially supported by the Center for Ap-
plied Mathematics for Energy Research Applications (CAMERA), which is a
partnership between Basic Energy Sciences (BES) and ASCR within DOE. We
are specially grateful to Fernando Perez for incentivizing exploration of python
packages and deployment of open-source code. Any opinion, findings, and con-
clusions or recommendations expressed in this material are those of the authors
and do not necessarily reflect the views of DOE or the University of California.
References
Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado,
G. S., Davis, A., Dean, J., Devin, M., Ghemawat, S., Goodfellow, I., Harp, A.,
27
Irving, G., Isard, M., Jia, Y., Jozefowicz, R., Kaiser, L., Kudlur, M., Leven-
berg, J., Man´e, D., Monga, R., Moore, S., Murray, D., Olah, C., Schuster, M.,
Shlens, J., Steiner, B., Sutskever, I., Talwar, K., Tucker, P., Vanhoucke, V.,
Vasudevan, V., Vi´egas, F., Vinyals, O., Warden, P., Wattenberg, M., Wicke,
M., Yu, Y., Zheng, X., 2015. TensorFlow: Large-scale machine learning on
heterogeneous systems. Software available from tensorflow.org.
URL http://tensorflow.org/
Alegro, M., Amaro-Jr, E., Loring, B., Heinsen, H., Alho, E., Zollei, L., Ushizima,
D., Grinberg, L. T., 2016. Multimodal whole brain registration: Mri and high
resolution histology. In: Proceedings of the IEEE Conference on Computer
Vision and Pattern Recognition Workshops. pp. 194–202.
Altman, N. S., 1992. An introduction to kernel and nearest-neighbor nonpara-
metric regression. The American Statistician 46 (3), 175–185.
Bale, H. A., Haboub, A., et al., 2012. Real-time quantitative imaging of failure
events in materials under load at temperatures above 1,600C. Nat Mater 12,
40–46.
Bawa, M., Condie, T., Ganesan, P., 2005. LSH forest: Self-tuning indexes for
similarity search. In: Fourteenth International World Wide Web Conference
(WWW 2005).
Bell, S., Upchurch, P., Snavely, N., Bala, K., 2014. Material recognition in the
wild with the materials in context database. CoRR abs/1412.0623.
Bethel, W., Greenwald, M., Nowell, L., 2015. Management, visualization, and
analysis of experimental and observational data (EOD) - the convergence of
data and computing. In: DOE ASCR Workshop. DOE, pp. 2–30.
Chourou, S., Sarje, A., Li, X., Chan, E., Hexemer, A., 2013. Hipgisaxs: A high
performance computing code for simulating grazing incidence x-ray scattering
data. Journal of Applied Crystallography (6), 1781–1795.
28
Cimpoi, M., Maji, S., Kokkinos, I., Mohamed, S., Vedaldi, A., 2014. Describing
textures in the wild. In: Proceedings of the 2014 IEEE Conference on Com-
puter Vision and Pattern Recognition. CVPR ’14. IEEE Computer Society,
Washington, DC, USA, pp. 3606–3613.
Donatelli, J., Haranczyk, M., Hexemer, A., Krishnan, H., Li, X., Lin, L., Maia,
F., Marchesini, S., Parkinson, D., Perciano, T., Shapiro, D., Ushizima, D.,
Yang, C., Sethian, J., 2015. Camera: The center for advanced mathematics
for energy research applications. Synchrotron Radiation News 28 (2), 4–9.
Eckstein, M. P., 2011. Visual search: A retrospective. Journal of Vision 11 (5),
14.
Evans, D., 2016. The internet of things: how the next evolution of the internet
is changing everything. http://www.cisco.com/c/dam/en_us/about/ac79/
docs/innov/IoT_IBSG_0411FINAL.pdf, accessed on Dec 13, 2016.
GE Digital, 2016. Predix: the industrial internet plat-
form. https://www.ge.com/digital/sites/default/files/
predix-platform-brief-ge- digital.pdf, accessed on Dec 13, 2016.
Goodfellow, I., Bengio, Y., Courville, A., 2016. Deep Learning. MIT Press.
URL http://www.deeplearningbook.org
Guo, Y., Liu, Y., Oerlemans, A., Lao, S., Wu, S., Lew, M. S., 2016. Deep
learning for visual understanding: A review. Neurocomputing 187, 27–48.
Hexemer, A., 2016. HipGISAXS: A massively-parallel high-performance x-ray
scattering data analysis code. "http://www.camera.lbl.gov/gisaxs", ac-
cessed on Dec 13, 2016.
Hexemer, A., M¨uller-Buschbaum, P., 2015. Advanced grazing-incidence tech-
niques for modern soft-matter materials analysis. IUCrJ 2 (1), 106–125.
Hirata, K., Kato, T., 1992. Query by visual example - content based image
retrieval. In: Proceedings of the 3rd International Conference on Extend-
29
ing Database Technology: Advances in Database Technology. EDBT ’92.
Springer-Verlag, London, UK, pp. 56–71.
Hong, C., Yu, J., Wan, J., Tao, D., Wang, M., Dec 2015. Multimodal deep
autoencoder for human pose recovery. IEEE Trans. Image Process. 24 (12),
5659–5670.
Inca, 2016. Instituto Nacional de Cancer. http://www2.inca.gov.br/wps/
wcm/connect/tiposdecancer/site/home/colo_utero visited on 2017-11-
17.
Indyk, P., Motwani, R., 1998. Approximate nearest neighbors: Towards remov-
ing the curse of dimensionality. In: Proceedings of the Thirtieth Annual ACM
Symposium on Theory of Computing. STOC ’98. ACM, New York, NY, USA,
pp. 604–613.
Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadar-
rama, S., Darrell, T., 2014. Caffe: Convolutional architecture for fast feature
embedding. arXiv preprint arXiv:1408.5093.
Jones, E., Oliphant, T., Peterson, P., et al., 2001. SciPy: Open source scientific
tools for Python. http://www.scipy.org/, accessed on Dec 13, 2016.
JR Raphael, 2015. How google photos new custom-labeling feature can
help clean up your collection. https://www.computerworld.com/article/
2988232/android/google-photos-custom-labeling.html, accessed on
Oct 15, 2017.
Kang, Z., Peng, C., Cheng, Q., 2016. Top-n recommender system via matrix
completion. In: Proceedings of the Thirtieth AAAI Conference on Artificial
Intelligence (AAAI-16).
Kato, T., 1992. Database architecture for content-based image retrieval. In:
Proc. of SPIE Image Storage and Retrieval Systems. Vol. 1662. San Jose,
CA, USA, pp. 112–123.
30
Lecun, Y., Bottou, L., Bengio, Y., Haffner, P., 1998. Gradient-based learning
applied to document recognition. In: Proceedings of the IEEE. pp. 2278–2324.
Liu, C., Sharan, L., Adelson, E. H., Rosenholtz, R., 2010. Exploring features in
a bayesian framework for material recognition. In: CVPR. IEEE Computer
Society, pp. 239–246.
Lowy, I., 2011. A Woman’s Disease: The history of cervical cancer. Oxford.
Lu, Z., Carneiro, G., Bradley, A. P., Ushizima, D., Nosrati, M. S., Bianchi,
A. G. C., Carneiro, C. M., Hamarneh, G., March 2017. Evaluation of three
algorithms for the segmentation of overlapping cervical cells. IEEE Journal
of Biomedical and Health Informatics 21 (2), 441–450.
Miikkulainen, R., 2010. Topology of a Neural Network. Springer US, Boston,
MA, pp. 988–989.
Nci, 2017. National Cancer Institute. http://www.cancer.gov/types/
cervical/hp/cervical-screening-pdq visited on 2017-11-17.
of Health, M., 2016. Brazilian Unified Health System. http://portalsaude.
saude.gov.br/index.php/cidadao/principal/english, accessed on Dec
13, 2016.
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang,
Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A. C., Fei-Fei, L., 2015.
ImageNet Large Scale Visual Recognition Challenge. International Journal of
Computer Vision (IJCV) 115 (3), 211–252.
Shamoi, P., Inoue, A., Kawanaka, H., 2015. Deep color semantics for e-commerce
content-based image retrieval. In: Conf. Fuzzy Logic in Artificial Intelligence.
pp. 14–20.
Sharan, L., Rosenholtz, R., Adelson, E. H., 2014. Accuracy and speed of material
categorization in real-world images. Journal of Vision 14 (9), 12.
31
Szegedy, C., Ioffe, S., Vanhoucke, V., 2016. Inception-v4, inception-resnet and
the impact of residual connections on learning. Computing Research Reposi-
tory abs/1602.07261.
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z., 2015. Rethinking
the inception architecture for computer vision. Computer Vision and Pattern
Recognition.
Ushizima, D., Perciano, T., Krishnan, H., Loring, B., Bale, H., Parkinson, D.,
Sethian, J., Oct. 2014. Structure recognition from high resolution images of
ceramic composites. IEEE International Conference on Big Data.
Ushizima, D. M., Bale, H. A., Bethel, E. W., Ercius, P., Helms, B. A., Krish-
nan, H., Grinberg, L. T., Haranczyk, M., Macdowell, A. A., Odziomek, K.,
Parkinson, D. Y., Perciano, T., Ritchie, R. O., Yang, C., Sep 2016. Ideal:
Images across domains, experiments, algorithms and learning. The Journal of
The Minerals, Metals & Materials Society, 1–10.
van den Broek, E. L., van Rikxoort, E. M., Schouten, T. E., 2005. Human-
Centered Object-Based Image Retrieval. Springer Berlin Heidelberg, Berlin,
Heidelberg, pp. 492–501.
van der Walt, S., Sch¨onberger, J. L., Nunez-Iglesias, J., Boulogne, F., Warner,
J. D., Yager, N., Gouillart, E., Yu, T., the scikit-image contributors, 6 2014.
scikit-image: image processing in Python. PeerJ 2, e453.
Wan, J., Wang, D., Hoi, S. C., Wu, P., Zhu, J., Zhang, Y., Li, J., 2014. Deep
learning for content-based image retrieval: A comprehensive study. In: Pro-
ceedings of the ACM International Conference on Multimedia, MM ’14, Or-
lando, FL, USA. pp. 157–166.
Wang, B., Brown, D., Gao, Y., Salle, J. L., 2015. March: Multiscale-arch-height
description for mobile retrieval of leaf images. Information Sciences 302, 132
– 148.
32
Yu, Q., Liu, F., SonG, Y.-Z., Xiang, T., Hospedales, T., Loy, C. C., 2016. Sketch
me that shoe. In: Computer Vision and Pattern Recognition. pp. 799–807.
Zeng, Y., Xu, X., Fang, Y., Zhao, K., 2015. Traffic Sign Recognition Using
Deep Convolutional Networks and Extreme Learning Machine. Springer In-
ternational Publishing, Cham, pp. 272–280.
Zhang, L., Shum, H. P. H., Shao, L., 2016. Discriminative semantic subspace
analysis for relevance feedback. IEEE Trans. Image Processing 25 (3), 1275–
1287.
Zhang, R., Lin, L., Zhang, R., Zuo, W., Zhang, L., Dec 2015a. Bit-scalable deep
hashing with regularized similarity learning for image retrieval and person
re-identification. IEEE Trans. Image Process. 24 (12), 4766–4779.
Zhang, Y., Ozay, M., Liu, X., Okatani, T., 2015b. Integrating deep features for
material recognition. Computing Research Repository abs/1511.06522.
33
Appendix A Graphical user interface for reverse image search
pyCBIR also provides a graphical user interface (GUI) (Figure 7). The main
advantages of the pyCBIR GUI are: (a) The visual output shows both correct
and miss-classified results when ground-truth is available; (b) pyCBIR allows
to choose other feature extraction methods, besides CNN, that do not require
training or labeled samples; (c) pyCBIR contains ten different similarity metrics
for evaluation of the searching results; (d) One can load the database from
a comma-separated values (CSV) file or simply querying the file system. As
result, pyCBIR shows the k-ranked outputs (k= 10 in Figure 7), the first column
represents the query images. Each output has a bound box that corresponds
to the correct retrieved images (green box), and miss-classified images (red
box). For each execution pyCBIR saves the result as a portable network graphics
(PNG) file, as Figures 8, 9, 10 show. These figures illustrate the use of LeNet and
the Cell database for 11 query images chosen randomly and their corresponding
top-10 ranked outputs.
34
Figure 7: pyCBIR interface: retrieval options (left) with feature extraction, searching method,
retrieval number, and data paths, and retrieval results (right) with query images (first column)
and top matches; green border indicates match, and red, misclassified images.
Figure 8: Results using the LeNet and fibers database for 6 query images chosen randomly
and their corresponding top-6 ranked outputs.
35
Figure 9: Results using the LeNet and GISAXS database for 6 query images chosen randomly
and their corresponding top-6 ranked outputs.
36
Figure 10: Results using the Inception-ResNet-v2 and fmd database for 6 query images chosen
randomly and their corresponding top-6 ranked outputs.
37
... The search engine uses ElasticSearch to provide fast and high-volume content searching answers. It also contains a reverse image searching pipeline [7] that allows users to search for similar images based on deep neural network algorithms. The data connector is currently under development. ...
... ML Models Currently, the image segmentation application offers the following ML algorithms: decision tree-based Random Forests [8] for supervised learning, K-means clustering for unsupervised learning [10], and Mixed-Scale Dense Convolutional Networks (MSDNets) [11][12][13][14], a deep, fully convolutional neural network architecture that leverages dense interconnectivity between all network layers to drastically reduce the number of learnable network parameters and alleviate overfitting. 7 Adaptive GUI Components The model parameter layouts are automatically generated and updated when selecting a different model. The keywords to describe these Dash components are pre-defined in the content registry. ...
Conference Paper
Full-text available
Machine learning (ML) algorithms are showing a growing trend in helping the scientific communities across different disciplines and institutions to address large and diverse data problems. However, many available ML tools are programmatically demanding and computationally costly. The MLExchange project aims to build a collaborative platform equipped with enabling tools that allow scientists and facility users who do not have a profound ML background to use ML and computational resources in scientific discovery. At the high level, we are targeting a full user experience where managing and exchanging ML algorithms, workflows, and data are readily available through web applications. Since each component is an independent container, the whole platform or its individual service(s) can be easily deployed at servers of different scales, ranging from a personal device (laptop, smart phone, etc.) to high performance clusters (HPC) accessed (simultaneously) by many users. Thus, MLExchange renders flexible using scenarios---users could either access the services and resources from a remote server or run the whole platform or its individual service(s) within their local network.
... The search engine uses ElasticSearch to provide fast and high-volume content searching answers. It also contains a reverse image searching pipeline [7] that allows users to search for similar images based on deep neural network algorithms. The data connector is currently under development. ...
... ML Models Currently, the image segmentation application offers the following ML algorithms: decision tree-based Random Forests [8] for supervised learning, K-means clustering for unsupervised learning [10], and Mixed-Scale Dense Convolutional Networks (MSDNets) [11][12][13][14], a deep, fully convolutional neural network architecture that leverages dense interconnectivity between all network layers to drastically reduce the number of learnable network parameters and alleviate overfitting. 7 Adaptive GUI Components The model parameter layouts are automatically generated and updated when selecting a different model. The keywords to describe these Dash components are pre-defined in the content registry. ...
Preprint
Full-text available
Machine learning (ML) algorithms are showing a growing trend in helping the scientific communities across different disciplines and institutions to address large and diverse data problems. However, many available ML tools are programmatically demanding and computationally costly. The MLExchange project aims to build a collaborative platform equipped with enabling tools that allow scientists and facility users who do not have a profound ML background to use ML and computational resources in scientific discovery. At the high level, we are targeting a full user experience where managing and exchanging ML algorithms, workflows, and data are readily available through web applications. Since each component is an independent container, the whole platform or its individual service(s) can be easily deployed at servers of different scales, ranging from a personal device (laptop, smart phone, etc.) to high performance clusters (HPC) accessed (simultaneously) by many users. Thus, MLExchange renders flexible using scenarios---users could either access the services and resources from a remote server or run the whole platform or its individual service(s) within their local network.
... Chutel and Sakare [15,16] summarized existing reverse image search techniques. Gaillard and Egyed-Zsigmond [24,25], Araujo et al. [2], Mawoneke et al. [43], Diyasa et al. [22], Veres et al. [65], Gandhi et al. [26] evaluated their own reverse image search algorithms. Rather than evaluating our own system, we evaluate how well the state of the art publicly-available reverse image search engines function for discovering the same image again. ...
Preprint
Much computer vision research has focused on natural images, but technical documents typically consist of abstract images, such as charts, drawings, diagrams, and schematics. How well do general web search engines discover abstract images? Recent advancements in computer vision and machine learning have led to the rise of reverse image search engines. Where conventional search engines accept a text query and return a set of document results, including images, a reverse image search accepts an image as a query and returns a set of images as results. This paper evaluates how well common reverse image search engines discover abstract images. We conducted an experiment leveraging images from Wikimedia Commons, a website known to be well indexed by Baidu, Bing, Google, and Yandex. We measure how difficult an image is to find again (retrievability), what percentage of images returned are relevant (precision), and the average number of results a visitor must review before finding the submitted image (mean reciprocal rank). When trying to discover the same image again among similar images, Yandex performs best. When searching for pages containing a specific image, Google and Yandex outperform the others when discovering photographs with precision scores ranging from 0.8191 to 0.8297, respectively. In both of these cases, Google and Yandex perform better with natural images than with abstract ones achieving a difference in retrievability as high as 54\% between images in these categories. These results affect anyone applying common web search engines to search for technical documents that use abstract images.
... No entanto, como afirma [Ataky and Koerich 2022], a diferenciação dos tipos de tumores malignos e benignos, em um conjunto de dados como o BreaKHis, necessita detectar aspectos de forma para diferenciar a papila de um agrupamento desordenado de células. Motivado por essa afirmação, optamos por utilizar também as características extraídas a partir de redes neurais profundas (também conhecida como Deep Learning Features), semelhante ao trabalho de [Almaraz-Damian et al. 2020, Luz et al. 2022, Araujo et al. 2018, fornecendo uma maior diversidade de características a nossa metodologia. ...
Conference Paper
O câncer de mama é uma doença resultante da multiplicação anormal de células da mama, que formam massas. O diagnóstico precoce da malignidade do câncer de mama é fundamental para a sobrevivência do paciente. O Processamento Digital de Imagens em conjunto com técnicas computacionais de aprendizado de máquina possibilitam a criação de métodos para detecção da malignidade de tumores em imagens de exames histopatológicos. Assim, este artigo apresenta uma metodologia para o diagnóstico automático de lesões malignas de mama em imagens histopatológicas baseada na mensuração mútua de características de textura bioinspiradas e Deep Learning Features. Os resultados obtidos indicam que o método é promissor, alcançando acurácia de 92,9% com o classificador Random Forest.
... Após o pré-processamento nos recortes do trato urinário, as imagens foram utilizadas para extrair apenas as características das regiões indicadas. As arquiteturas utilizadas foram pré-treinadas na ImageNet para a extração de características e os resultados foram coletados após a sequência de convoluções e pooling [24], esta estratégia já foi utilizada em diversas aplicações de imagens [26,6,5,4]. ...
... Diversos trabalhos na literatura utilizam técnicas baseadas em deep learning para problemas de visão computacional, pois essas técnicas produzem excelentes resultados [2,3,10,31]. Em [38] foram avaliadas as redes neurais convolucionais U-net, Segnet e uma ResNet pré treinada, sendo a U-net a arquitetura que obteve melhor performance na segmentação de ovos em imagens de palhetas de ovitrampas. ...
... With a large number of parameters to train, regularization strategies such as dropout enable randomly selecting a subset of neurons to be trained in a single SGD iteration [146]. In addition to architecture parameters, data augmentation plays a major role in the performance and accuracy of DL models, and it generally considers geometric transformations, such as rotation and translation [5], as well as physics-aware noise models [81] to increase the number of samples available to the model. ...
Technical Report
Contributions from a workshop organized by The Center for Advanced Mathematics For Energy Research Applications
Conference Paper
A tuberculose é a doença infecciosa bacteriana que mais mata no mundo, cerca de 1,5 milhão de pessoas morreram todos anos. A doença é causa pelo Mycobacterium tuberculosis, sendo a principal forma de diagnóstico a baciloscopia de escarro, exame no qual se analisa a expectoração do paciente através do microscópio em busca de bacilos, sendo essa uma técnica tanto para o diagnóstico como para acompanhamento da doença. Diante disso, este trabalho tem por objetivo desenvolver uma metodologia para detecção automatiza do bacilo usando a RetinaNet. Um conjunto de 1218 imagens foi usado para a avaliação deste método. Os resultados foram animadores, obtendo uma precisão de 64,9%, recall de 70,4% e um score F1 de 61%. Finalmente, acreditamos que nosso método possui a capacidade de atuar no diagnóstico da tuberculose.
Article
Purpose Dunhuang murals are rich in cultural and artistic value. The purpose of this paper is to construct a novel mobile visual search (MVS) framework for Dunhuang murals, enabling users to efficiently search for similar, relevant and diversified images. Design/methodology/approach The convolutional neural network (CNN) model is fine-tuned in the data set of Dunhuang murals. Image features are extracted through the fine-tuned CNN model, and the similarities between different candidate images and the query image are calculated by the dot product. Then, the candidate images are sorted by similarity, and semantic labels are extracted from the most similar image. Ontology semantic distance (OSD) is proposed to match relevant images using semantic labels. Furthermore, the improved DivScore is introduced to diversify search results. Findings The results illustrate that the fine-tuned ResNet152 is the best choice to search for similar images at the visual feature level, and OSD is the effective method to search for the relevant images at the semantic level. After re-ranking based on DivScore, the diversification of search results is improved. Originality/value This study collects and builds the Dunhuang mural data set and proposes an effective MVS framework for Dunhuang murals to protect and inherit Dunhuang cultural heritage. Similar, relevant and diversified Dunhuang murals are searched to meet different demands.
Article
Full-text available
Closing the semantic gap in medical image analysis is critical. Access to large-scale datasets might help to narrow the gap. However, large and balanced datasets may not always be available. On the other side, retrieving similar images from an archive is a valuable task to facilitate better diagnosis. In this work, we concentrate on forming a search space, consisting of the most similar images for a given query, to be used for a similarity-based search technique in a retrieval system. We propose a two-step hierarchical shrinking search space when local binary patterns are used. Transfer learning via convolutional neural networks is utilized for the first stage of search space shrinking, followed by creating a selection pool using Radon transform for further reduction. The difference between two orthogonal Radon projections is considered in the selection pool to extract more information. The IRMA dataset, from ImageCLEF initiative, containing 14,400 X-ray images, is used to validate the proposed scheme. We report a total IRMA error of 168.05 (or 90.30% accuracy) which is the best result compared with existing methods in the literature for this dataset when real-time processing is considered.
Article
In this paper we propose a model retraining method for learning more efficient convolutional representations for Content Based Image Retrieval. We employ a deep CNN model to obtain the feature representations from the activations of the convolutional layers using max-pooling, and subsequently we adapt and retrain the network, in order to produce more efficient compact image descriptors, which improve both the retrieval performance and the memory requirements, relying on the available information. Our method suggests three basic model retraining approaches. That is, the Fully Unsupervised Retraining, if no information except from the dataset itself is available, the Retraining with Relevance Information, if the labels of the training dataset are available, and the Relevance Feedback based Retraining, if feedback from users is available. The experimental evaluation on three publicly available image retrieval datasets indicates the effectiveness of the proposed method in learning more efficient representations for the retrieval task, outperforming other CNN-based retrieval techniques, as well as conventional hand-crafted feature-based approaches in all the used datasets.
Article
The nearest neighbor problem is the following: Given a set of n points P in some metric space X, preprocess P so as to efficiently answer queries which require finding the point in P closest to the query point q in X. We focus on the particularly interesting case of the d-dimensional Euclidean space where X = R-d under some l-p norm.
Article
The design of wood-fiber based thermal insulating material with optimized properties and characteristics requires a good scientific knowledge of the latter. Currently, the technical, economic and ecological characteristics of a wood-fiber based composite mat are not well known. This article presents a methodology for the acquisition of expert knowledge so that the properties and characteristics can be modeled. The knowledge domain is multidisciplinary and it was delimited and decomposed into disciplines and domains of expertise. A panel of seven experts was constituted to cover the various disciplines and domains of expertise of the knowledge domain. Knowledge acquisition sessions, guided by the estimated importance and the availability of knowledge, were conducted using semi-structured interviews and the mapping of the existing causal relations between variables. A causal map was established to represent the causal knowledge of each of the experts and then, the established causal maps were assembled into a unique global causal map, which was subsequently validated by the experts. It contains the information necessary for formulating the properties and characteristics to be optimized, which were: thermal conductivity; thickness recovery of the material; the manufacturing cost and the product’s environmental impact. Properties and characteristics are function of raw material type, their morphological properties and the manufacturing process variables. This methodology makes it possible to establish which objectives to optimize and which variables influence each objective. Consequently, the objective functions of the optimization problem can be clarified, specified and modeled.
Article
A lot of images are currently generated in many domains, requiring specialized knowledge of identification and analysis. From one standpoint, many advances have been accomplished in the development of image retrieval techniques based on visual image properties. However, the semantic gap between low-level features and high-level concepts still represents a challenging scenario. On another standpoint, knowledge has also been structured in many fields by ontologies. A promising solution for bridging the semantic gap consists in combining the information from low-level features with semantic knowledge. This work proposes a novel graph-based approach denominated Semantic Interactive Image Retrieval (SIIR) capable of combining Content Based Image Retrieval (CBIR), unsupervised learning, ontology techniques and interactive retrieval. To the best of our knowledge, there is no approach in the literature that combines those diverse techniques like SIIR. The proposed approach supports expert identification tasks, such as the biologist's role in plant identification of Angiosperm families. Since the system exploits information from different sources as visual content, ontology, and user interactions, the user efforts required are drastically reduced. For the semantic model, we developed a domain ontology which represents the plant properties and structures, relating features from Angiosperm families. A novel graph-based approach is proposed for combining the semantic information and the visual retrieval results. A bipartite and a discriminative attribute graph allow a semantic selection of the most discriminative attributes for plant identification tasks. The selected attributes are used for formulating a question to the user. The system updates similarity information among images based on the user's answer, thus improving the retrieval effectiveness and reducing the user's efforts required for identification tasks. The proposed method was evaluated on the popular Oxford Flowers 17 and 102 Classes datasets, yielding highly effective results in both datasets when compared to other approaches. For example, the first five retrieved images for 17 classes achieve a retrieval precision of 97.07% and for 102 classes, 91.33%.
Article
In this work, the incorporation of content-based image retrieval (CBIR) into computer aided diagnosis (CADx) is investigated, in order to contribute to the decision-making process of radiologists in the characterization of mammographic masses. The proposed scheme comprises two stages: A margin-specific supervised CBIR stage that retrieves images from reference cases along with a decision stage that is based on the retrieved items. The feature set utilized exploits state-of-the-art features along with a newly proposed texture descriptor, namely mHOG, targeted to capturing margin and core specific mass properties. Performance evaluation considers the CBIR and diagnosis stages separately and is addressed by using standard measures on an enhanced version of the widely adopted digital database for screening mammography (DDSM). The proposed scheme achieved improved performance of CADx of masses in X-ray mammography experimentally compared to the state-of-the-art.
Conference Paper
Scientific user facilities — particle accelerators, telescopes, colliders, supercomputers, light sources, sequencing facilities, and more — operated by the U.S. Department of Energy (DOE) Office of Science (SC) generate ever increasing volumes of data at unprecedented rates from experiments, observations, and simulations. At the same time there is a growing community of experimentalists that require real-time data analysis feedback, to enable them to steer their complex experimental instruments to optimized scientific outcomes and new discoveries. Recent efforts in DOE-SC have focused on articulating the data-centric challenges and opportunities facing these science communities. Key challenges include difficulties coping with data size, rate, and complexity in the context of both real-time and post-experiment data analysis and interpretation. Solutions will require algorithmic and mathematical advances, as well as hardware and software infrastructures that adequately support data-intensive scientific workloads. This paper presents the summary findings of a workshop held by DOE-SC in September 2015, convened to identify the major challenges and the research that is needed to meet those challenges.
Article
Background: Immunofluorescence (IF) plays a major role in quantifying protein expression in situ and understanding cell function. It is widely applied in assessing disease mechanisms and in drug discovery research. Automation of IF analysis can transform studies using experimental cell models. However, IF analysis of postmortem human tissue relies mostly on manual interaction, often subjected to low-throughput and prone to error, leading to low inter and intra-observer reproducibility. Human postmortem brain samples challenges neuroscientists because of the high level of autofluorescence caused by accumulation of lipofuscin pigment during aging, hindering systematic analyses. We propose a method for automating cell counting and classification in IF microscopy of human postmortem brains. Our algorithm speeds up the quantification task while improving reproducibility. New method: Dictionary learning and sparse coding allow for constructing improved cell representations using IF images. These models are input for detection and segmentation methods. Classification occurs by means of color distances between cells and a learned set. Results: Our method successfully detected and classified cells in 49 human brain images. We evaluated our results regarding true positive, false positive, false negative, precision, recall, false positive rate and F1 score metrics. We also measured user-experience and time saved compared to manual countings. Comparison with existing methods: We compared our results to four open-access IF-based cell-counting tools available in the literature. Our method showed improved accuracy for all data samples. Conclusion: The proposed method satisfactorily detects and classifies cells from human postmortem brain IF images, with potential to be generalized for applications in other counting tasks.