PreprintPDF Available

Abstract and Figures

The applicability of computational analysis to paleontological images ranges from the study of the animals, plants and evolution of microorganisms to the simulation of the habitat of living beings of a given epoch. It also can be applied in several niches, such as oil exploration, where there are several factors to be analyzed in order to minimize the expenses related to the oil extraction process. One factor is the characterization of the environment to be explored. This analysis can occur in several ways: use of probes, extraction of samples for petrophysical components evaluation, the correlation with logs of other drilling wells and so on. In the samples extraction part the Computed Tomography (CT) is of importance because it preserves the sample and makes it available for several analyzes. Based on 3D images generated by CT, several analyzes and simulations can be performed and processes, currently performed manually and exhaustively, can be automated. In this work we propose and validate a method for fully automated microfossil identification and extraction. A pipeline is proposed that begins in the scanning process and ends in an identification process. For the identification a Deep Learning approach was developed, which resulted in a high rate of correct microfossil identification (98% of Intersection Over Union). The validation was performed both through an automated quantitative analysis based upon ground truths generated by specialists in the micropaleontology field and visual inspection by these specialists. We also present the first fully annotated MicroCT-acquired publicly available microfossils dataset.
Content may be subject to copyright.
Automated Microfossil Identification and Segmentation Using a Deep Learning
Approach
Carvalho, L.E 1,*, Fauth, G.3, Baecker Fauth, S.3, Krahl, G.3, Moreira, A. C.4,
Fernandes, C.P.4, von Wangenheim, A2,
1 Graduate Program in Computer Science - Federal University of Santa
Catarina - Campus Jo˜ao David Ferreira Lima -
Trindade - Department of Informatics and Statistics - Room 320
Florian´opolis/SC – CEP 88040-97.
2 Image Processing and Computer Graphics Lab - National Brazilian
Institute for Digital Convergence.
3 itt Fossil – Instituto Tecnol´ogico de Micropaleontologia, Universidade do
Vale do Rio dos Sinos,
Av. UNISINOS, 950, 93022-000 S˜ao Leopoldo, RS, Brazil.
4 Graduate Program in Materials Science and Engineering Federal
University of Santa Catarina, Florian´opolis, SC, Brazil
* lcarvalho@incod.ufsc.br
Abstract
The applicability of computational analysis to paleontological images ranges from the
study of the animals, plants and evolution of microorganisms to the simulation of the
habitat of living beings of a given epoch. It also can be applied in several niches, such
as oil exploration, where there are several factors to be analyzed in order to minimize
the expenses related to the oil extraction process. One factor is the characterization of
the environment to be explored. This analysis can occur in several ways: use of probes,
extraction of samples for petrophysical components evaluation, the correlation with logs
of other drilling wells and so on. In the samples extraction part the Computed
Tomography (CT) is of importance because it preserves the sample and makes it
available for several analyzes. Based on 3D images generated by CT, several analyzes
and simulations can be performed and processes, currently performed manually and
exhaustively, can be automated. In this work we propose and validate a method for
fully automated microfossil identification and extraction. A pipeline is proposed that
begins in the scanning process and ends in an identification process. For the
identification a Deep Learning approach was developed, which resulted in a high rate of
correct microfossil identification (98% of Intersection Over Union). The validation was
performed both through an automated quantitative analysis based upon ground truths
generated by specialists in the micropaleontology field and visual inspection by these
specialists. We also present the first fully annotated MicroCT-acquired publicly
available microfossils dataset.
Introduction 1
The applicability of computational image analysis to paleontological data encompasses 2
the possibility of identifying, reconstructing and visualizing microfossils in rock samples
3
not recovered by traditional extraction methodologies. It can also allow the taxonomic
4
1/16
.CC-BY-ND 4.0 International licenseIt is made available under a
was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint (which. http://dx.doi.org/10.1101/661694doi: bioRxiv preprint first posted online Jun. 6, 2019;
microfossil identification even before the physical extraction from the rock sample. In 5
addition, it is also pertinent to verify the microfossil position in a given sedimentary 6
stratum, which can help in taxonomic inference, whereas detailed positional information
7
is lost in the traditional preparation method [8]. Computational analysis of samples can
8
be applied in several niches, e.g. oil exploration, habitat reconstruction and geology and
9
paleontology research. 10
On the other side, in the oil exploration field, there are many factors to be taken 11
into consideration in order to minimize oil prospecting expenses. One factor are the 12
environmental conditions, which can be analyzed in multiple ways: use of probes, 13
extraction of samples for petrophysical components evaluation and correlation with logs
14
from other drilling wells. 15
In the area of samples extraction it is possible to perform different analyzes on a 16
given sample. Here Computed Tomography (CT) plays a central role. More specifically,
17
samples are analyzed with X-ray micro-tomography (MicroCT), which is a radiographic
18
imaging technique that produces 3D images of the material’s internal structure with a 19
spatial resolution of around 1 micrometer [11]. MicroCT is of significance because it 20
preserves the sample and makes it available for different studies. Based on MicroCT 21
generated data volumes, various 3D data analyzes and simulations can be performed 22
and several analysis processes can be computationally carried out and automated using
23
state-of-the-art Computer Vision (CV) techniques. These processes are currently 24
performed manually and in a time-consuming manner. One of these processes that can
25
undergo automation through CV is the microfossils identification and localisation in 26
rock samples, which is the focus of this study. 27
Objective and Strategy 28
In this work, we propose a CV workflow composed of computational methods that 29
starts with the MicroCT scanning process of a sample and ends with the fully 30
automated identification and localisation of individual microfossils in this sample. The
31
main research question we try to answer is: Is it possible to fully automatically and 32
reliably identify microfossils in carbonatic rock samples? 33
The novelty in our approach is the use of Deep Learning Convolutional Neural 34
Network (CNN) approaches for the identification and 3D segmentation of microfossils 35
directly in their deposition place. Our approach works directly on MicroCT data gained
36
from carbonatic rocks, without the need of any preparation or physical extraction. For
37
this purpose we developed an identification and segmentation strategy that employs a 38
special category of CNN models, namely Semantic Segmentation (SS) neural networks 39
and extends this model in order to be able to process whole 3D MicroCT sample 40
volumes. In order to identify the best model, we extend, train, test and compare a series
41
of different state-of-the art SS models. For the validation our approach we employ a 42
validation strategy where we compare our results to ground truths that were manually 43
generated by experienced micropaleontologists employing state-of-the-art automated 44
image segmentation validation algorithms. 45
State of the Art 46
Paleontology is a well-established science and its methodological intersection with the 47
computational field started to grow in the 1990’s. In the late 1980’s, most main 48
paleontology journals still show an irregular presence of computational methods: some 49
journal issues contained one article describing some computational method application,
50
others presented 2 or 3 articles and very few offered a larger number of them [24]. In 51
2/16
.CC-BY-ND 4.0 International licenseIt is made available under a
was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint (which. http://dx.doi.org/10.1101/661694doi: bioRxiv preprint first posted online Jun. 6, 2019;
the majority of journals and books the insertion of computational methods in the 52
paleontology field still looked uneven. 53
In the late 1990s, however, with the widespread use of medical CT, a growth in 54
research activities employing tomographic images occurred [24]. This boosted the 55
development of specialized software applications such as: DRISHTI, SPIERS, SEG3D, 56
IMAGEJ, MIMICS, VGSTUDIO MAX, AVIZO, AMIRA, Geomagic, Rhinoceros, 57
Imaris, ITK-SNAP and TurtleSEG. These specialized tools helped change how 58
researchers deal with specific problems in several fields, including geology and 59
paleontology, frequently with applications to oil and gas exploration. The applicability 60
of the set of tools and techniques that came to be called Virtual Paleontology (VP) 61
range from animal, plant and microorganisms evolution analysis until the virtual 62
reconstruction of a specific extinct environment [23]. 63
On the other side, the application of the study of microfossils to the area of oil 64
prospection had its first appearance in 1890 in Poland [21], but it was in the USA, in 65
1920, with the use of microfossils to identify the age of probes extracted from drilling 66
rigs, that a bigger advance in the development of the field of Applied Micropaleontology
67
was attained [14]. 68
In the last decade multiple research works contributed to improve the 69
micropaleontology field. The latest efforts aim at the use of VP associated with CNNs 70
in order to identify microfossils [5]. With this in mind, our research is focusing on 71
pursuing techniques that can identify microfossils on their deposition place, i.e., without
72
the need of previous physical isolation. For this purpose we research some CV fields 73
such as 3D segmentation applied to tomographic image and 3D object recognition, in 74
order to apply them to microfossil identification. 75
In the next subsections we summarize the results of the systematic literature reviews
76
(SLR) we performed in order to identify the state-of-the-art of the methods and 77
procedures that potentially could be used in microfossil image studies. These reviews 78
followed the approach originally proposed by [10] for SLRs in Computer Sciences, where
79
first we defined a research question: Is it possible to fully automatically and reliably 80
identify microfossils in carbonatic rock samples?. This broad question, in order to be 81
more manageable, was split into 2 topics, each of which was explored in depth in a 82
separate SLR: 83
Analysis of 3D segmentation methods applied to tomographic images, which could
84
possibly be used to segment microfossils [3]; 85
Analysis of methods used for 3D object recognition in a general context, aiming to
86
evaluate which methods could be applied to the microfossils field [4]. 87
The results of these two SLRs will be briefly summarized below. Since a detailed 88
description would exceed the scope of this paper, we refer to the referenced SLRs for 89
more details. 90
3D segmentation applied to tomographic images and 3D object 91
recognition 92
An initial analysis of image processing methods employed in the fossil identification area
93
showed difficulty in finding any works that explore microfossils. So we generalized our 94
search for methods in other similar areas. We started by performing a systematic 95
literature review on 3D segmentation methods applied to tomographic images [3]. 96
Several works were analyzed which comprehended a vast group of segmentation 97
methods. In our review, we noticed a tendency on the use of 3D segmentation methods
98
based on models and region growing. However, its use for fossil/microfossil 99
segmentation wasn’t noticed in the literature. 100
3/16
.CC-BY-ND 4.0 International licenseIt is made available under a
was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint (which. http://dx.doi.org/10.1101/661694doi: bioRxiv preprint first posted online Jun. 6, 2019;
We also analysed the field of 3D object recognition employing the same SLR 101
methodology [4]. In this SLR for 3D object recognition we could identify two general 102
pipelines. Both pipelines start with the data acquisition, which can basically vary 103
between 3D data (MRI, CT) or 2D data (RGB and RGBD cameras), pre-processing, 104
where methods for artifact removal, image enhancement and image simplification and 105
data representation, wherein several authors proposed a varying amount of different 106
object representations. Then, it comes the stage where both pipelines differ: In the first
107
pipeline, the data representation stage is used to describe and storage the object 108
representation chosen, which is later used for similarity calculation and object 109
identification; In the second pipeline, the data representation is employed for training a
110
specific recognition architecture, such as a CNN, which is afterwards used for other 111
objects recognition. Despite having found two general approaches for 3D object 112
recognition, we could not identify, in our review, the application of these approaches on
113
fossil identification. 114
Deep learning, object recognition and paleontology 115
The 3D object recognition area has, in the last few years, experienced a growing 116
boosted by the increased availability of new algorithms and models, 3D data and the 117
popularization of a varied palette of 3D sensors. Methods developed in this area find 118
application in a wide range of areas, from the field of robotics to the security and 119
surveillance domain. The general tendency in this area has been the use of Deep 120
Learning (DL) techniques. 121
DL is a form of machine learning that enables computers to learn from experience 122
and understand the world in terms of a hierarchy of concepts [6]. DL employs very deep
123
CNNs, with neural networks that sometimes consist of more than 100 layers, in contrast
124
to the Artificial Neural Networks (ANNs) employed between the 1980’s and 2000’s, that
125
typically employed only three layers. One key concept here is the Convolutional Layer 126
(CL), a feature extraction structure, first presented in [12], that allows the hierarchical 127
learning and representation of complex knowledge. Because DL CNNs gather knowledge
128
from examples, there is no need for a human computer operator to formally specify all 129
the knowledge that the computer needs. The capacity to represent a hierarchy of 130
concepts in a network dozens of CLs deep allows a DL CNN to learn complicated 131
concepts by building them out of simpler ones; a graph of these hierarchies would be 132
many layers deep [6]. 133
One work that employs DL for object recognition is the 3D Object Recognition with
134
Deep Belief Nets approach [15], where a network of symmetrically connected neuron-like
135
units, that performs stochastic decisions about whether to be on or off, is presented. In
136
the Convolutional-Recursive Deep Learning for 3D Object Classification, where 137
Socher [22] presents a model based on the combination of convolutional and recursive 138
neural networks for the feature learning and classification in RGB-D images. Another 139
approach is the Vision-based Robotic Grasping System Using Deep Learning for 3D 140
Object Recognition and Pose Estimation, where Jincheng Yu [30] presents a robotic 141
vision-based system, which can not only recognize different objects, but also estimate 142
their pose through the use of a deep learning model. The deep learning model used is 143
the Max-pooling Convolutional Neural Network (MPCNN).The 3D Object Recognition
144
and Pose estimation System Using Deep Learning Method is an approach where Dong 145
Liang [13] presents a 3D object recognition and pose estimation method using a deep 146
learning model. Recognizing multi-view objects with occlusions using a deep 147
architecture, where a method for efficient 3D object recognition with occlusion is 148
presented [26]. In the Convolutional neural network for 3D object recognition based on
149
RGB-D dataset, Jianhua Wang [25] employs a convolutional neural network model to 150
learn features from a RGBD dataset which are given to a linear SVM to classify objects.
151
4/16
.CC-BY-ND 4.0 International licenseIt is made available under a
was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint (which. http://dx.doi.org/10.1101/661694doi: bioRxiv preprint first posted online Jun. 6, 2019;
In the Convolutional Neural Network for 3D object recognition using volumetric 152
representation Xiaofan Xu [29] presents an efficient 3D object volumetric representation,
153
called Volumetric Accelerator (VOLA), which requires much less memory than a normal
154
volumetric representation. Properly, VOLA can reduce the computational complexity of
155
Convolutional Neural Networks (CNNs). None of these approaches tackles the problem
156
of identification of fossils embedded in rocks or any remotely similar problem. 157
Material and Methods 158
This section describes our datasets and the CV approach we developed for fully 159
automated microfossil identification and segmentation in carbonatic rock samples. 160
Figure 1.
Sergipe Basin map with the exact rock samples extraction location marked
with a red cross. The sample was collected at a depth of approximately 2,500 meters.
Source: the authors.
Material 161
We employed two datasets: a scanned carbonatic rock sample obtained from a drilling 162
rig probe and a set of manually isolated microfossil specimens that were afterwards 163
obtained from this sample. The sample was collected at the Sergipe Basin Quaternary 164
sediments (Fig. 1): 165
Acarbonatic rock sample was the material for which we developed our CV 166
approach. The MicroCT scanner used to digitise the sample is a Versa XRM-500
167
(ZEISS/XRadia) with the following specifications: best resolution (pixel size) 168
0.7µm
, voltage 30-160 kV, power 2-10 W, CCD cameras 2048x2048 pixel, optical
169
lenses 0.4X, 4X, 10X, 20X and 40X, a set of 12 filters for beam hardening 170
correction, maximum sample mass capacity 15 kg and sample size limit (diameter
171
/ height) 80/300 mm. The sample acquisition parameters we employed are: 172
Spatial resolution 1.08 mm, image size 956x1004x983, no filtering for beam 173
correction hardening, 10X optical lens, 30 kV / 2W, angular step 0.255 (1600 174
5/16
.CC-BY-ND 4.0 International licenseIt is made available under a
was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint (which. http://dx.doi.org/10.1101/661694doi: bioRxiv preprint first posted online Jun. 6, 2019;
projections) and exposure time 11 seconds. Figure 2 shows the rock sample and 175
an excerpt of one slice of its digitised result. 176
A set of manually isolated microfossil specimens, gained from the sample above, 177
was employed in this work for illustration purposes and as a guide in order to 178
allow us to know how the specimens in the rock sample would look like if cleanly 179
segmented. These microfossils were prepared in the laboratory, following specific 180
precautions so that there were no chemical and/or mechanical changes: (i) the 181
sediment was first immersed in deionized water for approximately 24hrs, aiming 182
the chemical disaggregation; (ii) then, it was washed with running water in a 183
63 µm
sieve; (iii) next, the material was dried at 40
degree
C for approximately 48
184
hours. After drying the samples, the main representative microfossils in the 185
sample were selected through a magnifying glass. In this work, the microfossils 186
specimens were stamped with the help of a multidimensional acquisition with the
187
Zeiss Discovery V20 stereoscope (Z-stak mode in AxioVision 4.8 software). Figure
188
3 presents these microfossils. 189
The dataset containing the MicroCT data and and the manually segmented images 190
annotated by specialists is available at: 191
http://www.lapix.ufsc.br/microfossil-segmentation 192
Figure 2.
Analyzed rock sample (A) and one of its microtomography 2D sections image
(B). Source: the authors.
6/16
.CC-BY-ND 4.0 International licenseIt is made available under a
was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint (which. http://dx.doi.org/10.1101/661694doi: bioRxiv preprint first posted online Jun. 6, 2019;
Figure 3.
Analyzed foraminifera specimens. Planktonic Foraminifera:1) Globogeri-
noides ruber; 2a-b) Candeina nitida; 3) Orbulina universa; 4) Globigerinoides trilobus
saculifera; 5) Globigerinoides trilobus; 6a-b) Globorotalia truncatulinoides. Genus of
benthic foraminifera: 1) Bulinina; 2a-c) Bolivinita; 3a-c) Cibicidoides; 4) Laticarinina; 5)
Uvigerina; 6) Sphaeroidina; 7) Siphonaperta; 8) Quinqueloculina. Source: the authors.
7/16
.CC-BY-ND 4.0 International licenseIt is made available under a
was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint (which. http://dx.doi.org/10.1101/661694doi: bioRxiv preprint first posted online Jun. 6, 2019;
Methods 193
The CV approach we present here is intended to be embedded into a broader workflow.
194
Figure 4 presents a general overview of this workflow. 195
Figure 4. General workflow. Source: the authors and [18].
Non-CNN Computer Vision Methods 196
As part of a prospective search for CV methods for the microfossils segmentation, before
197
we started investigating the use of CNNs, we performed a series of experiments using 198
non-CNN, i.e. conventional CV methods for the segmentation of the MicroCT volume. 199
We analysed an extensive list of conventional CV algorithms, searching for a 200
segmentation algorithm which, with the most appropriate input parameters, would 201
potentially generate satisfactory results. We identified as interesting and selected the 202
following classical segmentation algorithms: active contours [9], simple threshold and 203
threshold with OTSU [16], all taking into account the complete tomographic volume. In
204
order to find the best possible parameters for each segmentation algorithm, we 205
performed a broad parameter values search running the algorithms with varied 206
parameter sets. For the active contour algorithm, in order to find the best parameter 207
set, we employed a genetic algorithm to search through possible input parameters. For
208
this purpose we considered 5 input parameters: Number of steps, Sigma, Alpha, 209
Smoothing and Theta, in a broad range of values. 210
The results of these conventional CV algorithms were initially analysed through 211
visual inspection. For the conventional CV method that presented the best results to the
212
visual inspection, we subsequently analysed its results also quantitatively employing the
213
method described below. 214
8/16
.CC-BY-ND 4.0 International licenseIt is made available under a
was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint (which. http://dx.doi.org/10.1101/661694doi: bioRxiv preprint first posted online Jun. 6, 2019;
CNN-based Segmentation Methods 215
In the 3D object identification and segmentation field, the most successfully and 216
commonly used SS models in the last years have been the UNET and its variations. 217
The UNET architecture was presented in [19], where the authors show its use for 218
medical image segmentation. UNETs provide a general framework that can be 219
parameterized with a specific image classification CNN model. The UNET then 220
employs two slightly modified instances of this CNN, an encoder and a decoder, one for
221
image recognition and another, employed in reverse mode, for the segment mask 222
generation [1]: it uses the encoder to map raw inputs to feature representations and the
223
decoder to take this feature representation as input, process it to make its decision and
224
produce an output. As the UNET produces state of art semantic segmentation we 225
choose it as our starting point. 226
In our work, we initially employed the UNET model associated with a ResNet34 [7],
227
as our initial structure and added several state of art improvement methods. These 228
methods were: nearest neighbour interpolation and pixel shuffling [20], Leaky Relu [28]
229
for activation function and batchnorm [2] for batch normalization. This complete 230
structure is available at the fastai
1
framework, which is a framework over Pytorch that
231
contains several models, methods and state of art improvements. 232
Evaluation metrics 233
We evaluated each employed segmentation method comparing our results to the 234
ground-truths generated by the micropaleontologists using the Intersection Over Union
235
(IOU) [17] score, which quantifies similarity between finite sample sets, and is defined as
236
the size of the intersection divided by the size of the union of the sample sets. The 237
predicted labels were evaluated against a specialist generated ground truth. 238
Results 239
This section presents the results we obtained with the different algorithms and CNN 240
models we tested. 241
Conventional CV algorithms 242
The best results under the conventional CV algorithms we obtained employing the 243
active contours method. For this method we obtained an IOU score of 20%. The 244
obtained active contour segmentation result is shown in figure 5. 245
The results we obtained indicated that conventional CV methods may not be 246
indicated for the task of microfossil segmentation in rock samples. 247
CNN-based Semantic Segmentation 248
For our initial tests with SS CNN models, we started with the following structure: 249
UNET associated with ResNet34 and the binary cross entropy as its loss function, a 250
carbonatic rock sample with several microfossil specimens, scanned with the MicroCT 251
previously described resulting in a total of 1000 slices. We employed an Intel Core 252
i7-7700 CPU3.60GHz, 32GB memory computer and an NVIDIA GeForce GTX 1080 Ti
253
11GB GPU. 254
With this initial structure, our first experiment used only the microfossil annotation,
255
performing a binary classification between microfossil and everything else. To improve 256
1https://www.fast.ai/
9/16
.CC-BY-ND 4.0 International licenseIt is made available under a
was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint (which. http://dx.doi.org/10.1101/661694doi: bioRxiv preprint first posted online Jun. 6, 2019;
Figure 5.
Best microfossil segmentation that we could obtain using 3D active contours
(IOU = 20%). Source: the authors.
initial results some strategies such as data augmentation and transfer learning were 257
applied aiming to minimize the effect of having a small database. However, the obtained
258
IOU coefficient, used for the results evaluation, stopped in 40-45%. Trying to improve 259
the obtained result, we increased the number of classes to four, which divided the 260
everything else class into porous space, rock and background. With this number of 261
classes, the obtained IOU value went from 40-45 % to 75-76 % and stopped. One 262
problem with this approach is the data balance [31], i.e., the existence, in the samples, 263
of more annotations from the rock class in comparison with the microfossils. The figure
264
6 shows the result obtained after marking and training for the 4 classes setup for a 265
selected slice. 266
Still using the 4-classes approach, we adjusted the hyper-parameters and applied a 267
few performance-enhancing strategies [27], such as the progressive input image 268
resolution enhancing approach (Jeremy Howards, informal communication during 269
lecture at https://course.fast.ai/videos/?lesson=1 ) and explored data 270
augmentation and batch size in order to obtain a 98% IOU. The microfossil GT and its
271
resulting segmentation with this best IOU is shown in figure 7. 272
Our experiments resulted in an experimental environment, where we employed the 273
UNET as base model associated with other models in the decoder part (restnet18, 274
ResNet34, ResNet50, ResNet101), the Cross entropy as loss function and IOU for quality
275
assessment. Table 1 shows the IOU value for each method employed and figure 8 shows
276
the original image, its GT and the prediction results for all architectures we tested.
Method IOU score
UNET + ResNet34 0.76
UNET + ResNet18 + hyper-parameter optimization 0.97
UNET + ResNet101 + hyper-parameter optimization 0.97
UNET + ResNet34 + hyper-parameter optimization 0.98
UNET + ResNet50 + hyper-parameter optimization 0.98
Table 1.
Segmentation performance in terms of IOU value. Each method was evaluated
in a set of 1000 images from annotate microfossil data.
277
After segmenting we took the predicted mask for resnet34 and applied to the original
278
image. The result of this process is the easy identification of several microfossils. Figure
279
9 shows the mask overlap result, the identification of one microfossil specimen 280
10/16
.CC-BY-ND 4.0 International licenseIt is made available under a
was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint (which. http://dx.doi.org/10.1101/661694doi: bioRxiv preprint first posted online Jun. 6, 2019;
Figure 6.
Obtained microfossil segmentation results with the 4-classes approach. (A)
Original digitalised image. (B) Ground Truth manually generated by paleontologists.
(C) UNET + ResNet34. Source: the authors.
(highlighted with the red rectangle) followed by its magnified version and the correlation
281
of this the magnified version with other two versions of the same specimen (physically 282
isolated and digitized with the Versa XRM-500 MicroCT and the Zeiss Discovery V20 283
stereoscope). 284
Discussion and Conclusions 285
In this paper we present a new nondestructive processing pipeline for the identification
286
of microfossils in carbonatic rocks that allows for the fully automated segmentation of 287
these fossils without the need of previous physical separation. Furthermore, we 288
developed and validated the CV methods for this identification and segmentation. The
289
validation was quantitatively and automatically performed against a ground truth 290
manually generated by expert micropaleontologists. 291
An extremely relevant aspect of the developed pipeline for the field of paleontology, 292
more specifically micropaleontology, resides in the nondestructive character of the 293
method. In the micropaleontological study process an essential step is the samples 294
preparation, aiming to separate the microfossils from the other rock and/or sediment 295
constituents. In the traditional laboratory process, the samples are physically 296
disaggregated (ground or milled) and subsequently performed chemical disaggregation, 297
with addition of reagents (e.g., hydrogen peroxide and acetic acid). Both physical or 298
chemical disaggregation can alter or even destroy microfossils characteristics. In this 299
11/16
.CC-BY-ND 4.0 International licenseIt is made available under a
was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint (which. http://dx.doi.org/10.1101/661694doi: bioRxiv preprint first posted online Jun. 6, 2019;
Figure 7.
The ground truth (A) and the obtained microfossil segmentation result (B)
with 4 classes, automated hyper-parameters search and additional data augmentation.
Source: the authors.
Figure 8.
(A) Original digitalised image. (B) Ground Truth manually generated by
paleontologists. (C) UNET + ResNet18 + hyper-parameter optimization. (D) UNET +
ResNet101 + hyper-parameter optimization. (E) UNET + ResNet34 + hyper-parameter
optimization. (F) UNET + ResNet50 + hyper-parameter optimization Source: the
authors.
premise, the imaging method is crucial for the morphological characteristics 300
visualization as reliable as possible, allowing the individuals taxonomic recognition [8]. 301
Another relevant factor that makes this method interesting is that it allows the 302
microorganisms’ preservation analysis throughout geological time, as well as aspects of 303
fossilization, preservation and even position in which the microfossils are deposited 304
(preserved) in the rocks. It should be emphasized that studies with the taphonomic 305
approach are fundamental for paleoenvironmental conditions and/or diagenetic 306
alteration processes reconstitution over geological time. Also, the use of this tool is 307
strongly indicated in cases where it is extremely difficult to recover microfossils along 308
specific sections and/or intervals where the material (rock) is very compact and even 309
when it presents incipient diagenetic alteration. The microfossils identification is 310
strategic for the exploration of petroleum due to the use in biostratigraphy, which refers
311
to the use of microfossils from different groups to perform the temporal characterization
312
of sedimentary rock strata, fundamental for the petroleum industry. 313
A few observations can be performed from the obtained results: (i) the importance of
314
employing appropriate hyper-parameters such as learning rate, weight decay, momentum
315
12/16
.CC-BY-ND 4.0 International licenseIt is made available under a
was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint (which. http://dx.doi.org/10.1101/661694doi: bioRxiv preprint first posted online Jun. 6, 2019;
Figure 9.
Result of applying the obtained segmentation mask over the digitized image.
(A) contrast-enhanced 2D section image masked from the digitized MicroCT volume
with one specific microfossil highlighted in red. (B) Highlighted microfossil extracted
and magnified for visualisation. (C) Physically isolated microfossil digitized with The
Versa XRM-500 MicroCT. (D) Cibicidoides multidimensional acquisition with the Zeiss
Discovery V20 stereoscope. Source: the authors.
and batch size: with that hyper-parameters optimization we obtained an improvement 316
of 20 %. (ii) a network architecture grown does not imply in better results. It is 317
possible to observe that the ResNet34 shows the same results that the ResNet50 and a
318
better result when compared with a ResNet101. However, here we have a hardware 319
limitation: both, ResNet50 and ResNet101, couldn’t run with the full image resolution
320
on the 11 GB GeForce, even with a batch size of 1. Even so, the ResNet34 requires less
321
execution time and hardware. (iii) Analyzing the obtained result images and comparing
322
the visually against their Ground Truth (Figure 8), we still notice some small errors, 323
however, we understand that this can be be mitigated by adding more training samples,
324
together with GTs from experts, to the training set when applying this pre-trained 325
network to other, new, samples. Also, there are always new state of art improvements 326
that could by tried aiming to reduce even more this small errors. Figure 9 shows the 327
isolated microfossil digitalized and its correlated identification into the sample. 328
We understand that this process of microfossil identification without the need of 329
physically isolate the microfossil has the potential to allow the paleontologist to analyze
330
specific aspects of a sample such as the microfossils deposition. This is important for 331
some applications in the oil and gas industry. It also has the potential to improve the 332
paleontologist’s work, because instead of losing time to physically isolate the microfossil
333
he receives the microfossil already identified and can perform other analysis such as 334
class identification and orientation. 335
Threats to validity 336
We employed a dataset that, even if it consisted of a very large quantity of images and
337
presented a wide variety of microfossils, was gained from a sample obtained from a 338
singular drill probe. On the other side, the samples digitisation and annotation afford a
339
set of requirements such as: having a MicroCT working and available; the cost of the 340
MicroCT digitisation process; a storage to keep the amount of generated data; and a 341
paleontologist group to analyze and annotate each digitised sample. As the workflow we
342
13/16
.CC-BY-ND 4.0 International licenseIt is made available under a
was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint (which. http://dx.doi.org/10.1101/661694doi: bioRxiv preprint first posted online Jun. 6, 2019;
suggest in this paper is new, it was not in place on any of the partners that participated
343
in this work and to obtain more scanned and annotated samples was not possible at this
344
point of our research. 345
This could jeopardize the generalizability of this work, as we have not enough data 346
to claim that our approach will be successfully applicable to any carbonatic rock sample.
347
On the other side, our identification and segmentation results were extremely successful
348
and we understand that they are promising. From the authors’ knowledge there is not 349
any other publicly available carbonatic rock probe dataset, with or without 350
specialist-annotated microfossils. 351
In this context, we understand our work as pioneering and pointing to a promising 352
direction of research that can potentialize both, micropaleontological research and 353
associated economical activities, such as oil prospection. Our publicly available fully 354
annotated MicroCT database has also the potential to support research activities to be
355
performed by other groups. 356
Conclusions 357
Summarizing, this work presents: 358
the first fully annotated MicroCT-acquired publicly available microfossils dataset;
359
a baseline for microfossil segmentation and the comparison with deep 360
learning-based semantic segmentation and other segmentation architectures; 361
a methodology for microfossil studies through MicroCT-acquired digital models; 362
a tool for cases where it is extremely difficult to recover microfossils along specific
363
sections. 364
With the improvement in the available hardware some future works aim to reduce 365
even more the obtained errors by increasing the batch size and image resolution and 366
employ more state of art deep learning improvements. 367
Acknowledgments 368
This study was financed in part by the Coordena¸ao de Aperfei¸coamento de Pessoal de
369
N´ıvel Superior - Brasil (CAPES) - Finance Code 001 and by PETROBRAS through the
370
research project number 902. There are no conflicts of interest. 371
References 372
1. V. Badrinarayanan, A. Kendall, and R. Cipolla. Segnet: A deep convolutional 373
encoder-decoder architecture for image segmentation. IEEE transactions on 374
pattern analysis and machine intelligence, 39(12):2481–2495, 2017. 375
2. J. Bjorck, C. P. Gomes, and B. Selman. Understanding batch normalization. 376
CoRR, abs/1806.02375, 2018. 377
3. L. E. Carvalho, A. C. Sobieranski, and A. von Wangenheim. 3d segmentation 378
algorithms for computerized tomographic imaging: a systematic literature review.
379
Journal of Digital Imaging, 31(6):799–850, Dec 2018. 380
4.
L. E. Carvalho and A. von Wangenheim. 3d object recognition and classification:
381
a systematic literature review. Pattern Analysis and Applications, Feb 2019. 382
14/16
.CC-BY-ND 4.0 International licenseIt is made available under a
was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint (which. http://dx.doi.org/10.1101/661694doi: bioRxiv preprint first posted online Jun. 6, 2019;
5. Q. Ge, B. Zhong, B. Kanakiya, R. Mitra, T. Marchitto, and E. Lobaton. 383
Coarse-to-fine foraminifera image segmentation through 3d and deep features. In
384
Computational Intelligence (SSCI), 2017 IEEE Symposium Series on, pages 1–8. 385
IEEE, 2017. 386
6. I. Goodfellow, Y. Bengio, and A. Courville. Deep Learning. MIT Press, 2016. 387
http://www.deeplearningbook.org. 388
7. K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image 389
recognition. CoRR, abs/1512.03385, 2015. 390
8. S. Kachovich, J. Sheng, and J. C. Aitchison. Adding a new dimension to 391
investigations of early radiolarian evolution. Scientific Reports, 9, 2019. 392
9. M. Kass, A. Witkin, and D. Terzopoulos. Snakes: Active contour models. 393
International journal of computer vision, 1(4):321–331, 1988. 394
10.
B. Kitchenham. Procedures for Performing Systematic Reviews. Technical report,
395
Joint Technical Report TR/SE-0401, 2004. 396
11. E. N. Landis and D. T. Keane. X-ray microtomography. Materials 397
characterization, 61(12):1305–1316, 2010. 398
12.
Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied
399
to document recognition. In Proceedings of the IEEE, pages 2278–2324, 1998. 400
13. D. Liang, K. Weng, C. Wang, G. Liang, H. Chen, and X. Wu. A 3d object 401
recognition and pose estimation system using deep learning method. In 2014 4th
402
IEEE International Conference on Information Science and Technology, pages 403
401–404, April 2014. 404
14. E. Molina and ed. Micropaleontolog´ıa (2aedici´on). Prensas Universitarias de 405
Zaragoza. Colecci´on Textos Docentes, 2004. 406
15. V. Nair and G. E. Hinton. 3d ob ject recognition with deep belief nets. In 407
Proceedings of the 22Nd International Conference on Neural Information 408
Processing Systems, NIPS’09, pages 1339–1347, USA, 2009. Curran Associates 409
Inc. 410
16. N. Otsu. A threshold selection method from gray-level histograms. IEEE 411
transactions on systems, man, and cybernetics, 9(1):62–66, 1979. 412
17. M. A. Rahman and Y. Wang. Optimizing intersection-over-union in deep neural 413
networks for image segmentation. In International symposium on visual 414
computing, pages 234–244. Springer, 2016. 415
18. A. Rakhlin, A. Davydow, and S. Nikolenko. Land cover classification from 416
satellite imagery with u-net and lov´asz-softmax loss. In 2018 IEEE/CVF 417
Conference on Computer Vision and Pattern Recognition Workshops (CVPRW),418
pages 257–2574. IEEE, 2018. 419
19. O. Ronneberger, P. Fischer, and T. Brox. U-net: Convolutional networks for 420
biomedical image segmentation. CoRR, abs/1505.04597, 2015. 421
20. W. Shi, J. Caballero, F. Husz´ar, J. Totz, A. P. Aitken, R. Bishop, D. Rueckert, 422
and Z. Wang. Real-time single image and video super-resolution using an efficient
423
sub-pixel convolutional neural network. In Proceedings of the IEEE conference on
424
computer vision and pattern recognition, pages 1874–1883, 2016. 425
15/16
.CC-BY-ND 4.0 International licenseIt is made available under a
was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint (which. http://dx.doi.org/10.1101/661694doi: bioRxiv preprint first posted online Jun. 6, 2019;
21. A. Singh. Micropaleontology in petroleum exploration. In 7th International 426
Conference and Exposition of Petroleum Geophysics, pages 14–16, 2008. 427
22. R. Socher, B. Huval, B. Bhat, C. D. Manning, and A. Y. Ng. 428
Convolutional-recursive deep learning for 3d object classification. In Proceedings
429
of the 25th International Conference on Neural Information Processing Systems,430
NIPS’12, pages 656–664, USA, 2012. Curran Associates Inc. 431
23. M. Sutton, I. Rahman, and R. Garwood. Techniques for Virtual Palaeontology.432
New Analytical Methods in Earth and Environmental Science. Wiley, 1 edition, 433
2014. 434
24.
J. C. Tipper. Computer applications in paleontology: Balance in the late 1980s?
435
Computers & Geosciences, 17(8):1091 – 1098, 1991. 436
25. J. Wang, J. Lu, W. Chen, and X. Wu. Convolutional neural network for 3d 437
object recognition based on rgb-d dataset. In 2015 IEEE 10th Conference on 438
Industrial Electronics and Applications (ICIEA), pages 34–39, June 2015. 439
26. Y. Xia, L. Zhang, W. Xu, Z. Shan, and Y. Liu. Recognizing multi-view objects 440
with occlusions using a deep architecture. Information Sciences, 320:333 – 345, 441
2015. 442
27.
J. Xie, T. He, Z. Zhang, H. Zhang, Z. Zhang, and M. Li. Bag of tricks for image
443
classification with convolutional neural networks. arXiv preprint 444
arXiv:1812.01187, 2018. 445
28. B. Xu, N. Wang, T. Chen, and M. Li. Empirical evaluation of rectified 446
activations in convolutional network. CoRR, abs/1505.00853, 2015. 447
29. X. Xu, A. Dehghani, D. Corrigan, S. Caulfield, and D. Moloney. Convolutional 448
neural network for 3d object recognition using volumetric representation. In 2016
449
First International Workshop on Sensing, Processing and Learning for Intelligent
450
Machines (SPLINE), pages 1–5, July 2016. 451
30. J. Yu, K. Weng, G. Liang, and G. Xie. A vision-based robotic grasping system 452
using deep learning for 3d object recognition and pose estimation. In IEEE 453
International Conference on Robotics and Biomimetics, ROBIO 2013, Shenzhen,
454
China, December 12-14, 2013, pages 1175–1180, 2013. 455
31. W. Zhu, Y. Huang, H. Tang, Z. Qian, N. Du, W. Fan, and X. Xie. Anatomynet: 456
Deep 3d squeeze-and-excitation u-nets for fast and fully automated whole-volume
457
anatomical segmentation. CoRR, abs/1808.05238, 2018. 458
16/16
.CC-BY-ND 4.0 International licenseIt is made available under a
was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint (which. http://dx.doi.org/10.1101/661694doi: bioRxiv preprint first posted online Jun. 6, 2019;
... With the widespread use of industrial computed tomography (CT) in paleontology, we can obtain comprehensive microfossil information from the surface to the interior without destroying the microfossil. Carvalho et al. [19] used the U-NET [16] method to segment the 2D CT slices of planktonic microorganisms. Their method performs the segmentation and recognition of slice data about the background, porous space, fossil, and rock, not the identification of the species. ...
... The 2D images are collected using a digital camera mounted on a microscope or scanning electron microscope. Few other datasets, such as echinoderm datasets [21] and planktonic datasets [19], are used in 3D imaging methods such as CT scanning. However, in these datasets, 3D fossil models except 2D slices are not provided. ...
Article
Full-text available
Microfossils, tiny fossils whose study requires the use of a microscope, have been widely applied in many fields of earth, life, and environmental sciences. The abundance and high diversity of microfossils, as well as the need for rapid identification, call for automated methods to classify microfossils. In this study, we constructed an open dataset of three-dimensional (3D) microfossils and proposed a deep learning-based approach for microfossil classification. The dataset, named ‘Archives of Digital Morphology’ (ADMorph), currently contains more than ten thousand 3D models from five classes of 410 million-year-old fishes. The deep learning-based method includes data preprocessing, feature extraction, and 3D microfossil model classification. To assess the method performance and dataset representability, we performed extensive experiments. Compared with multiview convolutional neural networks (MVCNN) (91.54%), PointNet (64.13%), and VoxNet (78.15%), the method proposed herein had higher accuracy (97.60%) on the experimental dataset. We also verified data preprocessing (92.36%) and feature extraction (97.10%). We combined them to obtain the macroaveraging accuracy of 97.60%, the highest accuracy of 100%, and the lowest accuracy of 88.78%. We suggest that the proposed method can be applied to other 3D fossils and biomorphological research fields. The fast-accumulating 3D fossil models might become a source of information-rich datasets for deep learning.
Article
Full-text available
Vertebrate fossils/remains became recently significant in various study fields for determining the ecological biodiversity. However, with the great abundance of fossils/remains and their classes, there is a difficulty in identifying and detecting these classes. Hence, in this paper, an accurate machine learning classification technique is presented to differentiate automatically some types of 3D vertebrate remain images. A computed tomography (CT) scanner is utilized to construct a dataset of 3D images of some vertebrate remains found in Egypt. Adaptive enhancement and segmentation processes are applied to the dataset. The different selected geometric features are then extracted. Thus, the extracted features are classified using suitable machine learning classifiers (SVM, KNN, DTs). The automatic detection for the remains class, according to the extracted features, is obtained using the confusion matrix for the training and testing data points and the receiver operating characteristic (ROC) curve. The results confirmed an accurate technique with high performance.
Article
Full-text available
Knowledge of the detailed architecture of the earliest radiolarian microfossils is key to resolving the evolution and systematics of this important group of marine protozoans. Non-destructive methods for observing the complexity within the internal structures of their siliceous skeletons have long eluded paleontologists. By developing methodologies that overcome some limitations of existing micro-computed tomography (micro-CT) we demonstrate a technique with potential to provide new insight into their evolution. Using 3D micro-CT data to generate models for six well-preserved siliceous radiolarian skeletons from the Middle Cambrian Inca Formation in far north Queensland, Australia and the Middle Ordovician Piccadilly Formation, in western Newfoundland, Canada, we can reconstruct phylogenetic relationships amongst some of the earliest radiolarians. Better knowledge of early radiolarian morphologies clarifies the vital function of internal structures and hierarchical diagnosis across a range of taxonomic affiliations.
Article
This paper presents a systematic literature review concerning 3D segmentation algorithms for computerized tomographic imaging. This analysis covers articles published in the range 2006—March 2018 found in four scientific databases (Science Direct, IEEEXplore, ACM, and PubMed), using the methodology for systematic review proposed by Kitchenham. We present the analyzed segmentation methods categorized according to its application, algorithmic strategy, validation, and use of prior knowledge, as well as its general conceptual description. Additionally, we present a general overview, discussions, and further prospects for the 3D segmentation methods applied for tomographic images.
Conference Paper
We consider the problem of learning deep neural networks (DNNs) for object category segmentation, where the goal is to label each pixel in an image as being part of a given object (foreground) or not (background). Deep neural networks are usually trained with simple loss functions (e.g., softmax loss). These loss functions are appropriate for standard classification problems where the performance is measured by the overall classification accuracy. For object category segmentation, the two classes (foreground and background) are very imbalanced. The intersection-over-union (IoU) is usually used to measure the performance of any object category segmentation method. In this paper, we propose an approach for directly optimizing this IoU measure in deep neural networks. Our experimental results on two object category segmentation datasets demonstrate that our approach outperforms DNNs trained with standard softmax loss.
Conference Paper
Object recognition is fundamental to some high-level computer vision tasks such as image segmentation, object tracking and behavior analysis. The main objective of object recognition is to answer whether a specified object exists in a given image, so extracting representative features from images and training a right classifier become the key techniques in this area. In this paper, we use convolutional neural network model to learn features from RGB-D dataset which are then given to a linear SVM classifier to classify objects. As the number of images in RGB-D dataset is not big enough to retrain a deep neural network with high feature extraction accuracy, we fine-tune the caffe model which was trained on approximately 1.2 million RGB images from ImageNet database. While the framework of depth image is intrinsically different form RGB image, we transform the depth image into three channels and use the same method with the RGB image to extract features. We can achieve a classification accuracy of 91.35% which is much better than the state of the art.
Conference Paper
Recently, several models based on deep neural networks have achieved great success in terms of both reconstruction accuracy and computational performance for single image super-resolution. In these methods, the low resolution (LR) input image is upscaled to the high resolution (HR) space using a single filter, commonly bicubic interpolation, before reconstruction. This means that the super-resolution (SR) operation is performed in HR space. We demonstrate that this is sub-optimal and adds computational complexity. In this paper, we present the first convolutional neural network (CNN) capable of real-time SR of 1080p videos on a single K2 GPU. To achieve this, we propose a novel CNN architecture where the feature maps are extracted in the LR space. In addition, we introduce an efficient sub-pixel convolution layer which learns an array of upscaling filters to upscale the final LR feature maps into the HR output. By doing so, we effectively replace the handcrafted bicubic filter in the SR pipeline with more complex upscaling filters specifically trained for each feature map, whilst also reducing the computational complexity of the overall SR operation. We evaluate the proposed approach using images and videos from publicly available datasets and show that it performs significantly better (+0.15dB on Images and +0.39dB on Videos) and is an order of magnitude faster than previous CNN-based methods.
Conference Paper
There is large consent that successful training of deep networks requires many thousand annotated training samples. In this paper, we present a network and training strategy that relies on the strong use of data augmentation to use the available annotated samples more efficiently. The architecture consists of a contracting path to capture context and a symmetric expanding path that enables precise localization. We show that such a network can be trained end-to-end from very few images and outperforms the prior best method (a sliding-window convolutional network) on the ISBI challenge for segmentation of neuronal structures in electron microscopic stacks. Using the same network trained on transmitted light microscopy images (phase contrast and DIC) we won the ISBI cell tracking challenge 2015 in these categories by a large margin. Moreover, the network is fast. Segmentation of a 512x512 image takes less than a second on a recent GPU. The full implementation (based on Caffe) and the trained networks are available at http://lmb.informatik.uni-freiburg.de/people/ronneber/u-net .
Book
Virtual palaeontology, the use of interactive three-dimensional digital models as a supplement or alternative to physical specimens for scientific study and communication, is rapidly becoming important to advanced students and researchers. Using non-invasive techniques, the method allows the capture of large quantities of useful data without damaging the fossils being studied. Techniques for Virtual Palaeontology guides palaeontologists through the decisions involved in designing a virtual palaeontology workflow and gives a comprehensive overview, providing discussions of underlying theory, applications, historical development, details of practical methodologies, and case studies. Techniques covered include physical-optical tomography (serial sectioning), focused ion beam tomography, all forms of X-ray CT, neutron tomography, magnetic resonance imaging, optical tomography, laser scanning, and photogrammetry. Visualization techniques and data/file formats are also discussed in detail. Readership: All palaeontologists and students interested in three-dimensional visualization and analysis. New Analytical Methods in Earth and Environmental Science Because of the plethora of analytical techniques now available, and the acceleration of technological advance, many earth scientists find it difficult to know where to turn for reliable information on the latest tools at their disposal, and may lack the expertise to assess the relative strengths or limitations of a particular technique. This new series will address these difficulties by providing accessible introductions to important new techniques, lab and field protocols, suggestions for data handling and interpretation, and useful case studies. The series represents an invaluable and trusted source of information for researchers, advanced students and applied earth scientists wishing to familiarise themselves with emerging techniques in their field. All titles in this series are available in a variety of full-colour, searchable e-book formats. Titles are also available in an enhanced e-book edition which may include additional features such as DOI linking, high resolution graphics and video. Mark Sutton is a Senior Lecturer at Imperial College, London, UK specializing in Palaeozoic invertebrate palaeobiology and in three-dimensional visualization techniques. He is the primary author of the SPIERS software suite for palaeontological 3D reconstruction.
Conference Paper
This paper addresses a 3D object recognition and pose estimation method with a deep learning model. We train two separated Deep Belief Networks (DBN) before connecting the last layers together to train a classifier. By this means, we can simplify the complicated 3D problem to an easier classifier training problem. The deep learning model shows its advantages in learning hierarchical features which greatly facilitate the recognition mission. We apply the new Deep Belief Networks that combine the two traditional DBNs together and assign different poses of objects as different classes in the system. Besides, to overcome the shortcoming in object detection of the deep learning model, a new object detection method based on K-means clustering is presented. We have built a database comprised of 4 objects with different poses and illuminations for experimental performance evaluation. The experimental results demonstrate that our system with two cameras using the new DBNs can achieve high accuracy on 3D object recognition as well as pose estimation.