Content uploaded by Gino Caspari
Author content
All content in this area was uploaded by Gino Caspari on Oct 06, 2019
Content may be subject to copyright.
Convolutional Neural Networks for Archaeological Site
Detection Finding “Princely” Tombs
Gino Caspari*
Department of Archaeology, University of Sydney
and
Institute of Archaeological Sciences, University of Bern
Pablo Crespo†
Ph.D. Program in Economics, The Graduate Center, City University of New York
ABSTRACT
Creating a quantitative overview over the early Iron Age heritage of the Eurasian steppes is
a difficult task due to the vastness of the ecological zone and the often problematic access. Re-
mote sensing based detection on open-source high-resolution satellite data in combination with
convolutional neural networks (CNN) provide a potential solution to this problem. We create
a CNN trained to detect early Iron Age burial mounds in freely available optical satellite data.
The CNN provides a superior method for archaeological site detection based on the comparison
to other detection algorithms trained on the same dataset. Throughout all comparison metrics
(precision, recall, and score) the CNN performs best.
Keywords: CNN; object detection; archaeological remote sensing; convolutional neural net-
works; site detection
*Email: gino.caspari@sydney.edu.au .
†Email: pcrespo@gradcenter.cuny.edu.
1 INTRODUCTION1
The archaeology of the Early Iron Age in the Eurasian steppe deals with a vast and archae-2
ologically unexplored space between Eastern Europe and Mongolia. Despite the amount3
of research which has been conducted by scholars of the former USSR and the recent wave4
of new research coming out of these areas, a quantifiable understanding of the wealth of5
cultural heritage the Eurasian steppe harbors, has yet to be achieved. One of the problems6
which hinders researchers in gaining a wider understanding is the fact that the ancient7
cultural phenomena of the Early Iron Age did not neatly adhere to modern nation state8
borders (Figure 1). The current administrative, linguistic, and institutional fragmentation9
of this vast ecological zone –the steppe–makes research on the ground difficult. Remote10
sensing in combination with automatic or semi-automatic approaches for object detection11
have been established as a tool which largely disregards these problems and is able to pro-12
vide the basis for solutions (Caspari et al.,2014). Rooted in archaeological field research13
we combine open source data with convolutional neural networks (CNNs) in order to14
encompass the newest technological advances and use them to detect elite tombs of the15
Early Iron Age in the Eurasian steppe.16
When it comes to restrictive access for foreign researchers, the Xinjiang Uyghur Au-17
tonomous Region is maybe the most extreme example in the region. It is known for its18
political and ethnical issues (Clarke,2008) and recently received international media at-19
tention due to its increasingly oppressive counter-terrorism campaigns. (Roberts,2018)20
Notoriously hard to receive permits for archaeological fieldwork in the first place, spo-21
radic eruptions of ethnic conflicts between the Uyghur minority and Han Chinese major-22
ity in southern Xinjiang can abort long-planned projects last minute. Militarized border23
zones geographically curtail the areas archaeologists can work in. Even receiving a permit24
is not necessarily a guarantee that a field campaign can be conducted as planned, since25
the security apparatus is suspicious of any research activity by foreigners. Remote sensing26
mitigates these problems of access and the quality of publicly available high-resolution27
satellite data for Xinjiang has increased dramatically over the past years (Caspari,2018).28
1
Figure 1: The area of interest in the eastern Central Asian steppes
CNNs have become the standard tool in computer vision applications in recent years.29
Their particular use in pattern and shape recognition is noted and popularized with the30
LeNet-5 architecture for recognizing handwritten digits (Lecun et al.,1999). Their particu-31
lar usefulness is predicated on their ability to take inputs in the shape of multidimensional32
matrices (tensors), allowing them to work with patterns in multiple directions. Pixels ad-33
jacent to each other have influence on what is identified. Most other machine learning34
algorithms used in image recognition work with inputs that take the shape of single row35
vectors, eliminating the ability to harness the information given by adjacent pixels in36
an image that are not in the same row (the pixels right below, above or set diagonally).37
Hence, CNNs are much more sensitive to identifying subtle patterns in images.38
CNNs are a versatile solution to a plethora of problems in archaeology which works39
well when plenty of data is available. It comes at the cost of not being able to fully and an-40
alytically understand the process of solving the problem. The outcomes however can be41
2
qualitatively assessed and the solution is reproducible. Consistent with their versatility,42
CNNs have been used in different archaeological sub fields and for a diverse number of43
tasks from sex determination of skeletal remains to solving mapping tasks and extracting44
pottery depictions from archaeological publications.45
Unsurprisingly, being one of the main categories of archaeological material, research46
on ceramics has seen a wide application of CNNs already. From recognizing vessels47
to classifying ceramic form, to understanding and classifying the structure of ceramics,48
CNNs have been useful in solving complex problems. (Benhabiles and Tabia,2016) build49
a CNN to design local descriptors for content-based retrieval of three-dimensional (3-D)50
vessel replicas. (Pasquet et al.,2017) use a CNN to detect amphorae in an underwater set-51
ting, correctly mapping around 90% of the vessels. (Hein et al.,2018) automatically extract52
and classify ceramics based on textures. (Chetouani et al.,2018) enlist the help of a CNN in53
order to classify shards and understand the movement of potters. The ArchAIDE project54
experiments with CNNs to create an as-automated-as-possible tool for the classifications55
and interpretation of shards (Gualandi et al.,2016). A similar application is envisioned56
by (Tyukin et al.,2018) with the project Arch-I-Scan which aims to automatically classify57
Roman pottery.58
Interpreting other archaeological classes of information with CNNs is still in its in-59
fancy, but a number of examples can give the reader an idea of what might be possible60
if expertly human labeled datasets are combined with CNNs. (Byeon et al.,2019) au-61
tomatically identify and classify cut marks on bones. The authors manage to demon-62
strate that CNNs recognize and classify marks with a much higher accuracy rate than63
human experts. CNNs also perform exceptionally well when tasked with determining64
the sex of skeletal remains based on CT scans thereby eliminating human bias (Bewes65
et al.,2019). In the analysis and interpretation of ancient scripts, CNNs are also begin-66
ning to make an impact. First attempts have been made in indexing Mayan hieroglyphs67
(Roman-Rangel and Marchand-Maillet,2016;Can et al.,2018) and creating a standard-68
ized corpus of graphemes for the Indus Valley script (Palaniappan and Adhikari,2017).69
Further applications of CNNs in classifying, transcribing, and ultimately translating e.g.70
3
cuneiform are to be expected.71
CNNs have so far found the widest application in the area of archaeological remote72
sensing. This subfield of archaeology has the advantage of already working within a73
data-focused framework where classification and mapping tasks are common. The ap-74
plication of CNNs thus comes as an obvious extension of existing automated and semi-75
automated methods. Especially with LiDAR data collection, the data volume is becoming76
too large to be analyzed through a manual approach. CNNs help to mitigate this problem77
while simultaneously maintaining a consistent approach. (Trier et al.,2019) present a case78
study mapping a number of archaeological object classes on an island in Scotland based79
on airborne laser scanning data. (Guyot et al.,2018) detect Neolithic burial mounds in80
a LiDAR-derived digital elevation model. (Kramer et al.,2017) combine aerial imagery81
and LiDAR data to detect archaeological structures using previously identified archaeo-82
logical sites as training data. Other non-invasive methods like geophysical prospection,83
in particular ground penetrating radar (Travassos et al.,2018;Ishitsuka et al.,2018;Pham84
and Lef`
evre,2018), have also seen the application of CNNs. Our own case study in this85
paper belongs to the wide field of CNN applications which arose from image processing86
conceptually close to well-known and widely applied tasks like the recognition of faces87
and vehicles in images. CNNs can be useful in any area where remote sensing data needs88
to be searched for archaeological structures. The efficient processing of image data even89
allows for real-time decision making so that (Rutledge et al.,2018) are able to present an90
autonomous underwater robot system, which allows for the autonomous surveying of91
underwater sites including path planning and acquisition of high-resolution sonar data.92
Even art historical classifications and comparisons are supported by CNNs. With the93
appropriate amount of data, it becomes feasible to define stylistic affinity. First applica-94
tions can be seen in the classification of wall paintings in Pompeii (Schoelz,2018) and95
(Li et al.,2018) approach towards dating the Mogao Grottoes wall paintings based on96
drawing styles defined by a CNN. (Wang et al.,2017) use CNNs for defining similari-97
ties of Bodhisattva head images at the Dazu Rock Carving site and thus contribute to98
the reconstruction of some of the damaged rock carvings. An application of CNNs in the99
4
restoration of damaged archaeology can also be seen in a paper by (Hermoza and Sipiran,100
2017) where the authors try to predict the missing geometry of damaged archaeological101
objects opening a promising avenue of research into computer-supported reconstruction102
and restoration of archaeological artifacts.103
Wherever the exploration and analysis of large data sets is aided by recognizing com-104
plex patterns, CNNs can be helpfully employed. This leads to creative applications like105
a study by (Graham,2018). The authors identify sales of human remains on social media106
platforms using CNNs to detect patterns allowing for the classification of a combination107
of images and text ultimately aiding the reconstruction of sales networks.108
2 THE FIELD ARCHAEOLOGICAL FOUNDATION109
The Dzungaria Landscape Project, first established in 2014 (Caspari et al.,2017), relied on110
a large-scale automated survey by means of a trained Hough Forest algorithm (Caspari111
et al.,2014). Since then, machine learning has made enormous progress and the quality112
of the freely available satellite imagery has increased substantially. Through an intensive113
on-ground survey, the project was able to obtain a dataset of archaeological structures in114
the foothills of the Chinese Altai Mountains. Accumulations of very large Early Iron Age115
burial mounds early on caught the attention of the researchers (Figure 2and Figure 3).116
It soon became clear that the southern Altai Mountains, in particular the area around117
Heiliutan were a focus of intense funerary building activity, especially during the first118
millennium BCE (Caspari et al.,2017). A number of different Early Iron Age material119
cultures in the first millennium BCE can be identified (van Geel et al.,2004). Here, we are120
specifically focusing on the funerary architecture of the Saka culture due to its relative121
homogeneity. There is a plethora of architectural remains from the Early Iron Age present122
in the survey area, but many of them are too small to be reliably detected in open source123
optical satellite data (Caspari,2018). By far the most dominant anthropogenic features of124
the landscape are large burial mounds with circular ditches around them.125
These monuments of which 59 (Caspari et al.,2017;Caspari,Forthcoming) were mapped126
5
Figure 2: Map generated during the 2015 survey of the Heiliutan Valley in northern Xinjiang. Large Saka
burial mounds tend to cluster. Dark grey = mound. Light grey = ditch.
during the field surveys, bear a striking resemblance to so-called Saka burials from the127
Semirechye (eastern Kazakhstan), the northern Tianshan and the Ili Valley. The term128
“Saka” is a relatively unspecific ethnic term stemming from Persian sources as (P’iankov,129
1994) elaborates and thus should only be used with the appropriate care. Over decades of130
archaeological research in what is now eastern Kazakhstan, the term, however, has come131
to denote a specific Early Iron Age material culture and is seen as a technical term among132
many researchers without implying the potentially problematic ethnic connotations. The133
Saka material culture in eastern Kazakhstan is dated to the 7th/6th cent. BCE and the134
3rd cent. BCE (Parzinger,2011). Saka burials have so far mainly been known from the135
Semirechye (Davis-Kimball,1991;Gass,2011;Nagler,2009;Nagler et al.,2010) and have136
only recently been compiled in a large study by (Gass,2016).137
The connections of Saka-related material culture into northern Xinjiang have been ana-138
lyzed (Davis-Kimball,1991;Chen and Hiebert,1995) but due to the fragmentary nature of139
archaeological data in Xinjiang have been assumed to mainly be confined to the western-140
most stretches of Xinjiang, namely the Ili Valley and the northern Tianshan. Older Chinese141
6
research has looked at these connections from the eastern side (Wang,1985) working on142
a number of sites which show clear relations to eastern Kazakhstan like Tiemulike (Insti-143
tute of Archaeology of the Xinjiang Academy of Social Sciences,1988), Dacaotan (Institute144
of Archaeology of the Xinjiang Academy of Social Sciences,1985), and Zhongyangchang145
(Institute of Archaeology of the Xinjiang Academy of Social Sciences,1986). The architec-146
tural features of the mounds in the Heiliutan Valley, however, suggest a strong cultural147
connection during the middle of the first millennium BCE all the way into the foothills of148
the Chinese Altai Mountains.149
Figure 3: Architectural features of Saka burial mounds.
The large burial mounds of the Saka material culture usually were built from a mixture150
of pebbles, larger round stones and earth from the alluvial terraces. Mounds are typically151
elevated and surrounded by circular rings of stones or circular ditches (Figure 3). Both152
ditch and mound are clearly visible in open source optical satellite data. The profile of153
the Saka burial mounds typically shows steep sides (sometimes three steep sides and one154
with a gentler slope) and a flat top. Maximum diameters in the Heiliutan area are typi-155
cally between 15.5m and 34.1m (89.5 %) and therefore well within the range of detectable156
objects in open-source satellite imagery (Figure 4). A group of outliers has diameters of157
over 40m. The average diameter of Saka mound in the Heiliutan Valley is 27.93m (median158
26.8m).159
The heights of these burial mounds average at 1.97m (median 1.4m). The largest160
7
Figure 4: Scatterplot of Saka burial mound diameters, notice the cluster of extraordinarily large mounds
which clearly set themselves apart from the smaller ones. These “princely” tombs are easily recognizable
in open source remote sensing data.
mounds have a height of up to 6.5m. Both diameters and heights of Saka burial mounds161
in the Chinese Altai are comparable to Saka burial mounds from Issyk, Kegen and other162
cemeteries with princely tombs (Gass,2011;Samashev,2007). The Saka burials of the163
Heiliutan area are all practically identical in their composition of building materials and164
the profile of the mound. The largest mound has a diameter of 53.5m, a height of 6.0m,165
and the circular ditch measures 91.5m across. This type of burial usually has a 5:3 ratio166
between circular ditch diameter and mound diameter which again matches Saka buri-167
als from the Semirechye (Gass,2011). The large accumulation of Saka burials (Figure 2)168
with a length of almost 2km are visible from afar and one of the dominant archaeological169
places within the landscape of the Heiliutan Valley. One of these monuments has been170
excavated in 2016 by the Institute of Archaeology of the Xinjiang Academy of Social Sci-171
ences but has yet to be published like many other burial mounds in the area of interest172
the grave was unfortunately looted.173
8
3 CONVOLUTIONAL NEURAL NE T W ORK S174
CNNs are a specific type of neural network architectures popularized by (Lecun et al.,175
1999), which can take grid-like inputs. Our particular case is a two-dimensional grid of176
pixels, in which each pixel can be considered a source of information in the same way as177
a cell in a row of tabular data would be. Note that images can be interpreted as numerical178
grids if each pixel on each channel (RGB) is given a numerical value based on the intensity179
of the color from 0 to 255. In order to understand how CNNs work, we will define them180
as the junction of three different operating components as types of “layers”:181
• convolutional layers182
• pooling layers183
• fully connected layers184
The convolutional and pooling layers are used to identify and summarize patterns185
in the data. The fully connected layers are used to utilize these summaries as inputs186
of a classification problem, helping us make the determination of whether our (in this187
case) image belongs to a specific class based on the model. An example diagram of these188
architectures is presented in Figure 5.189
Figure 5: CNN architecture example diagram
9
3. 1 CONVOLUTIONAL LAYER S190
The network uses convolutional layers to detect simple features or patterns in the data.191
The patterns can be small and simple, but the combination of multiple simple patterns192
allow for the search of complex forms.193
Each convolutional layer is composed of two stages: convolution and detection. In the
first stage a set of convolution operations are run on the input grid. A kernel or filter is
moved sequentially on the input generating outputs on each position they take. These
are defined by:
hi,j=m
∑
h=1
m
∑
l=1
wk,lxi+k−1,j+l−1
where hi,jis the output of the convolution at position (i,j),xi+k−1,j+l−1portion of the input194
grid over which the filter is applied, wk,lis the filter at position (k,l), and mdetermines195
the height and width of the filter.196
Hence, the filter is a weighting square grid, which is applied to the larger input grid to197
highlight specific patterns within it. The higher the value of a convolution operation, the198
higher the chance that the pattern that the filter searches for is found. Figure 6highlights199
this process by exemplifying it. We can then use different filters to find different patterns.200
For example, using a filter of the form:201
0 1 0
0 1 0
0 1 0
would be used to identify vertical lines.202
Filters can be, and often are, initialized at random to pick on many and varied subtle203
patterns within the input grid. Each convolutional layer runs several filters on the inputs204
and outputs grids for each.205
For the detection or activation stage, the results from the convolution stage are taken206
10
Figure 6: Filter applied over a matrix
and passed through a function. We used the ReLU (Rectifying linear unit), which is de-207
fined as:208
σ(x)=max (0, x)
This specific function grants extra weight to all of the non-negative units. Since the209
filters can have negative values, this activation allows for extra salience of patterns.210
After activation, the outputs of the convolutional layer are used as inputs for the pool-211
ing layers.212
3. 2 POO L IN G LAYE RS213
A pooling layer summarizes the resulting activated grids through max pooling. This214
work uses max pooling. A new grid is constructed from each activated grid by assigning215
each entry of it to the maximum value of 2 ×2 subgrids. An example is shown in Figure 7.216
11
Figure 7: 2 ×2 max pooling
At this point, the practitioner has two choices: to summarize the results once more217
through a fully connected layer (see Section 3.3) or to repeat the process of passing the218
outputs through a convolutional layer and pooling layer once again. This is what is meant219
by making a network ”deeper.” Passing the data through extra convolutional and pooling220
layers allows for further and more subtle evaluation of patterns. This is said to elevate221
the complexity of the model.222
3. 3 FUL LY CO N NE C TE D L AYERS223
Fully connected layers have the basic structure of artificial neural networks or multilayer224
perceptrons. Their task is to take the outputs from the last pooling layers and classifying225
them into specific categories. Before passing the grids resulting from the pooling layers226
to the fully connected layers, the grids are ”flattened.” Meaning the results from all the227
resulting grids are combined into a single row vector. The resulting elements of the vector228
12
produced after the flattening are then linearly combined. This means they are written as:229
β0+∑
i
βi×elementi
where β0is called the ”bias” and the rest of the βiare called the ”weights.” Each of230
these linear combinations is passed through an activation function yet again generating a231
single number output. This particular structure of operations constitutes what is called a232
“neuron.” The set of these activated linear combinations is called a ”hidden layer.” The233
practitioner can add extra complexity to the model by using the outputs of each hidden234
layer as the inputs for a new fully connected layer. The practitioner has to choose both235
the number of neurons and the depth of the model by choosing the number of hidden236
layers. Once it has been decided that the architecture is deep enough, in the case of binary237
classification such as ours, a final fully connected layer is created yielding a single linear238
combination and the activation of this one is what we consider the “output layer” of the239
network usually normalized between zero and one thanks to the activation function (a240
sigmoid function1is a common choice). The corresponding number in this output layer241
is mapped to a specific class according to a threshold. For example, binary classes code242
their “target” variable as either having values of zero or one. We can say that an output243
layer with a value larger than 0.5 will predict the input belongs to class one and to class244
zero otherwise.245
The question remains on how these networks are actually generalized for large sam-246
ples of images. Consider that a sample of already identified and labeled images, which247
we call our ”target” is compiled in a vector y. Then, we would like to make sure that over-248
all the values of the distinct weights and biases are chosen such that the resulting output249
layer is as close to the target as possible for representative samples. In this case, we would250
like to choose values such that the following distance is minimized via a process called251
“backpropagation” for nobservations in a sample:252
n
∑
i=1
1
2yi−outputi2(1)
1
σ(x)=1
1+e−x
13
It needs to be noted that deeper networks with hidden layers and many neurons in253
each are capable to make the distance in Equation (1) very small for a sample due to added254
complexity. This however does not come without the risks of making the network attuned255
to only the images fed through the specific sample and incapable of generalizing to others256
from the same population of objects but that were not present in the sampled data. This257
process is called ”overfitting.” Hence, the practicioner needs to be sure to design their258
architecture in a fine balance. The network must be capable to process complex enough259
patterns for classification, but not be so overly attuned to the sample data such that it fails260
classifying data from the same population outside the sample.261
4 AP P L ICATION OF A CNN O N “PRINCELY”TOMB CLASSIFICATION262
4. 1 DATA PRE P RO C ES S IN G263
Using open-source optical satellite data from Google Earth (100 x 100 pixels) of tombs264
with known locations and arbitrary patches of land around them, a labelled dataset was265
created with the following labelling scheme:266
y=
0 if tomb present
1 if tomb absent
The dataset is composed of 1212 images with 169 including tombs. Typical observa-267
tions of each case are presented in Figure 8. It is important to note that the distinctive268
shape of the tombs makes them easily distinguishable from other patches of land even in269
low-resolution data.270
In order to verify that the model we fit is a good model, the data is split in two portions,271
one for fitting the model (training data) and one for looking at how well it generalizes272
(testing and validation). The testing and validation data are simply datasets that don’t273
undergo the fitting process. Since the data belongs to the same population as the training274
data does, assessing the goodness of fit of the model in these can give us a good idea of275
14
Figure 8: Top: Examples of images labelled as tomb absent. Bottom: Examples of images labelled as tomb
present.
how well the model generalizes and it helps identify overfitting.276
The data was split with 75% used for training and 25% used for testing and valida-277
tion. Since the images containing tombs are heavily underrepresented in the dataset,278
augmentation is necessary for training appropriately with multiparameter methods such279
as convolutional neural networks. In this case 655 new images were synthesized from the280
training data with tombs present. The new samples are created by modifying the existing281
ones through randomly zooming, shearing, and performing horizontal flips. Note that282
augmented images are only used during the training stage. Using them for testing or val-283
idation is inappropriate due to their high correlation with the images that they originate284
from.285
4.2 CNN ARCHITECTURE286
The CNN utilized for our problem was trained on the augmented data mentioned at the287
beginning of this section. The full summary of the architecture is detailed in Figure 9.288
The CNN was trained in Keras, a Python module which uses Google’s TensorFlow as a289
15
backend in our case.290
Figure 9: Keras model summary
The architecture shown is relatively simple consisting of 3 convolutional and pooling291
layers with ReLU activations and two fully connected layers before the final activation292
with a sigmoid. The diagram specifies the dimensions of each. For example, the first293
convolutional layer uses 32 filters and outputs a 98 ×98 grid. A natural question not nec-294
essarily explained in the prior sections is what the ”dropout” row means in the diagram.295
Dropout is a regularization technique which disallows certain linear combinations to ex-296
ist at random during the optimization step. This technique helps ”regularize” or penalize297
overfitting. Hence, making sure the model is generalizable.298
4. 3 BENCHMARKS AND RESULTS299
Judging the accuracy of the convolutional neural network specified in Section 4.2 requires300
plausible methods for benchmarking. Furthermore, the true metrics of accuracy we are301
interested in are those in the validation data. These would be the ones that would tell302
16
us how each model works under observations not seen by the training model. As such,303
three models were chosen: a biased random guess, a support vector classifier with a linear304
kernel and a support vector classifier with a radial basis function kernel.305
Random guess is useful as a comparative benchmark since it selects its output by sim-306
ple random chance. In order to make the benchmark tougher, we biased the probabilities307
of classifying an image as containing a tomb to be the proportion of the actual number of308
tombs in the validation set309
Since the shapes of the tombs are simple and easily distinguishable, it stands to rea-310
son that simpler and more tractable classification methods could work as long as they311
allow for flexible boundary classification. Support vector machines with kernels as pro-312
posed by (Boser et al.,1992) work as sensible and powerful alternatives to deep learning313
models. We attempt using two types of kernels in this study, the linear kernel and the314
radial basis function kernel which both allow for different transformations of the data315
pre-classification. Each of these models have their hyperparameters adjusted via 5 -fold316
cross validation.317
We use three measures to compare the predictions made by the classifiers accuracy:318
Precision, Recall and F1score. Definitions below:319
Precision =# of True Positives
# of True Positives + # of False Positives
Recall =# of True Positives
# of True Positives + # of False Negatives
F1score =2
1
Recall +1
Precision
Precision simply gives the rate of correctly classified objects among all classified objects320
with the same label. Recall gives the rate of correctly labeled objects among all actual321
objects with that label. F1score gives a balanced measure of both. All tables and figures322
17
comparing models in this paper use these measures.323
Table 1and Table 2encapsulate the results obtained from the trained models making324
predictions on the validation set. We can appreciate that for both, images which con-325
tained tombs or those which did not, the CNN performs best. Interestingly, despite the326
fact that SVMs worked under training with an augmented dataset, their performance in327
identifying pictures with tombs was not comparable to that of the neural network. This328
is surprising since the tomb shapes are mostly simple to the naked eye, hence nonlinear329
classification should work well. The reason lies in the likelihood of the SVM models con-330
taining many false positives (objects that are not tombs being identified as such). This331
occurs because other images that might just simply be circular in shape are likely to be332
picked up by the SVM models as tombs . This has been an issue with other detection333
algorithms before e.g. (Caspari et al.,2014) . The big advantage of our architecture relies334
on the quantity of filters used being able to recognize higher subtlety in the patterns of335
the trained dataset that might identify a tomb, beyond just the circular shape. Figure 10336
summarizes both tables and includes a bar for Average/Total, which has a weighted av-337
erage for both classes under the measure. Showing that overall the CNN is the better338
performing model.339
Model Precision Recall F1score
Random Guessing 0.64 0.65 0.65
SVM with linear kernel 0.9 0.96 0.94
SVM with RBF kernel 0.96 0.97 0.97
CNN 0.98 1 0.99
Table 1: Classification metrics for validation data set pictures without tombs.
Model Precision Recall F1score
Random Guessing 0.59 0.58 0.59
SVM with linear kernel 0.29 0.15 0.20
SVM with RBF kernel 0.76 0.67 0.71
CNN 1 0.84 0.91
Table 2: Classification metrics for validation data set pictures with tombs.
18
Figure 10: Result Summaries
5 CONCLUSIONS340
The distinctive shape of the early Iron Age Saka burial mounds and their relatively large341
size make them an ideal training set for machine learning algorithms which can be run342
on open source satellite imagery. Our CNN outperforms other methods and provides343
a valuable approach for the large-scale detection of elite burial mounds in the Eurasian344
steppes. In this way a macro-regional survey of northern Xinjiang and the adjacent ar-345
eas could be conducted in order to assess the spatial distribution of this monument type346
and possibly revise the geographical extent to which Saka-related material culture spread347
through Eastern Central Asia during the first millennium BCE. The method has the clear348
advantage that all analyses can be conducted without the access problems archaeological349
projects in the region usually have to deal with.350
Preliminary satellite imagery analysis has developed into playing a major role in plan-351
ning and implementing archaeological field research (Lasaponara and Masini,2012;Cas-352
19
pari et al.,2019). But automatic feature detection has yet to become accessible to a wider353
range of researchers in order to be widely applied. A number of attempts have been354
made to connect archaeological surveys with automatic detection of features (Caspari355
et al.,2017;Trier et al.,2009;Trier and Pil,2012), however, it is not commonly used by356
practitioners. Both the complexity of the method which often demands cooperation with357
computer science specialists, and the lack of awareness for the possibility play a role in358
the so far rare application of automatic detection algorithms by archaeological practition-359
ers. The authors do not expect to see a widespread application unless intuitive tools are360
developed for feature selection, algorithm training and visualization of ready-to-use re-361
sults.362
20
REFERENCES
BENHABILES, H. and TABIA, H. (2016). Convolutional neural network for pottery re-
trieval. Journal of Electronic Imaging,26 (1).
BEWES, J., LOW, A., MO RPH ETT, A., PATE, F. D. and HENNEB ERG , M. (2019). Artificial
intelligence for sex determination of skeletal remains: Application of a deep learning
artificial neural network to human skulls. Journal of Forensic and Legal Medicine,62, 40 –
43.
BOSER, B. E., GUY ON, I. M. and VAPNIK, V. N. (1992). A training algorithm for opti-
mal margin classifiers. In Proceedings of the 5th Annual ACM Workshop on Computational
Learning Theory, ACM Press, pp. 144–152.
BYEON, W., DOMNGUEZ-RODRIGO, M., ARAMPATZIS, G., BAQUEDANO, E., YRAVEDRA,
J., MAT-GONZLEZ, M. A. and KOUMOUTSAKOS, P. (2019). Automated identification
and deep classification of cut marks on bones and its paleoanthropological implications.
Journal of Computational Science,32, 36 – 43.
CAN, G., ODOBEZ, J.-M. and GATICA-PE REZ , D. (2018). How to tell ancient signs apart?
recognizing and visualizing maya glyphs with cnns. J. Comput. Cult. Herit.,11 (4), 20:1–
20:25.
CASPARI, G. (2018). Assessing looting from space: The destruction of early iron age buri-
als in northern xinjiang. Heritage,1(2), 320–327.
— (Forthcoming). Quantifying the funerary ritual activity of the late prehistoric southern
kanas region (xinjiang, china). Asian Perspectives.
—, BALZ, T., GAN G, L., WANG, X. and LIAO, M. (2014). Application of hough forests for
the detection of grave mounds in high-resolution satellite imagery.
—, PLETS, G., BALZ, T. and FU, B. (2017). Landscape archaeology in the chinese altai
mountains survey of the heiliutan basin. Archaeological Research in Asia,10 (Complete),
48–53.
21
—, SADYKOV, T., BLOCHIN, J., BUESS, M., NIEBERLE, M. and BALZ, T. (2019). Integrating
remote sensing and geophysics for exploring early nomadic funerary architecture in the
siberian valley of the kings. Sensors,19 (14).
CHEN, K. T. and HIEB ERT, F. T. (1995). The late prehistory of xinjiang in relation to its
neighbors. Journal of World Prehistory,9(2), 243–300.
CHETO UANI, A., DEBROUTELLE, T., TREUILLET, S., EXBR AYAT, M. and JESSET, S. (2018).
Classification of ceramic shards based on convolutional neural network. 2018 25th IEEE
International Conference on Image Processing (ICIP), pp. 1038–1042.
CLARKE, M. (2008). China’s war on terror in xinjiang: Human security and the causes of
violent uighur separatism. Terrorism and Political Violence,20 (2), 271–301.
DAVIS -KIMBALL, J. (1991). Kazakh/american research project: 1990 field work report.
Middle East Studies Association Bulletin,25 (1), 33–35.
GASS, A. (2011). Early iron age burials in southeastern zhetysu: The geoarchaeological
evidence. Archaeology, Ethnology and Anthropology of Eurasia,39 (3), 57 – 69.
— (2016). Das Siebenstromland Zwischen Bronze- Und Fr¨uheisenzeit: Eine Regionalstudie.
Berlin; Boston; De Gruyter.
GRAHAM, S. (2018). Fleshing out the bones: Studying the human remains trade with
tensorflow and inception. Journal of Computer Applications in Archaeology,1(1), 55–63.
GUALANDI, M. L., SCOPIGNO, R., WOLF, L., RICHARDS, J., GARRIGOS, J. B. I., HEINZEL-
MANN, M., HERVAS, M. A., VILA, L. and ZALLOCCO, M. (2016). ArchAIDE - Archae-
ological Automatic Interpretation and Documentation of cEramics. In C. E. Catalano
and L. D. Luca (eds.), Eurographics Workshop on Graphics and Cultural Heritage, The Eu-
rographics Association.
GUYOT, A., HU BERT-MOY, L. and LORH O, T. (2018). Detecting neolithic burial mounds
from lidar-derived elevation data using a multi-scale approach and machine learning
techniques. Remote Sensing,10 (2).
22
HEIN, I., ROJ AS-DOMNGUEZ, A., ORN ELA S, M., D’ERCOLE, G. and PELOS CHE K, L.
(2018). Automated classification of archaeological ceramic materials by means of tex-
ture measures. Journal of Archaeological Science: Reports,21, 921 – 928.
HERMOZA, R. and SIPIRAN, I. (2017). 3d reconstruction of incomplete archaeological ob-
jects using a generative adversary network. CoRR,abs/1711.06363.
INSTIT UTE O F ARCHAEOLOGY OF THE XINJIANG ACAD EMY O F SOCI AL SCIENCES
(1985). Xinjiang xinyuan gongnaisi zhongyangchang shiguan mu (stone cist tomb of
zhongyangchang site, xinyuan, xinjiang). Kaogu yu Wenwu,2, 21–26.
INSTIT UTE O F ARCHAEOLOGY OF THE XINJIANG ACAD EMY O F SOCI AL SCIENCES (1986).
Xinjiang miquan dacaotan faxian shiduimu (the rock-fill tombs discovered at dacaotan,
miquan, xinjiang). Kaogu yu Wenwu,1, 36–38.
INSTIT UTE O F ARCHAEOLOGY OF THE XINJIANG ACAD EMY O F SOCI AL SCIENCES (1988).
Xinjiang xinyuan tiemulike gumu qun (tiemulike cemetery, xinyuan, xinjiang). Kaogu yu
Wenwu,8, 59–66.
ISHITSUKA, K., ISO , S., ONIS HI, K. and MATS UOK A, T. (2018). Object detection in
ground-penetrating radar images using a deep convolutional neural network and im-
age set preparation by migration.
KRAMER, I. C., HARE, J. S., PR¨
UGEL-BEN NET T, A. and SARGENT, I. (2017). Automated
detection of archaeology in the new forest using deep learning with remote sensor data.
LASAPONARA, R. and MASINI, N. (2012). Satellite remote sensing: a new tool for archaeology.
Springer.
LECUN, Y., HAFFN ER, P., BOT TOU, L. and BENGIO, Y. (1999). Object recognition with
gradient-based learning, Springer.
LI, Q., ZOU , Q., MA, D., WANG , Q. and WAN G, S. (2018). Dating ancient paintings of
mogao grottoes using deeply learnt visual codes. CoRR,abs/1810.09168.
NAGLER, A. (2009). Grosskurgane im Siebenstromland (Kazachstan), vol. 1. Jahresbericht
2008 des DAI. Arch¨
aologischer Anzeiger 2009.
23
—, Z, S., H, P. and M, N. (2010). S¨udkasachstan: Kurgane Asy Zaga, Kegen und Zoan Tobe,
Berlin, DAI, chap. Arch¨
aologische Forschungen in Kasachstan, Tadschikistan, Turk-
menistan und Usbekistan., pp. 49–54.
PALANIAPPAN, S. and ADHIKARI, R. (2017). Deep learning the indus script. ArXiv,
abs/1702.00523.
PARZINGER, H. (2011). Die fr ¨uhen V¨olker Eurasiens: vom Neolithikum zum Mittelalter. C.H.
Beck.
PASQUET, J., DEMESTICHA, S., SKARLATOS, D., MERAD, D. and DRAP, P. (2017). Am-
phora detection based on a gradient weighted error in a convolution neuronal network.
PHAM, M. and LE F `
EVRE, S. (2018). Buried object detection from b-scan ground penetrat-
ing radar data using faster-rcnn. CoRR,abs/1803.08414.
P’IAN KOV, I. V. (1994). The ethnic history of the sakas. Bulletin of the Asia Institute,8,
37–46.
ROBERTS, S. R. (2018). The biopolitics of chinas war on terror and the exclusion of the
uyghurs. Critical Asian Studies,50 (2), 232–258.
ROMAN-RANGEL, E. and MARCHAND-MAILLET, S. (2016). Indexing mayan hieroglyphs
with neural codes. In 2016 23rd International Conference on Pattern Recognition (ICPR), pp.
253–258.
RUTLEDGE, J., YUAN, W., WU, J. Y., FREE D, S., LEWIS, A., WOOD, Z. J., GAMBIN, T.
and CLARK, C. M. (2018). Intelligent shipwreck search using autonomous underwater
vehicles. 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1–8.
SAMASHEV, Z. (2007). Im Zeichen des goldenen Greifen - K¨onigsgr¨aber der Skythen, Berlin,
Staatliche Museen zu Berlin, chap. Die F ¨
urstengr—”aber des Siebenstromlandes, pp.
162–170.
SCHOELZ, A. (2018). An embarrassment of riches: Data integration in vr pompeii.
24
TRAVASSOS, X. L., AVIL A, S. L. and IDA , N. (2018). Artificial neural networks and ma-
chine learning techniques applied to ground penetrating radar: A review. Applied Com-
puting and Informatics.
TRIER,I. D., COWLEY, D. C. and WAL DEL AND , A. U. (2019). Using deep neural networks
on airborne laser scanning data: Results from a case study of semi-automatic mapping
of archaeological topography on arran, scotland. Archaeological Prospection,26 (2), 165–
175.
—, LARSEN , S. Y. and SOLBERG, R. (2009). Automatic detection of circular structures in
high-resolution satellite images of agricultural land. Archaeological Prospection,16 (1),
1–15.
— and PIL, L. H. (2012). Automatic detection of pit structures in airborne laser scanning
data. Archaeological Prospection,19 (2), 103–121.
TYUKIN, I., SOFEIK OV, K., LEVESLEY, J., GORBAN, A. N., ALLISON, P. and COOPER, N. J.
(2018). Exploring automated pottery identification [arch-i-scan].
VAN GEE L, B., BOKOVENKO, N., BUROVA, N., CHUGUNOV, K., DERGACHEV, V., DIRK-
SEN, V., KULKOVA, M., NAGLER, A., PARZINGER, H., VAN DER PLICHT, J., VASILIEV,
S. and ZAITSEVA, G. (2004). Climate change and the expansion of the scythian culture
after 850 bc: a hypothesis. Journal of Archaeological Science,31 (12), 1735–1742.
WANG, B. (1985). Gudai xinjiang sairen lishi gouchen (a preliminary research of the his-
tory of the ancient saka in xinjiang). Xinjiang Shehui Kexue,1, 48–58, 64.
WANG, H., HE, Z., HU ANG , Y., CHEN, D. and ZHOU, Z. (2017). Bodhisattva head images
modeling style recognition of dazu rock carvings based on deep convolutional network.
Journal of Cultural Heritage,27, 60 – 71.
25