Access to this full-text is provided by Springer Nature.
Content available from Scientific Reports
This content is subject to copyright. Terms and conditions apply.
1
Vol.:(0123456789)
Scientic Reports | (2023) 13:11257 | https://doi.org/10.1038/s41598-023-38190-x
www.nature.com/scientificreports
Curriculum learning‑based strategy
for low‑density archaeological
mound detection from historical
maps in India and Pakistan
Iban Berganzo‑Besga
1, Hector A. Orengo
1,2*, Felipe Lumbreras
3, Aftab Alam
4,
Rosie Campbell
5, Petrus J. Gerrits
5, Jonas Gregorio de Souza
6, Afa Khan
5,
María Suárez‑Moreno
5, Jack Tomaney
5, Rebecca C. Roberts
5 & Cameron A. Petrie
5,7
This paper presents two algorithms for the large‑scale automatic detection and instance segmentation
of potential archaeological mounds on historical maps. Historical maps present a unique source of
information for the reconstruction of ancient landscapes. The last 100 years have seen unprecedented
landscape modications with the introduction and large‑scale implementation of mechanised
agriculture, channel‑based irrigation schemes, and urban expansion to name but a few. Historical
maps oer a window onto disappearing landscapes where many historical and archaeological
elements that no longer exist today are depicted. The algorithms focus on the detection and shape
extraction of mound features with high probability of being archaeological settlements, mounds being
one of the most commonly documented archaeological features to be found in the Survey of India
historical map series, although not necessarily recognised as such at the time of surveying. Mound
features with high archaeological potential are most commonly depicted through hachures or contour‑
equivalent form‑lines, therefore, an algorithm has been designed to detect each of those features.
Our proposed approach addresses two of the most common issues in archaeological automated
survey, the low‑density of archaeological features to be detected, and the small amount of training
data available. It has been applied to all types of maps available of the historic 1″ to 1‑mile series,
thus increasing the complexity of the detection. Moreover, the inclusion of synthetic data, along with
a Curriculum Learning strategy, has allowed the algorithm to better understand what the mound
features look like. Likewise, a series of lters based on topographic setting, form, and size have been
applied to improve the accuracy of the models. The resulting algorithms have a recall value of 52.61%
and a precision of 82.31% for the hachure mounds, and a recall value of 70.80% and a precision of
70.29% for the form‑line mounds, which allowed the detection of nearly 6000 mound features over
an area of 470,500 km2, the largest such approach to have ever been applied. If we restrict our focus
to the maps most similar to those used in the algorithm training, we reach recall values greater than
60% and precision values greater than 90%. This approach has shown the potential to implement an
adaptive algorithm that allows, after a small amount of retraining with data detected from a new
map, a better general mound feature detection in the same map.
e past 100years and, in particular, the second half of the twentieth century, have seen extensive urban growth
and the large-scale implementation of mechanised agriculture and irrigated systems in India and Pakistan, caus-
ing irreversible eects on the landscape. Among other lasting impacts, such as the implementation of large-scale
OPEN
1Landscape Archaeology Research Group (GIAP), Catalan Institute of Classical Archaeology (ICAC), Pl. Rovellat
s/n, 43003 Tarragona, Spain. 2Catalan Institution for Research and Advanced Studies (ICREA), Passeig Lluís
Companys 23, 08010 Barcelona, Spain. 3Computer Science Department, Computer Vision Center, Universitat
Autònoma de Barcelona, Edici O, Campus UAB, 08193 Bellaterra, Spain. 4Banaras Hindu University, Ajagara,
Varanasi, Uttar Pradesh 221005, India. 5McDonald Institute for Archaeological Research, University of Cambridge,
Downing St., Cambridge CB2 3ER, UK. 6Complexity and Socio-Ecological Dynamics (CaSEs) Research Group,
Universitat Pompeu Fabra, Barcelona, Spain. 7Department of Archaeology, University of Cambridge, Downing St.,
Cambridge CB2 3DZ, UK. *email: horengo@icac.cat
Content courtesy of Springer Nature, terms of use apply. Rights reserved
2
Vol:.(1234567890)
Scientic Reports | (2023) 13:11257 | https://doi.org/10.1038/s41598-023-38190-x
www.nature.com/scientificreports/
irrigation systems, river avulsion and ooding, there have been much systematic attening, for cultivation and
construction, of hundreds, if not thousands, of archaeological settlement mounds1–3. ese archaeological
mounds with their distinct elevation, colour and form are an indicative feature of past settlements and anthro-
pogenic modications of the landscape. Given their partial or total destruction, these are no longer detectable
by other types of sources such as LIDAR or satellite imagery4,5. Historical maps are therefore oen the only
source of information about the location and size of those lost sites. Available satellite images of the Indian
subcontinent date back to 1972 thanks to the Landsat satellite programme6, but detailed mapping of this region
through triangulation dates back to 1802 and the start of the Great Trigonometrical Survey. Later, during the
period ofBritish rule in India and Pakistan (1858–1947), the Survey of India (SoI) continued the systematic
mapping of the whole subcontinent.
e SoI maps were originally intended to be geographic maps and depicted dierent topographic features
including mound features, many of which, as further research has shown3, are in fact archaeological sites (Fig.1).
It is impossible to calculate the percentage of mounded sites that were not drawn in the SoI maps, given the
disappearance of sites during the last 100years and the lack of reliable large-scale archaeologicalsurvey data.
However, all sites listed as being protected at the time the map surveys took place are indicated on the historic
maps, including sites like Harappa and Taxila7. Also, many major sites that were documented on the map sheets
were not ‘discovered’ by archaeologists for many years if not decades, including the major Indus Civilisation city
sites of Mohenjo-daro, Rakhigarhi and Dholavira. Furthermore, ground truthing has revealed there is a correla-
tion between these mound features and proto-historical and historical sites dating to various periods from the
period of the Indus Civilization onward3.
Deep Learning (DL) has been widely used in recent years to aid archaeological survey by using dierent
resources such as lidar data4–6,8 and drone imagery9. is study continues the work carried out by several authors
for the detection of archaeological sites using historical maps1–3. Previous studies made by Garcia-Molsosa
etal.focused on the present district of Multan in the Pakistani province of Punjab. e series of maps used in
this study had similar production standards10. Although this previous approach produced satisfactory results it
presented some drawbacks:
1. It employed a reduced series of maps of similar chronologies, depiction standards, scanning quality and
preservation. is ideal situation, however, proved not to be the norm when a much larger collection of maps
was assembled. e larger collection presented important variations in coloration, representation standards,
scanning quality and preservation, which enormously complicated the large-scale application of these initial
detectors10 and signicantly reduced their detection capabilities.
2. e initial algorithms were designed in a proprietary web-based geospatial machine learning (ML) platform.
e models were not available for download, analysis or free distribution and the processing was expensive,
prohibitively so when considering large areas such as the one under investigation.
e study presented in this paper uses the historical maps produced in the late nineteenth and early twentieth
century by the SoI with the aim of detecting two of the most common ways of drawing mound features (hachure
and form-line, see Section "Deep learning model" for further details), which are similar to those depicted by
the French in Syria and Lebanon10 (Fig.2). Our research seeks to develop two DL segmentation algorithms for
mound feature detection, one for each mound type, extending the detection to an area of 470,500 km2 (most of
which corresponds to the Indus River Basin), the largest area in which such an approach has ever been applied4,
and to all types of maps, thus increasing the complexity of the analysis. We have employed a Region-based
Figure1. Archaeological remains found where the historical maps indicated mounds. View from an elevated
mound feature in northwest India (L742). Image from Green etal. 3, Fig.2. Reproduced here under the terms of
the CC-BY 4.0 license in which it was originally published.
Content courtesy of Springer Nature, terms of use apply. Rights reserved
3
Vol.:(0123456789)
Scientic Reports | (2023) 13:11257 | https://doi.org/10.1038/s41598-023-38190-x
www.nature.com/scientificreports/
Convolutional Neural Network (R-CNN) segmentation algorithm as it collects information about not only the
location of the mound feature, but also about its shape and extent.
Automated detection processes require large amounts of data for their training (typically in the order of tens of
thousands of individual examples), but this is not common in archaeology where the number of known archaeo-
logical samples to train a ML algorithm is very low, as in this case study. Other studies with similar elements
such as burial mounds4, have shown that despite having limited training data, features of interest are detectable
due to the characteristic circular shape of the tumuli, which presented few variations. e archaeological ele-
ments of this study, despite being mound features like those of previous studies where we encountered a similar
problem, are much more diverse. Since they are symbols drawn by human hands and not images of their actual
form, whether aerial or satellite, the features are noticeably divergent in style from each other. Consequently, a
relatively small amount of training data was not enough to achieve meaningful results.
In computational archaeology, trained ML models have been shown to perform worse in areas with low-
density of archaeological features than in high-density ones (e.g.11,12). When performing large-scale detection
with few sites, many False Positives (FPs) are introduced (typically many more than the True Positives (TPs)),
which severely reduces the accuracy of the algorithm. However, real archaeological scenarios typically pre-
sent low-densities of archaeological sites that need to be detected, at least compared to other typical objects
in Computer Vision studies (such as cars, trees, buildings, ships, etc.). During a survey, the actual density of
archaeological features is unknown, so to be a useful tool, the developed ML algorithm must also provide good
results for low-density areas.
erefore, the use of ML approaches in archaeology entails a series of idiosyncratic challenges: including
the customary small amount of archaeological data for training and the usual low-density of archaeological
features. In this article we will implement a series of data augmentation (DA) techniques and learning strategies
to resolve these two issues.
e main goal of this article, besides the successful detection of mound features within acceptable param-
eters of precision and recall, will address these two issues by designing a workow for the correct detection of
archaeological features (1) in low-density areas and (2) with little amount of training data.
Materials and methods
In this study, a total of 645 maps, provided by the Cambridge University Library and the British Library have
been used. ese historical maps were produced and distributed by the SoI, and can be classied into dierent
periods characterized by the then current surveyor general of the SoI, including C. Straham (1898–1899), G.C.
Gore (1900–1902), F.B. Longe (1904–1907), and S.G. Burrard (1912–1913). Maps produced under A.R. Quraishi
(1954) in his role of Surveyor General of the survey of Pakistan have also been included.
Map digitisation and georeferencing. Before proceeding with the training of the DL algorithm, all 645
maps used for this study had to be scanned and georeferenced (Fig.3). e scanning process was done by dif-
ferent institutions and individuals, in dierent periods and using dierent means and resolutions as a result of
the dierent histories, means, and the procedures of the dierent institutions hosting and scanning them. Aer
the digitalisation of the maps, they were georeferenced using a minimum of 12 Ground Control Points (GCPs)
and an average of 25, geometrically distributed within the map to achieve a good distribution and an accurate
transformation. e GCPs were obtained from georeferenced high resolution RGB satellite imagery available
as Web Map Services layers in QGIS soware (several versions were employed)13. e georeferencing process
mainly used second order polynomials, which was the preferred method, and was applied to most maps. On
Figure2. e two types of mound features depictions that need to be detected in historical maps: (a) hachure
[8r], and (b) form-line [16r].
Content courtesy of Springer Nature, terms of use apply. Rights reserved
4
Vol:.(1234567890)
Scientic Reports | (2023) 13:11257 | https://doi.org/10.1038/s41598-023-38190-x
www.nature.com/scientificreports/
few occasions, when the maps had suered lineal distortions due to folds in the map surface, the adjust trans-
formation was used. ese methods produced average Root Mean Square Error (RMSE) values of 0.00035° (ca.
33.7–38.8m at this latitude) using a second order polynomial and 0.00010236° (ca. 10.3m with a maximum
value of ca. 26.8m) using the adjust transformation. Since the mounds under consideration are typically much
larger than these values, the georeferencing process results in mound feature locations, which, largely overlap the
real locations (for more details on the georeferencing process see1).
Deep learning model. In recent years, R-CNN models have become very common in archaeological
survey, highlighting segmentation algorithms such as mask R-CNN9 and DeepLabV3+14. For this study, we
developed two mound symbol detection DL algorithms using mask R-CNN15, since we are looking for instance
segmentation rather than semantic. Mask R-CNN detects objects in an image while simultaneously generating
a high-quality segmentation mask for each instance16. It extends Faster R-CNN17,18 by adding a branch for pre-
dicting segmentation masks, a small fully convolutional network (FCN)19, on each region of interest (RoI), in
parallel with the existing branch for classication and bounding box regression. Mask R-CNN is simple to train
and adds only a small overhead to Faster R-CNN16. Likewise, VGG Image Annotator (VIA) from the University
of Oxford has been used to label mound features20.
e digitized and georeferenced historical maps are 3-channel RGB images and we have cropped them into
512 × 512 pixel images to save computing costs. Of the 645 maps used, only 43 contained known mound features,
which have been used for training and validation: 286 hachure and 103 form-line mound features. Of those maps,
22 were used for training, including 168 hachure and 26 form-line mound features, and 21 were used for valida-
tion, including 118 hachure and 77 form-line mound features. In addition, given the small number of known
mound features, another 21 maps, chosen randomly from the 645 original maps, were manually analysed. In
this way, we have been able to create another dataset, the test dataset with 230 hachure and 137 form-line mound
features, to evaluate the model obtained from training and validation for a second time.
SoI map styles, colours and symbology depended on the date the maps were produced, the team drawing
them, the region and the print quality of the map1. Each map type also corresponds to a drawing style and,
therefore, to a dierent mound colour, despite corresponding to the same type of mound feature. ere are
three typologies by which mound features are represented in the SoI maps, of which the most common ones
are the hachure and the form-line mound feature. e hachure is depicted with many fragmented lines which
show the orientation of the slope, whereas the form-line mound features are drawn to represent one elevation
(Fig.2). e third type of mound feature representation on SoI historical maps is shaded-relief. Although these
are also present on the maps under study, they are not included in the automated detection given the low cor-
respondence of this type of mound feature with archaeological sites, where 86.36% of the examples visited on
the ground were found not to be archaeological sites3. We have focused the form-line algorithm on detecting
only its most common typology, as opposed to the hachure algorithm which detects all types of hachure depic-
tion. is is due to the fact that other form-line mound feature types (mound feature with concentric lines, with
continuous line and black ones) do not have their characteristic shape and they are similar to other typologies
that have no relation to archaeological features, such as road and slope lines (Fig.4). Likewise, cropped mound
features by the process of clipping maps to 512 × 512 pixels have not been detected because there are form-line
and hachure-shaped features that are not closed in a circle and are not mound features.
ML algorithms like Mask R-CNN typically evaluate their models on images that contain labelled objects
and do not evaluate those without labels. Since our goal is to demonstrate the good performance of the model
in low-density areas, we have created articial mound labels on all those images without real mounds to force
the analysis in them. is way, the algorithm also evaluates the presence or absence of mounds in areas of the
map where we know there are no mound features to better assess its precision. e 4 × 4 pixel articial mound
features are placed in the upper le corner of the images and will never be detected, as our algorithm discards
any detection at the edges of the images (10 pixels from the edge) to avoid FPs derived from cropped symbols.
ese articial mound features will never be detected, but these areas will be analysed, allowing our model to
analyse both high-density and low-density areas.
If our study had focused on areas with a high-density of mound features, our method and research could
have ended here since we obtained good results aer the rst training for both hachure and form-line mound
features. However, the majority of archaeological surveys are conducted in areas with a low-density of sites, or
in places where the density of archaeological features is undetermined. erefore, if we have looked at the reality
of archaeological research and analyse the results of the rst training for low-density areas, we observe that it is
necessary to rene the model given the high number of FPs present in the results.
Figure3. Scheme of the workow for the detection of mounds in historical maps.
Content courtesy of Springer Nature, terms of use apply. Rights reserved
5
Vol.:(0123456789)
Scientic Reports | (2023) 13:11257 | https://doi.org/10.1038/s41598-023-38190-x
www.nature.com/scientificreports/
Model renement. e high number of FPs present in the rst training was due to the limited number of
training data available. erefore, with the idea of introducing new training data, both positive and negative,
various DA techniques have been applied. e rst DA methods developed were mound feature random transla-
tion (DA1), random rotation (DA2),and the so-called Doppelgänger technique (DA3).
For each type of mound feature and algorithm, 1500 new articial mound features were used, created ran-
domly from the original ones used for training, and they were placed, by an automated process, randomly on all
the maps used in training, implementing both DA1 and DA2. When pasting these articial mound features at
random on each of the training maps, they were emptied of any other feature than the actual mound depiction
as they contained various symbols unrelated to the mound feature itself, thus avoiding possible FPs derived from
the presence of these symbols, but also because the training maps had dierent background colours and the
inclusion of these features would have created articial colour-related features (Fig.5).
In order to avoid FPs due to common symbols on the maps such as roads, grass and trees where these new
articial mound features could have been placed randomly, DA3 was developed to copy the inside of each mound
feature and to paste it to the outside of the mound feature so that it can be taken as negative training and just
the mound feature as positive data (Fig.6). In this study it has been decided not to implement other possible
DA techniques such as resizing, because mound features of dierent sizes are drawn dierently than the resized
mound feature itself. e hachure and form-line shapes are dierent for each size, increasing or decreasing the
number of strokes drawn. erefore, noise would be introduced into the algorithm. e entire DA process has
been done using our own script written in Python (see Data availability Section for further details).
Aer increasing the positive and negative training, the number of FPs detected was considerably reduced,
but a series of specic FPs was still maintained. In order to further reduce these, a renement stage (DA4) was
Figure4. Dierent form-line typologies found in historical SoI maps: (a) dashed [23r] and solid line [34r]
mound feature, (b) mound feature with concentric lines [30r], and (c) road-like black line mound feature [16r].
Figure5. Some examples of hachure mound features containing dierent symbols inside, as well as two types
of map background colour (a and b).
Content courtesy of Springer Nature, terms of use apply. Rights reserved
6
Vol:.(1234567890)
Scientic Reports | (2023) 13:11257 | https://doi.org/10.1038/s41598-023-38190-x
www.nature.com/scientificreports/
included (Fig.7). In both cases the same correct mound features were used as continuous line circles for negative
training data, so the algorithm could decide that continuous lines are not mound features. e total number of
elements used as renement is 88 for the form-line and 127 for the hachure ones, which have been placed using
the DA1 and DA2 techniques up to a total of 8800 for the form-line algorithm and 12,700 for the hachure model.
Curriculum learning approach. anks to these DA methods we managed to reduce the number of FPs
considerably, increasing the precision of the model. However, we stopped detecting some of the mound features
that were initially detected, which also reduced the recall value. For this reason and with the aim of improving
the accuracy metrics, it was decided to implement a Curriculum Learning (CL) strategy with synthetic data
(DA5) (Fig.8).
Firstly, CL is a way to gradually introduce complexity to the model through more training phases21. Secondly,
the lack of data forced us to create synthetic data for each mound feature class (DA5), which we have used to make
the algorithm learn through a CL strategy. In this way, the algorithm rst learns the basics from the synthetic data
and then more complex variations from the few known mound features in its second training, as a ne-tuning
stage (Fig.9). A total of 75 synthetic mound features were created for each of the two types.
Model ltering. Previous ground-truthing studies in India3, which included only a small number of well-
preserved archaeological mounds, showed that those mound features smaller than 200m in diameter were
mostly not archaeological sites, with hachure features adjacent to villages oen corresponding to ponds or upcast
from the creation of those ponds. Only 7.96% of the hachure and 25.83% of the form-line mound features of less
than 200m corresponded to archaeological sites3. Likewise, research on mound features in Pakistan showed that
many of the small mound features less than 100m in diameter were mostly dunes or modern spoil from pond
Figure6. First DA techniques used: (a) random translation (DA1), (b) random rotation (DA2), and (c) the
so-called Doppelgänger technique (DA3).
Figure7. Some FPs used as negative training data for renement (DA4): (a) hachure FPs and (b) form-line FPs.
Content courtesy of Springer Nature, terms of use apply. Rights reserved
7
Vol.:(0123456789)
Scientic Reports | (2023) 13:11257 | https://doi.org/10.1038/s41598-023-38190-x
www.nature.com/scientificreports/
excavation10. In contrast, 56.34% of the form-line and 40% of the hachure features greater than 200m in diam-
eter did correspond to sites3. For this reason, it has been decided to lter, throughout the study area, all those
mound features formed by areas of less than 500 pixels, a range of 60–150m in diameter depending on the pixel
resolution of each map, to avoid including mound features that are not likely to be archaeological sites (Filter1).
A second lter, using blob analysis, was applied to remove those elongated mound features which are not
commonly archaeological sites and are mostly dunes. e ellipsoidal shape of each detected mound feature has
been evaluated and all those that presented an elongation, a ratio between the largest and smallest diameter of
the ellipse, greater than 3.5 were eliminated (Filter2).
Finally, in the post-processing stage, given the similarity of the mound features with the characteristic eleva-
tion shape of mountainous areas, a script was applied using Google Earth Engine and QGIS to lter all those
mountainous regions (Filter3), areas with a slope greater than 5 degrees (of mean value within a 7 pixel radius,
equivalent in this area to 210m), and thus eliminate all mound features that, correctly identied by their drawn
shape, do not correspond to possible archaeological mounds (Fig.10).
Model evaluation. Once the algorithm was trained, new mound features were detected in the remaining
581 maps for which we possessed no information on the presence of mound features. Given the diversity of the
new maps compared to those used for training and validation (Fig.11), this evaluation was carried out dier-
entiating the maps based on their similarity with those used in training and validation following a probability
density function (Fig.12).
is detection can be replicated in Colab in order to facilitate its application by other users with the aim of
making this algorithm reproducible and replicable. e resulting shapele contains the masks of all detected
mound features for easy viewing in standard GIS soware such as QGIS.
Results
Below we present the results of the workow followed for the detection of mound features in SoI historical
maps. Both the initial (Tables1 and 2) and the nal results (Tables7 and 8) of the detection of hachure and
form-line mound features are presented, and only the intermediate results of the detection of hachure as an
example of the evolution of the process (Tables3, 4, 5, and 6), which was the same for both types of mound
feature representations.
Finally, the trained model was applied to maps covering an area of 470,500 km2 where a total of 2802 hachure
and 3145 form-line mound features have been detected (5947 mound features), and perfectly georeferenced by
Figure8. Hachure and form-line mound feature datasets for CL: (a) examples of synthetic hachure mound
features (DA5), (b) examples of original hachure mound features, (c) examples of synthetic form-line mound
features (DA5), and (d) examples of original form-line mound features. e synthetic data (a and c) is for the
rst training of each of the two algorithms and the original data (b and d) for the second training also for both
algorithms.
Figure9. CL process scheme where stages with more complex aspects of the mound features are gradually
included: rst the synthetic dataset with DA and second the original with DA.
Content courtesy of Springer Nature, terms of use apply. Rights reserved
8
Vol:.(1234567890)
Scientic Reports | (2023) 13:11257 | https://doi.org/10.1038/s41598-023-38190-x
www.nature.com/scientificreports/
Figure10. Hachure mound-shaped mountain peaks on (a) historical map and (b) its satellite image.
Figure11. Similarity based on the RGB values of their backgrounds compared to the training and validation
maps: (a) sample map used for training, (b) sample map used for test for a standard deviation of 0.5, and (c)
sample map used for test for a standard deviation of 3.
Figure12. Percentage of maps in which new mounds are detected (blue) relative to the probability density
of the maps used both in training and in validation (brown), their similarity based on the RGB values of their
backgrounds.
Content courtesy of Springer Nature, terms of use apply. Rights reserved
9
Vol.:(0123456789)
Scientic Reports | (2023) 13:11257 | https://doi.org/10.1038/s41598-023-38190-x
www.nature.com/scientificreports/
Table 1. Evaluation of the mask R-CNN model in high and low-density validation datasets, average mound
features per image, before the entire DA workow for the detection of hachure mound features.
Algorithm Density (%) TPs FNs FPs Recal l (%) Precision (%) F1 (%)
High-density 128.26 87 21 26 80.56 76.99 78.73
Low-density 2.67 87 21 737 80.56 10.56 18.67
Table 2. Evaluation of the mask R-CNN model in high and low-density validation datasets, average mound
features per image, before the entire DA workow for the detection of form-line mound features.
Algorithm Density (%) TPs FNs FPs Recal l (%) Precision (%) F1 (%)
High-density 95.77 45 22 20 67.16 69.23 68.18
Low-density 1.47 45 22 1366 67.16 3.19 6.09
Table 3. Evaluation of the mask R-CNN models in low-density validation dataset using dierent DA
techniques for the detection of hachure mound features: random translation (DA1), random rotation (DA2)
and the so-called Doppelgänger technique (DA3).
Algorithm TPs FNs FPs Recall (%) Precision (%) F1 (%)
None 87 21 737 80.56 10.56 18.67
DA1 71 39 37 64.55 65.74 65.14
DA1 + DA2 68 45 53 60.18 56.20 58.12
DA1 + DA2 + DA3 68 44 31 60.71 68.69 64.45
Table 4. Evaluation of the Mask R-CNN models in low-density validation dataset using a renement step
(DA4) for the detection of hachure mound features.
Algorithm TPs FNs FPs Recall (%) Precision (%) F1 (%)
DA1 + DA2 + DA3 68 44 31 60.71 68.69 64.45
DA1 + DA2 + DA3 + DA4 70 43 19 61.95 78.65 69.31
Table 5. Evaluation of the Mask R-CNN models in low-density validation dataset using CL-based approach
with synthetic data (DA5) for the detection of hachure mound features.
Algorithm TPs FNs FPs Recall (%) Precision (%) F1 (%)
DA1 + DA2 + DA3 + DA4 70 43 19 61.95 78.65 69.31
DA1 + DA2 + DA3 + DA4 + DA5 77 38 11 66.96 87.50 75.86
Table 6. Evaluation of area (Filter1), blob (Filter2) and slope (Filter3) lters in low-density validation dataset
for the detection of hachure mound features.
Algorithm TPs FNs FPs Recall (%) Precision (%) F1 (%)
None 87 225 15 27.88 85.29 42.03
Filter1 78 43 13 64.46 85.71 73.58
Filter1 + Filter2 77 38 11 66.96 87.50 75.86
Filter1 + Filter2 + Filter3 77 38 10 66.96 88.51 76.24
Content courtesy of Springer Nature, terms of use apply. Rights reserved
10
Vol:.(1234567890)
Scientic Reports | (2023) 13:11257 | https://doi.org/10.1038/s41598-023-38190-x
www.nature.com/scientificreports/
our algorithm (Figs.13 and 14). A manual evaluation of a series of maps of this area was performed, the afore-
mentioned test dataset (Tables9 and 10).
Discussion
Low‑density approach. In archaeology, it is common to nd unsatisfactory results masked by the dier-
ence in the density of archaeological features. e density of the features must be taken into account11,12 since
good results in high-density areas may actually be hiding much worse results in low-density areas. e rst
results showed a number of FPs of up to twenty times more than the mound features present in the area (Tables1
and 2). is algorithm would be useless in a large-scale survey, as it would generate a large number of FPs and an
overly large dataset, which would not be of use in the planning of eld validation or for archaeological analysis.
ese results strongly show that archaeological studies should focus their validation on low-density areas in
order to avoid biased results.
During an archaeological survey, the true density of archaeological features is unknown, so algorithms must
be developed to show good metrics in areas of both high and low-density of sites. Contrary to recently pub-
lished discussions12, poor results in low-density areas due to the sparse presence of archaeological features and
class imbalance are not inevitable, but these are the product of insucient model training. e foreground-to-
background imbalance as an example of class imbalance22, is not the reason for poor results in the detection
stage. e imbalance problem from each category for object detection in the training pipeline23, occurs when
one class heavily outnumbers the examples in the other class in the training data24, not in the validation and
test datasets. Variation in results due to the dierent density of archaeological features (Tables1 and 2) can be
resolved by dierent DA and CL approaches (Tables7 and 8).
Model renement and curriculum learning approach. e DA, with the introduction of 1500 new
mound features, signicantly improves the precision by increasing the training data, both positive and negative.
Both DA1 and DA2 show similar results that, despite the slight reduction in recall we have achieved a substantial
improvement in precision (Table3). anks to its negative training, the introduction of DA3 improves the preci-
sion of the model, which uses the DA4 to improve its accuracy.
e initial training data was not sucient and resulted in a large number of FPs indicating that the model had
not learned well what a mound feature looks like. e increase of the training data removed a large number of
FPs, but to eliminate more specic FPs it was necessary to resort to DA3 and DA4 (Table8). As shown in Fig.7,
most of the FPs used in renement were pointed circular and non-circular shapes for the hachure algorithm,
and both continuous and dashed circular shapes for the form-line model.
Likewise, as can be seen in Fig.15, the use of DA5 has allowed the detection of hachure shapes not included
in the original training data. e inclusion of synthetic data, along with the CL strategy, has allowed the algo-
rithm to better understand what the mound features look like. e CL using synthetic data helped to develop
an algorithm from a small training dataset, which is common in archaeology. As seen in Table5, both the recall
value and the precision value improved noticeably.
Model ltering. Smaller objects, such as mound features less than 500 pixels in area, are the most dicult
for a CNN to detect, because such objects do not have enough pixels for the necessary feature extraction. at is
why the recall value is so low without Filter1 but high enough when we apply it (Table6). Both Filter2 and Fil-
ter3 remove many FPs, which results in an increase of the precision of the model, with fewer, but higher quality
results that are more likely to be of archaeological interest.
In future work, the idea of developing new lters could be contemplated for the elimination of mound fea-
tures correctly detected but not correctly classied in their type. Some hachure mound features, in addition to
being detected by the hachure algorithm, have been detected by the form-line mound features algorithm. What
has been detected is not the complete mound feature but only its interior, which on many occasions resembles a
form-line mound feature. ese misclassied mound features could easily be removed with a lter that discards
Table 7. Evaluation of the mask R-CNN model in high and low-density validation datasets, average mound
features per image, aer the entire DA workow for the detection of hachure mound features.
Algorithm Density (%) TPs FNs FPs Recal l (%) Precision (%) F1 (%)
High-density 128.26 77 38 3 66.96 96.25 78.97
Low-density 2.67 77 38 10 66.96 88.51 76.24
Table 8. Evaluation of the mask R-CNN model in high and low-density validation datasets, average mound
features per image, aer the entire DA workow for the detection of form-line mound features.
Algorithm Density (%) TPs FNs FPs Recal l (%) Precision (%) F1 (%)
High-density 95.77 48 20 0 70.59 100 82.76
Low-density 1.47 48 20 4 70.59 92.31 80.00
Content courtesy of Springer Nature, terms of use apply. Rights reserved
11
Vol.:(0123456789)
Scientic Reports | (2023) 13:11257 | https://doi.org/10.1038/s41598-023-38190-x
www.nature.com/scientificreports/
the smallest duplicate detected mound feature. is can also happen with the hachure and the shaded-relief
mound features. Some shaded-relief examples, as the last image of Fig.15, resembles a hachure mound feature.
Applying the same lter mentioned above would also resolve these double detections, as well as reduce the FPs
for shaded-like dunes.
Model evaluation. Only 40.03% of the maps with unknown mound features, the ones used for testing, are
similar to 63.64% of the maps used for training and validation (Fig.12), so most are substantially dierent. is
diversity as well as its resulting metrics (Tables9 and 10) indicate the need for an adaptive algorithm that allows,
aer a small amount of retraining with data detected from a new map, a better general mound feature detection
in the same map. e more similar the maps are to those used in training and validation, the more similar the test
metrics are to the validation ones. An adaptive algorithm would improve both recall value by including dierent
ways of drawing the mound features, only some of which have been detected thanks to the synthetic data, and
precision value by including backgrounds not taken into account in the original training.
Figure13. Detection of mound features [21r] in an area where urban and agricultural development have
made those mapped mound features disappear: (a) satellite image of the area, (b) historical map of the area, (c)
detection of form-line mound features (blue) on the historical map, and (d) location of the detected potential
site mound features (blue) in the satellite image.
Content courtesy of Springer Nature, terms of use apply. Rights reserved
12
Vol:.(1234567890)
Scientic Reports | (2023) 13:11257 | https://doi.org/10.1038/s41598-023-38190-x
www.nature.com/scientificreports/
Likewise, new DA methods could be included in the training, such as random brightness jittering and random
Blur/Sharpen25. Some test maps, unlike those used in training and validation, have shown darker and blurred
images (Fig.16).
Comparison to manual digitisation of mound features. e VIAannotation soware was used to
hand digitise 756 mound features in JSON format, which were digitised using 64 random historical maps. e
density of mound features is not distributed uniformly throughout each map. Instead, mound features frequently
cluster together, indicating a high number of mound features on certain maps and a low number on others. is
type of pattern increases the amount of labour and time necessary for manual mound feature digitising using
GIS soware. We predicted that manually digitising all mound features from the 645 historical maps used in this
research region would take an experienced professional more than 120 work hours based on the manually dig-
itised mound features prepared as training data for the algorithm. e detection time, running each algorithm
on a single NVIDIA A40 GPU, has been more than 6 computing hours. While 120h does not seem too long for
this project, creating a ML-based algorithm paves the way to scale this research to the additional 2200 historical
maps covering other parts of Pakistan and India that have been scanned and are ready for analysis.
Figure14. Distribution of detected mound features in the Indus River Basin: (a) hachure and (b) form-line
mound features. Figure created by the rst author using QGIS 3.28.4 13 and a WMS-connected Google Earth
satellite imagery layer as a background.
Table 9. Evaluation of the mask R-CNN model in low-density test dataset based on its maps RGB similarity
relative to training and validation ones for the detection of hachure mound features.
Similarity TPs FNs FPs Recall (%) Precision (%) F1 (%)
|0.5σ| 92 61 9 60.13 91.09 72.44
|1σ| 111 89 14 55.50 88.80 68.31
|2σ| 116 104 19 52.73 85.93 65.35
|3σ| 121 109 26 52.61 82.31 64.19
Table 10. Evaluation of the Mask R-CNN model in low-density test dataset based on its maps RGB similarity
relative to training and validation ones for the detection of form-line mound features. *Four of the detected
mound features were drawn in another way than the one used for training, the continuous form-line. For this
reason, they have not been taken into account either as TP or as FP.
Similarity TPs FNs FPs Recall (%) Precision (%) F1 (%)
|0.5σ| 15 1 1 93.75 93.75 93.75
|1σ| 25 6 5* 80.65 83.33 81.97
|2σ| 97 40 27 70.80 78.23 74.33
|3σ| 97 40 41 70.80 70.29 70.55
Content courtesy of Springer Nature, terms of use apply. Rights reserved
13
Vol.:(0123456789)
Scientic Reports | (2023) 13:11257 | https://doi.org/10.1038/s41598-023-38190-x
www.nature.com/scientificreports/
Conclusions
A workow has been designed with dierent techniques and strategies that has allowed not only the detection
of nearly 6000 mound features in India and Pakistan, which will allow for a better understanding of the settle-
ment distributions related to the Indus Civilization and later cultural periods, but has also provided solutions to
common problems in archaeology such as the low-density of archaeological features in large-scale surveys and
the few training data for ML models.
Historical maps constitute one of the basic sources available to both historians and archaeologists. e study
area analysed in this paper present an excellent case. Much of the information provided by the maps cannot be
obtained using other survey methods as the area has been systematically modied during the last century. is
is also the case of many other areas where systematic landscape modications have been implemented and for
which historical map series exist26. ese are housed in many archives and some series cover very large national
and colonial territories using very similar symbols and conventions. is study opens the door for the large-scale
automated extraction of relevant information from historical maps and, in doing so, provides a workow and
open code that has the potential to immensely contribute to the historical sciences.
As with other large-scale site detection methods4, these DL algorithms will allow researchers to carry out
studies that could not be done before given the new amount of data obtained, facilitating the task of the archae-
ologist. Furthermore, this model could be applied in other regions that have historical maps such as Syria and
Lebanon9, but particularly those areas that were also mapped by or followed the model established by the SoI. e
outputs of this study represent a powerful tool in the large-scale documentation and monitoring of archaeological
Figure15. Dierent types of hachure mound features detected aer applying the trained model. e last image
represents the third type of mound features on the maps, the shaded relief mound features, erroneously detected
as hachure but similar to them due to their characteristic pointed and circular shapes.
Content courtesy of Springer Nature, terms of use apply. Rights reserved
14
Vol:.(1234567890)
Scientic Reports | (2023) 13:11257 | https://doi.org/10.1038/s41598-023-38190-x
www.nature.com/scientificreports/
heritage, with much work ahead to validate the results through remote sensing, archival work, and ground survey
in collaboration with partners in India and Pakistan.
Data availability
e historical map datasets generated and/or analysed during the current study are scheduled to be made publicly
available via the British Library and Cambridge University Library digital data repositories. Until that occurs, they
are available from the corresponding author on reasonable request. e historical map mound feature dataset
generated and/or analysed during the current study are scheduled to be made publicly available via the Arches
instance hosted by the Mapping Archaeological Heritage in South Asia (MAHSA) project. Until that occurs,
they are available from the corresponding author on reasonable request. e supplementary code for the Data
Augmentation process can be found online at https:// github. com/ iberg anzo/ Archa eolDA.
Received: 18 February 2023; Accepted: 4 July 2023
References
1. Petrie, C. A. et al. Mapping archaeology while mapping an empire: Using historical maps to reconstruct ancient settlement land-
scapes in modern India and Pakistan. Geosciences 9, 11 (2019).
2. Garcia-Molsosa, A., Orengo, H. A., Conesa, F. C., Green, A. S. & Petrie, C. A. Remote sensing and historical morphodynamics of
alluvial plains. e 1909 indus ood and the city of Dera Ghazi Khan (Province of Punjab, Pakistan). Geosciences 9, 21 (2019).
3. Green, A. S. et al. Re-discovering ancient landscapes: Archaeological survey of mound features from historical maps in northwest
India and implications for investigating the large-scale distribution of cultural heritage sites in south asia. Remote Sens. 11, 2089
(2019).
4. Berganzo-Besga, I. et al. Hybrid MSRM-based deep learning and multitemporal sentinel 2-based machine learning algorithm
detects near 10k archaeological tumuli in north-western Iberia. Remote Sens. 13, 4181 (2021).
5. Berganzo-Besga, I., Orengo, H. A., Canela, J. & Belarte, M. C. Potential of multitemporal lidar for the detection of subtle archaeo-
logical features under perennial dense Forest. Land 11, 1964 (2022).
6. Landsat Science. Landsat1 https:// lands at. gsfc. nasa. gov/ satel lites/ lands at-1/ (2022).
7. Petrie, C.A., Abdul-Jabbar, J., Abhayan, G.S., Alam, A., Berganzo Besga, I., Campbell, R., Conesa, F., Green, A.S., Green, L.M.,
Garcia-Molsosa, A., Gerrits, P., Gregorio de Souza, J., Hameed, M., Khan, A.S., Madella, M., Orengo, H.A., Prabhakar, V.N., Rajesh,
S.V., Redhouse, D.I., Roberts, R., Samad, A., Singh, R.N., Singh, V.K., Suarez Moreno, M., Tomaney, J., & Vafadari, A. Hidden
in plain sight: e unrecognised contribution of the survey of India in the documentation of Indus civilisation settlements. Century
Celebration on Mohenjodaro (2022).
8. Davis, D. S., Gaspari, G., Lipo, C. P. & Sanger, M. C. Deep learning reveals extent of archaic Native American shell-ring building
practices. J. Archaeol. Sci. 132, 105433 (2021).
9. Orengo, H. A. et al. New developments in drone-based automated surface survey: Towards a functional and eective survey system.
Archaeol. Prospect. 28, 1–8 (2021).
10. Garcia-Molsosa, A. et al. Potential of deep learning segmentation for the extraction of archaeological features from historical map
series. Archaeol. Prospect. 28, 187–199 (2021).
11. Soroush, M., Mehrtash, A., Khazraee, E. & Ur, J. A. Deep learning in archaeological remote sensing: Automated Qanat detection
in Kurdistan region of Iraq. Remote Sens. 12, 500 (2020).
12. Verschoof van der Vaart, W., Bonhage, A., Schneider, A., Ouimet, W. & Raab, T. Automated large-scale mapping and analysis of
relict charcoal hearths in connecticut (USA) using a Deep Learning YOLOv4 framework. Archaeol. Prospect. 2022, 1–16 (2022).
13. QGIS Development Team. QGIS geographic information system. QGIS Association. http:// www. qgis. org (2023).
14. Landauer, J., Hoppenstedt, B., Allgaier, J. Image segmentation to locate ancient maya architectures using deep learning. In Discover
the Mysteries of the Maya: Selected Contributions from the Machine Learning Challenge & e Discovery Challenge Workshop at
ECML PKDD 2021, (eds. Kocev, D., Simidjievski, N., Kostovska, A., Dimitrovski, I., Kokalj, Ž.) 7–12 (arXiv: Ithaca, NY, USA,
2022) arXiv: 2208. 03163.
Figure16. Map samples found in the test data with dierent characteristics than those used in training and
validation: (a) darker image background and (b) blurred image.
Content courtesy of Springer Nature, terms of use apply. Rights reserved
15
Vol.:(0123456789)
Scientic Reports | (2023) 13:11257 | https://doi.org/10.1038/s41598-023-38190-x
www.nature.com/scientificreports/
15. Waleed, A. Mask R-CNN for object detection and instance segmentation on Keras and TensorFlow. GitHub repository. https://
github. com/ matte rport/ Mask_ RCNN (2017).
16. He, K., Gkioxari, G., Dollár, P., Girshick, R. Mask r-cnn. In Proceedings of the IEEE International Conference on Computer Vision,
2961–2969 (2017).
17. Girshick, R. Fast r-cnn. In 2015 Proceedings of the IEEE International Conference on Computer Vision, 1440–1448 (2015).
18. Ren, S., He, K., Girshick, R. & Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE
Trans. Pattern Anal. Mach. Intell. 39, 1137–1149 (2016).
19. Long, J., Shelhamer, E., Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference
on Computer Vision and Pattern Recognition, 3431–3440 (2015).
20. Dutta, A; Zisserman, A. e VIA Annotation Soware for Images, Audio and Video. In Proceedings of the 27th ACM International
Conference on Multimedia (MM ’19), Nice, France. ACM, New York, NY, USA, 4 (2019).
21. Soviany, P., Ionescu, R.T., Rota, P., Sebe, N. Curriculum learning: A survey. arXiv, arXiv: 2101. 10382 (2022).
22. Oksuz, K., Cam, B.C., Kalkan, S., Akbas, E. Imbalance problems in object detection: A review. arXiv, arXiv: 1909. 00169 (2022).
23. Luque, A., Carrasco, A., Martín, A. & de las Heras, A. e impact of class imbalance in classication performance metrics based
on the binary confusion matrix. Pattern Recognit. 91, 216–231 (2019).
24. Batista, G. E. A. P. A., Prati, R. C. & Monard, M. C. A study of the behavior of several methods for balancing machine learning
training data. ACM SIGKDD Explor. Newslett. 6(1), 20–29 (2004).
25. Berganzo-Besga, I., Orengo, H. A., Lumbreras, F., Aliende, P. & Ramsey, M. N. Automated detection and classication of multi-cell
Phytoliths using deep learning-based algorithms. J. Archaeol. Sci. 148, 105654 (2022).
26. Orengo, H. A., Krahtopoulou, A., Garcia-Molsosa, A., Palaiochoritis, K. & Stamati, A. Photogrammetric re-discovery of the hidden
long-term landscapes of western essaly, central Greece. J. Archaeol. Sci. 2015(64), 100–109 (2015).
Acknowledgements
e Mapping Archaeological Heritage in South Asia (MAHSA) project is funded by Arcadia, a charitable fund of
Lisbet Rausing and Peter Baldwin. is research was also partially supported by Grant PID2021-128945NB-I00,
awarded by MCIN/AEI/10.13039/501100011033, and by “ERDF A way of making Europe”. e authors acknowl-
edge the support of the Generalitat de Catalunya CERCA Program to CVC and ICAC.Finally, the authors would
like to thank Junaid Abdul Jabbar, Mou Sarmah, Ushni Dasgupta, Azadeh Vafadari, Kuili Suganya Chittiraibalan,
Arnau Garcia-Molsosa and Adam Green.
Author contributions
I.B.B. developed methods, executed research, wrote the initial dra of the paper, implemented corrections,
produced the gures; H.A.O. and F.L., planned research, developed methods, corrected initial dra; A.A., P.J.G.,
J.G.S., A.K., R.C., M.S.M. and J.T. georeferenced maps; RR coordinated research; C.P. coordinated research,
planned research and acquired funding.
Competing interests
e authors declare no competing interests.
Additional information
Correspondence and requests for materials should be addressed to H.A.O.
Reprints and permissions information is available at www.nature.com/reprints.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and
institutional aliations.
Open Access is article is licensed under a Creative Commons Attribution 4.0 International
License, which permits use, sharing, adaptation, distribution and reproduction in any medium or
format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the
Creative Commons licence, and indicate if changes were made. e images or other third party material in this
article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the
material. If material is not included in the article’s Creative Commons licence and your intended use is not
permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from
the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.
© e Author(s) 2023
Content courtesy of Springer Nature, terms of use apply. Rights reserved
1.
2.
3.
4.
5.
6.
Terms and Conditions
Springer Nature journal content, brought to you courtesy of Springer Nature Customer Service Center GmbH (“Springer Nature”).
Springer Nature supports a reasonable amount of sharing of research papers by authors, subscribers and authorised users (“Users”), for small-
scale personal, non-commercial use provided that all copyright, trade and service marks and other proprietary notices are maintained. By
accessing, sharing, receiving or otherwise using the Springer Nature journal content you agree to these terms of use (“Terms”). For these
purposes, Springer Nature considers academic use (by researchers and students) to be non-commercial.
These Terms are supplementary and will apply in addition to any applicable website terms and conditions, a relevant site licence or a personal
subscription. These Terms will prevail over any conflict or ambiguity with regards to the relevant terms, a site licence or a personal subscription
(to the extent of the conflict or ambiguity only). For Creative Commons-licensed articles, the terms of the Creative Commons license used will
apply.
We collect and use personal data to provide access to the Springer Nature journal content. We may also use these personal data internally within
ResearchGate and Springer Nature and as agreed share it, in an anonymised way, for purposes of tracking, analysis and reporting. We will not
otherwise disclose your personal data outside the ResearchGate or the Springer Nature group of companies unless we have your permission as
detailed in the Privacy Policy.
While Users may use the Springer Nature journal content for small scale, personal non-commercial use, it is important to note that Users may
not:
use such content for the purpose of providing other users with access on a regular or large scale basis or as a means to circumvent access
control;
use such content where to do so would be considered a criminal or statutory offence in any jurisdiction, or gives rise to civil liability, or is
otherwise unlawful;
falsely or misleadingly imply or suggest endorsement, approval , sponsorship, or association unless explicitly agreed to by Springer Nature in
writing;
use bots or other automated methods to access the content or redirect messages
override any security feature or exclusionary protocol; or
share the content in order to create substitute for Springer Nature products or services or a systematic database of Springer Nature journal
content.
In line with the restriction against commercial use, Springer Nature does not permit the creation of a product or service that creates revenue,
royalties, rent or income from our content or its inclusion as part of a paid for service or for other commercial gain. Springer Nature journal
content cannot be used for inter-library loans and librarians may not upload Springer Nature journal content on a large scale into their, or any
other, institutional repository.
These terms of use are reviewed regularly and may be amended at any time. Springer Nature is not obligated to publish any information or
content on this website and may remove it or features or functionality at our sole discretion, at any time with or without notice. Springer Nature
may revoke this licence to you at any time and remove access to any copies of the Springer Nature journal content which have been saved.
To the fullest extent permitted by law, Springer Nature makes no warranties, representations or guarantees to Users, either express or implied
with respect to the Springer nature journal content and all parties disclaim and waive any implied warranties or warranties imposed by law,
including merchantability or fitness for any particular purpose.
Please note that these rights do not automatically extend to content, data or other material published by Springer Nature that may be licensed
from third parties.
If you would like to use or distribute our Springer Nature journal content to a wider audience or on a regular basis or in any other manner not
expressly permitted by these Terms, please contact Springer Nature at
onlineservice@springernature.com
Content uploaded by Hector A. Orengo
Author content
All content in this area was uploaded by Hector A. Orengo on Jul 12, 2023
Content may be subject to copyright.