ArticlePDF Available

Testing inter-observer error under a collaborative research framework for studying lithic shape variability

Springer Nature
Archaeological and Anthropological Sciences
Authors:
  • Max Planck Institute of Geoanthropology
  • the National Museum of Ethiopia

Abstract and Figures

Evaluating error that arises through the aggregation of data recorded by multiple observers is a key consideration in many metric and geometric morphometric analyses of stone tool shape. One of the most common approaches involves the convergence of observers for repeat trails on the same set of artefacts: however, this is logistically and financially challenging when collaborating internationally and/or at a large scale. We present and evaluate a unique alternative for testing inter-observer error, involving the development of 3D printed copies of a lithic reference collection for distribution among observers. With the aim of reducing error, clear protocols were developed for photographing and measuring the replicas, and inter-observer variability was assessed on the replicas in comparison with a corresponding data set recorded by a single observer. Our results demonstrate that, when the photography procedure is standardized and dimensions are clearly defined, the resulting metric and geometric morphometric data are minimally affected by inter-observer error, supporting this method as an effective solution for assessing error under collaborative research frameworks. Collaboration is becoming increasingly important within archaeological and anthropological sciences in order to increase the accessibility of samples, encourage dual-project development between foreign and local researchers and reduce the carbon footprint of collection-based research. This study offers a promising validation of a collaborative research design whereby researchers remotely work together to produce comparable data capturing lithic shape variability. Supplementary information: The online version contains supplementary material available at 10.1007/s12520-022-01676-2.
This content is subject to copyright. Terms and conditions apply.
Vol.:(0123456789)
1 3
https://doi.org/10.1007/s12520-022-01676-2
RESEARCH
Testing inter‑observer error underacollaborative research framework
forstudying lithic shape variability
LucyTimbrell1· ChristopherScott1· BehailuHabte2· YosefTefera2· HélèneMonod3· MounaQazzih4·
BenjaminMarais5· WendyBlack5· ChristineMaroma6· EmmanuelNdiema6,7· StruanHenderson8·
KatherineElmes8· KimberlyPlomp9,10· MattGrove1
Received: 13 June 2022 / Accepted: 20 September 2022
© Crown 2022
Abstract
Evaluating error that arises through the aggregation of data recorded by multiple observers is a key consideration in many
metric and geometric morphometric analyses of stone tool shape. One of the most common approaches involves the conver-
gence of observers for repeat trails on the same set of artefacts: however, this is logistically and financially challenging when
collaborating internationally and/or at a large scale. We present and evaluate a unique alternative for testing inter-observer
error, involving the development of 3D printed copies of a lithic reference collection for distribution among observers. With
the aim of reducing error, clear protocols were developed for photographing and measuring the replicas, and inter-observer
variability was assessed on the replicas in comparison with a corresponding data set recorded by a single observer. Our
results demonstrate that, when the photography procedure is standardized and dimensions are clearly defined, the resulting
metric and geometric morphometric data are minimally affected by inter-observer error, supporting this method as an effec-
tive solution for assessing error under collaborative research frameworks. Collaboration is becoming increasingly important
within archaeological and anthropological sciences in order to increase the accessibility of samples, encourage dual-project
development between foreign and local researchers and reduce the carbon footprint of collection-based research. This study
offers a promising validation of a collaborative research design whereby researchers remotely work together to produce
comparable data capturing lithic shape variability.
Keywords Stone tools· Metric measurements· Geometric morphometrics· 3D printing· Inter-observer reliability
* Lucy Timbrell
lucy.timbrell@liverpool.ac.uk
1 Department ofArchaeology, Classics andEgyptology,
University ofLiverpool, Liverpool, UK
2 Authority forResearch andConservation ofCultural
Heritage, National Museum ofEthiopia, AddisAbaba,
Ethiopia
3 Département Homme Et Environnement, Musée de
L’Homme, Paris, France
4 Institut National Des Sciences de LArchéologie Et du
Patrimoine, Rabat, Morocco
5 Archaeology Unit, Iziko Museums ofSouth Africa,
CapeTown, SouthAfrica
6 Department ofArchaeology, National Museums ofKenya,
Nairobi, Kenya
7 Department ofArchaeology, Max Planck Institute
fortheScience ofHuman History, Jena, Germany
8 Mossel Bay Archaeological Project, Western Cape Province,
CapeTown, SouthAfrica
9 Archaeological Studies Program, University
ofthePhilippines, QuezonCity, Philippines
10 Department ofArchaeology, Simon Fraser University,
Burnaby, BritishColombia, Canada
/ Published online: 1 October 2022
Archaeological and Anthropological Sciences (2022) 14:209
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Archaeological and Anthropological Sciences (2022) 14:209
1 3
Introduction
Shape analyses are becoming an increasingly popular
methodology for examining lithic variability in the archae-
ological record. As such, traditional linear metrics and
geometric morphometrics (GMM) are often employed to
capture morphological information on stone tools (Car-
dillo 2010; Lycett and von Cramon-Taubadel 2015; Matzig
etal. 2021). Combining morphological data from multi-
ple observers is frequently necessary in studies of lithic
assemblages, to increase sample size and/or to perform
inter-site/inter-assemblage analyses, yet this can be prob-
lematic due to the possibility of introducing inter-observer
error into the data (Lyman and VanPool 2009). Such
error has multiple potential sources, can be introduced at
various stages in the workflow, and can skew results by
obscuring any “real” signals in the data (Fruciano 2016);
examining the magnitude of inter-observer error is there-
fore imperative to validate whether meta-analyses are
robust. International researchers are increasingly being
encouraged to work collaboratively in order to remotely
produce archaeological and anthropological datasets
(Chang and Alfaro 2015; O’Leary and Kaufman 2011;
Scerri etal. 2020; Timbrell 2020, 2022)—in some case
even crowdsourcing morphometric data (Chang and Alfaro
2015). However, it is frequently impossible for observers
to converge on the same material to record repeat trials
for an inter-observer repeatability assessment. Such con-
trol tests therefore need to be appropriate for the specific
research design, and customized solutions for evaluating
error under collaborative research frameworks should be
developed (Fruciano 2016). Here, we present an innovative
analysis of inter-observer error involving the compilation
of standardized photographs and measurements of lith-
ics from multiple observers for metric and GMM analysis
(Timbrell 2022).
Traditionally, lithic shape variation has been examined
through qualitative descriptions (Inizan etal. 1999), typolog-
ical classification (Bordes 1961) and/or linear measurements
(Roe 1964; McNabb 2017). Advancements in biological
morphometrics and computing have meant that geomet-
ric morphometrics are now also routinely applied in the
analysis of lithic morphologies (Bookstein 1991; Buchanan
etal. 2018; Cardillo 2010; Lycett 2009; Serwatka and
Riede 2016). GMM approaches are split into methods that
use landmarks and outlines, the former representing shape
through homologous points (landmarks) superimposed on
a two-dimensional (2D) or three-dimensional (3D) object
and the latter applying geometric descriptions of homolo-
gous outlines or surfaces (Mitteroecker 2021). Landmark-
based methods allow for specific aspects of morphology
to be captured without the inclusion of random noise (i.e.
shape dimensions that are not pertinent to the research ques-
tion); however, their application to certain non-biological
structures, such as lithics and other archaeological artefacts,
is often more difficult as the identification of homologous
landmarks can be subjective (Okumura and Araujo 2018).
Outline-based GMM, on the other hand, avoids certain
issues of homology through quantifying the gross shape of
each specimen (Klingenberg 2008), making them ideal for
describing shape variation of lithics in archaeological stud-
ies (e.g. Iovita 2009, 2011; Ivaonovaité etal. 2020; Matzig
etal. 2021; Mesfin etal. 2020; Wang and Marwick 2020).
Assessment of the levels of inter- and intra-observer error
under different methodological approaches to studying lithic
shape is vital, and several studies have examined error in
metric and GMM analyses at different phases of the work-
flow (Evin etal. 2020; Fagerton etal. 2014; Lyman and Van-
Pool 2009; Macdonald etal. 2020; Menedez 2017; Osis etal.
2015; Perini etal. 2005; Robinson and Terhune 2017; von
Cramon-Taubadel etal. 2007; Yezerinac etal. 1992). Prob-
lematic landmarks, i.e. those that are difficult to consistently
locate, can be a source of error in landmark-based GMM
analysis (Fagerton etal. 2014; Menedez 2017; Robinson
and Terhune 2017; von Cramon-Taubadel etal. 2007), even
for experienced observers (Chang and Alfaro etal. 2015).
von Cramon-Taubadel etal. (2007) found that repeating the
digitization procedure was the most suitable method for
assessing the precision of landmarks, with adequate land-
mark definitions imperative for reducing error. Yezerinac
etal. (1992) also found that ill-defined measurements were a
factor increasing error in metric data; in addition to operator
experience, the precision of the measuring device and the
conditions under which the measurements are made, such as
lighting. Combining metric measurements from more than
one observer, therefore, is likely to be suitable only when the
dimensions are standardized and easily measured, and the
conditions, the precision and quality of the equipment and
the technique of recording the data are comparable (Lyman
and VanPool 2009).
Comparatively, fewer studies have examined the levels of
inter-observer error in outline-based GMM methods. Evin
etal. (2020), in an investigation of error between morpho-
metric approaches, found that although methods that employ
landmarks were the most sensitive to error, outline data saw
relatively lower levels of intra-observer error compared to
inter-observer error, with photography being an influential
source of variance between observers. Digital photography
is widely used in 2D GMM as it is inexpensive, easy to
perform and does not require extremely specialist knowl-
edge or equipment, with the digitization of landmarks and/or
outlines on the resulting images providing a 2D representa-
tion of the 3D object. The focal length and specifications of
the lens used can, however, cause parallax error; the optical
distortion that occurs when the specimen is too close or not
209 Page 2 of 15
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Archaeological and Anthropological Sciences (2022) 14:209
1 3
directly centered beneath the lens (fisheye). Nonetheless,
several studies employing both landmark and outline meth-
ods suggest that 2D GMM data are minimally affected by
parallax error, especially when the camera set-up is stand-
ardized and calibrated, with deviations small and constant
enough for accurate analyses (Caple etal. 2018; MacDonald
etal. 2020; Mullin and Taylor 2002; Riano etal. 2009).
Overall, outline-based methods are likely more suitable for
collaborative research designs in studies of lithic shape;
due to the objectivity of data capture, the fact that landmark
methods have high rates of inter-observer error, though this
is more pertinent during landmark digitization than object
photography (Evin etal., 2020), and the potential to reduce
inter-observer error through the standardization of the pho-
tography procedure.
Although the inter-observer error is a concern in any
collaborative research design, collating data from multiple
observers is often necessary in archaeological research,
be it to increase sample sizes, facilitate interdisciplinary
research and/or enable access to disparate data (Timbrell
2020). The latter is especially important when considering
issues of income-disparity, childcare and disability that can
disproportionately disadvantage researchers who are unable
to travel extensively to collect data. Global catastrophes,
such as pandemics, climate change and conflict, can also
temporarily delay international research through the con-
straints imposed on travel and safety, requiring researchers
to develop scientifically sound remote models of data gen-
eration (Scerri etal. 2020). Timbrell (2022) presents such
a framework, which involves the documentation of lithic
shape by multiple collaborators. These types of approaches
have additional benefits for decreasing the carbon footprint
associated with accessing multiple international samples and
fostering knowledge-sharing through dual project develop-
ment and the division of responsibilities so that both for-
eign and local researchers take on principal roles within a
given project, which is particularly crucial across the Global
North–South divide (Chirikure 2015; Douglass etal. 2020;
Else, 2022). Indeed, collaborative approaches accord with
the open science initiative in archaeology, which advocates
that data stewardship should be centered around research-
ers collecting and sharing data on behalf of the scientific
community, as opposed for the betterment of a single indi-
vidual’s career (Marwick etal. 2017).
While collaborative data collection offers a promising
new framework for generating and sharing data internation-
ally, the analysis of inter-observer error is imperative to vali-
date such an approach. Here, we present a unique control
test that involves the production of 3D printed replicas of a
lithic reference collection, which can be distributed among
observers and measured following the same protocols used
to collect the actual data. We then examine the differences
between the datasets, knowing that each collaborator has
recorded the same data from identical copies of the artefacts.
Using this approach, we evaluate whether the compilation of
data from multiple observers is conducive to error, and thus
could negatively bias the results of a collaborative study.
Materials
Six lithic points were knapped using fine-grained flint from
Caistor Quarry, Caister St Edumunds, UK and scanned for
3D printing at the University of Liverpool (Fig.1). The ref-
erence tools varied in both size and shape, encapsulating a
range of morphologies characteristic of the empirical sample
to be studied in the main project (African Middle Stone Age
assemblages). While flint is not a feature of African lithic
assemblages, it could be considered representative of the
finer-grained materials, such as obsidian, chert and heat-
treated silcrete, exploited during the Middle Stone Age (Key
etal. 2021; Sahle etal. 2013). The tools were produced on
flakes and retouched using: (1) direct freehand hard hammer
percussion (quartzite hammerstones), (2) direct soft free-
hand hammer percussion using an antler hammer and (3)
handheld pressure flaking using an antler tine supported in
a tanned leather pad. Each tool was colored blue using craft
enamel spray paint to aid scanning.
Next, each lithic was scanned with a freshly calibrated
Einscan Pro 2X structured light scanner with a colour cam-
era, using the combined feature and texture mapping in the
Fig. 1 The six 3D printed replica tools. Original lithics were knapped
and scanned by CS in preparation for 3D printing. Example photos
were taken by SH. Scale = 3cm
Page 3 of 15 209
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Archaeological and Anthropological Sciences (2022) 14:209
1 3
high-resolution setting. Initial scans were performed with
the lithics placed vertically in a foam holder using fixed
scan mode aligned with an automated turntable and coded
targets (scans taken every 11.25 degrees, i.e. 32 scans). The
models were then completed by switching to the “align by”
feature using the turntable (32 scans), and the lithic was
rescanned (2–3 times) until a complete model was achieved.
All alignment was automatic to produce a watertight mesh;
no holes were filled. Each model was sharpened using the
Einscan high-setting and saved as.obj files without decima-
tion (see Supplementary Online Table1for further data on
each model).
The 3D models were processed for printing using Chitu-
box v1.8.1. Medium-sized automated supports were applied
using this software at 90% total coverage to provide a strong
foundation for the 3D prints. We used an Elegoo Mars 2
Resin printer, with a new printer film, using standard grey
Elegoo LCD UV curing 405 Nm photopolymer resin with
recommended Elegoo settings (Fig.2). The prints were
extracted from the print bed, and the supports were removed
by hand prior to being rinsed in ethanol and cured in direct
sunlight. Each tool was printed six times to create six copies
of the assemblage, resulting in 36 prints in total.
Methods
Each tool was assigned a number (Tool 1–6; Fig.1), and a
replica copy of the assemblage was sent to researchers at
six independent institutions (Table1). Data collection pro-
tocols, outlined in detail by Timbrell (2022) and described
in Supplementary Online Resource S1, were developed
Fig. 2 Photographs from the
3D printing process. A The
3D model of the tool is sent
to the machine for printing. B
The resulting 3D prints once
removed from the supports
are cleaned using ethanol. 3D
printing was carried out by LT
and CS
Table 1 Summary of the observers and the photography equipment
used. This equipment was sourced locally; in most cases, the institu-
tions already had access to the necessary apparatus; however, in some
cases, it was rented and/or purchased and donated to the institution
after the project, following guidelines provided by The Wenner Gren
Foundation
Assem-
blage
number
Institution Abbreviation Country Camera body Camera lens
1 Institut National des Sciences de
l’Archéologie et du Patrimoine
INSAP Morocco Nikon D7100 Nikon AF-S Micro Nikkor 105mm
2 Iziko Museums of South Africa IM South Africa Canon 6D II Canon 100mm 2.8 Macro
3 Mossel Bay Archaeological Project MBAP South Africa Nikon D300s Nikon AF Micro Nikkor 60mm
1:2.8D
4 National Museum of Ethiopia NME Ethiopia Canon EOS DSLR 200D Canon Tamron 60mm Macro Di II
5 National Museums of Kenya NMK Kenya Nikon D5300 Nikon AF-S Micro Nikkor 40mm
6 Musée de l’Homme MH France Nikon D5200 Nikon AF-S Nikkor 24–70mm
209 Page 4 of 15
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Archaeological and Anthropological Sciences (2022) 14:209
1 3
to standardize the documentation of lithic shapes through
photography and measurements. These procedures were fol-
lowed by all observers across the study to produce equiva-
lent data. Instructions for object position, camera position
and settings and lighting were specified and tightly con-
trolled (Supplementary Online Resource S1). In addition,
a scale (sourced insitu by the observers) was placed in
each photograph to ensure a measure of size was recorded.
Table1 reports the camera and lenses used to capture each
replica assemblage in the study; high-quality equipment was
accessed by all observers either through their institution
directly or through funding provided by the project. Three
basic measurements on each tool were also taken to record
morphological length, width and thickness (see Supplemen-
tary Online Fig.1for a schematic) at a resolution of 0.1mm.
We defined length as the maximum dimension of the point,
width as the maximum measurement in the perpendicular
dimension to length and thickness as the maximum measure-
ment in the third dimension, following Shea (2020).
Prior to distribution among institutions, all 36 replicas
were also recorded by a single observer (LT) to produce a
comparative dataset. Photography was performed using a
Canon M50 camera with an EF-S 60mm f/2.8 Macro USM
lens and the three measurements were taken using digital
calipers. This enabled us to determine the magnitude of
intra-observer measurement error, for comparison with the
magnitude of inter-observer error, had the project been car-
ried out by a single individual under a traditional research
framework.
Data were uploaded onto a communal data sharing plat-
form (Google Drive) by each observer for processing and
analysis by a single observer (LT). Analyses were performed
in the R software environment (R Core Team 2020). Data
and code can be found on the GitHub repository for the pro-
ject: https:// github. com/ lucyt imbre ll/ error_ analy sis_ lithi cs/.
Metric analyses
We first computed the intra-class correlation coefficient
(
ICC
) using the “psych” R package (Revelle, 2022) to
assess the agreement between the six observers in meas-
uring the six tools for length, width and thickness. The
ICC
compares the variability within repeat measurements
whilst contrasting variability between groups of measure-
ments (Barlett and Frost 2008; Fruciano 2016; Koo and
Li 2016; Shrout and Fleiss 1979). Specifically, we used
a two-way mixed effects model to compute the
ICC
, with
the set of observers considered a fixed effect. To assess
the reliability of data collection, we next calculated and
compared the mean, variance, technical error of measure-
ment (
TEM
) and percentage technical error of measure-
ment (
%TEM
). The mean and variance (expressed as the
standard deviation) were calculated for each measurement
on each tool, with the
TEM
and
%TEM
calculated to com-
pare pairs of observers across all measurements on all
tools. The
TEM
reflects measurement precision between
observers, and is calculated as:
where
N
is the number of subjects,
K
is the number of
observers, and
M
is the measurement (modified from Uli-
jaszek and Kerr [1999]). The %TEM represents the magni-
tude of the error as a percentage of the mean of the measure-
ment/variable studied. It is calculated as:
where
is the average value of the raw measurements, taken
across all measurements on all tools by multiple observers.
The values obtained for these metrics must be subjectively
assessed according to the research question, as there is no
standard applied threshold of error deemed to be “accept-
able”. Following Lyman and VanPool (2009)’s analyses of
projectile points, we propose that a
%TEM
of < 4 could be
an acceptable level of error without negative consequences
on the results. Lastly, we calculated the coefficient of reli-
ability
(R)
, which ranges from 0 to 1, with 1 indicating very
high congruence between measures. We used the following
formula outlined in Lyman and VanPool (2009):
where
𝜎2
v
is the variance of all raw measurements on all tools
taken by two observers and
𝜎2
d
is the variance of the differ-
ence between those two sets of measurements. Similarly to
the
ICC
, the coefficient of reliability distinguishes between
the variability between the specimens and that which results
from random measurement error. However, whilst
R
can only
be calculated between pairs of observers, the
ICC
represents
an overall metric for measurement error across all observers.
Random error can inflate the amount of variance within
a sample, resulting in a loss of statistical power as noise
obscures true differences in means (Fruciano 2006; Yez-
erinac etal. 1992). To evaluate the levels of error in the
multiple observer data in relation to the single observer
data, we used two-sample t-tests to compare differences
in mean and F-tests to compare differences in variance. If
there is high inter- and/or intra-observer error, variation
within replicas of the same tool will be increased and dif-
ferences in the mean values for each tool will be signifi-
cantly different.
TEM
=
N
1K
1M2K
1M2
K
N(K1)
%
TEM =100(
TEM
v)
R
=𝜎
2
v
(
𝜎
2
v
+𝜎
2
d)
Page 5 of 15 209
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Archaeological and Anthropological Sciences (2022) 14:209
1 3
Two‑dimensional geometric morphometric analysis
In preparation for GMM analysis, each image was processed
using the “object select’ tool in Adobe Photoshop, which
automatically determines the contour of the object. Once
the contour was highlighted, the object was filled with solid
black to help facilitate the extraction of outline data. All
processed images were then synthesized into a single thin-
plate spline (.tps) file using tpsUtil, and the outline data
were extracted using tpsDig2. The outline of each artefact
was represented by an average of 2856 equidistant points,
which were scaled through the specification of the pixel-to-
centimeter ratio for each image (see Supplementary Online
Fig.2for a visualization of the data). The outline data were
saved as (x, y) coordinates within the.tps file and imported
into R.
Using the “Momocs” R package (Bonhomme etal. 2014),
the outlines were standardized following Bonhomme etal.
(2017) by normalizing to a common centroid, scaling to cen-
troid size and aligning along the long axis of the object. We
then performed elliptic Fourier analysis (EFA) to convert the
geometric data to frequency data, with the outline decom-
posed into a series of repeating trigonometric functions,
referred to as harmonics (Caple etal. 2017; Fig.3). The
appropriate number of harmonics were identified to capture
sufficient information on shape; this was deemed to be 8 har-
monics, achieving 99% harmonic power (Caple etal. 2017).
Next, we performed a principal components analysis
(PCA) on the elliptic Fourier coefficients to reduce the
dimensionality of the data. Principal components (PCs) are
constructed to highlight the main axes of morphological
variance (Zelditch etal. 2004). Like with the metric data,
we calculated the
ICC
and
R
values to partition the variance
from the inter-observer error for the PC scores of repeat cap-
tures (Daboul etal. 2018; Fruciano 2006). Due to the nature
of PC scores, we were unable to obtain an informative rela-
tive measure of dispersion (
%
TEM
)
and instead refer to the
standard deviation (calculated as the square root of the vari-
ance) as absolute measures of dispersion. This is because,
when the mean of a set of repeat captures falls close to the
mean of a PC (~ 0) and has a low standard deviation (~ 0),
the %TEM would be very high despite the tight clustering of
the repeated measures along that PC. In addition, we applied
linear discriminant analysis (LDA) to the PC scores, with
the equal sample sizes used as the prior group probabilities
(1/6) of a repeat belonging to a certain group based on their
outline shape alone (Mitteroecker and Bookstein 2011). In
this analysis, we tested firstly whether the tools could be dis-
tinguished based on their shape alone, and then whether the
observers could be identified. One would expect high clas-
sification results when discriminating between tools and low
classification results when discriminating between observers
if inter-observer error is low.
Results
Linear metric analysis
We first explored whether the measurements were recorded
consistently on the replicas between observers. Figure4
shows the distribution of the multiple observer data through
boxplots; most of the measurements have very limited vari-
ance around the mean, and all tools were significantly differ-
ent to each other across all measurements when tested using
Tukey’s honestly significant difference (HSD; p < 0.001).
Fig. 3 A schematic of the Ellip-
tic Fourier fitting process that
generates the raw shape data
for geometric morphometrics.
Coefficients of sine and cosine
terms (harmonics) are computed
to reconstruct the x (blue) and
y (red) coordinates from an
arbitrary starting point moving
along the outline
209 Page 6 of 15
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Archaeological and Anthropological Sciences (2022) 14:209
1 3
Thickness is the most variable dimension recorded, probably
because it is more difficult to orient the tool for this measure-
ment than it is for length or width. Calculation of the coeffi-
cient of reliability between each pair of observers found that
all values of
R
were > 0.999, suggesting that over 99% of the
variance in each measurement is due to variability between
the specimens as opposed to error. We calculated the
TEM
as 0.368 and the
%TEM
as 0.908, supporting that less than
1% of the variance in the dataset is related to measurement
error. Finally, the
ICC
score confirmed that there is a very
high absolute agreement between the observers (
ICC
= 1,
p < 0.001).
We then compared the measurements taken by multiple
observers with those taken by a single observer as a means
of comparing intra- and inter-observer errors. We first cal-
culated the coefficient of reliability for the single observer
for each pair of replica assemblages—we found that all
values were > 0.999, indicating very high congruence
between repeat captures by the single observer. Table2
reports the mean and standard deviation of length, width
and thickness for the single observer compared to multiple
observers; two-sample t-tests found that there were almost
no statistically significant differences in means between
the data sets (1/36 = p < 0.05; Table3). However, F-tests
found that half of the measurements show statistically sig-
nificance differences in variance, particularly along length
and width (Table3). This demonstrates that the single
observer is generally less prone to error, which is likely
due to a combination of the familiarity of this observer
to both the metric definitions and the assemblage and the
fact that the same equipment was used to measure all of
the replicas. Nonetheless, the fact that these differences in
variance only resulted in a single instance of significant
difference in mean, plus the standard deviation does not
exceeds 0.7mm, suggests that the effects of inter-observer
error are minimal on the results.
Fig. 4 Boxplots demonstrating
the distribution of length, width
and thickness (mm) collected
by multiple observers for each
tool (1–6)
Page 7 of 15 209
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Archaeological and Anthropological Sciences (2022) 14:209
1 3
Table 2 Summary statistics
reporting the mean (m) and
standard deviation (sd) obtained
for length, width and thickness,
recorded by multiple observers
versus a single observer for each
tool (1–6). Standard deviation
values have been rounded to 3
decimal places
Tool Length (mm) Width (mm) Thickness (mm)
Multiple Single Multiple Single Multiple Single
m sd m sd m sd m sd m sd m sd
1 86.2 0.471 86.3 0.175 40.9 0.308 40.9 0.103 13.4 0.281 13.3 0.248
2 67.6 0.266 67.6 0.089 37.3 0.258 37.5 0.228 10.3 0.141 10.4 0.075
3 66.0 0.613 66.3 0.137 23.4 0.266 23.3 0.225 6.87 0.472 6.72 0.075
4 74.6 0.279 74.4 0.299 48.4 0.374 48.5 0.103 11.9 0.151 11.8 0.105
5 59.7 0.133 59.7 0.075 27.4 0.335 27.6 0.082 9.45 0.281 9.48 0.147
6 87.3 0.405 87.4 0.063 44.7 0.659 44.6 0.126 14.3 0.415 14.2 0.117
Table 3 P-values from t-tests
(difference in mean) and
F-tests (difference in variance)
comparing the metrics (length,
width and thickness) for each
tool (1–6) measured by multiple
observers versus a single
observer. Statistical significance
(p < 0.05) is marked by an
asterisk (*). All values have
been rounded to 3 decimal
places
Tool Length (mm) Width (mm) Thickness (mm)
T F T F T F
1 0.815 0.049* 0.632 0.032* 0.673 0.792
2 0.678 0.032* 0.264 0.792 0.240 0.193
3 0.342 0.005* 0.498 0.724 0.475 0.001*
4 0.342 0.879 0.689 0.013* 0.037* 0.446
5 0.608 0.238 0.152 0.008* 0.804 0.182
6 0.575 0.001* 0.646 0.002* 0.653 0.015*
Fig. 5 Principal component (PC) contributions along the first 3 axes of variance within the multiple observer outline data
209 Page 8 of 15
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Archaeological and Anthropological Sciences (2022) 14:209
1 3
Geometric morphometric analysis
PCA was used to highlight variance in the multiple observer
data. The first 3 PCs represented > 90% of the variation
between the replicas, and thus were explored in this study.
Figure5 demonstrates the shape differences highlighted by
PC1-3. PC1 represents 59.7% of the total variance, whilst
PC2 and PC3 account for 33.4% and 3%, respectively (see
Supplementary Online Fig.3for scree plot of PC loadings
and cumulative variance).
When the first 3 PCs are plotted against each other, clear
clustering occurs, demonstrating that replicas of the same
tool tend to share more similarities than that of different
tools (Fig.6). However, there is notable variation within
tools along PC3, suggesting that inter-observer error deriv-
ing from photography equipment and set-up is prevalent in
this dimension. PC3 is an axis of variation represented by
slight asymmetries in convexity at the proximal end (Fig.5),
thus likely reflecting parallax error between observers. Addi-
tionally, we note some overlap between certain tool groups,
although this is primarily because these tools share similar
shapes once size is removed (Supplementary Online Fig.2).
For example, Tool 5 sometimes plots within the range of
variation for Tool 1 and only shows statistically significant
differences in mean from this tool along PC2 (p < 0.008;
see Supplementary Online Table2for Tukey’s HSD results
comparing differences in mean between tools). To tease
apart the variation between the tools and that associated
with the error, we calculated the coefficient of reliability
between each pair of observers, which ranged between 0.960
and 0.999 (Table4), suggesting that < 4% of the variance is
due to inter-observer error, which lies within our acceptable
threshold. The
ICC
was computed using the first 3 PC scores
to determine levels of similarity between the six observ-
ers, whilst taking into account the variability between the
tools, and found an almost perfect agreement (
ICC
= 0.99,
Table 4 Coefficient of reliability (
R
) values for pair-wise combina-
tions of observers using the first 3 PC scores. For observer abbre-
viations and associated assemblage numbers, see Table1. All values
have been rounded to 3 decimal places
INSAP IM MBAP NME NMK
IM 0.988
MBAP 0.978 0.960
NME 0.984 0.975 0.995
NMK 0.969 0.969 0.985 0.992
MH 0.989 0.978 0.993 0.999 0.990
Fig. 6 Scatterplots (top row) and boxplots (bottom row) of repeat capture scores along principal components (PC) 1–3, demonstrating the clus-
tering within tools (1–6). PC1 represents 59.7% of the total variance, whilst PC2 and PC3 account for 33.4% and 3%, respectively
Page 9 of 15 209
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Archaeological and Anthropological Sciences (2022) 14:209
1 3
p < 0.001). Finally, we found that an LDA could discriminate
accurately between the replica groups (94% classification
accuracy) and could not differentiate between observers (0%
classification accuracy).
Next, we compared the levels of error obtained when col-
lating photographs from multiple observers and that which
arises when all replicas are photographed by the same
observer. We performed another PCA with data acquired
from both sets of images (see Supplementary Online
Figs.4-5for PC contributions and loadings) and produced
scatterplots of PC1–3. Figure7 demonstrates clear cluster-
ing between tools recorded in both sets of data along PC1
and PC2. However, along PC3 there is clear variability
within repeats when grouped by the observer (multiple vs
single). F-tests found that the variance among certain tools
was only significantly higher for the multiple observers in
three cases, i.e. tool 4 and 1 along PC3 and tool 4 along PC1
(Table5). Two-sample t-tests found statistically significant
Fig. 7 Scatterplots (top row) and boxplots (bottom row) of repeat
capture scores along principal components (PC) 1–3, demonstrating
the clustering within tools (symbols) and between data sets (colors).
PC1 represents 60.4% of the total variance, whilst PC2 and PC3
account for 33.5% and 3.3%, respectively
Table 5 P-values from t-tests (difference in mean) and F-tests (differ-
ence in variance) comparing the principal component (PC) scores of
the repeats of each tool (1–6) captured by multiple observers verses
a single observer. Statistical significance (p < 0.05) is marked by an
asterisk (*). All values have been rounded to 3 decimal places
Tool PC1 PC2 PC3
T F T F T F
1 0.282 0.068 0.556 0.141 0.110 0.001*
2 0.091 0.463 0.114 0.162 0.188 0.671
3 0.006* 0.119 0.067 0.873 0.335 0.115
4 0.082 0.029* 0.009* 0.384 0.099 0.006*
5 0.004* 0.663 0.003* 0.257 0.000* 0.411
6 0.954 0.056 0.095 0.157 0.441 0.939
209 Page 10 of 15
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Archaeological and Anthropological Sciences (2022) 14:209
1 3
differences in means between the data sets, but these are
limited (5/36 = p < 0.05; Table5). Table6 and Fig.7 dem-
onstrate that the data collected by a single observer returns
lower variance, though this pattern is not strong, and, in a
few cases, it is slightly higher under this strategy, though not
significantly so. We finally calculated the coefficient of reli-
ability for the single observer between each of capture of the
replica assemblages—Supplementary Online Table3shows
that the
R
values ranged from 0.994 to 0.999, suggesting
that < 1% of the variance in the single observer data is due
to an intra-observer error.
Discussion
Here we present a control study that validates the use of the
collaborative data collection protocol presented in Timbrell
(2022), which can now be used more extensively by other
researchers to reduce travel and carbon emissions, as well
as to bring researchers from other geographical areas into
the collaborative process more directly. Our results demon-
strate that the levels of inter-observer error permeating shape
data collated under a collaborative research framework fall
within the acceptable threshold, thanks to the establishment
of clear research protocols followed by each collaborator.
We found that, inevitably, increases in error occur as a con-
sequence of relying on multiple observers, who each have
access to different equipment, yet we do not deem this to
be significant enough to highly distort the results towards a
different conclusion about the data. Therefore, our innova-
tive 3D printing approach and the results reported here have
important implications for error assessments of linear metric
and GMM data when recording lithic shape as well as the
aggregation of data collected by multiple observers.
Outline-based GMM was found to be slightly more sensi-
tive to inter-observer error than metric methods. As Caple
etal. (2018) point out, EFA involves global descriptors cap-
turing around 99% of the variance in the outline shape, and
therefore, discrepancies between images lead to error in the
coefficients dispersed throughout the full outline. Therefore,
even if the error is not equally distributed, it is measured as
such, and consequently, outline methods are often more sen-
sitive to error than linear methods that capture only certain
dimensions of an object. 2D outline-based GMM provides
comprehensive morphological information on the gross out-
line shape of an object, whereas linear metrics are able to
capture aspects of the 3D shape but in much less detail; the
increase in the morphological information captured, plus the
added potential for automated data capture (e.g. Bonhomme
etal. 2014; Matzig 2021) and impressive shape visualization
(e.g. Figure5), will be worth the potential increase in error
with 2D GMM in many scenarios.
Our use of PCA to highlight axes of variance within lithic
shape assemblages also demonstrates that inter-observer
error does not affect all PCs equally. As outlined by Page
(1976), subtle errors in each variable are combined in mul-
tivariate analyses and can be extracted by a single or small
set of PCs, although they may also describe real aspects of
covariance and so require careful consideration as to their
source. When undertaking metric analyses, it is possible
to assess error in each individual measurement; if the met-
rics are combined via dimension reduction methods such
as PCA, the contributions of each individual measurement
to each PC are readily identifiable through the PCA coef-
ficients. This is less feasible with GMM data, particularly
when using outlines and semi-landmarks, and in such cases,
it is preferable to assess error on each of the leading PCs, as
demonstrated above, rather than on each set of coordinates,
which can be very numerous. Overall, the error is impossible
to avoid completely, and indeed, the imperfect fidelity of
cultural transmission means that copying errors can natu-
rally occur during the knapping process and inflate variance
between and within assemblages (Eerkens and Lipo 2005;
Schillinger etal. 2014). In this sense, the error is certain to
arise within a data set capturing lithic variability; however,
steps can be taken to ensure it is minimized, such as stand-
ardization of data acquisition, processing, and analytical pro-
cedures, calibration, high-quality equipment and assessment
Table 6 Summary statistics reporting mean (m) and standard deviation (sd) of principal component (PC) scores of the repeats of each tool (1–6),
captured by multiple observers versus a single observer. All values have been rounded to 3 decimal places
Tool PC1 PC2 PC3
Multiple Single Multiple Single Multiple Single
 m  sd  m  sd  m  sd  m  sd  m sd  m  sd
1 − 0.034 0.006 − 0.031 0.003 0.039 0.007 0.037 0.003 − 0.011 0.016 − 0.023 0.003
2 0.034 0.003 0.031 0.002 0.046 0.002 0.042 0.004 0.042 0.004 0.039 0.004
3 − 0.131 0.001 − 0.135 0.002 − 0.111 0.003 − 0.107 0.004 0.009 0.004 0.007 0.002
4 0.174 0.005 0.169 0.002 − 0.074 0.004 − 0.081 0.003 − 0.004 0.01 − 0.012 0.002
5 − 0.03 0.001 − 0.033 0.002 0.03 0.003 0.025 0.001 − 0.014 0.001 − 0.023 0.002
6 − 0.008 0.007 − 0.008 0.003 0.078 0.002 0.074 0.004 − 0.006 0.006 − 0.004 0.006
Page 11 of 15 209
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Archaeological and Anthropological Sciences (2022) 14:209
1 3
of error through repeat measures (Evin etal. 2020; Lyman
and VanPool 2009; Robinson and Terhune 2017; Yezerinac
etal. 1992). In the case of the current study, we determine
that inter-observer error is low enough for accurate analyses
under both methods, especially as the high
ICC
and
R
values
demonstrate acceptable levels of congruence between the
six observers.
Through the development of clear research protocols, our
results demonstrate that multiple observers can successfully
work together to produce sets of comparable data for aggre-
gation. We believe that collaborative research designs, such
as the one reported in Timbrell (2022), play an integral role
in addressing the vulnerabilities of international research to
disruption, revealed most recently in 2020 by the outbreak
of coronavirus (COVID-19), which halted both domestic and
international travel as well as social interaction. Our results
suggest that, as well as single researchers visiting multiple
collections to independently access lithic samples, interna-
tional colleagues are also able to work together insitu to
generate data, thereby building resilience in archaeological
practice (Douglass etal., 2020; Scerri etal., 2020). We stress
though that collaborative research designs should involve
an equitable partnership in relation to the data, following
the imminent Cape Town statement (see Else, 2022), with
all researchers being involved in all stages of the research,
from planning and protocol development to publication and
dissemination (Chirikure 2015; Douglass etal. 2020). In this
way, dual project development can enable local researchers
to benefit from international archaeological research, thereby
avoiding some (but not all) of the neo-colonial “helicopter”
practices that have been hugely criticized in archaeological
and anthropological sciences, particularly in Africa (Ack-
ermann 2019; Athreya and Ackermann 2019; Sahle 2021).
We have provided here an initial pilot test of collaborative
data collection using a 3D printing approach. This approach
is unique and, to our knowledge, has not yet been applied
in the context of lithic variability nor inter-observer error
assessments. We propose that future studies should aim to
reproduce our approach with more expanded samples of rep-
lica artefacts and discuss three important aspects of potential
future study design below.
The first aspect relates to the use of statistics and simple
metrics for reporting the inter-observer error. Statistics such
as the
ICC
and %
TEM
express the error variance relative to
the overall variance of the sample; the variance is decom-
posed into that due to genuine variation among the artefacts
and that due to variation among the observers (including
that due to different individuals, their different cameras,
lenses, etc.). Whilst this approach has many advantages,
one immediate drawback is that these statistics are directly
affected by the magnitude of genuine variation in both the
sample of artefacts and in the dimensions measured. A
given constant level of measurement error will appear large
when the artefacts measured are highly standardized, but
small when the artefacts measured are highly variable. Even
if one were to measure the widths and lengths of a set of
highly standardized artefacts, a given level of measurement
error would appear smaller the further the ratio of width to
length is from unity, as this would increase the magnitude
of genuine variation in the measurements taken. For this
reason, it is always valuable to present simple indices of
absolute error (such as standard deviation or variance) for
single measurements alongside the indices of relative error
variance across all measurements provided by the
ICC
and
%TEM
. Such simple indices are valuable in assessing inter-
observer error even when the ultimate study involves more
sophisticated morphological analyses, such as those based
on GMM. In the current study, Table2 presents such indices
and demonstrates that levels of error are minimal (the largest
standard deviation among multiple observers for a single
measurement = 0.613mm).
The second aspect relates to the exploration of the
effects of the raw material used for the production of the
reference collection on the results of comparative stud-
ies. In this study, we used flint because it was available
and accessible at the University of Liverpool, where the
materials were prepared. This fine-grained raw material
tends to produce well-defined features and edges, and so it
would be interesting to replicate the approach with a more
coarse-grained material, such as quartzite, chert, calcrete
or sandstone. This is especially pertinent in our case as
the shapes obtained from these materials are likely to be
more representative of the actual African stone tools that
have been recorded in the main project. However, we note
that heat-treated silcrete may achieve a grain as fine as
flint (Key etal. 2021), and that obsidian can be even finer-
grained than flint; since both silcrete and obsidian are raw
materials commonly found in African Middle Stone Age
assemblages. We suggest that the flint used here acts as
a suitable middle ground in terms of granularity and can
therefore be considered as broadly comparable to those
raw materials studied in the main project.
Finally, an aspect of variation between individual replicas
that we did not explicitly measure is that which can arise
through 3D printing. Zeng and Zou (2019) outline some
of the factors that can affect the precision of 3D printing,
which include slicing and support errors. However, we pro-
pose that, even if there are printing errors present in our
replicas, these are likely minimal due to the highly compa-
rable data obtained across the project. Additionally, printing
errors should not contribute to differences between the two
data collection strategies as both the multiple observers and
the single observer recorded measurements from the same
set of replicas. Depending on the local accessibility of 3D
printers, our approach to inter-observer testing could be fur-
ther streamlined through the direct sharing of the virtual 3D
209 Page 12 of 15
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Archaeological and Anthropological Sciences (2022) 14:209
1 3
models, with each collaborator printing their own copies to
measure. This would alleviate potential logistical problems
with global distribution, both via mail or directly, though
further research is required to ascertain the variation in
objects printed using different models of 3D printers.
Conclusion
Aggregating lithic shape data requires careful considera-
tion in to order reduce potential sources of inter-observer
error that can result in detrimental consequences on the
results and their interpretation. Our analysis of metric
and outline-based 2D GMM data from multiple observers
found that the former performed slightly better than the
latter in our tests of inter- and intra-observer error, primar-
ily due to differences in the nature and detail of the mor-
phological information obtained, though both approaches
returned levels of error deemed acceptable for accurate
analyses. Standardization of the data collection procedure
is vital for ensuring that congruence between observers is
maintained, though we note that this alone cannot com-
pletely eradicate error as we find that variability between
observers can still be detected within our data to a (some-
times) significant extent. Nonetheless, we believe that
producing replica samples through 3D printing could
have many useful applications within archaeological and
anthropological sciences beyond the study of error in the
analysis of lithic assemblages and should be adopted more
widely in assessments of inter-observer error as an inte-
gral component of international collaborations between
institutions.
Supplementary Information The online version contains supplemen-
tary material available at https:// doi. org/ 10. 1007/ s12520- 022- 01676-2.
Acknowledgements We would like to thank Stéphanie Bonilauri,
Laurence Glemarec, and Roland Nespoulet (Musée de l’Homme),
Abdelouahed Ben-Ncer and Hassan Dermouk (Institut National des
Sciences de l’Archéologie et du Patriome), Simon Mboya and Sharon
Manura (National Museums of Kenya) and Curtis Marean (Mossel Bay
Archaeological Project) for their roles in facilitating the project.LT
would like to thank Paloma de la Peña, Amy Way and Christian
Hoggard for their continued collaboration. We would also like to
thank Todd VanPool and an anonymous reviewer for their constructive
feedback during the review of this article.
Author contribution All authors contributed to the study conception
and design. Funding was acquired by LT. CS and LT performed the
material preparation. Data collection was carried out by HM, MQ, BM,
CM, BH, YT, SH, KE and LT, under the supervision of EM, WB, KP
and MG. LT and MG performed data processing and analysis. The first
draft of the manuscript was written by LT and all authors commented
on previous versions of the manuscript. All authors read and approved
the final manuscript.
Funding This project was supported by funding awarded to LT by the
Leakey Foundation (Movement, interaction, and structure: modelling
population networks and cultural diversity in the African Middle Stone
Age), the Wenner Gren Foundation (Gr. 10157) and the Lithic Studies
Society (Jacobi Bursary, 2020).
Data availability All data and R code can be found on the project’s
repository and was made available for the peer-review of this article:
https:// github. com/ lucyt imbre ll/ error_ analy sis_ lithi cs/.
Declarations
Ethics approval Not applicable.
Consent to participate Not applicable.
Consent for publication Not applicable.
Conflicts of interest The authors declare no competing interest.
Open Access This article is licensed under a Creative Commons Attri-
bution 4.0 International License, which permits use, sharing, adapta-
tion, distribution and reproduction in any medium or format, as long
as you give appropriate credit to the original author(s) and the source,
provide a link to the Creative Commons licence, and indicate if changes
were made. The images or other third party material in this article are
included in the article's Creative Commons licence, unless indicated
otherwise in a credit line to the material. If material is not included in
the article's Creative Commons licence and your intended use is not
permitted by statutory regulation or exceeds the permitted use, you will
need to obtain permission directly from the copyright holder. To view a
copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.
References
Ackermann RR (2019) Reflections on the history and legacy of sci-
entific racism in South African paleoanthropology and beyond.
J Hum Evol 126:106–111. https:// doi. org/ 10. 1016/j. jhevol. 2018.
11. 007
Athreya S, Ackermann RR (2019) Colonialism and narratives of human
origins in Asia and Africa. In: Porr M, Matthews JM (eds) inter-
rogating human origins: decolonisation and the deep past. Rout-
ledge, London. https:// doi. org/ 10. 4324/ 97802 03731 659-4
Bartlett JW, Frost C (2008) Reliability, repeatability and reproduc-
ibility: analysis of measurement errors in continuous variables.
Ultrasound Obstet Gynecol 31:466–475. https:// doi. org/ 10. 1002/
uog. 5256
Bonhomme V, Picq S, Gaucherel C, Claude J (2014) Momocs: outline
analysis using R. J Stat Softw 56:1–24. https:// doi. org/ 10. 18637/
jss. v056. i13
Bonhomme V, Forster E, Wallace M, Stillman E, Charles M, Jones
G (2017) Identification of inter- and intra-species variation in
cereal grains through geometric morphometric analysis, and its
resilience under experimental charring. J Archaeol Sci 86:60–67.
https:// doi. org/ 10. 1016/j. jas. 2017. 09. 010
Bookstein FL (1991) Morphometric tools for landmark data: geometry
and biology. Cambridge University Press, Cambridge
Bordes F (1961) Typologie du Paléolithique Ancien et Moyen.
Bordeaux
Buchanan B, Andrews B, O’Brien M, Eren MI (2018) An assessment
of stone weapon tip standardization during the Clovis-Folsom
Page 13 of 15 209
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Archaeological and Anthropological Sciences (2022) 14:209
1 3
transition in the Western United States. Am Antiq 83:721–734.
https:// doi. org/ 10. 1017/ aaq. 2018. 53
Caple J, Byrd J, Stephan CN (2017) Elliptical fourier analysis:
fundamentals, applications, and value for forensic anthropol-
ogy. Int J Leg Med 131:1675–1690. https:// doi. org/ 10. 1007/
s00414- 017- 1555-0
Caple J, Byrd J, Stephan CN (2018) The utility of elliptical Fourier
analysis for estimating ancestry and sex from lateral skull photo-
graphs. Forensic Sci Int 289:352–362. https:// doi. org/ 10. 1016/j.
forsc iint. 2018. 06. 009
Cardillo M (2010) Some applications of geometric morphometrics to
archaeology. In: Elewa AMT (ed.) morphometrics for nonmorpho-
metricians. Springer, Berlin, Heidelberg. 325–341. https:// doi. org/
10. 1007/ 978-3- 540- 95853-6_ 15
Chang J, Alfaro ME (2015) Crowdsourced geometric morphometrics
enable rapid large-scale collection and analysis of phenotypic
data. Methods Eco Evol 7(4):472–482. https:// doi. org/ 10. 1111/
2041- 210X. 12508
Chirikure S (2015) “Do as I Say and Not as I Do”. On the gap
between good ethics and reality in african archaeology. In:
Haber A, Shepherd N (eds.) after ethics. Ethical Archaeolo-
gies: The Politics of Social Justice, vol 3. Springer, New York.
https:// doi. org/ 10. 1007/ 978-1- 4939- 1689-4_3.
Daboul A, Ivanovska T, Bülow R, Biffar R, Cardini A (2018) Pro-
crustes-based geometric morphometrics on MRI images: an
example of inter-operator bias in 3D landmarks and its impact
on big datasets. PLoS ONE 13(5):e0197675. https:// doi. org/ 10.
1371/ journ al. pone. 01976 75
Douglass K (2020) Amy ty lilin-draza’ay: building archaeologi-
cal practice on principles of community. Afr Archaeol Rev
37(481):485. https:// doi. org/ 10. 1007/ s10437- 020- 09404-8
Else, H. (2022) African researchers lead campaign for equity in
global collaborations. Nature News. https:// www. nature. com/
artic les/ d41586- 022- 01604-3
Evin A, Bonhomme V, Claude J (2020) Optimizing digitalization
effort in morphometrics. Biol Methods Protoc 5(1):bpaa023.
https:// doi. org/ 10. 1093/ biome thods/ bpaa0 23
Fagertun J, Harder S, Rosengren A, Moeller C, Werge T, Paulsen
RR, Hansen TF (2014) 3D facial landmarks: inter-operator vari-
ability of manual annotation. BMC Med Imaging 14:35. https://
doi. org/ 10. 1186/ 1471- 2342- 14- 35
Fruciano C (2016) Measurement error in geometric morphomet-
rics. Dev Genes Evol 226:139–158. https:// doi. org/ 10. 1007/
s00427- 016- 0537-4
Inizan ML, Reduron-Ballinger M, Roche H, Tixier J (1999) Technol-
ogy and terminology of knapped stone. Cercle de recherches et
d'études préhistoriques
Iovita R (2009) Ontogenetic scaling and lithic systematics: method
and application. J Archaeol Sci 36(7):1447–1457. https:// doi.
org/ 10. 1016/j. jas. 2009. 02. 008
Iovita R (2011) Shape variation in Aterian tanged tools and the ori-
gins of projectile technology: a morphometric perspective on
stone tool function. PLoS ONE 6(12):e29029. https:// doi. org/
10. 1371/ journ al. pone. 00290 29
Ivanovaitė L, Serwatka K, Hoggard CS, Sauer F, Riede F (2020)
All these fantastic cultures? Research history and regionaliza-
tion in the Late Palaeolithic tanged point cultures of Eastern
Europe. Eur J Archaeol 23(2):162–185. https:// doi. org/ 10. 1017/
eaa. 2019. 59
Key A, Pargeter J, Schmidt P (2021) Heat treatment significantly
increases the sharpness of silcrete stone tools. Archaeometry
63(3):447–466. https:// doi. org/ 10. 1111/ arcm. 12619
Klingenberg C (2008) Novelty and ‘homology-free’ morphometrics:
what’s in a name? Evol Biol 35:186–190. https:// doi. org/ 10. 1007/
s11692- 008- 9029-4
Koo T, Li M (2016) A guideline of selecting and reporting intraclass
correlation coefficients for reliability research. J Chiropr Med
15:155–163. https:// doi. org/ 10. 1016/j. jcm. 2016. 02. 012
Lycett SJ (2009) Quantifying transitions: morphometric approaches to
Palaeolithic variability and technological change. In: Camps M,
Chauhan P (eds) sourcebook of paleolithic transitions: methods,
theories, and interpretations, Springer New York, pp 9–92. https://
doi. org/ 10. 1007/ 978-0- 387- 76487-0_5
Lycett SJ, von Cramon-Taubadel N (2015) Toward a “quantitative
genetic” approach to lithic variation. J Archaeol Method Theory
22:646–675. https:// doi. org/ 10. 1007/ s10816- 013- 9200-9
Lyman LL, VanPool TL (2009) Metric data in archaeology: a study of
intra-analyst and inter-analyst variation. Am Antiq 74(3):485–504.
https:// doi. org/ 10. 1017/ S0002 73160 00487 21
MacDonald DA, Royal K, Buchanan B (2020) Evaluating the effects
of parallax in archaeological geometric morphometric analy-
ses. Archaeol Anthropol Sci 12:149. https:// doi. org/ 10. 1007/
s12520- 020- 01111-4
Marwick B, Guedes JA, Barton CM etal (2017) Open science in
archaeology. SAA Archaeol Rec. https:// doi. org/ 10. 17605/ OSF.
IO/ 3D6XX
Matzig DN (2021) OutlineR: an R package to derive outline shapes
from (multiple) artefacts on JPEG images. Zenodo. https:// doi.
org/ 10. 5281/ ZENODO. 45274 69
Matzig DN, Hussain ST, Riede F (2021) Design space constraints
and the cultural taxonomy of european final palaeolithic large
tanged points: a comparison of typological, landmark-based
and whole-outline geometric morphometric approaches.
J Palaeolithic Archaeol 4(27). https:// doi. org/ 10. 1007/
s41982- 021- 00097-2
McNabb J (2017) Journeys in space and time Assessing the Link
between Acheulean Handaxes and Genetic Explanations. J
Archaeol Sci Rep 13:403
Menéndez LP (2017) Comparing methods to assess intraobserver
measurement error of 3D craniofacial landmarks using geo-
metric morphometrics through a digitizer arm. J Forensic Sci
62(3):741–746. https:// doi. org/ 10. 1111/ 1556- 4029. 13301
Mesfin I, Leplongeon A, Pleurdeau D, Borel A (2020) Using mor-
phometrics to reappraise old collections: the study case of the
Congo Basin Middle Stone Age bifacial industry. J Lithic Stud
7(1). https:// doi. org/ 10. 2218/ jls. 4329
Mitteroecker P (2021) Morphometrics in evolutionary developmen-
tal biology. In: de la Rosa LN, Muller GB (eds) evolutionary
development biology. Springer, Cham, pp 941–951. https:// doi.
org/ 10. 1007/ 978-3- 319- 32979-6_ 119
Mitteroecker P, Bookstein F (2011) Linear discrimination, ordina-
tion, and the visualization of selection gradients in modern mor-
phometrics. Evol Biol 38(1):100–144. https:// doi. org/ 10. 1007/
s11692- 011- 9109-8
Mullin SK, Taylor PJ (2002) The effects of parallax on geometric
morphometric data. Comput Biol Med 32(6):455–464. https://
doi. org/ 10. 1016/ S0010- 4825(02) 00037-9
O’Leary MA, Kaufman S (2011) MorphoBank: phylophenomics in
the ‘cloud.’ Cladistics 27(5):529–537. https:// doi. org/ 10. 1111/j.
1096- 0031. 2011. 00355.x
Okumura M, Araujo AGM (2018) Archaeology, biology, and bor-
rowing: a critical examination of geometric morphometrics in
archaeology. J Archaeol Sci 101:149–158. https:// doi. org/ 10.
1016/j. jas. 2017. 09. 015
Osis S, Hettinga B, Macdonald S, Ferber R (2015) A novel method
to evaluate error in anatomical marker placement using a modi-
fied generalized Procrustes analysis. Comput Methods Biomech
Biomed Eng 18:1108–1116. https:// doi. org/ 10. 1080/ 10255 842.
2013. 873034
209 Page 14 of 15
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Archaeological and Anthropological Sciences (2022) 14:209
1 3
Page JW (1976) A note on interobserver error in multivariate anal-
yses of populations. Am J Phys Anthropol 44(3):521–525.
https:// doi. org/ 10. 1002/ ajpa. 13304 40315
Perini TA, de Oliveira GL, Ornellas JDS, de Oliveira FP (2005)
Technical error of measurement in anthropmetry. Rev Bras Med
Esporte 11(1):86–90. https:// doi. org/ 10. 1590/ S1517- 86922
00500 01000 09
R Core Team (2020) R: a language and environment for statistical com-
puting. R Foundation for Statistical Computing, Vienna, Austria.
URL https:// www.R- proje ct. org/.
Riaño HC, Jaramillo N, Dujardin J-P (2009) Growth changes in Rhod-
nius pallescens under simulated domestic and sylvatic conditions.
Infect Genet Evol 9:162–168. https:// doi. org/ 10. 1016/j. meegid.
2008. 10. 009
Robinson C, Terhune CE (2017) Error in geometric morphometric
data collection: combining data from multiple sources. Am J
Phys Anthropol 164(1):62–75. https:// doi. org/ 10. 1002/ ajpa. 23257
Roe DA (1964) The British Lower and Middle Paleolithic: some prob-
lems, methods of study and preliminary results. Proceedings of
the Prehistoric Society.
Sahle Y (2021) Fossil men: the quest for the oldest skeleton and the
origins of humankind. Am J Phys Anthropol 176(2):340–341.
https:// doi. org/ 10. 1002/ ajpa. 24359
Sahle Y, Hutchings WK, Braun, etal (2013) Earliest stone-tipped pro-
jectiles from the Ethiopian Rift date to >279,000 years ago. PLoS
ONE 8(11):e78092. https:// doi. org/ 10. 1371/ journ al. pone. 00780 92
Scerri EML, Kühnert D, Blinkhorn J etal (2020) Field-based sciences
must transform in response to COVID-19. Nat Ecol Evol 4:1571–
1574. https:// doi. org/ 10. 1038/ s41559- 020- 01317-8
Schillinger K, Mesoudi A, Lycett S (2014) Copying error and the cul-
tural evolution of “Additive” vs. “Reductive” material traditions:
an experimental assessment. Am Antiq 79(1):128–143. https://
doi. org/ 10. 7183/ 0002- 7316. 79.1. 128
Serwatka K, Riede F (2016) 2D geometric morphometric analysis casts
doubt on the validity of large tanged points as cultural markers in
the European Final Palaeolithic. J Archaeol Sci Rep 9:150–159.
https:// doi. org/ 10. 1016/j. jasrep. 2016. 07. 018
Shea JJ (2020) Prehistoric stone tools of Eastern Africa: a guide. Cam-
bridge University Press
Shott MJ, Trail BW (2010) Exploring new approaches to lithic analy-
sis: laser scanning and geometric morphometrics. Lithic Technol
35(2):195–220. https:// doi. org/ 10. 1080/ 01977 261. 2010. 11721 090
Shrout PE, Fleiss JL (1979) Intraclass correlation: uses in assessing
rater reliability. Psychol Bull 86:420–428. https:// doi. org/ 10.
1037// 0033- 2909. 86.2. 420
Timbrell L (2020) Strength in numbers: combining old datasets to
answer new questions. In Kaercher K, Arntz M, Bomentre N,
Hermoso Buxán XL, Day K, Ki S, Macleod R, Muñoz Mojado
H, Timbrell L, Wisher I (eds) New Frontiers in Archaeology:
Proceedings of the Cambridge Annual Student Archaeology
Conference 2019, Archaeopress: Access Archaeology. ISBN
978–1–78969–794–0
Timbrell L (2022) A collaborative model for lithic shape digitization in
museum settings. Lithic Technol. https:// doi. org/ 10. 1080/ 01977
261. 2022. 20922 99
Ulijaszek SJ, Kerr DA (1999) Anthropometric measurement error and
the assessment of nutritional status. Br J Nutr 82:165–177. https://
doi. org/ 10. 1017/ s0007 11459 90013 48
von Cramon-Taubadel N, Frazier BC, Lahr MM (2007) The problem
of assessing landmark error in geometric morphometrics: theory,
methods, and modifications. Am J Phys Anthropol 134:24–35.
https:// doi. org/ 10. 1002/ ajpa. 20616
Wang L-Y, Marwick B (2020) Standardization of ceramic shape: a case
study of Iron Age pottery from northeastern Taiwan. J Archaeol
Sci Rep 33:102554. https:// doi. org/ 10. 1016/j. jasrep. 2020. 102554
Yezerinac SM, Lougheed SC, Handford P (1992) Measurement error
and morphometric studies: statistical power and observer expe-
rience. Systematic Biol 41(4):471–482. https:// doi. org/ 10. 1093/
sysbio/ 41.4. 471
Zelditch ML, Swiderski DL, Sheets DH, Fink WL (2004) Geometric
morphometrics for biologists: a primer. Academic Press
Zeng L, Zou X (2019) Error analysis and experimental research on 3D
printing. IOP Conf Ser Mater Sci Eng 592:012150
Publisher's note Springer Nature remains neutral with regard to
jurisdictional claims in published maps and institutional affiliations.
Page 15 of 15 209
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
1.
2.
3.
4.
5.
6.
Terms and Conditions
Springer Nature journal content, brought to you courtesy of Springer Nature Customer Service Center GmbH (“Springer Nature”).
Springer Nature supports a reasonable amount of sharing of research papers by authors, subscribers and authorised users (“Users”), for small-
scale personal, non-commercial use provided that all copyright, trade and service marks and other proprietary notices are maintained. By
accessing, sharing, receiving or otherwise using the Springer Nature journal content you agree to these terms of use (“Terms”). For these
purposes, Springer Nature considers academic use (by researchers and students) to be non-commercial.
These Terms are supplementary and will apply in addition to any applicable website terms and conditions, a relevant site licence or a personal
subscription. These Terms will prevail over any conflict or ambiguity with regards to the relevant terms, a site licence or a personal subscription
(to the extent of the conflict or ambiguity only). For Creative Commons-licensed articles, the terms of the Creative Commons license used will
apply.
We collect and use personal data to provide access to the Springer Nature journal content. We may also use these personal data internally within
ResearchGate and Springer Nature and as agreed share it, in an anonymised way, for purposes of tracking, analysis and reporting. We will not
otherwise disclose your personal data outside the ResearchGate or the Springer Nature group of companies unless we have your permission as
detailed in the Privacy Policy.
While Users may use the Springer Nature journal content for small scale, personal non-commercial use, it is important to note that Users may
not:
use such content for the purpose of providing other users with access on a regular or large scale basis or as a means to circumvent access
control;
use such content where to do so would be considered a criminal or statutory offence in any jurisdiction, or gives rise to civil liability, or is
otherwise unlawful;
falsely or misleadingly imply or suggest endorsement, approval , sponsorship, or association unless explicitly agreed to by Springer Nature in
writing;
use bots or other automated methods to access the content or redirect messages
override any security feature or exclusionary protocol; or
share the content in order to create substitute for Springer Nature products or services or a systematic database of Springer Nature journal
content.
In line with the restriction against commercial use, Springer Nature does not permit the creation of a product or service that creates revenue,
royalties, rent or income from our content or its inclusion as part of a paid for service or for other commercial gain. Springer Nature journal
content cannot be used for inter-library loans and librarians may not upload Springer Nature journal content on a large scale into their, or any
other, institutional repository.
These terms of use are reviewed regularly and may be amended at any time. Springer Nature is not obligated to publish any information or
content on this website and may remove it or features or functionality at our sole discretion, at any time with or without notice. Springer Nature
may revoke this licence to you at any time and remove access to any copies of the Springer Nature journal content which have been saved.
To the fullest extent permitted by law, Springer Nature makes no warranties, representations or guarantees to Users, either express or implied
with respect to the Springer nature journal content and all parties disclaim and waive any implied warranties or warranties imposed by law,
including merchantability or fitness for any particular purpose.
Please note that these rights do not automatically extend to content, data or other material published by Springer Nature that may be licensed
from third parties.
If you would like to use or distribute our Springer Nature journal content to a wider audience or on a regular basis or in any other manner not
expressly permitted by these Terms, please contact Springer Nature at
onlineservice@springernature.com
... Comparative research and replicability are key aspects in science and are becoming increasingly relevant for lithic analyses in the 21st century (e.g., 29,43 ). In conjunction with recent studies on the replicability of individual attributes on flakes 30,44,45 and considering that flakes are the most frequent and common lithic products, FLEXDIST allows for the systematic comparison of flake assemblages across and within sites, regions and even across continents if comparable attribute definitions and recording procedures are followed. These attribute data already exist for many Palaeolithic sites. ...
... allowing for easy access, method replicability and potential improvements. These are essential features for strengthening open science and collaborative approaches among researchers in archaeology and lithic analysis 29,45,[57][58][59][60][61] . To evaluate the wider applicability, impact and limits of FLEXDIST, additional studies will need to test the method on flake assemblages from other site sequences with more complex patterns of cultural change, different periods, site function, technological backgrounds and raw materials. ...
... Unretouched blades and bladelets could also be included in future studies, which may result in even clearer patterns of variation than identified here due to their marked shape differences. Limits to such studies lie in systematically recording multi-scale attribute data on flakes and the replicability of such observations when multiple analysts are involved 30,44,45 . 21st century lithic analysis attests to a shift towards recording more replicable quantitative data, pursuing collaborative research and applying multivariate statistics to large (open) datasets, an approach which we follow here as well. ...
Article
Full-text available
Lithic artefacts provide the principal means to study cultural change in the deep human past. Tools and cores have been the focus of much prior research based on their perceived information content and cultural relevance. Unretouched flakes rarely attract comparable attention in archaeological studies, despite being the most abundant assemblage elements and featuring prominently in ethnographic and experimental work. Here, we examine the potential of flake morphology for tracing cultural change utilising 4,512 flakes, each characterised by 16 standard mixed-scale attributes, from a well-documented cultural sequence at the Middle Stone Age site of Sibhudu, South Africa. We quantified multivariate similarities among flakes using FLEXDIST, a highly versatile method capable of handling mixed, correlated, incomplete, and high-dimensional data. Our findings reveal a significant gradual change in flake morphology that aligns with the documented cultural succession at Sibhudu. Furthermore, our analysis provides new insights into the patterning of variability throughout the studied sequence. The demonstrated potential of flakes to track cultural change opens up additional avenues for comparative research due to their ubiquity, the availability of commonly recorded attributes, and especially in the absence of cores or tools. FLEXDIST, with its versatile applicability to complex lithic datasets, holds particular promise in this regard.
... To study the structure of eastern African MSA point variability, artefact samples were accessed via a collaborative data collection framework (Timbrell 2022;Timbrell et al. 2022c) with the National Museums of Kenya and the National Museum of Ethiopia. Table 1, Figure 1 and Supplementary Figure S1 describes the sample, which includes artefacts from both dated and undated layers, with date ranges rounded to the nearest 1000 years. ...
... We also performed two-dimensional geometric morphometrics on the photographs of the points taken via collaborative data collection as outlined and validated in Timbrell (2022) and Timbrell et al. (2022c). The protocols optimised the photographs for outline-based GMM, including the use of a scale and minimising of shadows around the object. ...
... To study the structure of eastern African MSA point variability, artefact samples were accessed via a collaborative data collection framework (Timbrell 2022;Timbrell et al. 2022c) with the National Museums of Kenya and the National Museum of Ethiopia. Table 1, Figure 1 and Supplementary Figure S1 describes the sample, which includes artefacts from both dated and undated layers, with date ranges rounded to the nearest 1000 years. ...
... We also performed two-dimensional geometric morphometrics on the photographs of the points taken via collaborative data collection as outlined and validated in Timbrell (2022) and Timbrell et al. (2022c). The protocols optimised the photographs for outline-based GMM, including the use of a scale and minimising of shadows around the object. ...
Article
Full-text available
Stone points are one of the key features used to define the African Middle Stone Age (MSA). Regional patterns in their shape and size through time have been thought to reflect inter-group interactions and networks of populations and are used to define cultural phases within the MSA. However, eastern Africa does not have distinctive and widely applied chrono-stratigraphic point variants that divide its MSA record, which is often described as being highly variable. This paper presents a metric and geometric morphometric analysis of eastern African MSA points and evaluates potential drivers of variation in them in relation to null models of isolation by distance, time and environment. Approximately half of the shape variance in our sample can be explained by spatial, temporal and environmental differences, as well as by size, indicating a degree of demographic continuity through sustained cultural transmission. A portion of the remaining variance likely represents stylistic differences between assemblages, which are often the subject of interest in archaeological studies. The highly variable nature of the eastern African MSA may reflect the region’s refugial positioning within the continent, with point technology a flexible adaptive system that was dynamically employed across Africa during the MSA depending on varying social and ecological contexts, resulting in the appearance of both ‘generic’ and ‘specific’ tool forms at particular times and places. RÉSUMÉ Les pointes de pierre sont l’une des caractéristiques-clé utilisées pour définir l’Âge de Pierre Moyen africain (Middle Stone Age, MSA). Il a été proposé que les distributions régionales dans les forme et les grandeurs au cours du temps reflètent les interactions entre groupes et les réseaux de populations, et ces distributions sont utilisées pour définir les phases culturelles au sein du MSA. Cependant, l’Afrique de l’Est ne possède pas de variantes chrono-stratigraphiques dans les pointes qui soient distinctives et largement appliquées et qui segmentent les données du MSA. Du reste ce registre du MSA est souvent décrit comme étant très variable. Cet article présente une analyse morphométrique métrique et géométrique des pointes MSA d’Afrique de l’Est, et évalue les facteurs potentiels de leur variation par rapport aux modèles nuls d’isolement en fonction de la distance, du temps et de l’environnement. À peu près la moitié de la variance de forme dans notre échantillon peut s’expliquer par des différences spatiales, temporelles et environnementales, ainsi que par la grandeur, indiquant un degré de continuité démographique à travers une transmission culturelle soutenue. Une partie de la variance restante représente probablement des différences stylistiques entre assemblages, qui sont souvent un objet d’intérêt pour les études archéologiques. La nature très variable du MSA est-africain peut refléter un positionnement de la région en tant que refuge au sein du continent. La technologie des pointes représenterait un système adapt if flexible qui fut utilisé de manière dynamique à travers l’Afrique pendant le MSA en fonction de divers contextes sociaux et écologiques, entraînant l’apparition de formes d’outils ‘génériques’ et ‘spécifiques’ à des moments et des lieux particuliers.
... The intraclass correlation coefficient (ICC) was computed (with PCA scores in the outlines approach and with Procrustes coordinates in the landmarks approach) using the "psych" R package (Revelle 2021) to assess the agreement between the two datasets. The ICC compares the variability within repeat measurements while contrasting variability between groups of measurements (Timbrell et al. 2022). ...
Article
Full-text available
In this study, we compared the efficacy of geometric morphometric techniques, including outlines and landmark-based approaches, to support the differentiation of Trichodina bellottii from three co-occurring killifish species. Both methods were able to differentiate trichodinids from different host species. However, discriminat analyses and MANOVA results based on landmarks had greater accuracy possibly because these analyses only provide information on certain points defined by the researcher, while the analyses based on outlines take into account points with less taxonomic information.
... Von Cramon-Taubadel et al., 2007;Kaufman and Rosenthal, 2009;Fruciano, 2016;Robinson and Terhune, 2017;Verheyen et al., 2018), 14 C dating (e.g. Scott et al., 2022), and is also recognized, on a smaller scale, in archaeology (Fisch, 1978;Newcomer et al., 1986;Gobalet, 2001;Lyman and VanPool, 2009;Shahack-Gross, 2016;Skals et al., 2018;Timbrell et al., 2022). Although the issue of possible inter-and intra-observer variation affecting the accuracy and repeatability of morphometric studies is recognized in archaeobotany (Jacomet, 2013;Steiner et al., 2015, Evin et al., 2020Antolin, 2022;Roushannafas et al., 2023) and phytolith analysis (Ball et al., 2016b(Ball et al., , 2017Evett and Cuthrell, 2016;Díez-Pastor et al., 2020;Out, 2020), this variation has not yet been studied systematically in phytolith morphometry. ...
Article
Background: Archaeobotanists and palaeoecologists extensively use geometric morphometrics to identify plant opal phytoliths. Particularly when applied to assemblages of phytoliths from concentrations retrieved from closed contexts, morphometric data from archaeological phytoliths compared with similar data from reference material may allow taxonomic attribution. Observer variation is one aspect of phytolith morphometry that has received little attention but may be an important source of error, and hence cause of potential misidentification of plant remains. Scope: To investigate inter- and intra-observer variation in phytolith morphometry, eight researchers (observers) from different laboratories measured 50 samples each from three phytolith morphotypes, Bilobate, Bulliform flabellate and Elongate dendritic, three times, under the auspices of the International Committee for Phytolith Morphometrics (ICPM). Methods: Data for 17 size and shape variables were collected for each phytolith by manually digitising a phytolith outline (mask) from a photograph, followed by measurement of the mask with open-source morphometric software. Key results: Inter-observer variation ranged from 0 to 23% difference from the mean of all observers. Intra-observer variation ranged from 0 to 9% difference from the mean of individual observers per week. Inter- and intra-observer variation was generally higher among inexperienced researchers. Conclusions: Scaling errors were a major cause of variation and occurred more with less experienced researchers, which is likely related to familiarity with data collection. The results indicate that inter- and intra-observer variation can be substantially reduced by providing clear instructions for and training with the equipment, photo capturing, software, data collection and data cleaning. In this paper, the ICPM provides recommendations to minimise variation. Advances in automatic data collection may eventually reduce inter- and intra-observer variation, but until this is common practice, the ICPM recommends that phytolith morphometric analyses adhere to standardised guidelines to assure that measured phytolith variables are accurate, consistent and comparable between different researchers and laboratories.
... Alternatively, the use of quantitative measurements allows continuous morphological variation among artefacts to be described and compared on an objective numerical scale. While a quantitative approach itself is not free of inter-observer variation (Pargeter et al., 2023a), measurement error can be more easily gauged and managed through explicit definitions and protocols (Gnaden and Holdaway, 2000;Timbrell et al., 2022). Traditionally, quantification in stone artefact analysis tends to be limited to linear metric (e.g., length, width, thickness) and simple meristic (i.e., scar count). ...
Article
Full-text available
In stone artefact studies, researchers often rely on qualitative classifications to describe flake scar arrangements on cores. While this approach provides a broad overview of core reduction patterns, its application can be ambiguous due to the three-dimensional complexities of core geometry and the subjective nature of qualitative classifications, making it challenging to objectively compare flake scar patterning across different analytical settings. In this study, we present a new approach to quantify one aspect of flake scar arrangement on cores: the three-dimensional orientation of core scar negatives. Using standardised digital and experimentally flintknapped cores, we demonstrate that statistical techniques from fabric analysis can quantitatively characterise the scar orientation profile of cores. Importantly, this method is able to reveal variations in the flake scar arrangements of informal cores, such as multiplatform cores. When applied to a sample of multiplatform cores from the Homo floresiensis type-site of Liang Bua in Indonesia, we identify differences in flake scar orientation between cores made by Homo floresiensis and those manufactured by modern humans who utilised the site after the disappearance of the extinct hominin. This finding suggests a possible divergence in stone knapping practices between the two hominin taxa at Liang Bua. Overall, our research provides a new quantitative approach to gain new insights into hominin technological behaviour through stone artefact analysis. It also highlights the potential of 3D analysis for advancing the field of archaeological lithic research.
... Future research should address this compositional complexity. In sum, our work aligns with approaches to lithic tool shape and technological analyses that-especially if integrated with quantitative and replicable protocols-promise an understanding of material culture change beyond typological categories and traditionally named units [195][196][197][198][199]. ...
Article
Full-text available
Archaeological systematics, together with spatial and chronological information, are commonly used to infer cultural evolutionary dynamics in the past. For the study of the Palaeolithic, and particularly the European Final Palaeolithic and earliest Mesolithic, proposed changes in material culture are often interpreted as reflecting historical processes, migration, or cultural adaptation to climate change and resource availability. Yet, cultural taxonomic practice is known to be variable across research history and academic traditions, and few large-scale replicable analyses across such traditions have been undertaken. Drawing on recent developments in computational archaeology, we here present a data-driven assessment of the existing Final Palaeolithic/earliest Mesolithic cultural taxonomy in Europe. Our dataset consists of a large expert-sourced compendium of key sites, lithic toolkit composition, blade and bladelet production technology, as well as lithic armatures. The dataset comprises 16 regions and 86 individually named archaeological taxa (‘cultures’), covering the period between ca. 15,000 and 11,000 years ago (cal BP). Using these data, we use geometric morphometric and multivariate statistical techniques to explore to what extent the dynamics observed in different lithic data domains (toolkits, technologies, armature shapes) correspond to each other and to the culture-historical relations of taxonomic units implied by traditional naming practice. Our analyses support the widespread conception that some dimensions of material culture became more diverse towards the end of the Pleistocene and the very beginning of the Holocene. At the same time, cultural taxonomic unit coherence and efficacy appear variable, leading us to explore potential biases introduced by regional research traditions, inter-analyst variation, and the role of disjunct macroevolutionary processes. In discussing the implications of these findings for narratives of cultural change and diversification across the Pleistocene-Holocene transition, we emphasize the increasing need for cooperative research and systematic archaeological analyses that reach across research traditions.
... Furthermore, the 2D outline data derives from legacy sources and different production styles may impact accuracy 62 . Drawings and photos have, however, been shown to provide reliable information for geometric morphometric analysis 40,63 , lending confidence that these data can be used for comparative inter-regional analyses. Notably, the acquisition of additional outline data for lithic artefacts is straightforward, and we hope that users will complement the dataset 19 at hand with further specimens. ...
Article
Full-text available
Comparative macro-archaeological investigations of the human deep past rely on the availability of unified, quality-checked datasets integrating different layers of observation. Information on the durable and ubiquitous record of Paleolithic stone artefacts and technological choices are especially pertinent to this endeavour. We here present a large expert-sourced collaborative dataset for the study of stone tool technology and artefact shape evolution across Europe between ~15.000 and 11.000 years before present. the dataset contains a compendium of key sites from the study period, and data on lithic technology and toolkit composition at the level of the cultural taxa represented by those sites. The dataset further encompasses 2D shapes of selected lithic artefact groups (armatures, endscrapers, and borers/perforators) shared between cultural taxa. These data offer novel possibilities to explore between-regional patterns of material culture change to reveal scale-dependent processes of long-term technological evolution in mobile hunter-gatherer societies at the end of the Pleistocene. Our dataset facilitates state-of-the-art quantitative analyses and showcases the benefits of collaborative data collation and synthesis.
... Based on its flexibility and relative affordability, multi-image photogrammetry has become an increasingly useful analytical tool within archaeology (Magnani et al. 2020) with Close-Range Photogrammetry being used for lithic analysis (e.g. Caricola et al. 2018;Porter et al. 2016;Porter 2019;Collins et al. 2019;Bennett 2021;Timbrell et al. 2022). ...
Article
This paper will present initial results from excavations at Maritime Academy, Frindsbury which produced several handaxes, two of which can be classed as ‘giant handaxes’. Artefacts were recovered from fluvial deposits in the Medway Valley and are thought to date from the Marine Isotope Stage 9 interglacial. This paper will focus on the largest of these handaxes and will present metrical data for the artefact and initial comparison with similar artefacts from the British Palaeolithic.
... On their part, Way et al. (2022) analysed the increasing complexity of the industries of the MSA combining the use of GM and paleoenvironmental data. Several other studies during this last year alone, including the integration or GM with other analytical techniques (Moreno et al., 2022), different approaches to research protocols (Timbrell et al., 2022) or even analyses on bone arrowheads (Tsirintoulaki et al., 2023), are proof of the increasing use of the method. Finally, and to conclude this short review, there are essentially two ways to approach the Fourier analysis, either by using landmarks or by using outlines. ...
Article
Full-text available
The COVID-19 pandemic halted scientific research across the world, revealing the vulnerabilities of field-based disciplines to disruption. To ensure resilience in the face of future emergencies, archaeology needs to be more sustainable with international collaboration at the forefront. This article presents a collaborative data collection model for documenting lithics using digital photography and physical measurements taken in-situ by local collaborators. Data capture protocols to optimise standardisation are outlined, and guidelines are provided for data curation, storage and sharing. Adopting collaborative research strategies can have long-term advantages beyond the COVID-19 pandemic, by encouraging knowledge-sharing between international collaborators, decreasing emissions associated with archaeological research, and improving accessibility for those who are not able to travel for access to international samples. This article proposes that archaeology should use the COVID-19 pandemic as a catalyst for change through encouraging deeper collaborations and the development of remote models of science as a complement to in-person research.
Article
Full-text available
The identification of material culture variability remains an important goal in archaeology, as such variability is commonly coupled with interpretations of cultural transmission and adaptation. While most archaeological cultures are defined on the basis of typology and research tradition, cultural evolutionary reasoning combined with computer-aided methods such as geometric morphometrics (GMM) can shed new light on the validity of many such entrenched groupings, especially in regard to European Upper Palaeolithic projectile points and their classification. Little methodological consistency, however, makes it difficult to compare the conclusions of such studies. Here, we present an effort towards a benchmarked, case-transferrable toolkit that comparatively explores relevant techniques centred on outline-based GMM. First, we re-analyse two previously conducted landmark-based analyses of stone artefacts using our whole-outline approach, demonstrating that outlines can offer an efficient and reliable alternative. We then show how a careful application of clustering algorithms to GMM outline data is able to successfully discriminate between distinctive tool shapes and suggest that such data can also be used to infer cultural evolutionary histories matching already observed typo-chronological patterns. Building on this baseline work, we apply the same methods to a dataset of large tanged points from the European Final Palaeolithic (ca. 15,000–11,000 cal BP). Exploratively comparing the structure of design space within and between the datasets analysed here, our results indicate that Final Palaeolithic tanged point shapes do not fall into meaningful regional or cultural evolutionary groupings but exhibit an internal outline variance comparable to spatiotemporally much closer confined artefact groups of post-Palaeolithic age. We discuss these contrasting results in relation to the architecture of lithic tool design spaces and technological differences in blank production and tool manufacture.
Article
Full-text available
Quantifying phenotypes is a common practice for addressing questions regarding morphological variation. The time dedicated to data acquisition can vary greatly depending on methods and on the required quantity of information. Optimizing digitization effort can be done either by pooling datasets among users, by automatizing data collection, or by reducing the number of measurements. Pooling datasets among users is not without risk since potential errors arising from multiple operators in data acquisition prevents combining morphometric datasets. We present an analytical workflow to estimate within and among operator biases and to assess whether morphometric datasets can be pooled. We show that pooling and sharing data requires careful examination of the errors occurring during data acquisition, that the choice of morphometric approach influences amount of error, and that in some cases pooling data should be avoided. The demonstration is based on a worked example (Sus scrofa teeth) using a combinations of 18 morphometric approaches and datasets for which we identified and quantified several potential sources of errors in the workflow. We show that it is possible to estimate the analytical power of a study using a small subset of data to select the best morphometric protocol and to optimize the number of variables necessary for analysis. In particular, we focus on semi-landmarks, which often produce an inflation of variables in contrast to the number of available observations use in statistical testing. We show how the workflow can be used for optimizing digitization efforts and provide recommendations for best practices in error management.
Chapter
Full-text available
Student-led archaeological research is exceedingly valuable to both the student and their affiliated universities or museums. For the students themselves, it provides an education beyond that received in the classroom by allowing them to gain experience with methodologies, protocols and the publishing process. In addition, student-led research projects frequently make significant contributions to science and can further our knowledge about specific topics within the field, increasing the impact of the research group and facilitating a deeper understanding of artefacts, samples or sites. That being said, the scope of research accessible to students is somewhat limited, mainly by time constraints and a lack of resources and/or funding available. An exciting way that students can now overcome such issues is through accessing and analysing open-access data: data that have been made publicly available online at no or very little cost. In particular, aggregating open-access data to form large datasets can be an extremely effective, yet relatively easy and affordable, way for students to answer new, exciting and often interdisciplinary questions with real data rather than 'simulated' examples. In this section, we will explore how student archaeologists can benefit from open science, how the aggregation and re-evaluation of existing data can be a fruitful avenue for research suitable for students and the limitations of archaeological data science within student-led research.
Article
Full-text available
Humans were regularly heat‐treating stone tool raw materials as early as 130 thousand years ago. The late Middle Stone Age (MSA) and Late Stone Age (LSA) of South Africa’s Western Cape region provides some of the earliest and most pervasive archaeological evidence for this behaviour. While archaeologists are beginning to understand the flaking implications of raw material heat treatment, its potential functional benefits remain unanswered. Using silcrete from the Western Cape region, we investigate the impact of heat treatment on stone tool cutting performance. We quantify the sharpness of silcrete in its natural, unheated form, before comparing it with silcrete heated in three different conditions. Results show that heat‐treated silcrete can be significantly sharper than unheated alternatives, with cutting forces halving and energy requirements reducing by approximately two thirds. Our data suggest that silcrete may have been heat treated during the South African MSA and LSA to increase the sharpness and performance of stone cutting edges. This early example of material engineering has implications for understanding Stone Age populations’ technological capabilities, inventiveness, and raw material choices. We predict that heat treatment behaviours in other prehistoric and ethnographic contexts may also be linked to edge sharpness increases and functional performance concerns.
Article
Cape Town statement on research partnerships between the global north and south will highlight unethical practices and offer advice to scientists. Cape Town statement on research partnerships between the global north and south will highlight unethical practices and offer advice to scientists. Credit: University of Cape Town Professor Thuli Madonsela delivering the Steneck Mayer Lecture at the 7th World Conference on Research Integrity. Professor Thuli Madonsela delivering the Steneck Mayer Lecture at the 7th World Conference on Research Integrity.
Article
Two decades after Jon Kalb's memoir Adventures in the Bone Trade, this book offers yet another glimpse into the many faces of paleoan-thropological research in Ethiopia's Afar region and beyond. One might wonder from the beginning why the main protagonist would participate in a genre he previously characterized as a concoction of "science, adventure, intrigue, hearsay and travelogue with the discovery and analysis of human ancestors" (White, 2001, p. 517). Is it perhaps because the science here is meticulously presented, and adequate rival views entertained, by an independent journalist-turned-author? Investigative, critical, and sweeping, Fossil Men is a remarkably engaging disclosure of the inner workings of paleoanthro-pology and its major stakeholders.
Article
The emergence of ceramic specialization in past societies is often linked to shifts in the complexity of social structures, because standardized ceramic production can reflect craft specialization and the presence of elite control. Previous work on identifying specialization relies on typological or linear metric analysis. Here we demonstrate how to investigate ceramic standardization by analyzing outlines of ceramic vessels. Outline analysis is useful because, unlike more commonly-used landmark analysis methods, it can effectively quantify shape differences for objects that lack distinctive measurement points needed for landmark analysis. We demonstrate this method using pottery from Kiwulan, a large multi-component Iron Age site (CE 1350–1850) in northeastern Taiwan. To measure ceramic specialization, we quantified pottery standardization by analyzing shape variables with reproducible geometric morphometric methods. We computed coefficients of variation (CVs) for shape coefficients obtained by elliptical Fourier analysis to test for shape standardization. We found significant differences in pottery shape and shape standardization that indicate changes in pottery production resulting from contact with mainland Han Chinese groups in northeastern Taiwan. Our case study, which includes an openly available research compendium of R code, represents an innovative application of outline-based methods in geometric morphometry to answer the anthropological questions of craft specialization.
Article
The pandemic will allow us to fundamentally remodel the way field-based sciences are taught, conducted and funded — but only if we stop waiting for a ‘return to normal’.