Content uploaded by Aiora Zabala
All content in this area was uploaded by Aiora Zabala on Jun 09, 2019
Content may be subject to copyright.
Reference Module in Earth Systems and Environmental Sciences.
Comprehensive Geographic Information Systems. 2018, Pages 371-388
Comparing global spatial data on deforestation
for institutional analysis in Africa
(Message the author for the final version of the published PDF,
from here https://doi.org/10.1016/B978-0-12-409548-9.09681-0)
ABSTRACT. Accurate spatial data on deforestation is critical for social science research, for the
assessment of ecosystem services and for environmental policy such as REDD+ programs. In the last
few years a number of sources of big spatial data about tree and forest cover and cover change
estimates for the last decade have been made publicly available. These data provide an excellent basis
for continental and global scale analyses of drivers and solutions for deforestation, which were not
possible in the past due to the incomparability of data elaborated by different agencies and for
different regions. In this chapter, I compare sub-country tree and forest cover and deforestation rates
computed from these datasets at a continental extent for Africa. I illustrate the application of these data
by exploring the relation of deforestation with precolonial institutions on deforestation, protected areas
and road density, timely examples that highlight the relevance of encountering discrepant data sources.
I compare statistically the tabular results of deforestation obtained from each source. The estimates
from the three sources are consistent at the cover level, but remarkably divergent at the cover change
level. I discuss the implications of these differences and suggest potential causes for divergence.
2 Definitions.......................................................................................................................................... 5
2.1 Tree cover and forests: the reasons for a myriad definitions of forest..............................................5
2.2 Forest cover dynamics.....................................................................................................................6
3 Global spatial data sources on deforestation.......................................................................................7
4 Methodology for comparison............................................................................................................11
4.1 Processing of the spatial tree and forest data.................................................................................11
4.2 Institutions and additional variables..............................................................................................12
4.3 Zonal statistics...............................................................................................................................13
4.4 Analysis of the zonal statistics from each dataset..........................................................................16
5 Results and discussion......................................................................................................................17
Further Reading (optional)................................................................................................................... 29
Relevant Websites (optional)................................................................................................................29
3. big data
6. environmental policy
7. forest conservation
8. forest cover
9. global deforestation
12. precolonial societies
13. R statistical language
14. tree cover
15. tropical forests
FAO: Food and Agriculture Organization of the United Nations
GLCF: Global Land Cover Facility
ISCGM: International Steering Committee for Global Mapping
JAXA: Japan Aerospace Exploration Agency
Forest loss and degradation are some of the major environmental problems globally. They can cause
biodiversity loss, particularly in tropical latitudes, as well as soil degradation and increased CO2 levels
in the atmosphere. There are many drivers of deforestation and degradation and their relative
importance tends to be context dependent (Geist & Lambin 2002). While some deforestation occurs
due to urban expansion or to the construction of communication lines, the major drivers of
deforestation globally relate to agriculture in its broadest definition (Hosonuma et al. 2012; Geist &
Lambin 2002; DeFries et al. 2010). Drivers related to agriculture include large-scale industrial agro-
expansion for annual food or biofuel crops, tree plantations and small-scale diffuse deforestation
derived from the complexities of rural livelihoods. These drivers have been extensively modeled at
regional and national scales, but spatially-explicit understanding of forest changes at continental and
global scales has been limited by the lack of data beyond tabulated and country-wise statistics (e.g.
statistics from the UN Food and Agriculture Organization).
The need for accurate deforestation mapping for social and institutional research is evident in the
framework of global tree and forest conservation policies, such as Reducing emissions from
deforestation and forest degradation (REDD) and carbon capture projects under the UN Framework
Convention on Climate Change. The ultimate goal of these policies is the permanence of trees and
forests (Herzog et al. 2003), yet mapped evidence for this outcome is a gold standard which is rarely
provided (Macura et al. 2015). In addition, land cover maps influence importantly large-scale models
of the economy or of climate change (Mora et al. 2014), which are an important input for
environmental policy more broadly.
Estimating deforestation accurately within and beyond national boundaries is also important for
planning forest conservation and evaluating policy impact. Estimates at scales other than countries
enable the identification of socio-economic and political drivers of forest change that may be unrelated
to national boundaries, such as sub-national institutions, and thus inform forest governance at all
scales. Ideally, these rates should be comparable across countries, across administrative units within
countries, and across spatial units that may embody a potential driver of deforestation. For example,
the cultural and institutional context of different provinces within the same country may be mediating
distinct level of effectiveness of the same national policy. Or potential drivers of conservation that
span beyond national boundaries (such as protected areas) may be assessed only if the dependent
variable—such as forest change—is comparable across countries and continents. Conversely, the
impact of drivers of forest change that are governed at global level may be appropriately evaluated
only if the data is comparable across countries or basins.
While the importance of forest monitoring has been long recognized, the data available to quantify
deforestation across scales is highly variable (Goetz et al. 2015). Data is often commensurate with the
level of economic resources that a country invests in environmental monitoring. This entails that many
countries rich in natural resources do not have detailed data and thus monitoring via satellites of global
coverage is the best option for such contexts (Herold & Johns 2007).
Since the early 2010s, the open publication of large-coverage and high-resolution data on tree and
forest cover and cover change has revolutionized the potential to monitor the world’s forests (this is
epitomized in the Global Forest Watch, an on-line visualizing tool that combines data on forests with a
number of other spatial datasets). These data are the product of gargantuan efforts (the most known
data being those of Hansen et al. 2013, available from http//earthengine.google.org; see details in
section 3 and in ‘Relevant Websites’). This spatial information enables modeling deforestation over
large extensions and at boundaries different to those of countries. It has a wide range of social
applications, such as disaster management, or addressing issues in the fields of health, energy, climate,
water or desertification (Mora et al. 2014).
Thus far, these datasets have been used to understand forest change as a consequence of a variety of
factors, such as protected areas (Spracklen et al. 2015), road networks (Hu et al. 2016) and precolonial
institutions (Larcom et al. 2016). These tree and forest data allow researchers to evaluate the impact of
socio-economic and institutional factors on forest dynamics, at scales and with detail that were not
possible before. Nevertheless, this open-up of possibilities does not come without caveats regarding
accuracy and validity, and it is important to make an informed choice when using these data for
research and policy development. In order to help informing this choice, this study compares these big
This chapter presents a practical application of these recent ground-breaking spatial and high-
resolution data, to the study of institutions. Institutions are referred to as the social structures
established to regulate social interactions or interactions between society and the environment
(Hodgson 2006; North 1991). Institutions can be formal (such as laws, government structures or
property rights) or informal (such as traditions or non-written social rules; North 1991). The chapter is
based on the geographic analysis elaborated in Larcom et al. (2016). In the paper, the authors explore
the influence of sub-national institutions inherited from precolonial times in Africa on recent rates of
deforestation. In this study, institutions are operationalized as leadership succession rules in each
precolonial society, such as democratic rules or power transfer by inheritance.
The extensive research on human-related aspects of deforestation using Geographic Information
Systems (GIS) has had little application to understand the specific influence of institutions. With the
exception of Larcom et al. (2016) the few GIS applications to understand the role of institutions on
forest dynamics focus on protected areas and/or on single countries (such as in Gaveau et al. 2009;
Andam et al. 2008; Andersson & Gibson 2007). While only case studies or few case-based
comparative studies have been published thus far, institutional factors of interest can vary within and
In particular, this chapter compares a number of sources of global data of tree and forest cover and
cover change that can be used to investigate the role of institutions and for social research more
generally, it discusses the implications of diverging estimates of deforestation and provides
recommendations. In doing so, it also exemplifies an analytical procedure to process such big data.
This study is driven by two goals: a) to explain the practical application of big spatial datasets to the
estimation of rates of tree and forest cover and cover change, and b) to compare different sources of
global data available. The underlying hypothesis of the comparison is that, given that the data measure
natural processes that are narrowly related (tree cover and/or forest cover and at time ranges that are
relatively close in time), the deforestation rates they provide for aggregated spatial units (such as the
boundaries of precolonial societies) should be highly correlated. In other words, they should suggest
similar forest trends.
To date, no comparison of global datasets of tree/forest cover and/or cover change has been published.
A few studies compare global datasets on land cover more broadly in a selection of sites (Giri et al.
2005; Cabral et al. 2010; Bai et al. 2014). They all find noticeable discrepancies between the sources,
which are more severe in certain land uses. In particular, Cabral et al. (2010) report large discrepancies
in deforestation estimates when using satellite products at different resolutions. The strengths and, to a
lesser extent, the weaknesses of each data are discussed within the context of each source (e.g. Hansen
et al. 2013), but not comparatively. No empirical study has used more than one source either, with the
exception of Larcom et al. (2016) who used a second dataset for a robustness test.
The chapter continues to highlight key differences between tree and forest cover definitions of
relevance for spatial mapping. It explains the sources of data and the methodology used to process the
data, to produce additional variables, and to compare the sources. The results of the comparison of
sources and of the institutional exploration are presented and discussed, to finally conclude.
The data available to understand forest cover dynamics are provided in a variety forms, such as tree
cover percentage, area predominantly covered in trees or areas classified as forests in binary maps.
Different operationalizations of forest have distinct ecological and policy implications (Chazdon et al.
2016) and are subject to controversy (Sexton, Noojipady, Song, et al. 2015; Gilbert 2009). Thus it is
important to understand the nuances of alternative definitions.
The different datasets broadly classify into measurements of tree/forest cover and measurements of
change. The second level of classification is whether the data refer to tree cover or to forests. Tree
cover percentage is a natural phenomenon narrowly related with forests and that acts as a proxy, but it
is a different one. This entails that variations in tree cover percentages and in the derived forest cover
estimations do not necessarily reflect deforestation or reforestation. On one hand, regrowth of tree
cover may also be due to tree plantation, whereas elimination of trees may not necessary involve
permanent deforestation, for example, where elimination is due to natural events and where, in the
absence of human intervention, it is followed by forest regeneration.
2.1 Tree cover and forests: the reasons for a myriad definitions of forest
There are two broad types of measurement of forest cover: those that are provided in land use cover
classifications together with other classes, such as agricultural, urban, etc. and those that are specific
about forests and trees. This chapter focuses on the later and leaves aside land-use classification data.1
The data specific on forests and trees typically come in two forms: tree cover percentage and forest
cover. Data on tree cover percentage are assumed to be relatively objective; it is produced through
algorithms that are purely geophysical, and it does not require decisions about what is defined as
forest. The definition of forest, however, is less straightforward. Apart from clear cases of dense
primary canopy (forest) and, e.g., grassland (non-forest), the thresholds of tree cover percentage and
other variables that can be used to classify areas into forest and non-forest are not clear-cut; there are
myriad definitions (Fuller 2006). Lund (2014) identifies hundreds of different definitions of forest
found in government reports and scientific publications across the world. The variables typically
included in such definitions are the percentage of tree canopy, the minimum extension of an area
covered by tree cover percentage beyond a threshold, or whether the tree cover is a plantation or a
natural forest (either primary or secondary).
1 Global land cover data is available, for example from the European Space
Agency and its state of the art is discussed in Mora et al. (2014). It is argued that tree cover based
deforestation measures may be more sensitive than land use classifications to detect changes, which
can obscure significant changes in tree cover within the same land use (Hansen et al. 2013).
Defining forests is inconclusive for two reasons. One is the threshold of tree cover percentage that
may be considered a forest. This threshold varies depending on the source but, most importantly, it can
vary depending on the type of ecosystem. For example, forest in the tropics may be characterized by
very high tree densities, whereas in temperate zones, areas with much lower densities may be
considered open forests. Given the sheer diversity of tree-based ecosystems, it is by no means the
intention of this chapter to suggest that a consensus on such a threshold should be achieved. The point
is made for the reader to understand that, at a global scale, there is a trade-off between comparability
across regions and precision in the definition of forest. Comparability between large regions may be
desired in order to understand the role of drivers that vary at large scales. Precision about what is
forest and what is not may be desired when assessing the ecological value of a given land-use cover.
The second reason why defining forests using spatial data is not straightforward responds to the classic
statement “tree plantations are not forests” (in Rodríguez-Labajos & Martínez-Alier 2013). While
some satellite images may show that trees have regrown in an area where natural forest has been
recently cut down, this does not necessarily imply that forest has regrown. Primary forests may be cut
down to make space for monoculture tree plantations for agricultural produce, and these may not keep
critical ecological functions and services that characterize a natural forest. For example, large gains of
tree cover can be observed in certain regions in Malaysia (Hansen et al. 2013), but this may be due to
plantations where primary forest loss has occurred nevertheless, with subsequent loss of biodiversity.
In other cases, a void of primary forest may be soon replaced by opportunistic species and later by
secondary forest, in an ecological transition leading to the natural potential vegetation. In the latter
case, deforestation would not be so much of an environmental policy concern. Most large-scale maps
do not unfold such forest dynamics of high importance for governance, and the distinction between
types of tree cover using satellite images is still a challenging knowledge frontier.
Consequently, the state of the art in global spatial datasets to study forests dynamics is primarily as
follows: maps of tree cover percentage (e.g. Hansen et al. 2013), binary or ordinal classifications of
forests defined as areas with a minimum tree cover percentage and, sometimes, areas with minimum
tree cover percentage and of a minimum size (e.g. JAXA), and forest areas identified in contrast to
other land uses (e.g. Land Cover Maps from the European Space Agency).2 In the datasets that provide
binary or ordinal maps of forest cover, a definition of forest is already implemented. Besides
processing the original satellite imagery, only data on tree cover percentage allows researchers to
implement alternative definitions of forest.
2.2 Forest cover dynamics
Likewise, only data on tree cover percentage provided at two time points allow researchers to
implement alternative definitions of forest change. Similar to the lack of consensus on what
percentage of tree cover makes a forest, there is no unequivocal definition of what threshold signifies a
categorical change of status of a forest.
To understand forest dynamics, one may look at the four main processes that affect them.
Deforestation is the conversion of forest to other uses such as farming land, formally defined as a
reduction of 50% of the tree-cover (Hansen et al. 2013). Degradation refers to the partial elimination
of forest biomass, or a reduction of 30% of the tree-cover percentage (Couturier et al. 2012). In
2 More fine-grained classifications of types of forests have been done globally
(e.g. Ecoregions of the World, published by the World Wildlife Fund), however these do not track the
evolution of land covers across time.
contrast, reforestation and afforestation are the establishment of trees, the latter in an area not covered
in forest for a long time.
If the data available is not suitable to apply changes defined as variations in the percentage of tree
cover (e.g. reductions of 50%, as operationalised in Hansen et al. 2013), an alternative definition of
deforestation/reforestation is where an area trespasses a threshold of tree cover percentage or changes
its classification from forest to non-forest (as operationalised in Saatchi et al. 2011 and Archibald et al.
2011). This definition is more widely applicable, for example, with data provided in the form of binary
classifications of forest. It has the caveat that, for ecosystems where tree densities tend not to be much
higher than the threshold, small yearly variations imply that a lot of the land can switch to the other
side of the threshold without much change in the percentage of tree cover. For example, if using a
threshold of 30% of tree cover percentage, a pixel that reduces from 32% to 28% would be detected as
deforestation, while in reality it lost 4% of the tree cover percentage. This could result in large rates of
deforestation without much actual change on the ground and, conversely, in large changes gone
undetected (e.g. from 90% to 40%).
3 Global spatial data sources on deforestation
Efforts to map forests globally have been led for decades by FAO (e.g. Global Forest Resources
Assessment, 2010). The oldest of such available data goes back to years 1992-93, with a resolution of
1km (FAO 2001; DeFries et al. 2000). In terms of cover change, the oldest available data shows
differences between 1990-2000 (GSFC & GLCF 2014) with a resolution of 30m and, at the time of
writing this chapter, forest cover change for 1975-1990 at a resolution of 60m is being processed by
the same source.3 Other large-scale separate efforts to map forest cover change have focused on key
forests of the world, such as the Congo basin (Ernst et al. 2012) and the Brazilian Amazon (Skole &
Tucker 1993), or on specific types of forests (Bodart et al. 2013, on African dry ecosystems). When
trying to aggregate or compare these data between countries and regions, spatial data from different
sources may raise incomparability concerns due to differences in the acquisition of remote sensing
data, on the algorithms and definitions of forest cover applied and on the validation and calibration
In the last few years four major sources of spatial raster data have been made publicly available to
study global deforestation: from the University of Maryland (2000-2014; Hansen et al. 2013, hereafter
HG data),4 the Global Land Cover Facility Landsat Tree Cover and Forest Cover Change (1990-2010;
GSFC & GLCF 2014, hereafter GLCF data), the International Steering Committee on Global Mapping
(2003-2008; ISCGM et al. 2013, hereafter ISCGM data), and the Japanese Aerospace Exploration
Agency (2007-2010; Shimada et al. 2014, hereafter JAXA data). They provide the basis for
continental and global-scale analysis of deforestation, which was not possible in the past due to the
heterogeneity of spatial data on forest and tree cover elaborated by distinct agencies. Thus far, these
and other sources have made available one-off world maps of tree cover percentage (e.g. HG 2000,
3 According to http://landcover.org/research/portal/gfcc/products.shtml
(accessed in September 2016)
4 The HG data downloadable at Earthengine on forest dynamics is binary for
loss and gain only and considers a threshold of 50%. However, this data is more comprehensive and
can be explored at www.forestwatch.org, where the user can set the threshold of tree cover to define
forest and forest loss.
Table 1 Summary of open data on global tree and forest cover and on deforestation. See
‘Relevant websites’ for the links to these sources.
Title Source Year of map (not
The World's Forests 2000
FAO 1992-1996 Forest cover AVHRR 1km
Occurrence of Forest
FAO Possibly 1995 Tree cover
Unknown 5 arc-
The World's Forests 2010 /
Forests of the World
FAO 2010 Tree cover
Tree Cover Continuous
Global Land Cover
1992-1993 Tree cover
Landsat Tree Cover
Global Land Cover
2000 & 2005 Tree cover
Landsat Forest Cover
Global Land Cover
Landsat Forest Cover
Global Land Cover
Tree canopy cover for year
2000 Tree cover
Global forest cover loss
2000 - 2012
(updated to 2013
in v.1.1, and 2014
Global forest cover gain
2000 – 2012
(updated to 2013
in v.1.1, and 2014
Year of gross forest cover
loss event (lossyear)
2000 - 2012
(updated to 2013
in v.1.1, and 2014
Intact Forest Landscapes
2013 Forest/ non-
Intact Forest Landscapes
2000 Forest/ non-
Vegetation (Percent Tree
Committee for Global
2003 Tree cover
MODIS 30 arc-
Vegetation (Percent Tree
Committee for Global
2008 Tree cover
MODIS 15 arc-
Global 25m Resolution
Mosaic and Forest/Non-
2007, 2008, 2009,
2003 and 2008) and of forest cover (FAO 2010). Further projects have produced global data on tree or
forests at high resolutions, but these have not been made available (e.g. Crowther et al. 2015).
The data to estimate global forest cover that are publicly available is summarized in Table 1. The
spatial data on tree or forests that is freely available for researchers covers mostly the decade of 2000s
and the few datasets available for the 1990s have a much coarser resolution.
The main sources of these data are FAO, University of Maryland (involved in the HG data and in the
Global Land Cover Facility) and two Japanese agencies (involved in the data from ISCGM and
JAXA). The latest products have reached a resolution of circa 25-30m per pixel.
Tree cover data from ISCGM were derived from MODIS images for the years 2003 and 2008
(ISCGM, 2013). The tree cover rasters have a resolution of 30 and 15 arc-seconds respectively and
each pixel represents the percentage of canopy cover in a range from 0 to 100%. Deforestation data
from Hansen et al. (2013) were also derived from MODIS images for the period between 2000 and
2012. The rasters have a resolution of 30m. Each pixel in the forest loss and forest gain rasters
represent either loss and no loss, and gain and no gain. These values are based on changes in tree
cover higher than 50% for the period. This source also provides the baseline tree cover map for 2000,
where each pixel represents tree cover percentage from 0 to 100%. Forest cover data from JAXA is
based on ALOS-PALSAR mosaics, with a resolution of 25m (3.2 arcsec).
The reason why these global spatial products are scarce and refer to only a few years or periods, is that
producing these products requires vast effort from both computational and human perspectives. For
that reason, a researcher who wants to study cover and cover change may not have a choice for the
reference years, neither the data to study deforestation spatially before the 1990s. Further choice may
be possible only by pre-processing the original satellite data, which for Landsat goes back to 1972, and
this for continental or global scales requires capacity that standard social science research teams do not
typically have. The production of the HG data, for example, was possible thanks to the collaboration
of Hansen's team with Google Inc., that facilitated the computing capability necessary to process the
data in a reasonable amount of time (Hansen et al. 2013).
An assessment of the time coverage of these datasets (Figure 1) shows that none of the change data
cover exactly the same period, and none of the baseline map sources comprehend similar pairs of
years. This presents the first caveat for the comparison, which is that none of the data compared cover
the exact same years, and therefore variability due to differences among years can increase
divergences between sources.
Figure 1 Year range covered by each of the datasets. Thick continuous lines correspond to the
estimates included in this comparison.
The lower part of Figure 1 summarizes comparisons that have been already provided or that are
possible to perform. Here it is assumed that products from the same source and similar resolution are
sufficiently consistent to enable estimations of cover change from two cover maps; they would use the
same algorithms and definitions for cover and for cover change. The following sources have produced
more than one time stamps of their cover maps at similar resolutions: GLCF-Landsat, ISCGM and
JAXA (the FAO data is excluded due to its coarse resolution). Global rasters of tree cover percentage
are also available from other sources, however their comparability is uncertain due to being from
Three data sources are selected for the comparison in this study: ISCGM, HG and JAXA. These
provide spatial data suitable to assess deforestation, which requires either two time points (e.g.
ISCGM and JAXA) or the actual cover change (HG). For those sources where three or more time
points could be compared, the criteria to select the range is to have the widest range possible one while
optimizing overlap between sources, in order to avoid excessive influence of yearly variation in
deforestation. The following are the tree-cover/forest-cover change estimations to compare:
Some potential sources have been excluded from this comparison. The JAXA data for 2015 had
missing tiles in the overview of the area of interest at the time of inspection of the download
application (2016), and therefore it was considered incomplete. The GLCF Landsat Forest Cover
Change is also provided for the period 1990-2000, however the decade of the 1990s is not covered by
any of the other sources. The GLCF Landsat Forest Cover Change for 2000-2005 was finally excluded
from the comparison due to computational issues in the extraction of aggregated statistics, which may
be resolved in the future. The GLCF Landsat Tree Cover Continuous Fields is provided for the years
2000, 2005 and 2010, and this is left out of this article due to computational limits, but may well be
included in future comparisons, particularly to understand the influence of definition of forests upon
the divergences between sources. The HG data on loss has been recently updated to 2014. This is
reportedly more accurate but has been excluded from this analysis in order to optimize overlap
between the ranges covered by the difference data sources.
4 Methodology for comparison
The three sources of spatial data are compared and contrasted with variables representing institutions
(precolonial societies and contemporary protected areas) and with road density. One of the forest data
represent forest loss (HG), and two other provide tree or forest cover at two time points (ISCGM and
JAXA). This section explains the processing of these data in order to obtain deforestation rates for
each spatial unit (the precolonial societies), the sources of additional data as well as the procedure to
compare the rates estimated from each source.
4.1 Processing of the spatial tree and forest data
Processing large extensions covered by these big data products can take significant computer power,
therefore command-line software was used instead of software with graphical user interface (such as
ArcGIS or QGIS). The data was processed using R statistical language (R Core Team 2016), with
packages ‘raster’ and ‘rgdal’ (Bivand et al. 2016; Hijmans 2015), which provide an interface for the
Geospatial Data Abstraction Library (GDAL). QGIS (QGIS Development Team 2016) was used to
verify visually some actions performed in R. The steps for processing the data from each source is
summarized in Table 2.
Table 2 Processing of the global spatial big data on trees and forests. Two ticks indicate that the
process was performed twice, once for each time point in the same source.
Procedure / Data source Hansen-Google ISCGM JAXA
Recode values where necessary (water bodies etc.)
Aggregate to reduce resolution
Merge tiles in single raster
Zonal statistics by area
Subtraction of the zonal statistics to obtain change
For each dataset, each tile within the extent of mainland Africa and Madagascar was downloaded, pre-
processed and resized, to be saved with a smaller resolution. Where necessary, values for ‘no data’
were recoded to avoid inflated variability in the computation of cover and cover change rates. For
example, for the JAXA data, values for water bodies (coded as 3) and for no data (0) were recoded as
NA (missing value), and values for non-forest (2) where recoded as 0. As a result, the processed tiles
have either 0 for non-forest, 1 for forest, or NA. Therefore averaging the values included within a
given terrestrial area indicates the fraction of the area forested, from 0 to 1.
Each tile from HG and from JAXA was aggregated by an approximate factor of 15, which reduced
their resolution by using averages to assign new values. Resizing was necessary in order to make the
files from these sources manageable by a single computer for extracting the cover and cover change
rates, due to their high resolution. This change in resolution does not affect significantly the rates of
deforestation estimated by area, as was confirmed in prior tests with a small sample of areas with the
HG data. Downloading and resizing was automated in R. It is important to note that in this process of
downloading and resizing, it is necessary to manage the temporal files in the computer so that they do
not accumulate and fill excess space in the hard disk.
After processing, all the tiles within the extent of interest were merged into a single raster, in order to
compute the cover and cover change estimates.5
The ISCGM data is suitable for applying alternative definitions of forest and of deforestation because
it is provided as tree cover percentage for two time points.6 In order to calculate deforestation, tree
cover data was classified into forest and non-forest, using two thresholds: one of 50% of tree cover for
a pixel to be considered forest (Hosonuma et al. 2012), another of 30% (Mayaux et al. 2013). In
Africa, tree coverage lower than 30% is considered non-forested rural complexes (Mayaux et al. 2013)
and tree coverage from 30% to 100% comprises the range of forested covers across biomes in the
continent. Recent literature on forest mapping in Africa defines dense forest as those areas with tree
coverage higher than 70% (Bodart et al. 2013; Mayaux et al. 2013; Ernst et al. 2012), typically
rainforests. Areas with tree cover between 30-70% are open woodlands (Bodart et al. 2013), which
include a range of forested land covers such as mosaics of dense tree cover, areas with uniform
coverage of open forest, or patches of forests fragmented with other uses.
4.2 Institutions and additional variables
The spatial distribution of precolonial societies and their institutional categories is provided by Nunn
and Wantchekon (2011), who based their vectorial map on the work of Murdock (1967). Murdock
mapped the geographical distribution of African ethnic groups, and compiled quantitative measures of
various social, economic and institutional measures found around the colonization period. The
institutional categories in this case correspond to the rules for chief succession that were customary in
the given society: hereditary, democratic, by social standing or from above (Larcom et al. 2016). As
explained above, institutional forces may be formal, in the shape of laws, policies, programs, and well-
established governance institutions (North 1991). They may also be informal institutions, in the form
of social unwritten norms, cultural traditions, or communal habits (North 1991). The form of chief
succession would lie somewhere in between, and its impact on the governance of natural resources
derives from its relation to the capacity of the community to control corruption and to implement
regulation coming from higher scales, such as state-level law (Larcom et al. 2016).
The data on precolonial societies was intersected with country boundaries (e.g. the Ababda society
area was split into Ababda-Egypt and Ababda-Sudan) because in modeling the impact of institutions
on forest dynamics it is appropriate to control for country effects (see details in Larcom et al. 2016).
The polygons resulting from this intersection are the unit of analysis hereafter.
Two additional variables typically considered important to explain spatial patterns of forests and/or
deforestation are included: road density (based on CIESIN & ITOS 2013) and percentage of area
protected (based on IUCN & UNEP-WCMC 2014), which is a contemporary form of institution.
The spatial distribution of protected areas was obtained from Protected Planet (IUCN & UNEP-
WCMC 2014). The data were inspected for quality (both visually and by obtaining descriptive
5 An alternative would be to create a raster mosaic or raster catalog. However,
the resizing of tiles produces a single raster that is manageable for subsequent analysis.
6 Applying alternative definitions of forest would be feasible with the GLCF
Landsat data as well.
statistics of the attribute table) and the following modifications were made. Biosphere Reserves were
excluded because, at the time of download (2014), some areas had a geometry that was recognizably
inconsistent with the actual area protected (e.g. a rectangle roughly covering the corresponding area,
where the actual area has an irregular shape according to official sources). Ramsar sites were replaced
with the spatial data provided by Ramsar (2013). Marine areas were excluded, as well as areas
protected after 2003 (which would not have affected changes shown by data from ISCGM). The final
selection includes all the protection categories from the International Union for Conservation of
Nature and Natural Resources (IUCN). After processing the database, the extension of many protected
areas still overlapped geographically and thus all remaining areas were dissolved into a single layer.
This resulted in a binary map of the continent with values of ‘protected’ and ‘non-protected’.
Subsequently, the percentage of each precolonial area that was covered by protection figures was
calculated (variable ‘protected’).
Road data was obtained from Columbia University (CIESIN & ITOS 2013) in vectorial form. For each
precolonial area, the density of roads in km of road per 100km2 was calculated (variable
‘road_density’; as measured in World Bank Indicators). Prior to calculating length of roads, the
geometry in the original vectorial file was simplified in QGIS (QGIS Development Team 2016), using
a tolerance of 0.0001, in order to obtain a map manageable by the computer to calculate the statistics.
The road density estimates were validated against the road density data available for over half African
countries (The World Bank 2013)7 and both were consistent, with a correlation coefficient of 0.7. The
sources and processing of these variables are further explained in Larcom et al. (2016).
4.3 Zonal statistics
In order to estimate the tree and forest cover fraction and the cover change rate for each area, the zonal
statistics operation is used. Zonal statistics is a standard procedure that summarizes the values of a
raster dataset for a given zone. This zone is defined by a polygon over the map (from, e.g., a
shapefile), such as the shape of a country or of a protected area. The zonal statistic can be computed
by using the mean, the maximum, the count or any other statistic derived from the raster values that
fall within the boundaries of the given polygon. For example, where applied to a raster binary map of
forest cover (the value layer; 0 for no forest, 1 for forest) and a dataset of country boundaries (the zone
layer), the zonal statistic of means returns the mean value of forested area for each country in the
dataset, which is equivalent to the fraction of forest in each country.
Using the polygonal areas of precolonial societies, zonal statistics were extracted for relevant tree and
forest cover and cover change maps. Zonal statistics were calculated using R (function
‘raster::extract’, from ‘raster’ package). For the dataset of forest cover change, the zonal statistics were
the final step of the process. For the datasets of forest cover at two time points, zonal statistics were
calculated for each time point and then the tabulated zonal statistics for each of the two years were
subtracted to obtain the difference (see Table 2). In total, zonal statistics were calculated for the
variables listed in Table 3. In order to enable comparison, each value of deforestation change was
divided by the number of years in the period covered by the given source, resulting in rates per year.
7 This indicator of road density for African countries (IS.ROD.DNST.K2) was
freely available on-line when accessed in 2014. It is now published in the World Road Statistics by the
International Road Federation and its access has been restricted.
Table 3 Zonal statistics calculated. In bold, those variables that are calculated directly from the
source raster map. ‘*’ variables calculated directly from tabulated statistics of tree/ forest cover. ‘Δ’
difference in tree/ forest cover. In square parenthesis, the range values of the variable.
Cover Δ Year Variable name8
Hansen-Google data, loss [0:1] 2000 to 2012
Hansen-Google data, tree cover [0:100] 2000 ct.go.00
ISCGM, tree cover [0:100] 2003 and 2008 ct.is.03 & 08
ISCGM, forest cover, threshold at 30% [0:1] 2003 and 2008 cf.is.30p.03 & 08
ISCGM, forest cover, threshold at 50% [0:1] 2003 and 2008 cf.is.50p.03 & 08
ISCGM, difference in tree cover [-100:100] *92003 to 2008
ISCGM, difference in forest cover, threshold at 30% [-1:1]
2003 to 2008
ISCGM, difference in forest cover, threshold at 50% [-1:1]
2003 to 2008
JAXA, forest cover [0:1] 2007 and 2010 cf.jx.07 & 10
JAXA, difference in forest cover [-1:1] * 2007 to 2010
The zonal statistic calculated for all cases was the mean and it excluded missing values. For the binary
rasters (forest cover and forest loss), this resulted in a variable indicating the fraction of the total area
(from 0 to 1) that was covered by forest or of the area where forest was lost. For the forest cover
change estimated with ISCGM and JAXA, values ranged from -1 to 1, because these also included
reforestation (negative values). For the tree-percentage cover rasters, zonal statistics produced a
variable that indicates the mean tree cover in the given area, from 0 to 100%. An example of the
estimates of tree cover percentage for each precolonial society is shown in Figure 2 for the year 2000
8 Variable naming convention: ‘ct’ tree cover, ‘cf’ forest cover, ‘dt’ differences
in tree cover, ‘df’ differences in forest cover; ‘go’ data from Google-Hansen, ‘is’ data from ISCGM,
‘jx’ data from JAXA; ‘30p’ and ‘50p’ tree cover percentage threshold for forest, where applicable.
9 This variable indicates changes in the average tree cover between both years.
This measure of tree cover change reflects reductions in canopy regardless of the land use or
ecosystem, and it is also sometimes used as a proxy for forest (e.g. Ickowitz et al. 2014).
Figure 2 Tree cover in 2000 in Africa, as a percentage of the total area of each precolonial
society. Based on data from Hansen et al. (2013) and over a background of the contemporary political
map, from Stamen-OpenStreetMap.
This figure is visually consistent with the visualization of the same data provided by the Global Forest
Watch (see link in section of ‘Relevant Websites’). The figure shows the natural biogeographical
patterns of the continent broadly: the most dense forests lie in the tropical region around the Congo
basin and West African coast, Eastern coast of Madagascar, and parts of Mozambique, Tanzania and
Ethiopia. The transition from areas with little or no trees to dense forest is more abrupt in the North,
bordering the Sahara, and more gradual in the South, where extensive, yet less tree-dense miombo
woodlands are found. Some areas with certain tree density stand out within non-tree regions: the North
of Maghreb, Nile basin and the Eastern and Western Cape provinces of South Africa.
For those sources which did not provide differences directly (ISCGM and JAXA), the difference
(deforestation) between two time points was calculated by subtracting the zonal statistics of each year.
By using the tabulated statistics to calculate the differences between two time points, it is not possible
to quantify dynamics defined as changes in the percentage of tree cover. This is because it is
appropriate to compare changes in the percentage of tree cover at small-scale units, for example,
individual forests or at the pixel scale. An alternative option would be to subtract spatially the rasters
of both years, pixel-by-pixel. However, this can add important inflated variability, particularly in the
case of ISCGM where the resolution of each year was different and re-sampling is needed for such
direct comparison. Subtracting the aggregated statistics loses in precision but reduces potentially
spurious variability. Also, Global Forest Watch recommends against this direct subtraction due to the
methodology used (WRI, 2016). As a robustness test, this approach of tabular subtraction (as opposed
to spatial subtraction) was validated internally by calculating the spatial difference between the
ISCGM rasters of 2003 and 2008, with prior re-sampling of the 2008 data to the resolution of 2003
(with bilinear interpolation), and then obtaining the average change for each precolonial area. The
results with both methods are very similar.
The results of forest cover changes for HG, ISCGM and JAXA are shown in Figure 3. In this figure,
positive numbers (in red) correspond to forest loss, negative numbers (in blue) to forest gain, in
percentage of the total area of the precolonial society. For the HG data, the definition of forest change
is given, and so is the definition of forest for JAXA. For ISCGM, two definitions of forest are applied
(as explained above). The figures are cropped to tropical regions because these contain the highest
magnitudes of changes.
Figure 3 Deforestation per year in the period indicated, as a percentage of the area of each
precolonial society in tropical Africa. Sources of data are given in parenthesis.
The differences in the estimation of forest change across sources are noticeable, and these are unfolded
and discussed below.
4.4 Analysis of the zonal statistics from each dataset
The tabular database analyzed comprised the following for each precolonial society: zonal statistics
for all the tree/forest cover and cover change listed in Table 3, country, latitude and longitude of the
centroid of the area, road density and percentage of area protected. For the variables indicating forest
cover or tree cover change, positive values correspond to deforestation, negative values indicate
reforestation. The statistical analysis was done using R.
Polygons that were very small were eliminated from the database because small errors or inaccuracies
in the spatial computations could introduce excessive fluctuations of values of the ratios calculated in
the zonal statistics. The threshold for sufficiently large areas was heuristically established as polygons
that would contain at least 100 pixels from the rasters in the aggregated resolution. This eliminated 32
polygons. The final sample contains 1,286 polygons (precolonial societies intersected with countries).
The first block of the analysis is the comparison among sources of data. The relations between
variables were assessed by means of Spearman correlation coefficients, significance tests of the
correlations, and scatter plots among all pairs of variables (shown in the Figures below in section 5).
An exploration of the distribution of the tree and forest variables indicates that none of them fulfill
parametric assumptions.10 Therefore Spearman correlation coefficients were used throughout. Whether
the differences across sources covariate with latitude was explored in scatter plots.
The second block analyses the relations between the cover and cover change estimates and other
variables related to forest dynamics. After computing tree/forest cover and cover change and
comparing them, the relation of these estimates with potential drivers of these changes was explored.
This is the actual exercise that may be done by social scientists using these data products. The
exploration here is made using bivariate statistics between the tree/forest cover change estimates and
the variables selected as potential institutional causes of forest dynamics (area protected and
precolonial area). To this exploration, the road density is added as an illustration of typical
demographic and economic drivers of forest dynamics related to increased population pressure.
The variable for precolonial institutions is visualized for each tree/forest cover and cover change
estimate by means of box plots. The variables of area protected (as a fraction of the total area) and
road density are continuous, and therefore their relation with tree/forest cover and cover change is
explored by means of correlation coefficients (Spearman coefficient) and scatter plots. The scatter
plots were also colored by latitude.
5 Results and discussion
The results are presented in four parts. First, the different sources of data are compared. Second, the
estimates of tree and forest cover are compared to the estimates of cover change. Next, the relation
between the estimates and latitude is explored. Finally, the estimates are statistically summarized by
type of institution, to explore whether each source of data provides different results of cover and cover
change for each type of institution.
Comparison of cover and cover change between sources
A matrix of correlations and scatter plots (Peterson & Carl 2014) was built for each group of variables:
the tree and forest cover variables (Figure 4), and the tree and forest cover change variables (Figure 5).
In these figures, the number in the upper part indicates Spearman correlation coefficients (p-values <
10 Normality of the variables was assessed by means of histograms, q-q plots
and Shapiro-Wilk tests for each of the forest/ tree cover and cover change variables. All the Shapiro-
Wilk tests suggest that normality in the distributions can be discarded, with p-values < 0.0001 in all
cases. The exceptions are the cover change variables that also include reforestation (all but the HG),
which have a near-symmetric distribution but still highly leptokurtic. Most variables have a high
number of values near 0. The forest cover variables (but not the tree cover variables) have a quasi-
uniform distribution when eliminating the high number of zero values. The tree cover variables have
distributions likely to be geometric or exponential.
0.001 (***), < 0.01 (**). < 0.05 (*)). Each dot in the scatter plots represents a precolonial area, which
are colored in a gradient of latitude: red for northern latitudes, green nearer the equator and blue for
southern latitudes. Variables are described in Table 3 and their histograms are shown in the diagonals.
Figure 4 Correlogram of tree and forest cover estimates, and likely explanatory variables.
The matrix of correlations and of scatter plots between all tree and forest cover variables (Figure 4)
reveals that there is a strong correlation in all years (2000, 2003, 2007, 2008 and 2010). Coefficients
are higher than 0.85 in all cases, and frequently above 0.9.
In terms of the comparison with additional explanatory variables, tree and forest cover estimates are
inversely correlated with latitude (absolute correlation coefficients between 0.3-0.6); naturally, most
dense forests concentrate around the equator. The cover estimates are mildly positively correlated with
the percentage of area protected (correlation coefficients around 0.2). The relation with road density is
very low for all estimates (correlation coefficients between 0.05-0.08).
The correlations of estimates of tree and forest cover change are shown in Figure 5. These are rather
inconsistent across sources. The potential reasons for these inconsistencies are discussed in the next
Figure 5 Correlogram of tree and forest cover change estimates, and likely explanatory
The two estimates derived from applying two different definitions to the ISCGM data are highly
correlated (coefficient of 0.87). However, when comparing the three sources for estimates of change
among themselves, the correlations are strikingly low. HG mildly correlates with the forest change
estimate from ISCGM under the tree-cover threshold of 50%, however, this correlation is very low
(coefficient of 0.16). All other coefficients, of HG with ISCGM-30%, of HG with JAXA, and of
JAXA with the two ISCGM, are in the range of -0.03 to 0.01. These divergences are noticeable in the
maps of Figure 3.
When comparing tree and forest cover change estimates with the additional variables, there are also
remarkable differences between the sources. In terms of protected areas, the HG estimates show a
positive relation between deforestation and percentage of area protected (coefficient of 0.24). Both the
ISCGM and the JAXA estimates have lower absolute correlation coefficients with protected area, but
these coefficients are negative; according to these sources and in the periods covered, the fraction of
area protected is somewhat related with deforestation (coefficients around -0.1).
In the case of road density, the correlations give indication to suggest that higher road density relates
to deforestation. This is more consistent across sources, although the magnitude of this relation is
highest for HG estimates (coefficient of 0.21), whereas the other two sources range between 0 and 0.1.
Comparison of tree and forest cover with cover change estimates
The correlations of tree/ forest cover with cover change estimates (Spearman coefficients) are shown
in the matrix in Table 4 below.
Table 4 Correlation matrix between tree and forest cover and cover change estimates.
Cover variables/ Cover change variables df.go.loss0012 df.is.30p.0308 df.is.50p.0308 df.jx.0710
ct.go.00 0.81 0.02 0.17 -0.02
ct.is.03 0.71 0.28 0.45 -0.03
ct.is.08 0.78 -0.08 0.10 -0.05
cf.jx.07 0.72 0.04 0.21 0.05
cf.jx.10 0.73 0.04 0.21 -0.09
These relations are generally low and also divergent across sources. The estimate of change derived
from HG data is the exception, since it is highly correlated with all the tree and forest cover estimates
(coefficients between 0.7-0.8). The estimates of the other two sources have very little relation with
tree/ forest cover estimates. The correlation coefficients of change estimates for the ISCGM data at
30% threshold are almost nil with the HG and the JAXA estimates of tree cover. Both ISCGM
estimates of change are positively correlated with the forest covers estimated from the same source for
2003 (with both definitions), but are very low with the forest cover estimates for 2008. The change
estimates from ISCGM at 50% threshold correlate with the JAXA estimates, but little with those of
HG. The JAXA data on cover change show very little relation with all cover estimates (all coefficients
Variation of estimates by latitude
An inspection of the scatter plots of tree and forest cover estimates colored by latitude unfolds these
patterns in more detail (Fig 4). The variation in estimates is rather low at higher latitudes, however
with the caveat that the predominance of sparse vegetation and desert in these latitudes mean that
estimates are low for all sources, in comparison to estimates in more equatorial areas.
There are no clear patterns between the data on cover estimates from JAXA, HG and ISCGM 2008,
which suggests that none of the sources’ sensibility to detect tree cover varies with latitude. However,
the ISCGM 2003 map seems to detect less tree cover in latitudes south of the equator. JAXA 2008 in
contrast, detects slightly more in southern latitudes, although this latter difference is not so
For the tree/ forest cover change estimates, ISCGM detects lower changes for southern latitudes in
comparison with HG and JAXA. The HG estimate suggests that deforestation occurs nearer the
equator (coefficient with latitude of -0.52), while the ISCGM estimates suggest the opposite, although
less remarkably (coefficients between 0.20-0.25). An inspection of Figure 3 suggests that ISCGM
found most deforestation in latitudes in the margins of equatorial areas, but around the equator this
source shows forest increase.
The relations of cover and cover change with precolonial institutions
The estimates of cover and cover change by each category of precolonial institution are shown in
Figure 6. For all sources of cover estimates, hereditary modes of power transfer appear to have larger
percentage of forest in their lands. This distinction is clearer with the HG and the JAXA data, although
in all cases there is wide spread.
Figure 6 Box plots of tree and forest cover and cover change estimates by precolonial
institutions. Figures A-E, tree/ forest cover; Figures F-I, forest change. Key: ‘De’ democratic, ‘Fa’
from above, ‘He’ hereditary, ‘Ss’ social standing.
For cover change estimates, societies with hereditary form of ruler succession have a wide range of
values, and slightly higher rates of deforestation according to HG. According to ISCGM, also societies
with hereditary rules have high rates, but also those with social standing rules, while the JAXA data
does not show clear differences. The category 'from above' has the lowest rates. While the different
sources do not provide conclusive results about which power transmission is related to higher
deforestation, the potential mechanisms for this variation are explained in Larcom et al. (2016).
All the sources provide similar estimates of tree cover and forest cover percentages when aggregating
by the units of interest, in this case, areas historically covered by different ethnic groups in Africa. The
similarities have coefficients of around 0.8-0.99 between the sources. Any differences between them
could be associated to changes in the real phenomenon across years or to divergences in the satellite
data sources and in the processing algorithms.
Likewise, the aggregated statistics of forest change (deforestation and reforestation) for each unit of
analysis would be expected to show similar patterns across sources, or at least to have a high
correlation. Ultimately, deforestation processes are, to some extent, path dependent and so differences
due to the different periods covered would not produce radically different results when studying large
extensions. Nonetheless and in stark contrast, the differences in cover change estimates found across
sources are remarkable. Their correlation coefficients are almost nil and this can have important
implications for the use of these estimates in further research.
There are a number of potential sources of these inconsistencies. It is important to remark that each
source had a slightly different definition of forest and of deforestation. The HG data define forest loss
as a change of 50% in the tree cover percentage. For example, areas of 90% and 40% of tree cover in
the beginning of the period that in the end of the period have 35% and 5% of tree cover forest will be
estimated as forest loss. For the ISCGM and the JAXA data however, forest loss is defined as the
change in the percentage of land surface classified as forest. This classification differs for each source.
These definitions of forest loss may be one reason for the important divergences between sources.
Further analysis can help solve this question, by applying the definition of forest change based on
percentage changes, which is applicable to the JAXA and the GLSC-Landsat data, but arguably not to
the ISCGM data due to the different resolutions of its two time points (as explained above).
Even though overlapping, each source of data provides a comparison within different periods. Year-
by-year variability in vegetation cover due to differences in climatic conditions can also explain the
large divergences between sources. While longer periods may be able to smooth yearly variability
(e.g. in the HG data), shorter periods may be highly affected by volatile dynamics of ecosystems. This
might have affected importantly the measurements in ecosystems with strong seasonal fluctuations
that can reflect promptly any climatic anomalies, while the rainforests closer to the equator have a
much more perennial cover and thus might not vary so strongly from year to year due to natural
In sum, these are possible causes of differences in the estimations provided by each source:
The period covered. In the period compared certain natural events may have had a significant
impact on tree dynamics, such as years that were either exceptionally dry, wet, warm or cold.
Remote sensing data sources. Both the HG and the ISCGM used MODIS images, whereas
JAXA used ALOS-PALSAR. The implications of each source for forest monitoring have been
discussed elsewhere (Fuller 2006; Joseph et al. 2011) and is beyond the scope of this chapter.11
Estimation algorithms of tree cover. In the description provided by each source there is no
much detail available about the specifics of the estimation algorithms (with some exceptions
such as Crowther et al. 2015). This is partially due to their technical intricacy. However, more
detailed specifications would allow researchers to more accurately assess the suitability of a
products for specific purposes (Rosa et al. 2014).
Definition of forest cover. While this may intuitively have a strong influence in the
outcomes, the findings of this study do not suggest that forest definitions (in the form of tree
cover threshold) have much influence in the divergences between estimates. Even when
applying definitions of 30% and 50%, the ISCGM estimates where still much more highly
correlated among themselves than with any of the other two sources.
11 Maps based on LANDSAT include the GLCF products and the Intact Forests
Landscapes map, which analysis could provide further clarification about variations due to the satellite
Validation and calibration. The HG data is accompanied by a useful description of its
validation (Hansen et al. 2013). Their description suggests that estimates of forest cover may
be more accurate for more densely covered areas within the tropics, and less so in temperate
areas. This is consistent with the findings that show that, while the strongest differences are in
forested areas because the magnitudes are larger, there are still very high differences in less
densely covered areas, such as savannas.
The study of deforestation as a cause of human action has prompted extensive monitoring, particularly
at regional and national levels, and is becoming even more relevant since it is the basis of high stake
policies within the frame of global climate change mitigation (e.g. REDD) and adaptation, and of
global action to conserve the world’s biodiversity. While causes of deforestation are qualitatively well-
documented (Geist & Lambin 2002), spatially-founded evidence about what institutional strategies
work to avoid forest loss is much more scarce. For example, spatial studies that assess the impact of
protected areas over forests at large scales have not reached clear conclusions about their effectiveness
(Nagendra 2008; Spracklen et al. 2015).
This chapter compares, for the first time, three major datasets of tree/ forest cover and cover change of
global extent, by extracting forest cover and deforestation rates for areas defined by the boundaries of
precolonial societies in Africa. It finds that, while estimates of cover converge relatively well,
estimates of change are remarkably different. Such divergences between sources were found in
previous comparisons of other spatial land use datasets (Giri et al. 2005).
From this study, some recommendations derive for further social applications of these big data of tree/
forest cover and cover change. First and foremost, when using large scale spatial data, it is appropriate
to perform robustness tests by modeling the same social variables with estimations of forest dynamics
from different sources (such as in Larcom et al. 2016), where available and where the time coverage of
the data is appropriate. This robustness check may be conducted more solidly if the estimated rates are
contrasted with alternative tabulated rates provided by national or international agencies. Second, it is
important to acknowledge the subtleties involved in the definitions of forest and of forest change, and
conceptually assess whether the measure utilized in the spatial product is appropriate for the
theoretical framework of the study, and otherwise evaluate how it affects the conclusions drawn from
the empirical results.
Researchers willing to use these data need to be aware of two key caveats. On one hand, tree cover
does not necessarily mean forest, irrespective of the tree cover threshold applied, because some tree
covers may be plantations and this has important ecological implications in terms of biodiversity and
other ecosystem functions. In this aspect is most apparent the trade-off between scope and precision,
where more extensive coverage is at the expense of a reduction in the specificity of the definition of
the natural phenomenon. Therefore any conclusions about forest dynamics would need to be nuanced
by heeding these fundamental differences.
On the other hand, the data layers may not be always comparable. According to the Global Forest
Watch, loss and gain of forest from the HG data cannot be directly compared, due to differences in the
data generation. Intuitively, one might design a study in which loss and gain are subtracted to obtain a
measure of net loss, either at the pixel level or at the aggregate level. It is in these usage and
comparative considerations where it is important to ensure transparency in the description of the
algorithms used in the processing of satellite images and in the production of the final data (Rosa et al.
2014). In addition, the ISCGM provided maps in two different resolutions for each year. If using the
same processing algorithms, one may assume that the estimations of tree cover may be consistent.
However, the findings suggest that the 2003 map may have been either strongly affected by climatic
anomalies, or by a feature in the processing that has produced a map which differs importantly, not
just from other sources, but also from the 2008 map from the same source. As explained above, the
differences between the 2003 and the 2008 ISCGM maps are commensurate to the differences
between the 2008 ISCGM map and other sources, whereas the two JAXA maps for 2007 and 2010 do
not show such strong differences.
The present analysis has a number of limitations, which may be avenues for further research. The main
limitations refer to the scale of the comparative study. First, this study is restricted to a number of the
sources from those suggested in Table 1, and to a number of possible comparisons from those
suggested in Figure 1. A more comprehensive comparison of the sources could clarify what type of
ecoregions and latitudes produce less homogeneous estimates, and whether any of the sources stand
out in their divergences, which may be subject to further scrutiny. Second, analyzing the data in their
full global extent could reveal further patterns in terms of latitudes and ecoregions. Third, the current
study is focused on areas that represent precolonial ethnic boundaries. An analysis covering
contemporary administrative boundaries could prove more useful to address deforestation, since it
would tackle governance at scales that are currently managed by the same official institutions. Fourth,
this study explores differences by means of simple correlations, and therefore it does not control for
other sources of divergences. A possible approach to better compare deforestation estimates could be
to clip them to areas that are classified as forests in land-use cover maps or to model differences as
dependent on a series of predictors. Finally, the study compares yearly rates for different (although
overlapping) periods, and yearly climatic anomalies may have introduced additional variability
between the sources that is indistinguishable in the present study. A more comprehensive
chronological comparison could overcome this drawback.
In essence, further research is needed to include more sources in the comparison. A comparison
specifically designed to assess the impact of the definitions of forest may be useful to test the
hypothesis of whether definition has more importance than source in determining the cover change
estimates. Understanding this point can help directing global monitoring efforts towards improving
one or the other aspect (either the conceptual definitions or the technology and processing).
Additionally, producing estimates by country would enable their validation against national and
international tabulated statistics, which are much more widely available, and thus produce a robustness
test for the estimates.
With regards to the actual data products, calls to improve the consistency across sources of land cover
data are also applicable here (Giri et al. 2005; Mora et al. 2014). In addition, it would be advisable to
include estimates of the level of confidence of the spatial data, which would facilitate more robust
conclusions for research and policy supported on these products. This has already been initiated by the
Global Land Cover Facility Landsat, which provides additional rasters indicating the probability of
change associated to its Forest Cover Change maps (Sexton, Noojipady, Anand, et al. 2015). Finally,
further validation on the ground would provide reassurance about the reliability of these products. It is
known that validation on the ground can be the most resource-intensive part of such a spatial data
product. Yet it is important to keep in mind that reliable data is of utmost importance to understand
drivers and develop solutions for such a critical environmental change as it is deforestation. New
developments in citizen-science could be useful for the endeavor of validating on the ground this
invaluable large-scale spatial information about the global forests.
The author is grateful to Gabriel Amable, David Gaveau, Shaun Larcom and Terry van Gevelt for
early discussions about this work. She is also grateful for kindly making the data available to the Earth
Observation Research Center (EORC) at the Japan Aerospace Exploration Agency (JAXA); the
Geospatial Information Authority of Japan, Chiba University and collaborating organizations; and
Hansen and his team at the University of Maryland, UMD, Google, USGS and NASA. This work
would not have been possible without the contribution of open-source developers of GDAL, R (and
associated packages) and QGIS.
<1TT , T
Further Reading (optional)
The Further reading list is intended to be the next step for the interested reader, and as such should
include chapters in book, major review articles, or seminal journal articles. The Further Reading list
should include no more than 10-15 references and these, regardless of source, should be arranged
alphabetically by author’s last name.
Relevant Websites (optional)
Authors are encouraged to provide up to 5-10 websites of use to the reader in regard of the topic.
Please refer to the Instructions to Authors at http://mrw.elsevier.com/gisy/instructions.html for
important notes regarding the particular format that may be submitted.
1. FAO GeoNetwork: http://www.fao.org/geonetwork
2. Global Forest Watch: http://www.globalforestwatch.org/
3. Global Land Cover Facility (GLCF) Landsat Tree Cover and Forest Cover Change Maps:
4. Google Earth engine (HG data): https://earthenginepartners.appspot.com
5. International Steering Committee for Global Mapping (ISCGM) Tree Cover Maps:
6. Japan Aerospace Exploration Agency (JAXA) Forest Maps: http://www.eorc.jaxa.jp
7. Intact Forest Landscapes: http://www.intactforests.org