mangal – making ecological network analysis simple

Article (PDF Available)inEcography 39(4):384-390 · April 2016with 421 Reads
DOI: 10.1111/ecog.00976
Abstract
The study of ecological networks is severely limited by 1) the difficulty to access data, 2) the lack of a standardized way to link meta-data with interactions, and 3) the disparity of formats in which ecological networks themselves are stored and represented. To overcome these limitations, we have designed a data specifi cation for ecological networks. We implemented a database respecting this standard, and released an R package (rmangal) allowing users to programmatically access, curate, and deposit data on ecological interactions. In this article, we show how these tools, in conjunction with other frameworks for the programmatic manipulation of open ecological data, streamlines the analysis process and improves replicability and reproducibility of ecological network studies.
Early View (EV): 1-EV
et al. 2011, Dalsgaard et al. 2013, Poisot et al. 2013b,
Chamberlain et al. 2014, Olito and Fox 2015).  e severe
shortage of publicly shared data in the fi eld also restricts
the scope of large-scale analyses.
It is possible to predict the structure of ecological net-
works, either using latent variables (Rohr et al. 2010, Ekl ö f
et al. 2013) or actual trait values (Gravel et al. 2013).  e
calibration of these approaches require accessible data, not
only about the interactions, but about the traits of the spe-
cies involved. Comparing the effi ciency of diff erent methods
is also facilitated if there is a homogeneous way of represent-
ing ecological interactions, and the associated metadata. In
this paper, we 1) establish the need for a data specifi cation
serving as a common language among network ecologists, 2)
describe this data specifi cation, and 3) describe rmangal, a R
package and companion database relying on this data speci-
cation. e rmangal package allows one to easily deposit
and retrieve data about ecological interactions and networks
in a publicly accessible database. We provide use-cases show-
ing how this new approach makes complex analyses simpler,
and allows for the integration of new tools to manipulate
biodiversity resources.
Ecography 38: 001–007, 2015
doi: 10.1111/ecog.00976
© 2015  e Authors. Ecography © 2015 Nordic Society Oikos
Subject Editor: Michael Borregaard. Editor-in-Chief: Miguel Ara ú jo. Accepted 7 April 2015
Ecological networks are effi cient representations of the
complexity of natural communities, and help discover
mechanisms contributing to their persistence, stability,
resilience, and functioning. Most of the early studies of
ecological networks were focused on understanding how
the structure of interactions within one location aff ected
the ecological properties of this local community.  ey
revealed the contribution of average network properties,
such as the buff ering impact of modularity on species loss
(Yodzis 1981, Pimm et al. 1991), the increase in robustness
to extinctions along with increases in connectance (Dunne
et al. 2002), and the fact that organization of interactions
maximizes biodiversity (Bastolla et al. 2009). New studies
introduced the idea that networks can vary from one local-
ity to another.  ey can be meaningfully compared, either
to understand the importance of environmental gradients
on the presence of ecological interactions (Tylianakis et al.
2007), or to understand the mechanisms behind varia-
tion itself (Poisot et al. 2012, 2014). Yet, meta-analyses
of numerous ecological networks are still extremely rare,
and most of the studies comparing several networks do
so within the limit of particular systems (Schleuning
mangal making ecological network analysis simple
Timoth é e Poisot , Benjamin Baiser , Jennifer A. Dunne , Sonia K é , Fran ç ois Massol ,
Nicolas Mouquet , Tamara N. Romanuk , Daniel B. Stouffer , Spencer A. Wood and
Dominique Gravel
T. Poisot (tim@poisotlab.io.) and D. B. Stouff er, Univ. of Canterbury, School of Biological Sciences, Christchurch, New Zealand. TP also at: D é pt
de sciences biologiques, Univ. de Montr é al, Pavillon Marie-Victorin, C.P. 6128, succ. Centre-ville, Montr é al, QC H3C 3J7, Canada. B. Baiser,
Dept of Wildlife Ecology and Conservation, Univ. of Florida, Gainesville, USA. J. A. Dunne, Sante Fe Inst., 1399 Hyde Park Road, Santa Fe,
NM 87501, USA. S. K é and N. Mouquet, Inst. des Sciences de l É volution, Univ. de Montpellier, CNRS, IRD, EPHE, CC065, Place Eug è ne
Bataillon, FR-34095 Montpellier Cedex 05, France. F. Massol, Laboratoire G é n é tique et Evolution des Populations V é g é tales, CNRS UMR
8198, Univ. Lille 1, B â timent SN2, FR-59655 Villeneuve d Ascq cedex, France, and UMR 5175 CEFE Centre d’Ecologie Fonctionnelle et
Evolutive (CNRS), 1919 Route de Mende, FR-34293 Montpellier Cedex 05, France. T. N. Romanuk, Dept of Biology, Dalhousie Univ.,
Canada. S. A. Wood, Natural Capital Project, School of Environmental and Forest Sciences, Univ. of Washington, Seattle, WA 98195, USA,
and Dept of Biological Sciences, Idaho State Univ., Pocatello, ID 83209, USA. D. Gravel, Univ. du Qu é bec à Rimouski, D é pt de Biologie, 300
All é es des Ursulines, Rimouski, QC G5L 3A1, Canada. DG and TP also at: Qu é bec Centre for Biodiversity Sciences, Montr é al, QC, Canada.
e study of ecological networks is severely limited by 1) the diffi culty to access data, 2) the lack of a standardized way to
link meta-data with interactions, and 3) the disparity of formats in which ecological networks themselves are stored and
represented. To overcome these limitations, we have designed a data specifi cation for ecological networks. We implemented
a database respecting this standard, and released an R package (rmangal) allowing users to programmatically access, curate,
and deposit data on ecological interactions. In this article, we show how these tools, in conjunction with other frameworks
for the programmatic manipulation of open ecological data, streamlines the analysis process and improves replicability and
reproducibility of ecological network studies.
2-EV
Networks need a data specifi cation
Ecological networks are (often) stored as an adjacency matrix
(or as the quantitative link matrix), that is a series of 0s and
1s indicating, respectively, the absence or presence of an
interaction.  is format is extremely convenient (as most
network analysis packages, e.g. bipartite, betalink, food-
web, require data to be presented this way), but is extremely
ineffi cient at storing meta-data. In most cases, an adjacency
matrix provides information about the identity of species
(in the cases where rows and columns headers are present) and
the presence or absence of interactions. If other data about
the environment (e.g. where the network was sampled)
or the species (e.g. the population size, trait distribution, or
other observations) are available, they are often either given
in other fi les or as accompanying text. In both cases, making
a programmatic link between interaction data and relevant
meta-data is diffi cult and, more importantly, error-prone.
By contrast, a data specifi cation (i.e. a set of precise
instructions detailing how each object should be repre-
sented) provides a common language for network ecolo-
gists to interact, and ensures that, regardless of their source,
data can be used in a shared workfl ow. Most importantly, a
data specifi cation describes how data are exchanged. Each
group retains the ability to store the data in the format that
is most convenient for in-house use, and only needs to pro-
vide export options (e.g. through an API, i.e. a program-
matic interface running on a web server, returning data in
response to queries in a pre-determined language) respecting
the data specifi cation. is approach ensures that all data
can be used in meta-analyses, and increases the impact of
data (Piwowar and Vision 2013). Data archival also off ers
additional advantages for ecology.  e aggregation of local
observations can reveal large-scale phenomena (Reichman
et al. 2011), which would be unattainable in the absence of a
collaborative eff ort. Data archival in databases also prevents
data rot and data loss (Vines et al. 2014), thus ensuring that
data on interaction networks which are typically hard and
costly to produce continue to be available and usable.
Elements of the data specifi cation
e data specifi cation introduced here (Fig. 1) is built around
the idea that (ecological) networks are collections of relation-
ships between ecological objects, and each element has par-
ticular meta-data associated with it. In this section, we detail
the way networks are represented in the mangal specifi cation.
An interactive webpage with the elements of the data specifi ca-
tion can be found online at http://mangal.io/doc/spec/ .
e data specifi cation is available either at the API root (e.g.
http://mangal.io/api/v1/?format = json > ), or can be viewed
using the whatIs function from the rmangal package. Rather
than giving an exhaustive list of the data specifi cation (which
is available online at the aforementioned URL), this section
serves as an overview of each element, and how they interact.
We propose JSON, a user-friendly format equivalent to
XML, as an effi cient way to standardise data representa-
tion for two main reasons. First, it has emerged as a de
facto standard for web platform serving data, and accepting
data from users. Second, it allows strict validation of the
data: a JSON fi le can be matched against a scheme, and
one can verify that it is correctly formatted (this includes
the possibility that not all fi elds are fi lled, as will depend
on available data). Finally, JSON objects are easily and
cheaply (memory-wise) parsed in the most commonly-used
programming languages, notably R (equivalent to list) and
python (equivalent to dict). For most users, the format in
which data are transmitted is unimportant, as the inter-
action happens within R as such, knowing how JSON
objects are organized is only useful for those who want to
interact with the API directly. As such, the rmangal pack-
age takes care of converting the data into the correct JSON
format to upload them in the database.
Functions in the rmangal package are names after ele-
ments of the data specifi cation, in the following way:
verb Element. verb can be one of list, get, or patch;
for example, the function to get a particular network
is getNetwork.  e function to modify (patch) a taxon
is patchTaxa. All of these functions return a list object,
which means that chaining them together using, e.g. the
plyr package, is time-effi cient. ere are examples of this
in the use-cases.
Node information
Taxa
Taxa are a taxonomic entity of any level, identifi ed by their
name, vernacular name, and their identifi ers in a variety of
Core elements
Network information Metadata
taxa item trait
interaction
reference
environment
dataset network
Figure 1. An overview of the data specifi cation, and the hierarchy between objects. Every box corresponds to a level of the data specifi cation.
Grey boxes are nodes, blue boxes are interactions and networks, and green boxes are metadata.  e bold boxes (dataset, network, interac-
tion, taxa) are the minimal elements needed to represent a network.
3-EV
taxonomic services. Associating the identifi ers of each taxa
allows using the new generation of open data tools, such as
taxize (Chamberlain and Sz ö cs 2013), in addition to pro-
tecting the database against taxonomic revisions.  e data
specifi cation currently has fi elds for NCBI (National Center
for Biotechnology Information), GBIF (Global Biodiversity
Information Facility), TSN (Taxonomic Serial Number, used
by the Integrated Taxonomic Information System), EOL
(Encyclopedia of Life) and BOLD (Barcode of Life) iden-
tifi ers. We also provide the taxonomic status, i.e. whether
the taxon is a true taxonomic entity, a trophic species , or a
morphospecies. Taxonomic identifi ers can either be added
by the contributors, or will be automatically retrieved during
the automated curation routine.
Item
An item is any measured instance of a taxon. Items have
a level argument, which can be either individual or popu-
lation; this allows representing both individual-level net-
works (i.e. there are as many items of a given taxa as there
were individuals of this taxon sampled), and population-
level networks. When item represents a population, it is
possible to give a measure of the size of this population.
e notion of item is particularly useful for time-
replicated designs: each observation of a population at
a time-point is an item with associated trait values, and
possibly population size.
Network information
All objects described in this sub-section can have a spatial
position, information on the date of sampling, and refer-
ences to both papers and datasets.
Interaction
An interaction links two taxa objects (but can also link pairs
of items).  e most important attributes of interactions are
the type of interaction (of which we provide a list of pos-
sible values), and its obs_type, i.e. how it was observed.  is
eld helps diff erentiate direct observations, text mining, and
inference. Note that the obs_type fi eld can also take con-
rmed absence as a value; this is useful for, e.g. cafeteria
experiments in which there is high confi dence that the inter-
action did not happen.
Network
A network is a series of interaction objects, along with 1)
information on its spatial position (provided at the latitude
and longitude), 2) the date of sampling, and 3) references to
measures of environmental conditions.
Dataset
A dataset is a collection of one or several network(s).
Datasets also have a fi eld for data and papers, both of
which are references to bibliographic or web resources
that describe, respectively, the source of the data and the
papers in which these data have been studied. Datasets or
networks are the preferred entry point into the resources,
although in some cases it can be meaningful to get a list of
interactions only.
Meta-data
Trait value
Objects of type item can have associated trait values.  ese
consist in the description of the trait being measured, the
value, and the units in which the measure was taken. As the
measurment was taken at a diff erent time and/or location
that the interaction was, they have fi elds for time, latitude
and longitude, and references to original publication and
original datasets.
Environmental condition
Environmental conditions are associated with datasets, net-
works, and interactions objects, to allow for both macro and
micro environmental conditions.  ese are defi ned by the
environmental property measured, its value, and the units.
As traits, they have fi elds for time, latitude and longitude,
and references to original publication and original datasets.
References
References are associated with datasets.  ey accommodate
the DOI, JSON or PubMed identifi ers, or a URL. When
possible, the DOI is preferred as it off ers more potential to
interact with other online tools, such as the CrossRef API.
Use cases
In this section, we present use-cases using the rmangal
package for R, to interact with a database implementing
this data specifi cation, and to serve data through an API
( http://mangal.io/api/v1/ ). It is possible for users to
deposit data into this database through the R package.
Note that data are made available under a CC-0 Waiver
(Poisot et al. 2013a). Detailed information about how to
upload data are given in the vignettes and manual of the
rmangal package. In addition, the rmangal package comes
with vignettes explaining how users can upload their data
into the database through R.
e data we use for this example come from Ricciardi
et al. (2010).  ese data were previously available on the
InteractionWeb DataBase as a single xls fi le. We uploaded
them in the mangal database at < http://mangal.io/api/v1/
dataset/2 > . e rmangal package can be installed this way:
# Prepare the environment
library(devtools)
# This line is needed on some linux distributions
if(getOption('unzip')=='') options ('unzip' = 'unzip')
# This installs the rmangal package
install_github('mangal-wg/rmangal')
library(rmangal)
Once rmangal is installed and loaded, users can establish a
connection to the database this way:
mangal_url <-'http://mangal.io/'
api <-mangalapi(mangal_url)
Create taxa and add an interaction
In the fi rst use-case, we will create an interaction between
two taxa. We ask of readers not to execute this code as it
4-EV
which Martinez (1992) proposed to be linear (in food
webs).
library(plyr)
library(igraph)
# Retrieve the dataset of interest
dataset <-getDataset(api, 2)
# Get each network in the dataset as a graph object
graphs <- alply(dataset$networks, 1, function(x) toIgraph(api, x))
# Make a data.frame with the number of links and species
ls <-ldply(graphs, function(x) c(S = length(V(x)), L = length(E(x))))
ls$X1 <-aaply(as.numeric(as.vector(dataset$networks)), 1,
function(x) getNetwork(api, x)$name)
colnames(ls)[1] <- 'Network'
# Now plot this dataset
source("suppmat/usecase_ls.r")
Getting the data to produce Fig. 2 requires less than 10
lines of code.  e only information needed is the identi-
er of the network or dataset, which we suggest should
be reported in publications as: these data were depos-
ited in the mangal format at URL /api/v1/dataset/
ID (where URL and ID are replaced
by the corresponding values), preferably in the meth-
ods, possibly in the acknowledgements. To encourage
data sharing and its recognition, we encourage users
of the database to always cite the original datasets or
publications.
Network beta-diversity
In the second example, we use the framework of network
β -diversity (Poisot et al. 2012) to measure the extent to
is, but rather to use it as a template for their own analy-
ses. A complete, step-by-step guide to upload is given in
the vignettes of the rmangal package. Uploading anything
requires an username and API key, which can be obtained
at the following URL: < http://mangal.io/dashboard/login > .
Your API key be generated automatically after registration.
You can use it to connect to the database securely:
api_secure <- mangalapi("http://mangal.io", usr="MyUserName",
key="AbcDefIjkL1234")
e rst step is to create two taxa objects, with the species
that we observed interacting:
567891011
3
4
5
6
7
8
9
10
Species richness
Number of interactions
Data
Constant connectance
Best fit (linear model)
Figure 2. Relationship between the number of species and number
of interactions in the anemonefi sh-fi sh dataset. Constant
connectance refers to the hypothesis that there is a quadratic
relationship between these two quantities.
seal <- list(
name = "Hydrurgaleptonix",
vernacular = "Leopardseal",
eol = 328637
)
cod <- list(
name = "Gadusmorhua",
vernacular = "Atlanticcod"
)
Now, we will send these two objects to the remote database:
seal <- addTaxa(api_secure, seal)
cod <- addTaxa(api_secure, cod)
Note that it is suggested to overwrite the local copy of
the object, because the database will always send back
the remote copy.  is makes the syntax of further addi-
tion considerably easier, as we show below. Once the two
objects are created, we can create an interaction between
them:
seal_eats_cod <- list(
taxa_from = seal,
taxa_to = cod,
int_type = "predation",
obs_type = "observed"
)
en using the same approach, we can send this informa-
tion in the remote database:
seal_eats_cod <- addInteraction(api_secure, seal_eats_cod)
To create networks, datasets, etc, one needs to follow the
same procedure, as is explained in the online guide for data
contributors, available at http://mangal.io/doc/upload/ .
Link species relationships
In the fi rst example, we visualize the relationship between
the number of species and the number of interactions,
5-EV
which networks that are far apart in space have diff erent
interactions. Each network in the dataset has a latitude
and longitude, meaning that it is possible to measure
the geographic distance between two networks. For each
pair of networks, we measure the geographic distance (in
km), the species dissimilarity ( β s ), the network dissimi-
larity when all species are present ( β WN ), and fi nally, the
network dissimilarity when only shared species are
considered ( β os ).
# We need the betalink package to measure network beta-diversity
install_github('PoisotLab/betalink')
library(betalink)
library(plyr)
library(igraph)
library(sp)
# We first retrieve all information about the networks
Networks <- alply(dataset$networks, 1, function(x) getNetwork(api,x))
Geographic distance
Species composition dissimilarity
0.00 0.10 0.20 0.300.00 0.10 0.20 0.30
0.00 0.10 0.20 0.30
0.2
0.4
0.6
0.8
1.0
0.0
0.2
0.4
0.6
0.8
0.1
0.0
0.2
0.3
0.4
0.5
Geographic distance
Network dissimilarity (all species)
Geographic distance
Network dissimilarity (shared species)
Figure 3. Relationships between the geographic distance between two sites, and the species dissimilarity, network dissimilarity with all
species, and network dissimilarity with only shared species.
# Extract the lat/lon data
LatLon <- ldply(Networks, function(x) c(name = x$name, lat =
x$latitude, lon = x$longitude))
rownames (LatLon) <- LatLon$name
LatLon$lat <- as.numeric(LatLon$lat)
LatLon$lon <- as.numeric(LatLon$lon)
LatLon <- LatLon[,c('lat', 'lon')]
# Then we measure the distances between all pairs of sites
GeoDist <- spDists(as.matrix(LatLon, latlon=TRUE))
colnames(GeoDist) <- rownames(GeoDist) <- rownames(LatLon)
GeoDist <- as.dist(GeoDist)
# Now, we measure the beta-diversity of the networks
names(graphs) <- aaply(names(graphs), 1, function(x)
Networks[[x]]$name)
# Finally, we measure the beta-diversity
BetaDiv <- network_betadiversity(graphs)
# We add the geographic distance
BetaDiv$GEO <- GeoDist
# Plotting
source("suppmat/usecase_beta.r")
  • ... IWDB is an example of data repository where nearly a hundred of webs are available today, half of them being plant-pollinator networks. Mangal ( Poisot et al., 2016a,b) has been designed (with an R package [rmangal]) to access, curate and deposit data on ecological interactions. The Global Biotic Interactions (GloBI) database ( Poelen et al., 2014) is an open structure to share and analyse species interactions in a structured data repository. ...
    ... microbio.me.) and ecology (e.g. mangal; Poisot et al., 2016a,b), but NGS networks would still need to connect both fields. Taking inspiration from current databases, a systematic storage of DNA sequences and ecological interactions together would allow network ecologists to rapidly recreate the NGS networks published, compare them with their own ecological networks and make further analysis possible. ...
    Article
    Full-text available
    Ecological network analysis (ENA) provides a mechanistic framework for describing complex species interactions, quantifying ecosystem services, and examining the impacts of environmental change on ecosystems. In this chapter, we highlight the importance and potential of ENA in future biomonitoring programs, as current biomonitoring indicators (e.g. species richness, population abundances of targeted species) are mostly descriptive and unable to characterize the mechanisms that underpin ecosystem functioning. Measuring the robustness of multilayer networks in the long term is one way of integrating ecological metrics more generally into biomonitoring schemes to better measure biodiversity and ecosystem functioning. Ecological networks are nevertheless difficult and labour-intensive to construct using conventional approaches, especially when building multilayer networks in poorly studied ecosystems (i.e. many tropical regions). Next-generation sequencing (NGS) provides unprecedented opportunities to rapidly build highly resolved species interaction networks across multiple trophic levels, but are yet to be fully exploited. We highlight the impediments to ecologists wishing to build DNA-based ecological networks and discuss some possible solutions. Machine learning and better data sharing between ecologists represent very important areas for advances in NGS-based networks. The future of network ecology is very exciting as all the tools necessary to build highly resolved multilayer networks are now within ecologists reach.
  • ... All of the data are openly available in the database mangal (Poisot et al. 2016a) and all R scripts for running the analy- ses, are provided in the github repository < https://github. com/DominiqueGravel/ms_probaweb >. ...
    Article
    Biogeography has traditionally focused on the spatial distribution and abundance of species. Both are driven by the way species interact with one another, but only recently community ecologists realized the need to document their spatial and temporal variation. Here, we call for an integrated approach, adopting the view that community structure is best represented as a network of ecological interactions, and show how it translates to biogeography questions. We propose that the ecological niche should encompass the effect of the environment on species distribution (the Grinnellian dimension of the niche) and on the ecological interactions among them (the Eltonian dimension). Starting from this concept, we develop a quantitative theory to explain turnover of interactions in space and time – i.e. a novel approach to interaction distribution modeling. We apply this framework to host–parasite interactions across Europe and find that two aspects of the environment (temperature and precipitation) exert a strong imprint on species co‐occurrence, but not on species interactions. Even where species co‐occur, interaction proves to be stochastic rather than deterministic, adding to variation in realized network structure. We also find that a large majority of host‐parasite pairs are never found together, thus precluding any inferences regarding their probability to interact. This first attempt to explain variation of network structure at large spatial scales opens new perspectives at the interface of species distribution modeling and community ecology.
  • Article
    The challenge Understanding how biotic interactions affect species’ geographical ranges, biodiversity patterns and ecological responses to environmental change is one of the most pressing challenges in macroecology. Extensive efforts are underway to detect signals of biotic interactions in macroecological data. However, efforts are limited by bias in the taxa and spatial scale for which occurrence data are available and by difficulty in ascribing causality to co‐occurrence patterns. Moreover, we are not necessarily looking in the right places; analyses are largely ad hoc, depending on availability of data, rather than focusing on regions, taxa, ecosystems or interaction types where biotic interactions might affect species’ geographical ranges most strongly. Unpicking biotic interactions We suggest that macroecology would benefit from the recognition that abiotic conditions alter two key components of biotic interaction strength: frequency and intensity. We outline how and why variation in biotic interaction strength occurs, explore the implications for species’ geographical ranges and discuss the challenges inherent in quantifying these effects. In addition, we explore the role of behavioural flexibility in mediating biotic interactions potentially to mitigate impacts of environmental change. New data We argue that macroecology should take advantage of “independent” data on the strength of biotic interactions measured by other disciplines, in order to capture a far wider array of taxa, locations and interaction types than are typically studied in macroecology. Data on biotic interactions are readily available from community, disease, microbial and parasite ecology, evolution, palaeontology, invasion biology and agriculture, but most are yet to be exploited within macroecology. Integrating biotic interaction strength data into macroecology Harmonization of data across interdisciplinary sources, taxa and interaction types could be achieved by breaking down interactions into elements that contribute to frequency and intensity. This would allow quantitative biotic interaction data to be incorporated directly into models of species distributions and macroecological patterns.
  • Article
    1.A framework for the description and analysis of multilayer networks is established in statistical physics and calls are increasing for their adoption by community ecologists. Multilayer networks in community ecology will allow space, time, and multiple interaction types to be incorporated into species‐interaction networks. 2. While the multilayer‐network framework is applicable to ecological questions, it is one thing to be able to describe ecological communities as multilayer networks and another for multilayer networks to actually prove useful for answering ecological questions. Im portantly, documenting multilayer network structure requires substantially greater empirical investment than standard ecological networks. In response, we argue that this additional effort is worthwhile and describe a series of research lines where we expect multilayer networks will generate the greatest impact. 3.Inter‐layer edges are the key component that differentiate multilayer networks from stan dard ecological networks. Inter‐layer edges join different networks—termed layers—together and represent ecological processes central to the species interactions studied (e.g. inter‐layer edges representing movement for networks separated in space). Interlayer edges may take a variety of forms, be species‐ or network‐specific, and be measured with a large suite of empirical techniques. Additionally, the sheer size of ecological multilayer networks also requires some changes to empirical data collection around interaction quantification, collaborative efforts, and collation in public databases. 4.Network ecology has already touched on a wide swathe of ecology and evolutionary biology. Because network stability and patterns of species linkage are the most developed areas of network ecology, they are a natural starting place for multilayer investigations. However, multilayer networks will also provide novel insights to niche partitioning, the connection between traits and species’ interactions, and even the geographic mosaic of coevolution. 5.Synthesis: Multilayer networks provide a formal way to bring together the study of species‐interaction networks and the processes that influence them. However, describing inter‐layer edges and the increasing amounts of data required represent challenges. The payoff for added investment will be ecological networks that describe the composition and capture the dynamics of ecological communities more completely and, consequently, have greater power for understanding the patterns and processes that underpin diversity in ecological communities. This article is protected by copyright. All rights reserved.
  • Article
    Full-text available
    Network approaches to ecological questions have been increasingly used, particularly in recent decades. The abstraction of ecological systems – such as communities – through networks of interactions between their components indeed provides a way to summarize this information with single objects. The methodological framework derived from graph theory also provides numerous approaches and measures to analyze these objects and can offer new perspectives on established ecological theories as well as tools to address new challenges. However, prior to using these methods to test ecological hypotheses, it is necessary that we understand, adapt, and use them in ways that both allow us to deliver their full potential and account for their limitations. Here, we attempt to increase the accessibility of network approaches by providing a review of the tools that have been developed so far, with – what we believe to be – their appropriate uses and potential limitations. This is not an exhaustive review of all methods and metrics, but rather, an overview of tools that are robust, informative, and ecologically sound. After providing a brief presentation of species interaction networks and how to build them in order to summarize ecological information of different types, we then classify methods and metrics by the types of ecological questions that they can be used to answer from global to local scales, including methods for hypothesis testing and future perspectives. Specifically, we show how the organization of species interactions in a community yields different network structures (e.g., more or less dense, modular or nested), how different measures can be used to describe and quantify these emerging structures, and how to compare communities based on these differences in structures. Within networks, we illustrate metrics that can be used to describe and compare the functional and dynamic roles of species based on their position in the network and the organization of their interactions as well as associated new methods to test the significance of these results. Lastly, we describe potential fruitful avenues for new methodological developments to address novel ecological questions.
  • Preprint
    Full-text available
    There is a growing concern about the status and trends of animal pollinators worldwide. Pollinators provide a key service to both wild plants and crops by mediating their reproduction, so pollinator conservation is of fundamental importance to conservation and to food production. Understanding of the extent of pollinator declines is constrained by the paucity of accessible data, which leads to geographically- and taxonomically-biased assessments. In addition, land conversion to agriculture and intensive agricultural management are two of the main threats to pollinators. This is paradoxical, as crop production depends on pollinators to maximize productivity. There is a need to reconcile conservation and ecosystem service provision in agroecosystems. These challenges require coordinated transdisciplinary research infrastructures. Specifically, we need better research infrastructures to (i) describe pollinator decline patterns worldwide, (ii) monitor current pollinator trends, and (iii) understand how to enhance pollinators and pollination in agroecosystems. This can be achieved, first, by redoubling the efforts to make historical occurrence data on species occurrences, interactions and traits openly available and easy to integrate across databases. Second, by empowering citizen science to monitor key pollinator species in a coordinated way and standardizing and consolidating long term collection protocols both in natural and agricultural areas. Finally, there is a need to develop multi-actor, localised research infrastructures allowing integration of social, economic and ecological approaches in agriculture. We illustrate how decentralized infrastructures can accelerate the process of co-producing research and integrating data collection across scientists, managers, members of the public, farmers and disciplines. The time is ripe to harness the power of coordinated research infrastructures to understand and mitigate pollinator declines.
  • Preprint
    Full-text available
    There is a growing concern about the status and trends of animal pollinators worldwide. Pollinators provide a key service to both wild plants and crops by mediating their reproduction, so pollinator conservation is of fundamental importance to conservation and to food production. Understanding of the extent of pollinator declines is constrained by the paucity of accessible data, which leads to geographically- and taxonomically-biased assessments. In addition, land conversion to agriculture and intensive agricultural management are two of the main threats to pollinators. This is paradoxical, as crop production depends on pollinators to maximize productivity. There is a need to reconcile conservation and ecosystem service provision in agroecosystems. These challenges require coordinated transdisciplinary research infrastructures. Specifically, we need better research infrastructures to (i) describe pollinator decline patterns worldwide, (ii) monitor current pollinator trends, and (iii) understand how to enhance pollinators and pollination in agroecosystems. This can be achieved, first, by redoubling the efforts to make historical occurrence data on species occurrences, interactions and traits openly available and easy to integrate across databases. Second, by empowering citizen science to monitor key pollinator species in a coordinated way and standardizing and consolidating long term collection protocols both in natural and agricultural areas. Finally, there is a need to develop multi-actor, localised research infrastructures allowing integration of social, economic and ecological approaches in agriculture. We illustrate how decentralized infrastructures can accelerate the process of co-producing research and integrating data collection across scientists, managers, members of the public, farmers and disciplines. The time is ripe to harness the power of coordinated research infrastructures to understand and mitigate pollinator declines.
  • Chapter
    Complex network analysis allows ecologists to implement interesting and diverse approaches to study interactions among the most diverse life forms. In the last decades, several tools and advances have been developed in software, randomizations, and computer graphics; increasing the quantity of ecologists that lead authorship when these analyses are used in their research. Thereby, some metrics and indices have been improved and others appeared as novel approaches, establishing a vast quantity of information in literature. In this chapter, you will be able to find a compendium of the main descriptors currently used in the literature, as well as the primary information to develop the statistical analysis and graph visualization. It is important to have enough criteria when using these metrics and indices, which must be complemented with both: knowledge concerning natural history and the logic and limitations of the indices and analysis, in order to avoid misleading conclusions.
  • Article
    Full-text available
    Knowledge of species composition and their interactions, in the form of interaction networks, is required to understand processes shaping their distribution over time and space. As such, comparing ecological networks along environmental gradients represents a promising new research avenue to understand the organization of life. Variation in the position and intensity of links within networks along environmental gradients may be driven by turnover in species composition, by variation in species abundances and by abiotic influences on species interactions. While investigating changes in species composition has a long tradition, so far only a limited number of studies have examined changes in species interactions between networks, often with differing approaches. Here, we review studies investigating variation in network structures along environmental gradients, highlighting how methodological decisions about standardization can influence their conclusions. Due to their complexity, variation among ecological networks is frequently studied using properties that summarize the distribution or topology of interactions such as number of links, connectance, or modularity. These properties can either be compared directly or using a procedure of standardization. While measures of network structure can be directly related to changes along environmental gradients, standardization is frequently used to facilitate interpretation of variation in network properties by controlling for some co-variables, or via null models. Null models allow comparing the deviation of empirical networks from random expectations and are expected to provide a more mechanistic understanding of the factors shaping ecological networks when they are coupled with functional traits. As an illustration, we compare approaches to quantify the role of trait matching in driving the structure of plant–hummingbird mutualistic networks, i.e. a direct comparison, standardized by null models and hypothesis-based metaweb. Overall, our analysis warns against a comparison of studies that rely on distinct forms of standardization, as they are likely to highlight different signals. Fostering a better understanding of the analytical tools available and the signal they detect will help produce deeper insights into how and why ecological networks vary along environmental gradients.
  • Article
    Full-text available
    Network ecology provides a systems basis for approaching ecological questions, such as factors that influence biological diversity, the role of particular species or particular traits in structuring ecosystems, and long-term ecological dynamics (e.g., stability). Whereas the introduction of network theory has enabled ecologists to quantify not only the degree, but also the architecture of ecological complexity, these advances have come at the cost of introducing new challenges, including new theoretical concepts and metrics, and increased data complexity and computational intensity. Synthesizing recent developments in the network ecology literature, we point to several potential solutions to these issues: integrating network metrics and their terminology across sub-disciplines; benchmarking new network algorithms and models to increase mechanistic understanding; and improving tools for sharing ecological network research, in particular “model” data provenance, to increase the reproducibility of network models and analyses. We propose that applying these solutions will aid in synthesizing ecological sub-disciplines and allied fields by improving the accessibility of network methods and models.
  • Article
    Community ecology is tasked with the considerable challenge of predicting the structure, and properties, of emerging ecosystems. It requires the ability to understand how and why species interact, as this will allow the development of mechanism-based predictive models, and as such to better characterize how ecological mechanisms act locally on the existence of inter-specific interactions. Here we argue that the current conceptualization of species interaction networks is ill-suited for this task. Instead, we propose that future research must start to account for the intrinsic variability of species interactions, then scale up from here onto complex networks. This can be accomplished simply by recognizing that there exists intra-specific variability, in traits or properties related to the establishment of species interactions. By shifting the scale towards population-based processes, we show that this new approach will improve our predictive ability and mechanistic understanding of how species interact over large spatial or temporal scales.
  • Article
    Plant–pollinator mutualistic networks represent the ecological context of foraging (for pollinators) and reproduction (for plants and some pollinators). Plant–pollinator visitation networks exhibit highly conserved structural properties across diverse habitats and species assemblages. The most successful hypotheses to explain these network properties are the neutrality and biological constraints hypotheses, which posit that species interaction frequencies can be explained by species relative abundances, and trait mismatches between potential mutualists respectively. However, previous network analyses emphasize the prediction of metrics of qualitative network structure, which may not represent stringent tests of these hypotheses. Using a newly documented temporally explicit alpine plant–pollinator visitation network, we show that metrics of both qualitative and quantitative network structure are easy to predict, even by models that predict the identity or frequency of species interactions poorly. A variety of phenological and morphological constraints as well as neutral interactions successfully predicted all network metrics tested, without accurately predicting species observed interactions. Species phenology alone was the best predictor of observed interaction frequencies. However, all models were poor predictors of species pairwise interaction frequencies, suggesting that other aspects of species biology not generally considered in network studies, such as reproduction for dipterans, play an important role in shaping plant–pollinator visitation network structure at this site. Future progress in explaining the structure and dynamics of mutualistic networks will require new approaches that emphasize accurate prediction of species pairwise interactions rather than network metrics, and better reflect the biology underlying species interactions.
  • Article
    Full-text available
    An intricate network of interactions between organisms and their environment form the ecosystems that sustain life on earth. With a detailed understanding of these interactions, ecologists and biologists can make better informed predictions about the ways different environmental factors will impact ecosystems. Despite the abundance of research data on biotic and abiotic interactions, no comprehensive and easily accessible data collection is available that spans taxonomic, geospatial, and temporal domains. Biotic-interaction datasets are effectively siloed, inhibiting cross-dataset comparisons. In order to pool resources and bring to light individual datasets, specialized research tools are needed to aggregate, normalize, and integrate existing datasets with standard taxonomies, ontologies, vocabularies, and structured data repositories. Global Biotic Interactions (GloBI) provides such tools by way of an open, community-driven infrastructure designed to lower the barrier for researchers to perform ecological systems analysis and modeling. GloBI provides a tool that (a) ingests, normalizes, and aggregates datasets, (b) integrates interoperable data with accepted ontologies (e.g., OBO Relations Ontology, Uberon, and Environment Ontology), vocabularies (e.g., Coastal and Marine Ecological Classification Standard), and taxonomies (e.g., Integrated Taxonomic Information System and National Center for Biotechnology Information Taxonomy Database), (c) makes data accessible through anapplication programming interface (API) and various data archives (Darwin Core, Turtle, and Neo4j), and (d) houses a data collection of about 700,000 species interactions across about 50,000 taxa, covering over 1,100 references from 19 data sources. GloBI has taken an open-source and open-data approach in order to make integrated species-interaction data maximally accessible and to encourage users to provide feedback, contribute data, and improve data access methods. The GloBI collection ofdatasets is currently used in the Encyclopedia of Life (EOL) and Gulf of Mexico Species Interactions (GoMexSI).
  • Article
    Full-text available
    Interaction webs, or networks, define how the members of two or more trophic levels interact. However, the traits that mediate network structure have not been widely investigated. Generally, the mechanism that determines plant-pollinator partnerships is thought to involve the matching of a suite of species traits (such as abundance, phenology, morphology) between trophic levels. These traits are often unknown or hard to measure, but may reflect phylogenetic history. We asked whether morphological traits or phylogenetic history were more important in mediating network structure in mutualistic plant-pollinator interaction networks from Western Canada. At the plant species level, sexual system, growth form, and flower symmetry were the most important traits. For example species with radially symmetrical flowers had more connections within their modules (a subset of species that interact more among one another than outside of the module) than species with bilaterally symmetrical flowers. At the pollinator species level, social species had more connections within and among modules. In addition, larger pollinators tended to be more specialized. As traits mediate interactions and have a phylogenetic signal, we found that phylogenetically close species tend to interact with a similar set of species. At the network level, patterns were weak, but we found increasing functional trait and phylogenetic diversity of plants associated with increased weighted nestedness. These results provide evidence that both specific traits and phylogenetic history can contribute to the nature of mutualistic interactions within networks, but they explain less variation between networks.
  • Article
    Current global changes make it important to be able to predict which interactions will occur in the emerging ecosystems. Most of the current methods to infer the existence of interactions between two species require a good knowledge of their behaviour or a direct observation of interactions. In this paper, we overcome these limitations by developing a method, inspired from the niche model of food web structure, using the statistical relationship between predator and prey body size to infer the matrix of potential interactions among a pool of species. The novelty of our approach is to infer, for any species of a given species pool, the three species‐specific parameters of the niche model. The method applies to both local and metaweb scales. It allows one to evaluate the feeding interactions of a new species entering the community. We find that this method gives robust predictions of the structure of food webs and that its efficiency is increased when the strength of the body–size relationship between predators and preys increases. We finally illustrate the potential of the method to infer the metaweb structure of pelagic fishes of the Mediterranean sea under different global change scenarios.
  • Article
    Full-text available
    All species are hierarchically related to one another, and we use taxonomic names to label the nodes in this hierarchy. Taxonomic data is becoming increasingly available on the web, but scientists need a way to access it in a programmatic fashion that’s easy and reproducible. We have developed taxize, an open-source software package (freely available from http://cran.r-project.org/web/packages/taxize/index.html) for the R language. taxize provides simple, programmatic access to taxonomic data for 13 data sources around the web. We discuss the need for a taxonomic toolbelt in R, and outline a suite of use cases for which taxize is ideally suited (including a full workflow as an appendix). The taxize package facilitates open and reproducible science by allowing taxonomic data collection to be done in the open-source R platform.
  • Article
    Policies ensuring that research data are available on public archives are increasingly being implemented at the government [1], funding agency [2-4], and journal [5, 6] level. These policies are predicated on the idea that authors are poor stewards of their data, particularly over the long term [7], and indeed many studies have found that authors are often unable or unwilling to share their data [8-11]. However, there are no systematic estimates of how the availability of research data changes with time since publication. We therefore requested data sets from a relatively homogenous set of 516 articles published between 2 and 22 years ago, and found that availability of the data was strongly affected by article age. For papers where the authors gave the status of their data, the odds of a data set being extant fell by 17% per year. In addition, the odds that we could find a working e-mail address for the first, last, or corresponding author fell by 7% per year. Our results reinforce the notion that, in the long term, research data cannot be reliably preserved by individual researchers, and further demonstrate the urgent need for policies mandating data sharing via public archives.
  • Article
    Full-text available
    Background. Attribution to the original contributor upon reuse of published data is important both as a reward for data creators and to document the provenance of research findings. Previous studies have found that papers with publicly available datasets receive a higher number of citations than similar studies without available data. However, few previous analyses have had the statistical power to control for the many variables known to predict citation rate, which has led to uncertain estimates of the “citation benefit”. Furthermore, little is known about patterns in data reuse over time and across datasets. Method and Results. Here, we look at citation rates while controlling for many known citation predictors and investigate the variability of data reuse. In a multivariate regression on 10,555 studies that created gene expression microarray data, we found that studies that made data available in a public repository received 9% (95% confidence interval: 5% to 13%) more citations than similar studies for which the data was not made available. Date of publication, journal impact factor, open access status, number of authors, first and last author publication history, corresponding author country, institution citation history, and study topic were included as covariates. The citation benefit varied with date of dataset deposition: a citation benefit was most clear for papers published in 2004 and 2005, at about 30%. Authors published most papers using their own datasets within two years of their first publication on the dataset, whereas data reuse papers published by third-party investigators continued to accumulate for at least six years. To study patterns of data reuse directly, we compiled 9,724 instances of third party data reuse via mention of GEO or ArrayExpress accession numbers in the full text of papers. The level of third-party data use was high: for 100 datasets deposited in year 0, we estimated that 40 papers in PubMed reused a dataset by year 2, 100 by year 4, and more than 150 data reuse papers had been published by year 5. Data reuse was distributed across a broad base of datasets: a very conservative estimate found that 20% of the datasets deposited between 2003 and 2007 had been reused at least once by third parties. Conclusion. After accounting for other factors affecting citation rate, we find a robust citation benefit from open data, although a smaller one than previously reported. We conclude there is a direct effect of third-party data reuse that persists for years beyond the time when researchers have published most of the papers reusing their own data. Other factors that may also contribute to the citation benefit are considered. We further conclude that, at least for gene expression microarray data, a substantial fraction of archived datasets are reused, and that the intensity of dataset reuse has been steadily increasing since 2003.
  • Article
    Full-text available
    SUMMARY Network theory is gaining momentum as a descriptive tool in community ecology. Because organisms with the same lifestyle can still exhibit ecological differences, it is crucial to determine the scale at which networks should be described. Here we show that networks of hosts (mammals) and parasites (ectoparasitic gamasid mites) differ when either facultative or obligatory parasites only are considered. More importantly, the structure of these networks is opposed, with obligatory parasites networks being more modular, and facultative parasites networks being more nested. Our results have consequences for the way we define which species to include in ecological networks, which we discuss in the light of community ecology and epidemiology.