ArticlePDF Available

Abstract and Figures

AGRIS is the International System for Agricultural Science and Technology. It is supported by a large community of data providers, partners and users. AGRIS is a database that aggregates bibliographic data, and through this core data, related content across online information systems is retrieved by taking advantage of Semantic Web capabilities. AGRIS is a global public good and its vision is to be a responsive service to its user needs by facilitating contributions and feedback regarding the AGRIS core knowledgebase, AGRIS’s future and its continuous development. Periodic AGRIS e-consultations, partner meetings and user feedback are assimilated to the development of the AGRIS application and content coverage. This paper outlines the current AGRIS technical set-up, its network of partners, data providers and users as well as how AGRIS’s responsiveness to clients’ needs inspires the continuous technical development of the application. The paper concludes by providing a use case of how the AGRIS stakeholder input and the subsequent AGRIS e-consultation results influence the development of the AGRIS application, knowledgebase and service delivery.
Content may be subject to copyright.
F1000Research
Open Peer Review
, Bayer CropScience LPLaura Privalle
R&D USA
, The Genome AnalysisRobert Davey
Centre UK
Discuss this article
(0)Comments
2
1
SOFTWARE TOOL ARTICLE
AGRIS: providing access to agricultural research data exploiting
open data on the web [v1; ref status: indexed,
http://f1000r.es/599]
Fabrizio Celli , Thembani Malapela , Karna Wegner , Imma Subirats ,
Elena Kokoliou , Johannes Keizer
1
Food and Agriculture Organization of the United Nations, Viale delle Terme di Caracalla, Rome, 00153, Italy
Agro-Know, Vrilissia, 152 36, Greece
Abstract
AGRIS is the International System for Agricultural Science and Technology. It is
supported by a large community of data providers, partners and users. AGRIS
is a database that aggregates bibliographic data, and through this core data,
related content across online information systems is retrieved by taking
advantage of Semantic Web capabilities. AGRIS is a global public good and its
vision is to be a responsive service to its user needs by facilitating contributions
and feedback regarding the AGRIS core knowledgebase, AGRIS’s future and
its continuous development. Periodic AGRIS e-consultations, partner meetings
and user feedback are assimilated to the development of the AGRIS
application and content coverage. This paper outlines the current AGRIS
technical set-up, its network of partners, data providers and users as well as
how AGRIS’s responsiveness to clients’ needs inspires the continuous
technical development of the application. The paper concludes by providing a
use case of how the AGRIS stakeholder input and the subsequent AGRIS
e-consultation results influence the development of the AGRIS application,
knowledgebase and service delivery.
Thembani Malapela ( )Corresponding author: thembani.malapela@fao.org
Celli F, Malapela T, Wegner K How to cite this article: et al. AGRIS: providing access to agricultural research data exploiting open data
2015, :110 (doi: )on the web [v1; ref status: indexed, ]http://f1000r.es/599 F1000Research 4 10.12688/f1000research.6354.1
© 2015 Celli F . This is an open access article distributed under the terms of the , whichCopyright: et al Creative Commons Attribution Licence
permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The author(s) is/are employees
of the US Government and therefore domestic copyright protection in USA does not apply to this work. The work may be protected under the
copyright laws of other jurisdictions when used in those jurisdictions. Data associated with the article are available under the terms of the Creative
(CC0 1.0 Public domain dedication).Commons Zero "No rights reserved" data waiver
The work described in this article was partly funded by the EC project “agInfra: A data infrastructure to support agriculturalGrant information:
scientific Communities”, Grant agreement no: 283770.
Competing interests: No competing interests were declared.
08 May 2015, :110 (doi: ) First published: 4 10.12688/f1000research.6354.1
18 Aug 2015, :110 (doi: )First indexed: 4 10.12688/f1000research.6354.1
1 1 1 1
2 1
1
2
Referee Status:
Invited Referees
version 1
published
08 May 2015
1 2
report report
08 May 2015, :110 (doi: )First published: 4 10.12688/f1000research.6354.1
08 May 2015, :110 (doi: )Latest published: 4 10.12688/f1000research.6354.1
v1
Page 1 of 10
F1000Research 2015, 4:110 Last updated: 18 AUG 2015
1.0 Introduction
In the last decade Semantic Web technologies have introduced
changes into the way structured data is published, shared and con-
sumed on the Web. The Web has become a powerful bedrock where
emerging online applications use it as an infrastructure to exchange,
query and link semantically related data and information
1
. In order
to take advantage of the prowess of the emerging Web, many repos-
itories have adopted linked data principles making the vision of a
semantic Web of data a reality
2
. Overtime, two important roles of
linked open data (LOD) have emerged: consuming and publishing
data, thereby facilitating innovation and wider knowledge crea-
tion and sharing
3
. The principle of linked data has been extensively
described in publications and books
4,5
. There are still challenges
faced in browsing, analyzing, reusing and consuming linked data
by the research community, Semantic Web community and policy
makers. The major fallacy
1
of these emerging technologies is that
they assume that connectivity to data repositories and entity resolu-
tion services are always online and available.
In the agricultural domain, the Agricultural Information Manage-
ment Standards (AIMS) Team of the Food and Agriculture Organi-
zation of the United Nations (FAO) has taken advantage of the
possibilities of LOD in making agricultural data, information and
knowledge accessible. Often-cited examples include the publica-
tion of AGROVOC (http://aims.fao.org/vest-registry/vocabularies/
agrovoc-multilingual-agricultural-thesaurus) as a linked data set
6
,
and the AGRIS database and application
7
. AGRIS (the International
System for Agricultural Science and Technology) is an initiative
that was set up in 1974 by the FAO to make agricultural research
information discoverable and globally available. Since then AGRIS
has been collecting from more than 150 data providers located in
more than 65 countries. AGRIS collects and disseminates biblio-
graphic information on scholarly and scientific publications in agri-
culture and related subjects.
AGRIS today is a ‘global public good’
8
, built and maintained by a
big community of data providers, partners and users. This is based
on two overarching principles. Firstly, that AGRIS grants complete
core access to data where users are allowed to download and use
the content subject to an acceptable use policy (http://agris.fao.
org/content/acceptable-use-policy). Secondly, users are invited to
give ideas on the development of AGRIS and its vision through
e-consultations, stakeholder meetings, user surveys and feedback.
This paper will briefly overview the recent developments in AGRIS
and outline the latest technical implementations. The objective of
this paper is to show the responsiveness of AGRIS to the commu-
nity (clients’) needs and review the steps leading to the technical
development and future direction of AGRIS.
2.0 The AGRIS mashup
Since December 2013, AGRIS has exposed its database as LOD,
defining uniform resource identifiers (URIs) for bibliographic pub-
lications and allowing anyone to reuse the database also through a
SPARQL endpoint. After an initial period where LOD opportunities
were tested in the OpenAGRIS system
9
, the AGRIS team decided
to adopt LOD standards into the deployed system. The goal was
to take advantage of the latent knowledge available in the AGRIS
data, in order to automatically discover and display related and
relevant information from the Internet. AGRIS seeks to become the
prime information service for agricultural research, where domain
experts, agricultural extentionists, students, researchers, librarians/
information managers and decision makers can discover needed
information with precision and recall it in a short response time.
When the user is searching for a publication, the AGRIS system
is able to enrich the user’s query by displaying a mashup page
with results of related information available on the same topic. To
achieve this, AGRIS adopted a dual approach that allows users to
access agricultural information through:
- Bibliographic metadata in the domain of agricultural science
and technology are stored in a central database, currently
storing nearly 8 million bibliographic references of scholarly
and scientific publications.
- Other types of information (distribution maps, passport data,
pictures, other bibliography, etc.) that are interlinked to the
AGRIS central database.
Two things are crucial to build a useful mashup page: the selec-
tion of the data sources and the precision of the automatic extracted
resources. Precision in this context means the relevance of dis-
played resources
10
must be of the same subject coverage as that of
the article selected by the user. In AGRIS this is possible through
AGROVOC
6
, which is a Simple Knowledge Organization System
(SKOS) concept scheme used to index the AGRIS database. AGRO-
VOC brings additional value as a thesaurus consisting of more than
32,000 concepts and is available in 21 languages, covering all areas
of interest to the AGRIS database. Therefore, AGROVOC is the
backbone of the resource discovery process where AGRIS records
(which are indexed with AGROVOC concepts) are used to query
external Web services (e.g. by scientific names) and SPARQL end-
points by using AGROVOC URIs or alignments with other thesauri
related to agriculture. External data sources are identified based on
the content, the relevancy to the AGRIS domain, and after evaluat-
ing, the information provider
11
.
Figure 1 shows a mashup page which displays an AGRIS record
selected by the user with some AGROVOC descriptors and URIs
(left). Once AGRIS loads the mashup page, it reads the list of
AGROVOC URIs available within the AGRIS record, and run asyn-
chronous queries to external Web services and SPARQL endpoints
to get information related to the content of the selected AGRIS
record. In the screenshot below for the AGROVOC concept “Oryza
sativa”, AGRIS displays a distribution map from GBIF (http://www.
gbif.org; the Global Biodiversity Information Facility), as well
as some germplasm collecting missions from Bioversity Interna-
tional (http://www.bioversityinternational.org/). AGRIS pulls and
visualizes data from World Bank, CGRIS germplasm database, and
International Food Policy Research Institute (IFPRI). A full listing
of external data sources AGRIS pulls from is available on AGRIS
website (http://agris.fao.org/content/how-it-works).
3.0 AGRIS recent developments
AGRIS is constantly evolving to provide its users with new valu-
able services and many new different sources of information to be
explored. The development of AGRIS, both on the service and data
sides, is mainly driven by AGRIS users, who can provide feedback,
Page 2 of 10
F1000Research 2015, 4:110 Last updated: 18 AUG 2015
ideas and needs in different ways: using the feedback” form availa-
ble in the AGRIS Web site; responding to periodic surveys designed
by the AGRIS team; sending emails to the AGRIS Team or joining
online events like AIMS Webinars [http://aims.fao.org/capacity-
development/webinars] or AGRIS e-consultations. All this feedback
is collected and analyzed to define priorities and provide new serv-
ices to the community. For instance, after the adoption of a linked
data infrastructure (when AGRIS and OpenAGRIS were merged at
the beginning of 2014), the AGRIS team prepared an online sur-
vey and Webinars to collect feedback about the new AGRIS Web
application. Two main activities were considered as top priorities to
improve the service:
1. Inclusion of all the available bibliographic metadata in the
AGRIS mashup page;
2. Multilingual search with the possibility to get results in
several languages when searching with keywords in a specific
language.
The first activity was carried out because, even though the AGRIS
Web site mashes-up many sources of information to provide its users
with a good browsing experience, for many AGRIS users access
to the complete bibliographic metadata set is a valuable piece of
information. Thus, the mashup view was extended to include all
bibliographic metadata available, and advanced search functional-
ity (namely, “classical view”) was re-introduced to allow filtering
results according to specific metadata elements.
In the second activity, the multilingual search, the objective was to
allow users to query the AGRIS database in their own native lan-
guage, as well as retrieving results in different languages. This is
exemplified by the following use case:
Xian is a Chinese researcher and he wants to discover some
knowledge from the AGRIS database. He wants to know
something more about “rice and recent research activities
surrounding it, but he prefers to query the database using
his own native language. So he starts querying AGRIS using
the keyword . The AGRIS system discovers only 14
documents: they are not enough to add additional filters and
they refer only to documents indexed with a Chinese keyword.
Xian wants to access the international literature, so he also
wants English articles. On the right side of the AGRIS interface,
Xian enables the multilingual search and clicks on “GO”:
150,000 results! Maybe now Xian has too many articles to
examine, but he can use other keywords to restrict the number
of the results…
Figure 1. AGRIS mashup screenshot showing AGRIS bibliographic records pulling and visualising linked resources from external
data sources.
Page 3 of 10
F1000Research 2015, 4:110 Last updated: 18 AUG 2015
Figure 2. Additional bibliographic metadata available in the mashup page.
Figure 3. The CGRIS widget showing a record integrated from the Chinese Agricultural Scitech Documents Database. This allows
users to tell the data source of the main bibliographic data of the AGRIS article.
Page 4 of 10
F1000Research 2015, 4:110 Last updated: 18 AUG 2015
The multilingual search is very important to facilitate access to lit-
erature in different languages: a future improvement of this feature
will be offering the possibility of selecting sub-sets of languages to be
included in the output of a query. The implementation of this feature
relies on AGROVOC and on the AGRIS-linked open data infrastruc-
ture. In fact, while AGRIS records are indexed with AGROVOC
keywords in a specific language, the translation to resource descrip-
tion framework (RDF) makes AGROVOC URIs usable. From an
AGROVOC URI there is a possibility to extract labels of a concept
in all the languages available in AGROVOC: those labels can be
considered as “translations” of a query term, so that they can be
used to expand the user’s query to include the translation of terms
in different languages. To be more precise, the implementation of
the multilingual search feature required two activities:
- Indexing AGROVOC URIs in Apache Solr (http://lucene.
apache.org/solr)
- Implementation of a software component that expands the
user’s query to match results in all languages available in
AGROVOC. The query expansion is transparent to the end user,
who does not need to know technical details of this feature.
Another improvement was the inclusion of Chinese research content
in the AGRIS database where a large amount of Chinese metadata
were directly interlinked to the AGRIS database and displayed in
the mashup pages. In the context of AgINFRA (http://aginfra.eu/)
and the collaboration between AGRIS and the Chinese Academy
of Agricultural Sciences [http://www.caas.cn/en/administration/
research_institutes/research_institutes_beijing/77772.shtml],
500,000 resources from the Chinese Agricultural Sci-tech Docu-
ments Database (CASDD) and 410,000 resources from the CGRIS
germplasm database were exposed as Web services and exploited
as AGRIS external data sources, relying on the AGROVOC formal
alignment with the Chinese Agricultural Thesaurus (CAT). The out-
come of this activity was the inclusion of a large batch of Chinese
research in agriculture in the AGRIS system, together with a unique
collection of all types of plant genetic resources information from
China, enriching the AGRIS knowledge base.
4.0 AGRIS data ingestion
AGRIS is supported by a community of data providers, part-
ners and users. AGRIS ingests bibliographic metadata provided
by the community and publishes it as open data; the metadata is
captured through either (i) pulling data through harvesting from
clients or (ii) by data being pushed to the AGRIS from clients
9
.
AGRIS uses various tools and technologies to consume metadata
from content providers and accepts any metadata records that meet
the Meaning Bibliographic Metadata (M2B) standards. AGRIS’s
data providers come from an international audience, with users
often at varying stages of technological development. Figure 4
below summarizes the AGRIS data workflow, ingestion and
processing.
The resultant AGRIS content is exposed via the AGRIS Web
application which is a mashup application that allows users
to query the AGRIS content, interlinking all records to external
sources of information. (See Figure 1 above and section 2.0 for
more details).
Figure 4. AGRIS dataflow and processing summarising different AGRIS data ingestion workflows and processing
9
.
Page 5 of 10
F1000Research 2015, 4:110 Last updated: 18 AUG 2015
5.0 AGRIS community needs
AGRIS’ vision is to be a responsive service to its global users’ needs
by facilitating their contribution to the AGRIS core knowledgebase,
AGRIS’s future and continuous development. Since 2014, FAO,
Agro-Know (http://www.agroknow.gr/agroknow/) and the Agricul-
tural Information Institute of Chinese Academy of Agricultural Sci-
ences (http://www.caas.cn/en/administration/research_institutes/
research_institutes_beijing/77772.shtml) (CAAS) have established
a collaboration for the maintenance and centralization of AGRIS
data processing. The collaboration is keen to keep AGRIS a
community-driven product responding to the needs of the clients.
The AGRIS team has a commitment to see AGRIS visitors and data
providers as clients who contribute to the continuous development
of AGRIS. In pursuit of this goal, periodic AGRIS stakeholder
meetings, AGRIS e-consultations in the form of online surveys
and user feedback are carried out to inform the development of
the AGRIS application and coverage of the knowledgebase. Four
thematic areas of focus have emerged since the initial discussions:
1.) AGRIS subject coverage, 2.) geographical accessibility of the
system, 3.) improvement in user interactions and multilingualism
and 4.) Strengthening the infrastructural backbone of AGRIS.
In considering these thematic areas, AGRIS partners agreed to
map a strategy in each respective thematic area where the resultant
output will drive technical developments, new functionalities and
usability features. The involvement of the AGRIS community of
users and further collaboration on technical developments will be
invaluable in strengthening and developing new functionalities for
the AGRIS portal. Feedback received from the community of data
providers, partners and users is important for the possible improve-
ments to the AGRIS portal and the knowledgebase. Furthermore,
as stated earlier AGRIS has also been involved in a number of
projects with the European Commission. For example, within the
SemaGrow (http://www.semagrow.eu/) project, AGRIS served as a
demonstrator of a technical infrastructure based on the federation
of many triple stores; relying on the two backend components Agro
Tagger (http://aims.fao.org/vest-registry/tools/agrotagger) and Web
Crawler, AGRIS will be able to crawl the Web and to index discov-
ered resources with AGROVOC URIs.
The AGRIS maintenance partners sought the engagement of the
broader community into the further technical developments of
AGRIS in the key thematic areas outlined above. The following
issues emerged in the aforementioned four key areas:
AGRIS content coverage
In terms of subject coverage, the AGRIS database collects bib-
liographic references in agriculture as defined by the FAO which
includes nutrition, forestry, and fisheries. Since the nomenclature
of AGRIS defines it as an international system for Agricultural Sci-
ence and Technology, technology could also be included. The full
list of subject categories can be downloaded online (http://www.
fao.org/scripts/agris/c-categ.htm): they will be revised in the com-
ing months to enable the list to cope with the increased subject cov-
erage requirement.
In terms of content, AGRIS core data initially focused on grey lit-
erature and later came to include papers, reports and other content
types. The partners felt that AGRIS backbone data should continue
to be bibliographic metadata, but felt that linked data technologies
should be fully exploited to allow the inclusion of other relevant
content types. To further develop the coverage of AGRIS content
and to prevent stagnation, the AGRIS team aims to work out a new
adequate subject scope for the AGRIS knowledgebase and discover
new sources of information and data in collaboration with commu-
nity partners. There are possibilities of linking AGRIS with science
blogs and automatically updated feeds, and of further strengthening
the relationship between AGRIS and AgriFeeds (http://www.agri-
feeds.org/) (for example, http://esciencenews.com and other feeds
from scientific presses and universities).
Using data mining, the AGRIS database could be a way to access
already existing information. In the AGRIS e-consultation users
expressed their demand for more additional data like statistics, mul-
timedia, price data, daily crops prices etc. The user survey addition-
ally underlined the high demand for accessing full text resources.
The AGRIS team has already responded with the implementation
of the mashup page that allows linking to full text resources in the
internet. The AGRIS Team is aware of the potential in identifying
relevant content to interlink with AGRIS core data (http://aims.fao.
org/activity/blog/aginfra-promotes-integration-biodiversity-infor-
mation-agris). The work on providing even more full-text links
and resources will be continued, with the possibility of enriching
AGRIS metadata with newly discovered full-text links and of set-
ting up a link-checking mechanism to remove broken links. There
will be a need for AGRIS’s authors’ disambiguation and the initial
option could be to use unique author identifiers, for example AGRIS
intends to use the AgriVIVO’s (http://aims.fao.org/vest-registry/
tools/agrivivo) scientific profiles. Another interesting activity will
be the analysis of AGRIS full-text links to extract relevant informa-
tion, such as a database of pictures indexed with AGROVOC, which
will help enrich the content of a specific paper and to allow the
re-use of pictures for personal reports or research activities, subject
to copyright.
Geographical accessibility of the system and multilingualism
Although AGRIS can be accessed from anywhere in the world, it
has been noted that there is a lack of good performance in some
regions especially in China and East Asia. This might be due to
the fact that the front-end Web application is hosted only in Rome.
The AGRIS team is aware of this challenge and is trying to look for
a solution to geographical accessibility by collaborating with com-
munity partners. In the meantime, the possibility of having two rep-
licas of the database in two continents to minimize this challenge
is being considered. This solution will require resources (such as
a system administrator in each replica), a synchronization mecha-
nism and a networking mechanism to geographically serve users
seamlessly from different places in the world.
The user survey which registered 279 respondents who confirmed
the lack of performance in some geographic regions. The overall
feedback on performance was positive: around 80% of the users
rated the performance of the web portal as extremely good or
moderately good. Half of the users that are not satisfied with the
AGRIS performance come from Asian countries. Several options
to improve performance especially for countries in East Asia are
Page 6 of 10
F1000Research 2015, 4:110 Last updated: 18 AUG 2015
currently being discussed and need testing. The AGRIS team must
ensure that a better connection to Asia will not put other regions
at a disadvantage. Recently, the multilingual search is an exam-
ple of a client-demand service that has already been implemented.
(see recent developments in section 3.0 above).
Improvement in user interactions
The creation of a user registration facility is one of the main goals
of the AGRIS team and was demanded by the community. Both
AGRIS partners and users expressed their interest for a private area
that allows the creation of personal profiles and the customiza-
tion of the AGRIS interface (a step towards “social AGRIS”). The
implementation of functions to define the portal design and select
preferred datasets in the mashup page are possible as well as the
addition of social functions like comments, ratings and quoting.
The AGRIS team sees potential in having AGRIS sparking debates
and collaborations based on AGRIS content. Support for mobile
devices (e.g. smartphones and tablets) is another possible improve-
ment that has been demanded in the surveys and discussed with
individual community partners. In the survey users regard mobile
device accessibility as very important with 40% of respondents
wanting AGRIS to be read on a Tablet and 24% wanting access to
AGRIS on their Smartphone.
Strengthening the AGRIS backbone
The strengthening of the AGRIS backbone is important to provide a
sustainable system. Currently, the AGRIS database is replicated in
two types of models – (i) the AGRIS AP file system XML database
and (ii) the AGRIS RDF triplestore. The AGRIS team will cease
to maintain the AGRIS AP XML database and will design a new
streamlined data model (most probably based on AGRIS RDF and
linked open data-enabled bibliographic data (LODE-BD) (http://
aims.fao.org/lode/bd) that allows ingestion of data directly into
the triple store. One of the goals is to design a more scalable and
stable backend solution, as a system of load balancing of different
instances of the AGRIS triplestore. In order to involve the commu-
nity, the AGRIS team will have experiments as part of ‘Hackathons’
where participants can try different triplestore solutions simulating
AGRIS queries to the database.
The above summarized feedback represents the many suggestions
of the clients’ needs and expectations from AGRIS. The value of the
user-driven and responsive service is a core part of the AGRIS Vision
and its continuous development. The move to a ‘social AGRIS’
will ensure that the AGRIS community of partners, data providers
and users shape AGRIS service into the future. The collection of
feedback from AGRIS spurs a number of potential enhancements
now and in the future, with experiments made possible by the com-
munity (in the form of Hackathons). Redesign of workflows and
AGRIS architecture, alongside strategic collaboration with global
partners are some of the evident processes and activities within the
AGRIS future vision.
6.0 Conclusion
AGRIS seeks to be a technological service that embraces the linked
open data technologies while continuing to be a service relevant to
its clients. This paper establishes that AGRIS is a global good, in
that it is truly global in terms of data contribution and access, and
also a public good in that it is built, maintained and responds to
its community of partners, data providers, and users. AGRIS has
been a bedrock for a number of semantic tools (as exhibited by the
SemaGrow project) yet also provides a gateway to scientific
research in Agriculture, Science and Technology. Semantic Web
features have afforded AGRIS the ability to continue to be ‘the’ por-
tal in Agriculture, Science and Technology for students, researchers
and policy makers while at the same time constantly providing new
and valuable services to the community in a dynamic and changing
world.
Author contributions
FC acted as the AGRIS technical lead and decided on the conceptual
content, was responsible for technical parts of this paper, contributed
section 2.0 and section 3.0 and was responsible for production of
all the images.
TM acted as corresponding author and coordinated inputs from all
sections as well as contributing section 1.0 and part of section 5.0
and section 6.0, including the literature search. TM also fine-tuned
the references and organization of the paper.
KW contributed to section 5.0, interpreted the survey results and
related materials and also contributed the whole AGRIS community
section.
IS provided editorial guidance and edited the paper for appropri-
ateness regarding F1000 guidelines and correctness in discussing
metadata related aspects.
EK is responsible for AGRIS data processing and contributed to
section 5.0 by providing the analysis of the survey.
JK made provided guidance as to the title of the paper and direc-
tion it took, as well as taking responsibility for this content from
FAO’s perspective. JK also reviewed the article and cleared it for
publication.
Competing interests
No competing interests were declared.
Grant information
The work described in this article was partly funded by the EC
project “agInfra: A data infrastructure to support agricultural scien-
tific Communities”, Grant agreement no: 283770.
Page 7 of 10
F1000Research 2015, 4:110 Last updated: 18 AUG 2015
1. Gueret C, Boyera S, Powell M, et al.: The Semantic Web for all. 2014.
Reference Source
2. Gayo JEL, Kontokostas D, Auer S: Multilingual Linked Data Patterns. 2012.
Reference Source
3. Bauer F, Kaltenböck M: Linked open data: The essentials. A quick start
guide for decision makers. 2012.
Reference Source
4. Zuiderwijk A, Jeffery K, Janssen M: The potential of metadata for linked
open data and its value for users and publishers. JeDEM. 2012; 4(2):
222–244.
Reference Source
5. Zaveri A, Rula A, Maurino A: Quality assessment methodologies for
linked open data: A systematic literature review and conceptual
framework. 2012.
Reference Source
6. Caracciolo C, Stellato A, Morshed A, et al.: The AGROVOC linked dataset.
Semantic Web. 2013; 4(3): 341–348.
Reference Source
7. Anibaldi S, Jaques Y, Celli F, et al.: Migrating bibliographic datasets to
the Semantic Web: The AGRIS case. Semantic Web. 2015; 6(2): 113–120.
Publisher Full Text
8. Rodríguez JM, Clement AJ, Farhan H, et al.: Publishing statistical data
following the linked open data principles: The web index project. In
Ordonez de Pablos, P. (ed). Cases on Open-Linked Data and Semantic Web
Applications. Spain: IGI. 2013; 28.
Publisher Full Text
9. Celli F, Jaques Y, Anibaldi S, et al.: Pushing, Pulling, Harvesting, Linking:
Rethinking bibliographic workflows for the Semantic Web. EFITA-
WCCA-CIGR Conference, Turin, Italy, 24–27 June 2013. 2013.
Reference Source
10. Turpin A, Scholer F: User performance versus precision measures for
simple search tasks. In Proceedings of the 29th Annual international ACM
SIGIR Conference on Research and Development in information Retrieval.
2006; 11–18.
Publisher Full Text
11. Jaques Y, Anibaldi S, Celli F, et al.: Proof and Trust in the OpenAGRIS
Implementation. Proc. Int’l Conf. on Dublin Core and Metadata Applications.
2012.
Reference Source
References
Page 8 of 10
F1000Research 2015, 4:110 Last updated: 18 AUG 2015
F1000Research
Open Peer Review
Current Referee Status:
Version 1
18 August 2015Referee Report
doi:10.5256/f1000research.6813.r9982
Robert Davey
The Genome Analysis Centre, Norwich, UK
I, Robert Davey (TGAC, UK)
will sign my name to my review
will review with integrity
will treat the review as a discourse with you; in particular, I will provide constructive criticism
will be an ambassador for the practice of open science
The authors describe AGRIS, a metadata repository for an impressive cohort of bibliographic linked open
data in the agricultural sciences.
Managing large-scale data generative approaches ("big"/heterogeneous data) is only one side of the coin
in the modern research era. With so much data available now (and this is only going to get worse), we
need systems like AGRIS to make sense of the descriptions of data to ensure that researchers can find
and reuse findings more easily, and more importantly integrate them into their own work.
Whilst I too found the AGROVOC system to be a little slow and unresponsive, the AGRIS portal is
relatively fast which shows the underlying power of the linked datasets, which the mashup interface
represents nicely. I did attempt to use the LOD Live portion of the site, but all of the resources I tried to
visualise came up with the "no resource endpoint configured". I'm not sure if this is a factor of the actual
resource itself, or an issue with the query mechanism. Could the authors give some working examples?
The paper mentions the use of Lucene, but are detailed technical documents about the software
implementations that power the website available? Likewise, is there any relevant source code that would
be suitable for release to the community? If so, links to this information might be useful.
The paper reads well, and gives some insight into how such a platform is built, assessed and used. The
ability to search for terms in multiple languages is something that is incredibly important, and should be
commended.
My only minor comments for revision would be that:
the term "mashup" might not be well understood by many. A short description or refactoring of the
term might add some clarity.
I don't see the benefit of the 1.0, 2.0, etc section names. There are no subsections, so the article
doesn't really need the ordered list style.
Page 9 of 10
F1000Research 2015, 4:110 Last updated: 18 AUG 2015
F1000Research
doesn't really need the ordered list style.
I have read this submission. I believe that I have an appropriate level of expertise to confirm that
it is of an acceptable scientific standard.
No competing interests were disclosed.Competing Interests:
18 May 2015Referee Report
doi:10.5256/f1000research.6813.r8691
Laura Privalle
Regulatory Science Seeds and Traits Innovation Center, Bayer CropScience LP R&D, Morrisville, NC,
USA
This is an informative article on the availability of a potentially very useful software tool of which I was not
previously aware. The article presents clear examples of how to use the tool and the value it brings to the
user. Others will be equally interested to learn of this website.
I have read this submission. I believe that I have an appropriate level of expertise to confirm that
it is of an acceptable scientific standard.
No competing interests were disclosed.Competing Interests:
Page 10 of 10
F1000Research 2015, 4:110 Last updated: 18 AUG 2015
... Here, a genome has several tens of TF families, which can be classified according to their biological functions and nucleotide sequence patterns in the DNAbinding sites (cis-elements). For example, the genome of the model plant Arabidopsis (125 Mbp) has more than 1500-1700 TFs (genes), and the TF genes are grouped into about 50 different families (Arabidopsis Genome Initiative, 2000;Celli et al., 2015). ...
... Information on regulatory factors is accessible from databases such as TAIR , AGRIS (Celli et al., 2015), PlantPAN , PlantTFDB (Jin et al., 2017), NEW PLACE (Higo et al., 1999), and Cistrome DAP . This information available online consists of the data obtained from laboratory experiments or computational analysis. ...
... The search engines used for the review were: Google Scholar (Younger, 2010), FAO AGRIS (Celli et al., 2015), Bielefeld Academic Search Engine -BASE (Bäcker et al., 2017), ICRAFs Agroforestry database (AFT; Orwa et al., 2009) and Plant Resources of Tropical Africa -PROTA (Lemmens et al., 2012). Grey literature was kept to a minimum and every effort was made to cite peer reviewed articles from the closest possible geographic location to the study site. ...
Technical Report
Full-text available
The aim of the study was to understand and quantify the water-use of different agricultural and ecological land-use components of the Maputaland Coastal Plain. These could potentially be developed into an integrated, multiple-use agroforestry system(s), as an alternative to commercial plantation forestry in water stressed catchments. The objectives of the project were to: 1. Understand with accuracy the water-use of plantation forestry and indigenous species within a commercial, community woodlot and mixed plantation or agroforestry environment in Maputaland. 2. Understand the ecological pattern and water-use of natural vegetation systems that could be incorporated in agroforestry systems in Maputaland. 3. Develop and evaluate groundwater models of the Maputaland Coastal Aquifer to determine the impacts of land-use in context to plantation forestry, natural vegetation systems and a mixed plantation environment.
... The search engines used for the review were: Google Scholar (Younger, 2010), FAO AGRIS (Celli et al., 2015), Bielefeld Academic Search Engine -BASE (Bäcker et al., 2017), ICRAFs Agroforestry database (AFT; Orwa et al., 2009) and Plant Resources of Tropical Africa -PROTA (Lemmens et al., 2012). Grey literature was kept to a minimum and every effort was made to cite peer reviewed articles from the closest possible geographic location to the study site. ...
Technical Report
This report has been reviewed by the Water Research Commission (WRC) and approved for publication. Approval does not signify that the contents necessarily reflect the views and policies of the WRC nor does mention of trade names or commercial products constitute endorsement or recommendation for use. Obtainable from: Water Research Commission Private Bag X03 GEZINA, 0031 orders@wrc.org.za or download from www.wrc.org.za
... Agriculture is well served by freely available resources, because there has been a concerted, but uncoordinated effort to develop semantic resources for agriculture by various national agencies [7,8,13,22,25,37,42,43,46,45,50,74,77,78,79]. Semantic resources are typically one of two types: general agriculture [22] or specialized sub-domains of agriculture. ...
Article
Full-text available
Semantic web technologies have become a popular technique to apply meaning to unstructured data. They have been infrequently applied to problems within the agricultural domain when compared to complementary domains. Despite this lack of application, agriculture has a large number of semantic resources that have been developed by large NGOs such as the Food and Agriculture Organization (FAO). This survey is intended to motivate further research in the application of semantic web technologies for agricultural problems, by making available a self contained reference that provides: a comprehensive review of preexisting semantic resources and their construction methods, data interchange standards, as well as a survey of the current applications of semantic web technologies.
... As the test case shows (Figure 3, Figure 4), if query CASDD with English term "barley", the lucene-skos plugin will analyze, expand and reconstruct the query both from languages and semantic relations aspects based on CAT, we can get Chinese records about and closely related with "大麦". As we know, AGRIS [16] is a global public database holds over 8 million bibliographic records largely enhanced with AGROVOC. It's also a mash-up web application that links the AGRIS knowledge to related Web resources using the Linked Open Data methodology to provide as much information as possible about a topic within the agricultural domain 7 . ...
Article
Cereals have evolved various tolerance mechanisms to cope with the drought environment. Understanding the drought tolerance mechanism of cereal crops at the molecular level offers a trail to produce high yielding and drought tolerant cultivars to sustain food and nutritional security. Systems biology approaches have been recognized as the vital player to generate insights on the genome wide stress-regulation mechanisms. The integration of stress manipulating factors such as the transcription factors, co-expressed genes and metabolic flux, etc., highlights the self-driven drought-responsive mechanism. Hence, reviewing the regulatory mechanisms involved in alleviating drought stressed plants is crucial, especially in economically and nutritionally important cereals. Therefore, various in-silico tools and databases available for predicting genes and proteins involved in the drought-responsive mechanism have been outlined and described with the scope to develop an advanced level of understanding of the drought stress responses in cereals. Thus, the details provided in the review would enable the reader to differentiate the competencies of algorithms employed and give scope to choose the appropriate tool and technique to decline the negative impacts and limit the failures in the intensive crop improvement research.
Chapter
The main focus of the study was to explore the practices of open data sharing in the agricultural sector, including establishing the research outputs concerning open data in agriculture. The study adopted a desktop research methodology based on literature review and bibliographic data from WoS database. Bibliometric indicators discussed include yearly productivity, most prolific authors, and enhanced countries. Study findings revealed that research activity in the field of agriculture and open access is very low. There were 36 OA articles and only 6 publications had an open data badge. Most researchers do not yet embrace the need to openly publish their data set despite the availability of numerous open data repositories. Unfortunately, most African countries are still lagging behind in management of agricultural open data. The study therefore recommends that researchers should publish their research data sets as OA. African countries need to put more efforts in establishing open data repositories and implementing the necessary policies to facilitate OA.
Article
Grasslands are integral to rural livelihoods in southern Africa, because they provide hydrological regulation services and a variety of plant resources, including livestock fodder, medicines, and food products. To ensure ongoing provision of these resources in rapidly developing rural landscapes, an understanding of the relationships between grassland species composition and ecosystem services is required. This study examines the provision of grassland forage and non-forage resources across five grassland types in relation to environmental determinants of site topography, soil conditions, and plantation-forestry disturbance. Grasslands characteristic of low-lying and fertile landscape positions were dominated by nutritious lawn grasses and therefore tended to complement rangeland practices, whereas grasslands associated with elevated areas or infertile conditions were diverse in species composition and consequently provided the majority of plant medicines, spiritual resources, fruit-beverage resources, oils, and craft materials. Secondary grassland, resulting from forestry plantation abandonment, had moderate forage potential and limited non-forage resources. Our results provide a simple framework for approaching grassland resource classification, grassland conservation and land use management on the Maputaland coastal plain.
Thesis
Full-text available
This study aims to enhance the visibility and interoperability of information resources that research institutes own, and it aims to contribute to solving problems of interconnections and links to information resources in order to support access for users. The author developed agricultural research information systems provided by research institutes (such as information on literature, bibliographies, and full texts), the goal of development is to increase visibility such that many people can use them. In addition, research institutes have built the research information infrastructure using some systems, such as link resolvers, in order to interconnect research information. Link resolvers provide connections between articles published in electronic journals and bibliographic databases. Many libraries are introducing link resolvers to enhance convenience for the retrieval of articles. Many research results are published on web sites of research institutes and are being used by citizens and scholars. Linked Data helps ensure interconnections and use of the information resources being distributed on the Web. According to the Linking Open Data cloud diagram, many datasets of Linked Open Data (LOD) were distributed on the Web. If research information is well structured for enhancing the visibility of research results, then the outcomes published by research institutes can be used to build an infrastructure for interconnecting with many information resources. The author thinks it is possible to improve the usability of research results for people to use them for new findings. In this study, the author aimed to: (1) Clarify the advantages of link resolvers for accessing information resources, and show the effect of interconnecting the resources. (2) Develop a method to enhance interoperability to link them with other information resources, in order to apply LOD to research results published on the Web. The author discusses how to enhance the visibility and interconnectivity of information resources that research institutes own. Interconnecting information resources will help to solve problems for supporting access to research information. This paper consists of four chapters. Chapter 1 discusses the infrastructure of agricultural research information and clarifies two aims to enhance the visibility and interoperability of information resources. The author proposes a method to develop this infrastructure to ensure interconnections, sharing, and re-use of the resources contained in libraries, in addition to the findings published by research institutes. Chapter 2 discusses the analysis of link resolvers’ effect on users’ access to bibliographic databases and electronic journals. This chapter reports an analysis of the log files of link resolvers, bibliographic databases, and electronic journals. A link resolver is a tool that facilitates interconnections to increase the visibility of journals. The author found that link resolvers could lead users from bibliographic databases to electronic journals in a wide range of academic fields. In addition, he found that link resolvers more often lead users to journals that specialize in certain subjects, rather than those that cover broad areas, such as the Proceedings of the National Academy of Sciences (PNAS), Science, and Nature. In particular, the author observed effect using link resolvers from a generic database, such as Web of Science. Journals in a wide range of academic fields are usually accessed via e-journals’ websites, whereas specialized journals are more frequently accessed than before due to references from link resolvers. The author discovered how link resolvers affect user behavior. Interconnections can increase the visibility and use of information resources. In addition, anyone will be able to find and access research results. That is, they have access to connections between research results and external information resources. More importantly, they will be able to create links to research results from external information resources. Better interconnections to information resources will increase the use of research institutes’ findings. Chapter 3 discusses structure analysis and a definition schema method for creating an LOD from complex information resources. Many information resources on the Web are organized in such a way that humans can easily read them. However, they are not well structured from the viewpoint of LOD. The author has explored how to create LOD from complex information resources with various elements and achieve interconnectivity without losing any information. He found that it is possible to arrange data efficiently to create LOD by analyzing the structure and vocabulary based on the description of an information resource. Furthermore, the author proposed an approach to apply the appropriate LOD structure which does not lose any information, in order to link to other resources when using those designed for human reading. The author found that information resources provided on the Web to be able to interconnect with external information resources by defining structures of information resources based on the LOD technologies. However, if information resources do not have enough details to create LOD, it would be hard to organize a structure that can be interconnected. For this reason, the author focused on improving the representation of any component item of an information resource in this study. He defined an item as an entity identified in the LOD environment, and the entity has an attribute indicating the type of information. In the future, this will help people link the item to other resources. The study concludes with Chapter 4. The author learned that it is possible to enhance the visibility of research information and interconnectivity among information resources by building an infrastructure of research information for scholars and research institutes using the methodology described in this study. The infrastructure built in this enhanced environment will improve the research setting by increasing the appropriate representation of information resources and their interconnections with the outcomes of the research institutes. Both link resolvers and LOD technologies are powerful tools for interconnecting structured information resources. In order to increase the appropriate representation of information resources, it is necessary to form and define suitable entities and their attributes for interconnection. Link resolvers take a cost when connecting information resources each other. LOD technologies take a cost at the time of building them. The purpose of publishing new research results is to make them available to any user, such that the user can get new findings. Research institutes need to provide information resources in a more accessible format. However, research institutes are required to reduce costs. Therefore, the author believes that research institutes should provide the cost of building an infrastructure (for linking information resources, as discussed in this dissertation). There are many information resources already provided in various formats. The author considers that, in the near future, it is necessary to examine the methods, which are based on this dissertation to enhance the visibility and interoperability of these sources at a low cost and within a short period. Furthermore, it is necessary to consider how to evaluate the effectiveness of the interconnections of well-structured information resources. The findings of this study will help develop a better research information environment for every user, whereby he or she can get new findings from the information resources of research institutes. In addition, these findings will enhance the value of the information resources provided by research institutes.
Article
Full-text available
Over the past few years Semantic Web technologies have brought significant changes in the way structured data is published, shared and consumed on the Web. Emerging online applications based on the Web of Objects or Linked Open Data can use the Web as a platform to exchange and reason over semantically rich data covering any topic. Yet two-thirds of the world population is deprived of Web access and is thus deprived from applications that could have a critical impact on their life. There is a need to adapt Semantic Web technologies to put them to work "for all" in challenging contexts. This special issue describes progress made towards reaching that goal.
Conference Paper
Full-text available
The AGRIS repository is a bibliographic database covering almost forty years of agricultural research. Following the conversion of its indexing thesaurus AGROVOC into a concept-based vocabulary, the decision was made to express the entire AGRIS repository in RDF as Linked Open Data. As part of this exercise, a semantic mashup named OpenAGRIS was developed in order to access the records and use them to dynamically display related data from external systems through both SPARQL queries and traditional web services. The overall process raised numerous issues regarding the relative lack of administrative metadata required to compellingly address the top proof and trust layers of the semantic web stack, both within the AGRIS repository and in external data dynamically pulled into OpenAGRIS. The team began by disambiguating the journals in which the articles were published and converting them into RDF but quickly realized this was only the beginning of a series of necessary steps in moving from a closed to an open world paradigm. Further disambiguation of institutions, authors and AGRIS Centres as well as the use of the VoiD vocabulary and of quality indicator models are discussed and evaluated.
Conference Paper
Full-text available
In this paper we describe the ongoing move of the AGRIS repository toward a decentralized approach based on Linked Open Data (LOD) (Bizer, et al., 2008). This move has progressively required modifications and enhancements to data, models and workflows. The growing demand for freely accessible data has brought a rise in data distributed using LOD, which combines Resource Description Framework (RDF) (McBride, 2004a) and RDF Schema (McBride, 2004b) with vocabularies such as Dublin Core (DC) (Miles, et al., 2009) and Simple Knowledge Organisation System, together with interfaces such as SPARQL query language for RDF (Prud'hommeaux, et al., 2008). While LOD implementations are by now a well-established pattern, the impacts that such approaches have on underlying business processes is less well understood. The openness of the LOD paradigm can expose flaws in information management workflows. Poor metadata, lack of metrics, vague provenance; all can contribute to the inability of an LOD-enabled system to satisfy the demands of the Semantic Web.
Article
Full-text available
AGRIS is among the most comprehensive online collections of agricultural and related sciences information. It is a growing global catalog of 4.3 million high-quality structured bibliographic records indexed from a worldwide group of providers. AGRIS relies heavily on the AGROVOC thesaurus for its indexing. Following the conversion of that thesaurus into a SKOS concept-scheme and its publication as Linked Open Data (LOD), the entire set of AGRIS records was also triplified and released as LOD. As part of this exercise, OpenAGRIS, a semantic mashup application, was developed to dynamically combine AGRIS data with external data sources, using a mixture of SPARQL queries and traditional web services. The re-engineering of AGRIS for the semantic web raised numerous issues regarding the relative lack of administrative metadata required to compellingly address the proof and trust layers of the semantic web stack, both within the AGRIS repository and in the external data pulled into OpenAGRIS. The AGRIS team began a process of disambiguation and enrichment to continue moving toward an entity-based view of its resources, beginning with the tens of thousands of journals attached to its records. The evolution of the system, the issues raised during the triplification process and the steps necessary for publishing the result as LOD content are hereby discussed and evaluated.
Article
Full-text available
Public and private organizations increasingly release their data to gain benefits such as transparency and economic growth. The use of these open data can be supported and stimulated by providing considerable metadata (data about the data), including discovery, contextual and detailed metadata. In this paper we argue that metadata are key enablers for the effective use of Linked Open Data (LOD). We illustrate the potential of metadata by 1) presenting an overview of advantages and disadvantages of metadata derived from literature, 2) presenting metadata requirements for LOD architectures derived from literature, workshops and a questionnaire, 3) describing a LOD metadata architecture that meets the requirements and 4) showing examples of the application of this architecture in the ENGAGE project. The paper shows that using metadata with the appropriate metadata architecture can yield considerable benefits for LOD publication and use, including improving find ability, accessibility, storing, preservation, analysing, comparing, reproducing, finding inconsistencies, correct interpretation, visualizing, linking data, assessing and ranking the quality of data and avoiding unnecessary duplication of data. The Common European Research Information Format (CERIF) can be used to build the metadata architecture and achieve the advantages.
Article
Full-text available
The increasing publication of linked data makes the vision of the semantic web a probable reality. Although it may seem that the web of data is inherently multilingual, data usually contain labels, comments, descriptions, etc. that depend on the natural language used. When linked data appears in a multilingual setting, it is a challenge to publish and consume it. This paper presents a survey of patterns to publish Multilingual Linked Data and identifies some issues that should be taken into account. As a use case, the paper describes the patterns employed in the DBpedia Internationalization project.
Article
Full-text available
Born in the early 1980's as a multilingual agricultural thesaurus, AGROVOC has steadily evolved over the last fifteen years, moving to an electronic version around the year 2000, and embracing the Semantic Web shortly thereafter. Today AGROVOC is a SKOS-XL concept scheme published as Linked Open Data, containing links (as well as backlinks) and references to many other Linked Datasets in the LOD cloud. In this paper we provide a brief historical summary of AGROVOC and detail its specification as a Linked Dataset.
Conference Paper
Full-text available
Several recent studies have demonstrated that the type of improvements in information retrieval system effectiveness reported in forums such as SIGIR and TREC do not trans- late into a benefit for users. Two of the studies used an instance recall task, and a third used a question answering task, so perhaps it is unsurprising that the precision based measures of IR system effectiveness on one-shot query evalu- ation do not correlate with user performance on these tasks. In this study, we evaluate two different information retrieval tasks on TREC Web-track data: a precision-based user task, measured by the length of time that users need to find a sin- gle document that is relevant to a TREC topic; and, a simple recall-based task, represented by the total number of rele- vant documents that users can identify within five minutes. Users employ search engines with controlled mean average precision (MAP) of between 55% and 95%. Our results show that there is no significant relationship between system ef- fectiveness measured by MAP and the precision-based task. A significant, but weak relationship is present for the preci- sion at one document returned metric. A weak relationship is present between MAP and the simple recall-based task.
Article
This chapter introduces the promotion of statistical data to the Linked Open Data initiative in the context of the Web Index project. A framework for the publication of raw statistics and a method to convert them to Linked Data are also presented following the W3C standards RDF, SKOS, and OWL. This case study is focused on the Web Index project; launched by the Web Foundation, the Index is the first multidimensional measure of the growth, utility, and impact of the Web on people and nations. Finally, an evaluation of the advantages of using Linked Data to publish statistics is also presented in conjunction with a discussion and future steps sections.