Conference PaperPDF Available

Abstract and Figures

Systematic Literature Reviews (SLRs) are increasingly popular to categorize and identify research gaps. Their reliability largely depends on the rigour of the attempt to identify, appraise and aggregate evidences through coding, i.e. the process of examining and organizing the data contained in primary studies in order to answer the research questions. Current Qualitative Data Analysis Software (QDAS) lack of a common format. This jeopardizes reuse (i.e. difficult to share coding data among different tools), evolution (i.e. difficult to turn coding data into living documents that evolve as new research is published), and replicability (i.e. difficult for third parties to access and query coding data). Yet, the result of a recent survey indicates that 71,4% of participants (expert SLR reviewers) are ready to share SLR artifacts in a common repository. On the road towards open coding-data repositories, this work looks into W3C's Open Annotation as the way to RDFized those coding data. Benefits include: portability (i.e. W3C's prestige endorses the adoption of this standard among tool vendors); webization (i.e. coding data becomes URL addressable, hence openly reachable), and data linkage (i.e. RDFizing coding data benefit from Web technologies to query, draw inferences and easily link this data with external vocabularies). This paper rephrases coding practices as annotation practices where data is captured as W3C's Open Annotations. Using an open annotation repository (i.e. Hypothes.is), the paper illustrates how this repository can be populated with coding data. Deployability is proven by describing two clients on top of this repository: (1) a write client that populates the repository through a color-coding highlighter, and (2), a read client that obtains a traditional SLR spreadsheets by querying so-populated repositories.
Content may be subject to copyright.
Coding-Data Portability in Systematic Literature Reviews: a
W3C’s Open Annotation Approach
Oscar Díaz
University of the Basque Country
(UPV/EHU)
San Sebastián, Spain
oscar.diaz@ehu.eus
Haritz Medina
University of the Basque Country
(UPV/EHU)
San Sebastián, Spain
haritz.medina@ehu.eus
Felipe I. Anfurrutia
University of the Basque Country
(UPV/EHU)
San Sebastián, Spain
felipe.anfurrutia@ehu.eus
ABSTRACT
Systematic Literature Reviews (SLRs) are increasingly popular to
categorize and identify research gaps. Their reliability largely de-
pends on the rigour of the attempt to identify, appraise and ag-
gregate evidences through coding, i.e. the process of examining
and organizing the data contained in primary studies in order to
answer the research questions. Current Qualitative Data Analysis
Software (QDAS) lack of a common format. This jeopardizes reuse
(i.e. dicult to share coding data among dierent tools), evolution
(i.e. dicult to turn coding data into living documents that evolve
as new research is published), and replicability (i.e. dicult for third
parties to access and query coding data). Yet, the result of a recent
survey indicates that 71,4% of participants (expert SLR reviewers)
are ready to share SLR artifacts in a common repository. On the road
towards open coding-data repositories, this work looks into W3C’s
Open Annotation as the way to RDFized those coding data. Benets
include: portability (i.e. W3C’s prestige endorses the adoption of
this standard among tool vendors); webization (i.e. coding data be-
comes URL addressable, hence openly reachable), and data linkage
(i.e. RDFizing coding data benet from Web technologies to query,
draw inferences and easily link this data with external vocabular-
ies). This paper rephrases coding practices as annotation practices
where data is captured as W3C’s Open Annotations. Using an open
annotation repository (i.e. Hypothes.is), the paper illustrates how
this repository can be populated with coding data. Deployability
is proven by describing two clients on top of this repository: (1) a
write client that populates the repository through a color-coding
highlighter, and (2), a read client that obtains a traditional SLR
spreadsheets by querying so-populated repositories.
CCS CONCEPTS
Applied computing Annotation
;
Information systems
Data exchange;Ontologies; Mediators and data integration; Me-
diators and data integration;
KEYWORDS
Secondary Studies, Web Annotation, Data portability
Publication rights licensed to ACM. ACM acknowledges that this contribution was
authored or co-authored by an employee, contractor or aliate of a national govern-
ment. As such, the Government retains a nonexclusive, royalty-free right to publish or
reproduce this article, or to allow others to do so, for Government purposes only.
EASE ’19, April 15–17, 2019, Copenhagen, Denmark
©2019 Copyright held by the owner/author(s). Publication rights licensed to ACM.
ACM ISBN 978-1-4503-7145-2/19/04. . . $15.00
https://doi.org/10.1145/3319008.3319025
ACM Reference Format:
Oscar Díaz, Haritz Medina, and Felipe I. Anfurrutia. 2019. Coding-Data
Portability in Systematic Literature Reviews: a W3C’s Open Annotation
Approach. In Evaluation and Assessment in Software Engineering (EASE
’19), April 15–17, 2019, Copenhagen, Denmark. ACM, New York, NY, USA,
Article 4, 10 pages. https://doi.org/10.1145/3319008.3319025
1 INTRODUCTION
Systematic literature reviews (SLRs) have been used extensively
in software engineering to identify clusters of related studies and
research gaps [
15
]. According to a recent review [
1
], one of the
most challenging steps during SLR elaboration is data extraction,
i.e. extracting the required data from the primary studies. Failing
to properly conduct data extraction might jeopardize the quality
of the mapping as a whole, as the reliability of its conclusions are
dependent on the quality of the mapping process used [
6
,
13
]. This
data includes publication details (e.g., authors, year, title) whose
extraction can be automated [
32
]. But most importantly, this data
includes context descriptions (e.g., subjects, technologies, settings)
and ndings (e.g., results, behaviors, actions) researchers need to an-
swer the SLR’s research questions [
9
,
13
]. This commonly requires
laborious coding.
Coding is “the process of examining and organizing the data
contained in each study of the systematic review” [
9
]. “It involves
identifying one or more passages in the text that exemplify the
same theoretical or descriptive idea” [9]. In Software Engineering,
SLRs tend to traditionally favor spreadsheets as the repository of
the coding eort. Spreadsheets tend to collect the code but not the
passages in the text” that sustain the coding decision. This hinders
traceability, i.e. seeking to keep the link to the rationales that sustain
the mapping [
23
]. A step forward is the use of Qualitative Data
Analysis Software (QDAS) packages for SLR [
21
]. This software
collects most of the coding data (including the passages in the text”)
but usually in proprietary formats, hence hindering portability.
Attempts have been made to reach some common vocabulary (e.g.
QuDEx Schema [
8
]), yet the situation as recently as in 2018, is
that “most QDAS packages still work with proprietary formats,
which makes it dicult, if not impossible, to import a whole project
created in one software package into another software package.
There are dierent export possibilities in current QDAS packages,
but as there is no common standard dened yet, the eectiveness
is diverse” [
11
]. This limitation makes researchers be locked-in to a
particular tool and the functionality provides or avoid using a tool
because of the lack of portability of the data [
1
]. This might partially
explain why spreadsheets are still the predominant tool for SLRs
[
27
]. The bottom line is that the lack of a common format, and hence
EASE ’19, April 15–17, 2019, Copenhagen, Denmark Oscar Díaz, Haritz Medina, and Felipe I. Anfurrutia
of open, central coding repositories, hinders SLR development in
dierent aspects :
reuse. Researchers other than the authors cannot explore
the underlying extracted information to answer questions
related to their specic research goals [
1
]. Yet, the result of
a recent survey indicates that 71,4% of participants (expert
SLR reviewers) are ready to share SLR artifacts in common
repositories [19].
evolution. Storage of data in a central repository will facili-
tate the goal of making SLRs into living documents that can
evolve as new research is published [1].
replicability. Readers are deprived from accessing and query-
ing the coding data. Yet, practitioners believe open reposito-
ries will “enable people to double check how one achieved
the reported results” [19].
Making coding data portable rests on the existence of standards
and standardization bodies. This work studies the use of the W3C’s
Web Annotation recommendations (hereafter refer to as “Open
Annotation” or “Web Annotation”) for this purpose. This W3C
initiative seeks to standardize the wide range of proprietary formats
available in the market of Annotation tools [
14
]. A Web Annotation
is an online annotation associated with a web resource, such as a
web page or document.
W3C’s Web Annotation Data Model and Vocabulary provide a
common description model and format to enable annotations to be
shared among systems [
30
]. The aim: unleashing annotations from
the tools’ silos, promoting portability, and open access. For our
purposes, however, there is another aspect even more appealing:
W3C’s annotations are Web resources, i.e. URL addressable. In
pragmatic terms, highlighting a paragraph -a kind of annotation-
turns that paragraph into an URL addressable resource so that users
other than the author can access it. If we manage to capture coding
data in terms of the W3C’s Vocabulary, we will not only facilitate
portability but also turn coding data into Web resources amenable
to be located, reused and extended by third parties. Our eort is
then aligned with Al-Zubidy et al’s plead: “methods that could both
reduce the involved eort and guarantee the quality of SLRs are
in great need” [
1
]. Our research question is then: can W3C’s Web
Annotation accommodate SLR coding practices?
Hence, this paper’s main contributions include :
supporting coding practices as annotation practices along
the W3C’s Web Annotation recommendations (Section 4),
validating the deployability of the approach, i.e. the extent so
coding-driven Web annotations can accommodate existing
coding practices, i.e. coding through color-coding highlight-
ing & reporting through spreadsheets. To this end, Google
Spreadsheet (GSheet) is turned into a client of Web Annota-
tion repositories. The GSheet extension periodically queries
the repository and updates the cells accordingly. No user
interaction is required. (Section 5).
We start by motivating the interest to turn coding data into Web
resources.
2 FROM TAGS TO WEB RESOURCES:
MOTIVATING THE SHIFT
This Section motivates the interest of moving from tags to Web
resources for capturing the coding eort. Figure 1 (a) displays the
result of the coding eort as a spreadsheet. Cells hold the result
of the classication process, i.e. the codes, but not the classica-
tion rationales, i.e. quotes from primary studies sustaining this
classication. This normally requires to go back to the PDFs. On
the QDAS front, Figure 1 (b) shows the case for NVivo. Targeting
ground theory studies, NVivo collects not just the coding result
but also trace back to the case studies (the counterpart of primary
studies in ground theory). Notice however that codes are just tags,
i.e. labels for assigning units of meaning that might be the result of
some agreement but where the rest of the information (i.e. paper,
reviewer, quote,...) is stored in dedicated databases. By contrast,
our proposal is to capture coding data as Web resources along
W3C’s annotation vocabulary. Rationales are three-fold: portability,
webization and data linkage.
Portability
. W3C’s prestige endorses the adoption and conti-
nuity of this normalization eorts. Open standards reduce total
costs and increase returns on investment through the following
benets: interoperability, vendor neutrality, lower and manageable
risk, robustness and durability or increase available skills, to name
a few [7].
Webization
. Nowadays, the majority of academic publications
as well as the grey literature (e.g. blogs, company reports, etc) are
embodied as univocally-identiable Web resources
1
. If primary
studies have a unique ID, why not the codings made on top of
these studies? After all, codes are metadata in the same sense that
the author or publication venue, and hence it can be argued that
they should also be permanently associated with the document.
W3C Open Annotation oers the means to achieve this [
29
]. In
this way, coding data is not just values but Web objects. That is,
they are URL addressable. This is most important for this infor-
mation to be easily reachable, and hence, inspectable. To better
grasp the importance of this fact, let’s use an example. Click on
https://rebrand.ly/ease19quoteExample to see Open Annotation
at work. The highlighted quote might serve to sustain the coding
decision. Rather than keeping it as a highlight embedded into the
PDF document, Web Annotation permits researchers to create an
address for that quote and a link that points to it. Notice that pri-
mary studies and annotations are two dierent though linked Web
objects. Software clients can then display annotations as a Web
layer on top of the document visors.
Data linkage.
Web Annotations are described using semantic
technologies (i.e. RDF) [
24
]. If coding data is captured in RDF, soft-
ware clients can resort to Web technologies (OWL, SKOS, SPARQL,
etc.) to query, draw inferences and easily extend annotation reposi-
tories [
16
]. DBPedia provides an inspiring example. DBPedia turns
Wikipedia data into RDF tuples. However, the importance of this
repository is not only that it includes Wikipedia data, but also that it
incorporates links to other datasets on the Web. Those extra links (in
terms of RDF triples) permit to reach related descriptions found in
other datasets. Likewise, coding repositories become linked datasets
1
A Web resource is any identiable thing using a Uniform Resource Identier (URI) or
Internationalized Resource Identier (IRI).
Coding-Data Portability in Systematic Literature Reviews EASE ’19, April 15–17, 2019, Copenhagen, Denmark
Figure 1: Coding results in spreadsheets (a) and NVivo (b)
[
5
,
16
] where primary studies (as entities) are gradually enriched
with classication data coming from diverse SLRs also embodied
through annotation repositories. But this enrichment is not limited
to SLRs. Other datasets can be used. For instance, journals, authors
or categories can be extended with links to DBPedia that provide a
common understanding about these entities. For instance, the URL
http://dbpedia.org/page/Systematic_review can be used to denote
the notion of “systematic review”.
At this point, a reasonable doubt might arise about whether re-
searchers would be obliged to annotate Web pages rather than their
traditional PDF counterparts. The answer is no. A primary study
(e.g. in PDF format) can be located in multiple locations: digital
libraries on the web (e.g. ScienceDirect), in a shared server among
the reviewer team or locally kept in the reviewer computer. No
matter the PDF copy being annotated, to support reuse, all copies
must be identied unequivocally as the same document. To ensure
univocal reference, W3C resort to Internationalized Resource Iden-
tiers (IRIs). IRI examples include the Document Object Identier
(DOI), the PDF ngerprint
2
, or even URLs. W3C Annotation data
model permits to combine more than one of those identiers. As
annotations are identied using IRIs, it is possible to map as the
same object a locally-hosted PDF document and the paper pub-
lished in a digital library. Some of Web Annotation repositories
(e.g.: Hypothes.is) already have implemented mapping mechanisms
between DOIs, URNs and URIs to make identiable and accessible
annotations independently of how and where they were created
[
28
]. This boils down to researchers annotating dierent PDF copies
of the same primary study can be sure that annotations will point
to the very same resource. If those annotations reside in the same
2
PDF standard denes a unique identier, or “ngerprint”, that authoring tools encode
into the PDFs they create.
repository, then dedicated PDF visors should show all annotations
regardless of the copy in which the annotations were initially made!
3 W3C’S WEB ANNOTATION
RECOMMENDATIONS
Web annotations are “a recreation of traditional annotations (such
as marginalia and highlights) as a new layer of interactivity and
linking on top of the Web” [
29
]. Web annotations are online anno-
tations associated to a Web resource (e.g. a primary study) which
are typically used to convey information about that Web resource
(e.g. a highlight, a comment, a tag, a revision). In 2017, the W3C
published a set of recommendations to enable annotations to be
shared between systems [
29
]. These recommendations include the
data model (describing the model and serialization), the vocabulary
(underpinning classes, predicates and named entities) and the proto-
col (describing mechanisms, annotation creation and management).
Specically, the Open Annotation model (OA) denes annotations
as Web resources that hold two main predicates: “oa:hasTarget” as
the annotated Web resource (e.g. a Web page); and oa:hasBody” as
the content of the annotation (e.g. a comment that qualies that
Web page).
But annotations are not always about whole documents but parts
of them. For instance, highlighting is an annotation made on spe-
cic paragraphs of the document. To single out those paragraphs
as the target of the annotation, W3C provides a mechanism called
Selector. Figure 2 provides an example: “oa:hasBody” stands for com-
ment “myComment”; “oa:hasTarget” points to an oa:SpecicResource
resource which is pinpointed through the quote “annotation” that
appears in source webpage1. In the example, the way to single
this quote out is by indicating the text that precedes (“this is an”)
and follows (“that has”) the text paragraph that is the focus of the
annotation. In addition, W3C provides properties to indicate the
EASE ’19, April 15–17, 2019, Copenhagen, Denmark Oscar Díaz, Haritz Medina, and Felipe I. Anfurrutia
Style Depicts
Instances with an IRI
(annotations,
documents,...)
Instances without an IRI
Classes
Literals
Relationships
Class instantiation
Figure 2: A Web annotation sample using W3C’s data model.
annotation’s provenance (“dcterms:creator”)
3
, when the annotation
is created (“dcterms:created”) or the reasons why the annotation
was created (“oa:motivatedBy”). W3C includes a predened list of
motivations, which is possible to extend with new more precise
motivation denitions. Next section resorts to this capability to
account for SLR coding.
4 CAPTURING CODING DATA AS OPEN
ANNOTATIONS
Coding is “the process of examining and organizing the data con-
tained in each study of the systematic review. It involves identifying
one or more passages in the text that exemplify the same theoretical
or descriptive idea” [
9
]. Broadly, the output of coding is a set of
coding tuples: <paper, category, reviewer, code, quote, validation> that
account for the act of a reviewer classifying a given paper along
a certain code for the category at hand on the grounds of some
paragraphs or quotes found on this paper. In addition, data check-
ers might conduct the validation of the mapping decisions of the
reviewers [6, 26].
Coding tuples are not obtained in a single step but they are grad-
ually elaborated. Cruzes et al. identify the Integrated Approach as
the most relevant for SLR coding practices [9]. Here, codes can be
obtained bottom up from quotes in the primary studies (inductive
approach), but also codes can be readily available from previous
studies where codes are grouped into categories (deductive ap-
proach). Ideally, it is recommended to reuse existing categories as
this permits the comparability between studies [
23
]. Categories
for the “research approach” introduced in [
31
] are a case in point.
However, categories are not always available, and SLRs need to
frequently introduce their own classications [
23
]. In this case, a
3
dcterms: This alias identies the namespace of Dublin Core Schema. This schema
denes a set of vocabulary terms that can be used to describe digital or physical
resources.
“open coding” is being suggested [
22
]. At the onset, a set of para-
graphs are identied that are coded after the research question.In
this way, a number of quotes are obtained from a set of pilot studies.
These initial quotes need to be further elaborated till a set of codes
emerge that properly account for the distinct evidences found in
the pilot studies.
Previous paragraphs identify distinct activities that intertwine
during the gradual obtention of the <paper, category, reviewer, code,
quote, validation> tuples. First,
“codeBookDevelopment”
where
the codes are introduced (terminology along reference [
17
]). Sec-
ond, “categorization” where category codes are created by den-
ing links between codes. Third,
“classifying”
where the paper
is characterized along a code based on some quotes. And nally,
“assessing”
where the validation part is considered. This section
reframes those activities as annotation endeavors. To stick with OA,
dierences among those annotation eorts are captured as distinct
oa:motivatedBy values.
4.1 Classifying
Figure 3 illustrates the case of the mapping of the primaryStudy1
with code annoCodeEvaluationResearch on the grounds of the “we
focus on empirical evaluation” quote. The motivation of the an-
notation is set to oa:classifying, and this text segment is coded
by dcterms:creator reviewer1. Worth noticing, the code is not a
value but an annotation object itself. This moves us to describe the
codeBookDevelopment activity.
4.2 codeBookDevelopment
Codes are introduced also through annotation. But now the tar-
get of the annotation is not a primary study but a reputed doc-
ument, i.e. one that provides a denition for the code being in-
troduced. Figure 4 shows the case of the “Research Type” code
Coding-Data Portability in Systematic Literature Reviews EASE ’19, April 15–17, 2019, Copenhagen, Denmark
Figure 3: Classifying reframed as the process of annotating with an oa:classifying motivation.
Figure 4: codeBookDevelopment reframed as the process of
annotating with an slr:codeBookDevelopment motivation.
which annotates the Wieringa06Paper, specically the text para-
graph where the word/denition appears. The motivation is set to
slr:codeBookDevelopment. This is an extension of the OA motivation
list. This extension is described in https://rebrand.ly/ease19ontology.
4.3 Assessing
Good practices advise initial coding data to be revised [15]. Anno-
tation wise, revision can be considered a meta-annotation process
where reviewers annotate (inform) upon other annotations (classi-
cation annotations). Figure 5, illustrates this situation: anno1 stands
for the Web object described in Figure 3; anno2 is a new anno-
tation that comment “comment1” (oa:hasBody) on top of anno1
(oa:hasTarget). This new annotation is conducted by reviewer2
(dcterms:creator) with the purpose of validating (oa:assessing)re-
viewer1’s anno1 annotation.
Figure 5: Assessing reframed as the process of annotating
with an oa:assessing motivation.
Figure 6: Categorization reframed as the process of annotat-
ing with an oa:linking motivation.
4.4 Categorization
For our purposes, categorization is the process of upgrading a code
as a category. This implies setting some structure among the code
set which is captured as code relationships: codeA is a category for
codeB. Hence, codeA and codeB already exist, and categorization
is operationalized as setting a link from codeA (i.e. the category)
to codeB (i.e. the enclosed code). Figure 6 shows an example: an-
noCodeResearchType code is turned into a category by setting an
EASE ’19, April 15–17, 2019, Copenhagen, Denmark Oscar Díaz, Haritz Medina, and Felipe I. Anfurrutia
Figure 7: Highlight&Go annotation production: classifying annotations. The PDF visor displays a primary study. The high-
lighter holds color-coding buttons.
annotation where annoCodeResearchType is the oa:hasBody, and
annoCodeEvaluationResearch is the oa:hasTarget. Both codes should
already be dened during codeBookDevelopment. Here, we reuse
W3C existing motivation: oa:linking.
5 EVALUATION
This Section aims at providing some evidences of the appropriate-
ness of W3C’s Web Annotation vocabulary for the practice of SLR
coding. The SLR practice might divert from the use cases W3C’s
Web Annotation was initially conceived, hence, potentially jeop-
ardizing two key quality attributes of this ontology
4
: tness and
4
Other questions to assess ontology quality include: Can humans understand the on-
tology correctly? (Intelligibility); Does the ontology accurately represent its domain?
(Fidelity); Is the ontology well-built and are design decisions followed consistently?
(Craftsmanship). Here, our approach sticks to W3C’s vocabulary, resorting to exten-
sions only in a single case: slr:codeBookDevelopment. This is as minimal extension that
does not preclude the qualities already exhibited by the W3C’s vocabulary.
deployability [
20
]. Hence, this evaluation introduces two research
questions:
Does the deployed ontology meet the requirements of the
information system of which it is part? (Deployability);
Does the representation of the domain t the requirements
for its intended use? (Fitness).
5.1 Deployability
Deployability looks at the ontology as deployed software that is
part of a larger system. As a system example, we take highlighters,
a popular means to support coding in QDAS tools. Using color-
coding, codes are assigned to the quote being highlighted in the
study at hand. The question is the extent W3C’s ontology can be
integrated within highlighting software. To answer this question,
we develop Highlight&Go, a Chrome Web Extension that (1) collects
coding data obtained via a dedicated highlighter, and (2), transpar-
ently displays coding data through a Google spreadsheet (GSheet).
Coding-Data Portability in Systematic Literature Reviews EASE ’19, April 15–17, 2019, Copenhagen, Denmark
Figure 8: Highlight&Go annotation production: checking an-
notations. The PDF visor displays an already-highlighted
primary study. The highlighter holds buttons but only for
the codes being used in the study at hand. The highlighter
behaves as an index to locate highlights within the study.
All data Highlight&Go stores are W3C’s annotation triplets. From
a Web-Semantic perspective, Highlight&Go could be considered
an RDF dataset endpoint. Highlight&Go is openly available at the
Chrome Web Store at https://rebrand.ly/highlightAndGo.
Section 4 introduces ve dierent types of annotations to sup-
port four coding activities: “oa:classifying” for annotations upon
primary studies; “slr:codeBookDevelopment” for annotations that
introduce codes; “oa:assessing” for annotations to revise existing
code mappings; and nally, “oa:linking” for annotations that turn
codes into categories. Next paragraphs revise how these activities
are supported in Highlight&Go.
Classifying mode
(see Figure 7). This mode helps to map the
primary study on display: select the text paragraph that supports
the code decision, then the highlighter shows up for reviewers
to click on the corresponding code button. The paragraph will
be accordingly colored, and the corresponding code button will be
proled to denote this code decision. Figure 7 shows the case for the
SPLEMMA paper. Mapping has been conducted for the Asset_type
and the Evolution_activity categories to which codes products and
Implementation_change have been assigned, respectively.
Checking mode
(see Figure 8). When in this mode, data check-
ers can validate the mappings so far. Now, the highlighter behaves
as a kind of index on top of the set of highlights conducted for
the paper at hand, no matter the reviewer. First, checkers can lter
out highlights based on the reviewer. In this way, they can know
who did what. Second, highlights are indexed by category: click on
a code button for the browser to scroll up to displaying the rst
paragraph that endorse this code. At any moment, the data checker
can either validate the code (right click) or introduce his own code
(by moving to the classifying mode). In the example, the Asset_type
mapping is being validated.
CodeBookDevelopment mode
(see Figure 9). Operationally
wise, this mode is similar to the “classifying” one. The dierence
stems from the document being highlighted: it is not a primary study
but a reputed document, i.e. one that provides a denition for the
code being introduced (e.g. theories, protocol) [
9
]. Figure 9 describes
the process of obtaining CodeBookDevelopment annotations out
of Wieringa’s paper: (1) highlight quotes for the new code; (2)
push “New code +” button to link label “Research Type” with the
quote; (3) repeat previous steps to create other codes, e.g. “Asset
type”; (4) create a link from “Research Type” to “Validation Research
(i.e. turning “Research Type” into a category) by highlighting a
Validation Research” quote but this time push the “Research Type
button instead of “New code +”. If SLR authors are introducing the
code themselves, annotation can be made upon authors’ draft SLR
document (insofar as it is available online e.g. kept on Google Drive
or Dropbox).
The bottom line is that the trace of coding activities are kept as
RDF annotation triplets, i.e. an annotation dataset. Being described
along the W3C annotation standard, these triplets can now be
merged, ported or displayed using other standard-compliant client
tools.
5.2 Fitness
Fitness is commonly assess in terms of competency questions, i.e.
data needs that evaluate whether the ontology is able to serve its
intended purpose[
20
]. For our purpose, the acid test is whether
SLR’s spreadsheets can be lled up from an annotation repository.
Can all, row, columns and cells, be obtained from an annotation
dataset? To prove the point, Highlight&Go extends GSheet with
query functions on top of Hypothes.is repositories. On loading, this
extension pulls the associated dataset repository to check whether
new annotations have been introduced since the last sync, and
update the spreadsheet accordingly.
Figure 10 displays the rendering of the repository data at one
point during the coding process: (1) columns are obtained from
codeBookDevelopment annotations and linking annotations; (2) pri-
mary studies are elicited from classication annotations; and (3),
cells are extracted from classication annotations. Cells can hold
either a code if the column stands for a category,or a quote, when
the column is just a code not yet rened into a category. The former
is illustrated by Asset_type where codes are already available (e.g.
Products, Code_asset, SPL_ architecture, etc). On the other hand, cat-
egory Product_derivation_approach is still in an early stage of the
thematic analysis process, and hence, it keeps paragraphs directly
extracted from the primary studies on hold pending for thematic
analysis.
But the important point is that cells do not hold just strings
(i.e. tags) but Web Resources
. Therefore, the W3C recommenda-
tion does not only provide a data model for annotation descrip-
tion but conceptualizes annotations as Web Resources, and hence,
as URL addressable
5
.Highlight&Go taps into it by enriching cell
content with the Web annotations’ URLs. Readers are encouraged
5
These URLs should be static, i.e. they can not be dynamically generated on access.
For instance, dynamic URLs are generated by some content providers (e.g. ACM, IEEE)
for their PDFs. In this case, we have to resort to the DOI or HTML page. Alternatively,
EASE ’19, April 15–17, 2019, Copenhagen, Denmark Oscar Díaz, Haritz Medina, and Felipe I. Anfurrutia
Figure 9: Highlight&Go annotation production: code&category annotations. The PDF visor displays a reputed document. The
highlighter holds just a single button “newCode”.
Figure 10: Highlight&Go annotation production: extending GSheets with query functions upon annotation repositories.
Cells point to Web resources, ready to be navigated upon. To see this example in action, join the Hypothes.is group
(https://rebrand.ly/ease19hypothesis) and next, move to the GSheet (https://rebrand.ly/ease19sheet).
to access the spreadsheet online at https://rebrand.ly/ease19sheet.
Click on the link for the Annotation object to be displayed
6
. There-
fore, clicking on the hyperlinks will open the PDF visor at the very
position where the annotation was taken.
researchers can obtain static URLs by storing PDFs in a cloud storage (e.g. Dropbox).
Title cells can then link to PDFs which are stored in Dropbox.
6
Highlight&Go taps into the Hypothes.is client to display annotation objects. In the
Hypothes.is displayer, annotations are shown within the container document.
Finally, the cell layout is used to reect the reviewing progress sta-
tus. To this end, Highlight&Go queries the repository for oa:assessing
annotations, and apply the rules in Table 1
7
. Each result is mapped
to a color. As an example, consider the SPLEMMA paper. From
Figure 10, checkers can grasp the mapping stage for this primary
study: for the Evolution Activity (white background), the study is
7
For mono-valued categories, reviewers can only provide one code. If two highlights
exist, then they must come from dierent reviewers. For multi-valued categories,
previous predicates become a bit more complicated.
Coding-Data Portability in Systematic Literature Reviews EASE ’19, April 15–17, 2019, Copenhagen, Denmark
Table 1: Cell background colors in hypersheets. The right-
hand column includes the predicate upon “H”, i.e. the set
of tuples <paper, category, reviewer, code, quote, validation>.
Variables “pand “fhold the paper id and the category, re-
spectively, that jointly pinpoint to the cell at hand. More-
over, variables “c” and “v” hold the code contained in the
cell and whether the coding is validated or not, respectively.
Mapping
Stage
Background
Color
Highlight-based Predicate
in-progress white
hH, ¬h’H :
h.paper = p Óh.category = f Ó
((h’.paper = p Óh’.category ,f) Ô
(h’.paper ,pÓh’.category = f))
conicting red
h,h’H :
h.paper = p Óh.category = f Ó
h’.paper = p Óh’.category = f Ó
h.code ,h’.code
coinciding yellow
h,h’H :
h.paper = p Óh.category = f Ó
h’.paper = p Óh’.category = f Ó
h.code = h’.code Óh.reviewer ,
h’.reviewer
validated green
hH :
h.paper = p Óh.category = f Ó
h.code = c Óh.validation = v
still in the process to being mapped; for the Research type (red
background), disagreement arises out; for the Product Derivation
Approach (yellow background), agreement is being reach; nally,
for the Asset type (green background), the agreement has already
been validated. In this way, checkers can easily spot where their in-
tervention is needed, namely, to encourage participation (locate the
white), to resolve conicts (locate the red) or to validate coinciding
mapping outcomes (locate the yellow). Click on the hyperlinks to
be moved to the reading realm, and see the supporting quotes in
context.
To conclude, through a proof-of-concept example, this Section
sustains the case of W3C annotation recommendation as an expres-
sive data model for SLR coding activities. Being backed by W3C,
this standard is a rm candidate to alleviate portability weaknesses
of current SLR tooling.
6 RELATED WORK
Dierent attempts have been made to increase portability of SLR
artifacts. Dierences can be drawn based on the vocabulary domain
(“what”) and the modeling approach (“how”) (see Table 2). As for
the former, main domains include:
coding data. Coming from the QDAS realm, QuDEx and
Atlas-XML provides vocabulary to collect coding data (sources,
codes, code categories, segments, memos and relationships
between them).
primary-study data. Ontologies have been proposed to cap-
ture dierent aspect of the content of research publications
(a.k.a. RDFizing papers): meta-data such as author, venue and
Table 2: Overview of dierent approaches to increase data
portability
Reference Domain Representational
Approach
QuDEx [8] coding data XML Schema
Atlas-XML [2] coding data XML Schema
EMSE [10] primary-study data RDF
SRMModel [3] procedural data Model-driven
SLRONT [32] procedural data RDF
LRMModel [3] primary-study data Model-driven
W3C’s Web
Annotation [this
paper]
coding data RDF
the like (e.g. LRMModel); experimental data (the EMSE ontol-
ogy); or structural and rhetorical concepts (e.g. the Annotea
Project [12] or DoCo [25]),
procedural data. SRMModel represents SLR constructs for
planning, execution and synthesis. In the same vein, SLRONT
captures concepts for the SLR protocol and structured ab-
stract of primary studies.
Proposals use dierent representational mechanism, namely:
Model-driven. SRMModel and LRMModel express SLR con-
structs through a meta-model to tap into model-driven tech-
nologies to account for platform heterogeneity.
XML Schema. This is the case of QuDEx and Atlas-XML.
Though XML facilitates machine processability, it falls short
to reect the semantics to interoperate with other ontologies
[4].
RDF semantic technologies. Here, data is captured as RDF
triplets. This permits to tap into semantic technologies to
query, draw inferences or link with third-party vocabularies.
Our approach diers from previous proposals in investigating a
dierent pair (coding data, RDF). Coding data is probably one of
the SLR artifacts more prone to be reused for inspectability and en-
richment purposes. W3C’s Web Annotation recommendations oer
the potential to facilitate reuse not only by promoting a common
vocabulary but also by turning coding data into a Web resource. Un-
like previous works, our focus was not so much on expressiveness
but about the pragmatics of coding, i.e. study to what extent cod-
ing practices can be operationalize in terms of an easy, well-know
practice: highlighting quotes.
7 CONCLUSIONS
SLR producers are willing to share their laboriously-obtained data.
SLR consumers demand access to this data to capitalize and re-
vise [
19
]. Yet, the lack of portability is making these wishes elu-
sive [
1
]. To face these demands, we advocate for operationalizing
coding practice in terms of annotating Web resources. We resort
to W3C’s Open Annotation recommendations. Specically, prop-
erty oa:motivatedBy serves to distinguish the dierent activities
involved during SLR coding: “slr:codeBookDevelopment” for intro-
ducing new codes; “oa:linking” for setting category relationships
between codes; “oa:classifying” for classifying primary studies; and
EASE ’19, April 15–17, 2019, Copenhagen, Denmark Oscar Díaz, Haritz Medina, and Felipe I. Anfurrutia
nally, oa:assessing” for revising existing mappings. This approach
is evaluated for deployability purposes, i.e. facility in obtaining
coding annotations & eectiveness in supporting reporting needs.
The former is tackled through a color-coding highlighter that ob-
tains Open Annotations out of highlighting-driven coding activities.
Second, GSheet is turned into a client of Open Annotation reposito-
ries. The GSheet extension periodically queries the repository and
updates the cells accordingly. No user interaction is required.
As next steps, RDFizing brings about the opportunity of checking
the reliability of the coding. As the data scheme has a dcterms:creator,
it is possible to quickly check and calculate statistic values (e.g.
Cohen’s Kappa [
18
]) on the classication, showing the coding relia-
bility for multiple coders. In addition, since RDFizing is not limited
to the coding itself but to the meta-coding (i.e., codebook elabora-
tion), (meta)annotations stand for the “breadcrumbs” of the decision
taken during coding (a kind of “journal”or “audit trail”). In this way,
Web annotations can alleviate the lack of detail on agreement about
coding decisions (i.e. inter-rater reliability). This turns annotation
datasets into valuable online supplemental materials to improve
the soundness and replicability of SLRs.
ACKNOWLEDGEMENTS
This work is supported by the University of the Basque Country
under contract US17/13. Haritz Medina enjoys a grant from the
same University.
REFERENCES
[1]
Ahmed Al-Zubidy, Jerey C. Carver, David P. Hale, and Edgar E. Hassler. 2017.
Vision for SLR tooling infrastructure: Prioritizing value-added requirements.
Information and Software Technology 91 (11 2017), 72–81. https://doi.org/10.1016/
j.infsof.2017.06.007
[2]
Atlas.ti. 2000. Atlas.ti XML Universal Data Export. (2000). https://atlasti.com/
product/xml/
[3]
Souvik Barat, Tony Clark, Balbir Barn, and Vinay Kulkarni. 2017. A Model-Based
Approach to Systematic Review of Research Literature. In Proceedings of the 10th
Innovations in Software Engineering Conference on - ISEC ’17 (New York, New
York, USA). ACM Press, 15–25. https://doi.org/10.1145/3021460.3021462
[4]
Tim Berners-Lee. 1998. Semantic Web: Why RDF is more than XML. (1998).
https://www.w3.org/DesignIssues/RDF-XML.html
[5]
Christian Bizer, Tom Heath, and Tim Berners-Lee. 2011. Linked data: The stor y so
far. In Semantic services, interoperability and web applications: emerging concepts.
IGI Global, 205–227.
[6]
Pearl Brereton, Barbara A Kitchenham, David Budgen, Mark Turner, and Mo-
hamed Khalil. 2007. Lessons from applying the systematic literature review
process within the software engineering domain. Journal of systems and software
80, 4 (2007), 571–583. https://doi.org/10.1016/j.jss.2006.07.009
[7]
Calvesbert, Gian. 2014. The Benets of Open Standards. (2014). https://www.
air-worldwide.com/Blog/The- Benets-of- Open-Standards/
[8]
Louise Corti and Gregory Arofan. 2011. CAQDAS Comparability. What about
CAQDAS Data Exchange? FORUM: Qualitative Social Research 12, 3 (2011), 1–18.
https://doi.org/10.1037/1089-2680.9.2.103
[9]
D. S. Cruzes and T. Dyba. 2011. Recommended Steps for Thematic Synthesis
in Software Engineering. 2011 International Symposium on Empirical Software
Engineering and Measurement 7491 (2011), 275–284. https://doi.org/10.1109/
ESEM.2011.36
[10]
Fajar J. Ekaputra, Estefanía Serral, and Stefan Bi. 2014. Building an empirical
software engineering research knowledge base from heterogeneous data sources.
In Proceedings of the 14th International Conference on Knowledge Technologies and
Data-driven Business - i-KNOW ’14. ACM Press, New York, New York, USA, 1–8.
https://doi.org/10.1145/2637748.2638408
[11]
Jeanine C Evers. 2018. Current Issues in Qualitative Data Analysis Software
(QDAS): A User and Developer Perspective. The Qualitative Report 23, 13 (2018),
61–73.
[12]
Leyla Jael García-Castro, Olga Giraldo, and Alexander García. 2012. Using anno-
tations to model discourse: An extension to the Annotation Ontology. In CEUR
Workshop Proceedings, Vol. 903. 13–22. https://doi.org/10.1038/nature10944
[13]
Vahid Garousi and Michael Felderer. 2017. Experience-based guidelines for eec-
tive and ecient data extraction in systematic reviews in software engineering.
In Proceedings of the 21st International Conference on Evaluation and Assessment
in Software Engineering. ACM, 170–179. https://doi.org/10.1145/3084226.3084238
[14]
Joaquín Gayoso-Cabada, Antonio Sarasa-Cabezuelo, and José-Luis Sierra. 2018.
Document Annotation Tools. In Proceedings of the Sixth International Conference
on Technological Ecosystems for Enhancing Multiculturality - TEEM’18. ACM Press,
New York, New York, USA, 889–895. https://doi.org/10.1145/3284179.3284331
[15]
B. Kitchenham, D. Budgen, and O. P. Brereton. 2015. Evidence-Based Software
Engineering and Systematic Reviews.
[16]
Nikolaos Konstantinou and Dimitrios-Emmanuel Spanos. 2015. Introduction:
Linked Data and the Semantic Web. In Materializing the Web of Linked Data.
Springer, 1–16.
[17]
Kathleen M Macqueen and Eleanor McLellan-Lemal. 1998. Team-based codebook
development : Structure , process , and agreement. Cultural Antropology Methods
10, 2 (1998), 31–36.
[18]
Mary L McHugh. 2012. Interrater reliability: the kappa statistic. Biochemia medica
22, 3 (oct 2012), 276–82. https://doi.org/10.11613/BM.2012.031 arXiv:arXiv:gr-
qc/9809069v1
[19]
Vilmar Nepomuceno and Sergio Soares. 2018. Maintaining systematic literature
reviews. In Proceedings of the 12th ACM/IEEE International Symposium on Em-
pirical Software Engineering and Measurement - ESEM 18. ACM Press, New York,
New York, USA, 1–4. https://doi.org/10.1145/3239235.3267432
[20]
Neuhaus, Fabian. 2013. OntologySummit2013 Communique. (2013). http://
ontolog.cim3.net/wiki/OntologySummit2013_Communique.html
[21]
Michelle Ortlipp. 2008. Keeping and using reective journals in the qualitative
research process. The qualitative report 13, 4 (2008), 695–705.
[22]
Kai Petersen, Robert Feldt, Shahid Mujtaba, and Michael Mattsson. 2008. System-
atic Mapping Studies in Software Engineering. In 12th International Conference
on Evaluation and Assessment in Software. 1–10.
[23]
Kai Petersen, Sairam Vakkalanka, and Ludwik Kuzniarz. 2015. Guidelines for
conducting systematic mapping studies in software engineering: An update.
Information and Software Technology 64 (2015), 1–18. https://doi.org/10.1016/j.
infsof.2015.03.007
[24]
Nigel Shadbolt, Tim Berners-Lee, and Wendy Hall. 2006. The semantic web
revisited. IEEE intelligent systems 21, 3 (2006), 96–101.
[25]
David Shotton and Silvio Peroni. 2015. DoCO, the Document Components
Ontology. (2015). https://sparontologies.github.io/doco/current/doco.html
[26]
Mark Staples and Mahmood Niazi. 2007. Experiences using systematic review
guidelines. Journal of Systems and Software 80, 9 (2007), 1425–1437.
[27]
Paolo Tell, Jacob B Cholewa, Peter Nellemann, and Marco Kuhrmann. 2016.
Beyond the Spreadsheet: Reections on Tool Support for Literature Studies.
Proceedings of the 20th International Conference on Evaluation and Assessment in
Software Engineering (2016), 22:1–22:5. https://doi.org/10.1145/2915970.2916011
[28]
Udell, Jon. 2017. Federating Annotations Using Digital Object Identiers (DOIs).
(2017). https://web.hypothes.is/blog/dois/
[29]
W3C Web Annotation Working Group. 2017. Web Annotation. (2017). https:
//www.w3.org/annotation/
[30]
Web Annotation Working Group. 2017. Web Annotation Ontology (OA). (2017).
https://www.w3.org/ns/oa
[31]
Roel Wieringa, Neil Maiden, Nancy Mead, and Colette Rolland. 2006. Re-
quirements engineering paper classication and evaluation criteria: A pro-
posal and a discussion. Requirements Engineering 11, 1 (mar 2006), 102–107.
https://doi.org/10.1007/s00766-005- 0021-6
[32]
Yueming Sun, Ye Yang, He Zhang, Wen Zhang, and Qing Wang. 2012. Towards
evidence-based ontology for supporting systematic literature review. In 16th
International Conference on Evaluation & Assessment in Software Engineering
(EASE 2012). 171–175. https://doi.org/10.1049/ic.2012.0022
... The development of an SLR makes use of at least three types of data: i) general data of publications, such as authorship, year, origin and title; ii) context data, such as themes, methodology and application areas; and iii) data considered findings, such as contributions, barriers and guidelines (Díaz et al., 2019). ...
Article
Full-text available
There is an increase in the consumption of products across the globe and with it, the damage and environmental impacts also grow in significant proportions. Therefore, there is a need to propose methods, techniques and procedures capable of minimizing environmental impacts, which, consequently, has inspired the academic community to develop research that addresses sustainable alternatives for the production, use and disposal of products. Due to society's growing interest in this subject, this research aims to present the state of the art on sustainable products and analyze how this content impacts organizations. Methodologically, the research was supported by a Systematic Literature Review based on 161 papers from the Web of Science database, published from 01/01/2011 to 12/31/2020. As main results, it was observed the historical evolution, the adopted methodological procedures, the innovations and trends of future research that are still open. In addition, it was noticed that a large part of the studies sought to address how the producer's extended responsibility has been configured, the process of developing eco-designed products and, finally, the consumer's behavior regarding products with eco-labels.
... However, these concepts need to be connected to the knowledge the newcomer already has. As an example, consider Jane, a developer that has previously worked in the application engineering team of Highlight&Go [40], one of the products derived from WACline. The main concepts from this application, which are already known to Jane, are highlighted in the SPL Cmap of Fig. 2 with an orange background. ...
Preprint
With a volatile labour and technological market, onboarding is becoming increasingly important. The process of incorporating a new developer, a.k.a. the newcomer, into a software development team is reckoned to be lengthy, frustrating and expensive. Newcomers face personal, interpersonal, process and technical barriers during their incorporation, which, in turn, affects the overall productivity of the whole team. This problem exacerbates for Software Product Lines (SPLs), where their size and variability combine to make onboarding even more challenging, even more so for developers that are transferred from the Application Engineering team into the Domain Engineering team, who will be our target newcomers. This work presents concept maps on the role of sensemaking scaffolds to help to introduce these newcomers into the SPL domain. Concept maps, used as knowledge visualisation tools, have been proven to be helpful for meaningful learning. Our main insight is to capture concepts of the SPL domain and their interrelationships in a concept map, and then, present them incrementally, helping newcomers grasp the SPL and aiding them in exploring it in a guided manner while avoiding information overload. This work's contributions are four-fold. First, concept maps are proposed as a representation to introduce newcomers into the SPL domain. Second, concept maps are presented as the means for a guided exploration of the SPL core assets. Third, a feature-driven concept map construction process is introduced. Last, the usefulness of concept maps as guides for SPL onboarding is tested through a formative evaluation. Link to the online demo: url="https://rebrand.ly/wacline-cmap"
Article
A significant amount of research project funding is spent creating customized annotation systems, re-inventing the wheel once and again, developing the same common features. In this paper, we present WACline, a Software Product Line to facilitate customization of browser extension Web annotation clients. WACline reduces the development effort by reusing common features (e.g., highlighting and commenting) while putting the main focus on customization. To this end, WACline provides already implemented 111 features that can be extended with new ones. In this way, researchers can reduce the development and maintenance costs of annotation clients.
Conference Paper
With a volatile labour and technological market, onboarding is becoming increasingly important. The process of incorporating a new developer, a.k.a. the newcomer, into a software development team is reckoned to be lengthy, frustrating and expensive. Newcomers face personal, interpersonal, process and technical barriers during their incorporation, which, in turn, affects the overall productivity of the whole team. This problem exacerbates for Software Product Lines (SPLs), where their size and variability combine to make onboarding even more challenging, even more so for developers that are transferred from the Application Engineering team into the Domain Engineering team, who will be our target newcomers. This work presents concept maps on the role of sensemaking scaffolds to help to introduce these newcomers into the SPL domain. Concept maps, used as knowledge visualisation tools, have been proven to be helpful for meaningful learning. Our main insight is to capture concepts of the SPL domain and their interrelationships in a concept map, and then, present them incrementally, helping newcomers grasp the SPL and aiding them in exploring it in a guided manner while avoiding information overload. This work’s contributions are four-fold. First, concept maps are proposed as a representation to introduce newcomers into the SPL domain. Second, concept maps are presented as the means for a guided exploration of the SPL core assets. Third, a feature-driven concept map construction process is introduced. Last, the usefulness of concept maps as guides for SPL onboarding is tested through a formative evaluation. Link to the online demo: https://rebrand.ly/wacline-cmap
Conference Paper
Full-text available
To systematically collect evidence and to structure a given area in software engineering (SE), Systematic Literature Reviews (SLR) and Systematic Mapping (SM) studies have become common. Data extraction is one of the main phases (activities) when conducting an SM or an SLR, whose objective is to extract required data from the primary studies and to accurately record the information researchers need to answer the questions of the SM/SLR study. Based on experience in a large number of SM/SLR studies, we and many other researchers have found the data extraction in SLRs to be time consuming and error-prone, thus raising the real need for heuristics and guidelines for effective and efficient data extraction in these studies, especially to be learnt by junior and young researchers. As a 'guideline' paper, this paper contributes a synthesized list of challenges usually faced during SLRs' data extraction phase and the corresponding solutions (guidelines). For our synthesis, we consider two data sources: (1) the pool of 16 SLR studies in which the authors have been involved in, as well as (2) a review of challenges and guidelines in the existing literature. Our experience in utilizing the presented guidelines in the near past have helped our junior colleagues to conduct data extractions more effectively and efficiently.
Article
Full-text available
[Background]: Systematic Literature Review (SLR) has become an important software engineering research method but costs tremendous efforts. [Aim]: This paper proposes an approach to leverage on empirically evolved ontology to support automating key SLR activities. [Method]: First, we propose an ontology, SLRONT, built on SLR experiences and best practices as a groundwork to capture common terminologies and their relationships during SLR processes; second, we present an extended version of SLRONT, the COSONT and instantiate it with the knowledge and concepts extracted from structured abstracts. Case studies illustrate the details of applying it for supporting SLR steps. [Results]: Results show that through using COSONT, we acquire the same conclusion compared with sheer manual works, but the efforts involved is significantly reduced. [Conclusions]: The approach of using ontology could effectively and efficiently support the conducting of systematic literature review.
Conference Paper
Full-text available
Objective: Even though a number of tools is reported to be used by researchers undertaking systematic reviews, important shortages are still reported revealing how such solutions are unable to satisfy current needs. Method: Two researchers independently provided a competing design for a tool supporting systematic reviews. The resulting tools were assessed against the feature lists provided by prior research. Results: After presenting an overview of the tools and the core design decisions taken, we provide a feature analysis and a discussion regarding selected challenges deemed crucial to provide a proper tool support. Conclusions: Although the designed solutions do not yet support the entire systematic review process, their architecture has been designed to be flexible and extendable. After highlighting the difficulties of developing appropriate tools, we call for action: developing tools to support systematic reviews is a community project.
Article
This paper describes recent issues and developments in Qualitative Data Analysis Software (QDAS) as presented in the opening plenary at the KWALON 2016 conference. From a user perspective, it reflects current features and functionality, including the use of artificial intelligence and machine learning; implications of the cloud; user friendliness; the role of digital archives; and the development of a common exchange format. This user perspective is complemented with the views of software developers who took part in the “Rotterdam Exchange Format Initiative,” an outcome of the conference.
Conference Paper
Document annotation tools have been widely used in technology-enhanced learning. Through these tools, students can associate annotations with fragments of documents, which enhances the thorough analysis of content and develops meta-reflective thinking. Likewise, annotation tools can facilitate collaborative annotation among students, as well as implement innovative interaction mechanisms. In addition, most tools provide mechanisms for classifying annotations. These mechanisms facilitate the subsequent retrieval of relevant annotations. Therefore, classification mechanisms are essential to facilitating the evaluation of work done by students, promoting collaborative work in annotation communities, etc. In addition, classification mechanisms help students better understand what they should annotate, implicitly guiding them during annotation activities. However, while there are multiple studies focusing on other aspects of annotation tools, the classification aspect has deserved only marginal attention in the literature. This work makes an exploratory study focused on this essential, although somewhat ignored, requirement. As a result, five main approaches to annotation classification are identified: absence of classification mechanisms, classification based on annotation modes, classification by predefined semantic categories, classification based on folksonomies, and classification based on ontologies.
Conference Paper
Background: Maintenance and traceability (versioning) are constant concerns in Software Engineering (SE), however, few works related to these topics in Systematic Literature Reviews (SLR) were found. Goal: The goal of this research is to elucidate how SLRs can be maintained and what are the benefits and drawbacks in this process. Method: This work presents a survey where experienced researchers that conducted SLRs between 2011 and 2015 answered questions about maintenance and traceability and, using software maintenance concepts, it addresses the SLRs maintenance process. From the 79 e-mails sent we reach 28 answers. Results: 19 of surveyed researchers have shown interest in keeping their SLRs up-to-date, but they have expressed concerns about the effort to be made to accomplish it. It was also observed that 20 participants would be willing to share their SLRs in common repositories, such as GitHub. Conclusions: There is a need to perform maintenance on SLRs. Thus, we are proposing a SLR maintenance process, taking into account some benefits and drawbacks identified during our study and presented through the paper.
Article
Context Even with the increasing use of Systematic Literature Reviews (SLR) in software engineering (SE), there are still a number of barriers faced by SLR authors. These barriers increase the cost of conducting SLRs. Objective For many of these barriers, appropriate tool support could reduce their impact. In this paper, we use interactions with the SLR community in SE to identify and prioritize a set of requirements for SLR tooling infrastructure. Method This paper analyzes and combines the results from three studies on SLR process barriers and SLR tool requirements to produce a prioritized list of functional requirements for SLR tool support. Using this list of requirements, we perform a feature analysis of the current SLR support tools to identify requirements that are supported as well as identify the need for additional tooling infrastructure. Results The analysis resulted in a list 112 detailed requirements (consolidated into a set of composite requirements) that SE community desires in SLR support tools. The requirements span all the phases of the SLR process. The results show that, while recent tools cover more of the requirements, there are a number of high-priority requirements that are not yet fully covered by any of the existing tools. Conclusion The existing set of SLR tools do not cover all the requirements posed by the community. The list of requirements in this paper is useful for tool developers and researchers wishing to provide support to the SLR community with SE.
Conference Paper
A systematic approach to develop a literature review is attractive because it aims to achieve a repeatable, unbiased and evidence-based outcome. However the existing form of systematic review such as Systematic Literature Review (SLR) and Systematic Mapping Study (SMS) are known to be an effort, time, and intellectual intensive endeavour. To address these issues, this paper proposes a model-based approach to Systematic Review (SR) production. The approach uses a domain-specific language expressed as a meta-model to represent research literature, a meta-model to specify SR constructs in a uniform manner, and an associated development process all of which can benefit from computer-based support. The meta-models and process are validated using real-life case study. We claim that the use of meta-modeling and model synthesis lead to a reduction in time, effort and the current dependence on human expertise.