Conference PaperPDF Available

EvidenceSET: A Tool for Supporting Analysis of Evidence and Synthesis of Primary and Secondary Studies

Authors:

Abstract and Figures

Secondary and tertiary studies are broadly applied in the Software Engineering area. Researchers use to spend several weeks to analyze each relevant publication to be part of a body of knowledge for a specific research topic. However, performing a secondary or tertiary study can be an obstacle for some researchers due to a long time frame or high number of selected publications. In order to decrease such efforts, we developed the Evidence-based Study Extractor Tool (EvidenceSET), a web-based tool to support the creation of research themes from a set of primary or secondary studies. In this paper, we also present an example of use in the field of software ecosystems. Link (for tool demo): https://www.youtube.com/watch?v=Bg-OE_kyMls
Content may be subject to copyright.
EvidenceSET: A Tool for Supporting Analysis of Evidence
and Synthesis of Primary and Secondary Studies
Olavo Barbosa1, Rodrigo Santos2, Davi Viana3
1State Agency for Information Technology of Pernambuco (ATI)
Av. Rio Capibaribe, 147, São José – CEP 50020-080 – Recife, Brazil
2PPGI – Federal University of the State of Rio de Janeiro (UNIRIO)
Av. Pasteur, 458, Urca – CEP 22290-240 – Rio de Janeiro, RJ
3PPGCC – Federal University of Maranhão (UFMA)
Av. dos Portugueses, 1966, Bacanga – CEP 65080-805 – São Luís, Brasil
olavo.barbosa@ati.pe.gov.br, rps@uniriotec.br, davi.viana@ufma.br
Abstract. Secondary and tertiary studies are broadly applied in the Software
Engineering area. Researchers use to spend several weeks to analyze each
relevant publication to be part of a body of knowledge for a specific research
topic. However, performing a secondary or tertiary study can be an obstacle
for some researchers due to a long time frame or high number of selected
publications. In order to decrease such efforts, we developed the Evidence-
based Study Extractor Tool (EvidenceSET), a web-based tool to support the
creation of research themes from a set of primary or secondary studies. In this
paper, we also present an example of use in the field of software ecosystems.
Link (for tool demo): https://www.youtube.com/watch?v=Bg-OE_kyMls
1. Introduction
Evidence is knowledge obtained from findings derived from analysis of data obtained
from observational or experimental procedures that are potentially repeatable and that
meet accepted standards of design, execution, and analysis (Kitchenham et al., 2002).
Each empirical study gathering such evidence is known as a primary study.
Correspondingly, evidence-based software engineering (EBSE) aims to apply an
evidence-based approach to research and practice in the Software Engineering (SE) area
(Kitchenham & Charters, 2007). Furthermore, an EBSE’s key element is Systematic
Review (SR), which is a concise summary of the best available evidence that uses
explicit and rigorous methods to identify, critically appraise, and synthesize relevant
studies on a particular topic (Cruzes & Dybå, 2011). Indeed, two SR methods have been
largely implemented in SE area: Systematic Literature Reviews (SLR) and Systematic
Mapping Study (SMS).
In contrast to an expert review using ad hoc literature selection, a SLR is a
methodological, rigorous review of research results. The goal is not just to aggregate all
the existing evidence regarding specific research questions; SLR also intends to support
the development of evidence-based guidelines for practitioners (Kitchenham et al.,
2009). On the other hand, a SMS is a method that can be conducted to get an overview
of a particular research topic (Kitchenham, 2004). The goal is to create an inventory of
classified studies in an emerging topic (Wieringa et al., 2006). Hence, a SMS provides
73
an overview of the scope of a specific topic, and allows discovering research gaps and
trends (Petersen et al., 2015).
According to Budgen et al. (2008), a SMS’s research question is likely to be
broader than a SLR’s. Kitchenham (2004) also indicates that a SMS generally has
broader research questions and often involve multiple research questions. Independently
of SMS/SRL goals, it is important to perform a realistic, reliable analysis of the
findings. Such analysis requires spending several weeks to verify each primary study.
This was our motivation for developing the Evidence-based Study Extractor Tool
(EvidenceSET), a tool for supporting the method proposed by Cruzes & Dybå (2011)
regarding the levels of interpretation in thematic synthesis. In order to exemplify its use,
we applied our tool in the Software Ecosystem (SECO) field since several secondary
studies were conducted towards a body of knowledge for SECO (Barbosa et al., 2016).
This paper is organized as follows: in Section 2, we present the Evidence-based
Study Extractor Tool; in Section 3, we present an example of use; in Section 4, we
provide a comparison with similar tools; and in Section 5, we conclude the paper.
2. The Evidence-based Study Extractor Tool
EvidenceSET is part of a research study inspired by the work of Cruzes & Dybå (2011)
regarding the levels of interpretation in thematic synthesis. Cruzes & Dybå (2011) state
that a key part of a SLR is data extraction, i.e., initial ideas and possible patterns are
identified during the first reading of individual empirical studies. Using techniques for
extracting information from SE primary studies, a reviewer follows a procedure for
collecting context information and identifying paper findings.
The authors proposed five steps for thematic synthesis as follows: 1) Extracting
data from the primary studies, such as bibliographical data, aims, context, and results;
2) Coding data by identifying and coding interesting concepts, categories, findings, and
results; 3) Translating codes into themes, sub-themes, and higher order themes; 4)
Creating a model of higher-order themes by exploring relationships between themes
and creating a model of higher-order themes; and 5) Assessing the trustworthiness of the
synthesis and interpretations leading up to the thematic synthesis.
In this regard, codes can be defined as interesting concepts, categories, findings,
and results of studies. A theme describes and organizes possible observations and/or
interprets some aspects of a given phenomenon. Once themes are identified, they can be
explored and interpreted to create a model consisting of higher-order themes and
relationships among them. According to the method, such syntheses identify crucial
areas and questions for further studies that have not been addressed adequately by past
empirical researches. Therefore, in this paper, we focused at supporting steps 1, 2, and 3
as proposed by Cruzes & Dybå (2011). As such, EvidenceSET aims to support SE
researchers to conduct secondary and/or tertiary studies, especially in activities such as
evidence analysis and synthesis.
2.1 Tool Architecture
EvidenceSET is a parser, web-based tool coded in PHP as backend programming
language, JavaScript as front-end programming language, and HTML as Hypertext
Markup Language. It runs as a Linux PHP script in order to generate the data source and
provide output to a web browser. It was developed to analyze evidence, findings, and
74
synthesis of primary and secondary studies in SE. The main libraries used in the tool
are: Google Chart Tools, PHP PDF Parser library, a browser based visualization
JavaScript library named as vis.js, and Graphviz (Graph Visualization Software).
3. An Example of Use
In this paper, we selected a set of secondary studies that focused on the SECO field.
Each study (PDF file) was put in a directory in order to be used as input. At the Linux
shell command line, a researcher starts a PHP script to extract a content of the whole set
of studies. The output is a group of the most frequent words found in each study. In
order to avoid irrelevant words in the output file, a configuration file that contains words
such as adjectives, conjunctions, pronouns, and English language connectors is applied
so that the output is directly related to the field of study. Number of appearances, word
correlation and binding for each study are shown as results.
For example, we have the most frequent words for each study; most frequent
words for all studies; studies in which these words are found; and how many times such
words appear in the same text line for each study. As output, we consider such words as
keywords. Cruzes & Dybå (2011) state that this output is important for summarizing the
findings (a group of codes). Instead of a manual code extraction currently performed in
SLR studies, EvidenceSET automatically shows codes in different ways as follows:
keyword network-graph, word tree keyword correlation, and Sankey chart (see Section
3.1). For the current phase of our research, we selected six secondary studies from the
SECO field. The inclusion and exclusion criteria are secondary studies (SLR or SMS)
written in English; studies in which the research method is described; and studies that
can be reproduced. The following studies were selected: Barbosa et al. (2013), Manikas
& Hansen (2013), Fotrousi et al. (2014), Axelsson & Skoglund (2016), Manikas (2016),
and Alves et al. (2017).
3.1. An Overview of EvidenceSET Outcomes
Social structure has been an important concept in sociology and has a strong base in
Mathematics as found in graph theory (Degenne & Forse, 1999). Essentially, a graph
contains of a finite set of vertices x1, x2, and xn, as well as a set of edges or arcs that
connect them. Network analysis has moved from being a suggestive metaphor to an
analytic approach, with its own theoretical statements, methods, network analysis
software, and researchers (Wasserman & Faust, 1994). Bringing these concepts from
network analysis and graph theory to the SE area, we can model an approach for
extracting codes from a set of studies and provide a representation as shown in Figure 1
and Figure 2.
Figure 1 shows a network graph that contains the most frequent keywords in the
selected papers on SECO. The recurrent keyword is quality with 207 appearances in 6
studies; platform is another one (appearing 202 times in the selected studies). Platform
is the keyword with more connections. As such, for each arrow, we have the amount of
connections of a given keyword. For instance, platform appears 20 times in the same
line in which players appears; and business and management appear 28 times together.
Figure 2 shows another kind of network graph as shaped circles. Larger circles
represent the most frequent keywords in the set of studies. Connected circles are the
ones that have at least 13 occurrences in the same line considering the papers’ content
(similar to Figure 1). From Figure 2, we can infer the most significant keywords on the
75
SECO field. Regarding to the circles’ colors, we have six groups that represent how
many times a keyword is found in the set of studies. For instance, blue circles appear in
6 studies; yellow circles appear in 5 studies; red circles appear in 4 studies; green circles
appear in 3 studies; and, finally, pink circles appear only in 2 studies. Thus, this kind of
visualization complements the network graph shown in Figure 1. As proposed by
Cruzes & Dybå (2011), tools that provide tree-maps can be used to start organizing
codes and translating them into themes. As such, Figure 3 presents a list of the most
frequent keywords (on the left side) and the business keyword selected with other words
that appear at the same line (on the right side) – considering the papers provided as
inputs to EvidenceSET. The number in parentheses embodies the frequency, e.g.,
business and management appeared 28 times at the same line.
Figure 1. Network graph #1: the most frequent keywords in the SECO field.
Figure 2. Network graph #2: the most frequent keywords in the SECO field.
76
Sankey diagrams are traditionally used to visualize the flow of energy or
materials in networks and processes. They illustrate quantitative information about
flows, relationships and transformation. Sankey diagrams represent directed, weighted
graphs with weight functions that satisfy flow conservation: the sum of the incoming
weights for each node is equal to outgoing weights (Riehmann et al., 2005). Figure 4
shows the sum of the incoming weights for each study for the governance keyword (on
the right side). We realize that this word appears in 4 out of 6 selected studies.
3.2. Discussion
From Figure 1, we can identify the most frequent code from the selected studies in the
SECO field. There are some particularities, such as codes as open and source. Even
though they are different words, they appear to be the same code known as open-source
because it is a very common concept in the SECO field. We realize that the automatic
process helped us to confirm some insights provided by the mind-map graph manually
created in a previous work (Barbosa et al., 2016). Even governance is not directly
connected to social, business, and technical, it is connected to “open” and “software”;
open-source is connected to architecture; architecture to business; business to social;
social to technical and so on. We can also perceive the most important keywords found
in the whole set of papers: quality, platform, business, governance, open-source, health,
and architecture. The results found in Figure 1 suggest that the main higher-order
themes are included in such groups of keywords for the SECO field.
In Figure 2, we have another approach to show the same results. Connected
circles are those that have more than 12 appearances in the same line, considering the
set of selected papers. We consider this number in order to provide an outcome.
Nevertheless, the tool can be set to any number. In this view, we can observe how close
the concepts are (e.g., business, social, architecture, platform, quality, and governance).
In Figure 3, we present another view to help researchers to analyze studies.
Although the data source is a set of secondary studies, individual empirical studies can
be used too. In our example, we have the business keyword and its correlated keywords.
The most relevant word connected to business is management (28 occurrences at the
same line). By clicking on the panel’s left side, a researcher can choice any keyword; by
clicking on the right side, EvidenceSET shows correlated keywords. Finally, Figure 4
shows a Sankey diagram that demonstrated the relationship between a keyword (in this
case, governance) and secondary studies that quote it.
Figure 3. Word tree representation: the most frequent keywords in the SECO field.
77
We observe that Alves et al. (2017) has the highest weight because it is a SLR
study. By clicking on the Sankey diagram’s size bar, a researcher can see how many
times a keyword appeared in any study.
Figure 4. Sankey diagram with the relationships between keywords and studies.
4. Comparison with Similar Tools
In the SE area, a previous SMS on the existing tools for aiding systematic reviews found
that a range of visualization and text mining tools had been used to support study
selection, data extraction, and data synthesis (Marshall & Brereton, 2015). Such tools
include basic productivity tools, such as word processors and spreadsheets, reference
managers, statistics packages, and purpose-built tools, which cover all SMS’s phases
(Marshall & Brereton, 2014). Marshall & Brereton (2015) performed a SLR to find what
tooling characteristics are most important to reviewers. They aimed to explore the scope
and use of SLR tools in other disciplines. Software tools were categorized into several
groups such as Reference Management Tools as Mendeley; Basic Productivity Tools as
Microsoft Word and Excel; and Advanced Analysis Software as Stata data analysis and
statistical software. The most important features referred to visualization and text
mining tools used to support study selection, data extraction and data synthesis.
Regarding the synthesis approaches, Noblit & Hare (1998) state that approaches
can be integrative and interpretive. Integrative synthesis combines data to create
generalizations. It involves data quantification and integration based on techniques such
as meta-analysis. Interpretive synthesis categorizes concepts identified in primary
studies into a higher-order theoretical structure. Both syntheses are usually used in
secondary studies such as SLR and SMS. In a list of methods for qualitative and mixed-
methods evidence synthesis, Cruzes & Dybå (2011) discussed two methods: Thematic
Analysis/Synthesis, and Content Analysis. The first one is a method for identifying,
analyzing and reporting patterns (themes) from a dataset. It minimally organizes the
dataset, adds rich detail and frequently interprets several aspects of a research topic. The
second one is a systematic way of categorizing and coding studies under broad, thematic
78
headings by using extraction tools designed to support reproducibility. According to
Cruzes & Dybå (2011), after using the tool, researchers can have a suitable data source
extracted from a list of secondary studies to support creating themes of a research topic.
5. Conclusion and Future Work
This paper presented a tool for supporting analysis of evidence of SE, named as
EvidenceSET. In order to exemplify the use of EvidenceSET, we analyzed a set of
studies from the SECO field and obtained an initial thematic synthesis. The procedure
consisted of collecting many secondary studies (PDF files as input), identifying specific
text segments, automatically labeling such text segments, and reducing overlaps for
translating codes into themes. We selected a set of six secondary studies. We realized
that a tool with different code extracting visualizations could be useful to support
researchers to perform evidence analysis and primary/secondary studies’ synthesis. In
our example in the context of the SECO field, we did not see governance as the central-
topic as we argued in the first step of our research (Barbosa et al., 2016). On the other
hand, its occurrence in secondary studies suggests that is an important emerging concept
in the field. As future work, we aim to improve EvidenceSET in order to make its setup
easier. In addition, we will provide other visualizations, such as adding other network
graph visualizations as well as carrying out some tests with a massive dataset in order to
evaluate scalability.
Acknowledgements
The first author thanks to ATI for partially supports this research. The second author
thanks to DPq/PROPG/UNIRIO for partially support this research.
References
Alves, C., Aline, J., Jansen, S. (2017) “Software Ecosystems Governance: A Systematic
Literature Review and Research Agenda”. In: Proceedings of the 19th International
Conference on Enterprise Information Systems, Porto, Portugal, pp. 215-226.
Axelsson, J., Skoglund, M. (2016) “Quality Assurance in Software Ecosystems: A
Systematic Literature Mapping and Research Agenda”. The Journal of Systems and
Software 114(2016):69-81.
Barbosa, O., Santos, R., Alves, C., Werner, C., Jansen, S. (2013) “A Systematic
Mapping Study on Software Ecosystems through a Three-dimensional Perspective”.
In: Jansen, S., Cusumano, M., Brinkkemper, S. (eds.) Software Ecosystems:
Analyzing and Managing Business Networks in the Software Industry, Edward Elgar
Publishing, pp. 59-81.
Barbosa, O., Santos, R., Viana, D. (2016) “Preliminary Findings of Secondary Studies
on the Software Ecosystems Research”. In: Anais do I Workshop sobre Aspectos
Sociais, Humanos e Econômicos de Software, Maceió, Brasil, pp. 81-85.
Budgen, D., Turner, M., Brereton, P., Kitchenham, B. (2008) “Using Mapping Studies
in Software Engineering”. In: Proceedings of PPIG 2008, Lancaster University, UK,
pp. 195-204.
Cruzes, D.S., Dybå, T. (2011) “Recommended Steps for Thematic Synthesis in Software
Engineering”. In: Proceedings of the 5th International Symposium on Empirical
Software Engineering and Measurement, Banff, Canada, pp. 275-284.
79
Degenne, A., Forse, M. (1999) “Introducing Social Networks”. SAGE Publications.
Fotrousi, F., Fricker, S.A., Fiedler, M., Le-Gall, F. (2014) “KPIs for Software
Ecosystems: A Systematic Mapping Study”. In: Proceedings of the 5th International
Conference on Software Business, Paphos, Cyprus, pp. 194-211.
Kitchenham, B. (2004) “Procedures for Performing Systematic Reviews”. Joint
Technical Report, SE Group, Keele University, and Empirical Software Eng., Nat'l
ICT Australia.
Kitchenham, B., Charters, S. (2007) “Guidelines for performing Systematic Literature
Reviews in Software Engineering”. TR - Department of CS, University of Durham,
Durham.
Kitchenham, B., Pfleeger, S.L., Pickard, L.M., Jones, P.W., Hoaglin, D.C., Emam, K.E.,
Rosenberg, J. (2002) “Preliminary guidelines for empirical research in software
engineering”. IEEE Engineering 28(8):721-734.
Kitchenham, B., Brereton. P., Budgenb, D., Turnera, M., Baileyb, J., Linkmana, S.
(2009) “Systematic literature reviews in software engineering A systematic
literature review”. Information and Software Technology 51(1):7-15.
Manikas, K. (2016) “Revisiting Software Ecosystems Research: A Longitudinal
Literature Study”. The Journal of Systems and Software 117(2016):84-103.
Manikas, K., Hansen, K.M. (2013) “Software Ecosystems – A Systematic Literature
Review”. The Journal of Systems and Software 86(5):1294-1306.
Marshall, C., Brereton, P. (2014) “Tools to support systematic literature reviews in
software engineering: A mapping study”. In: Proceedings of the International
Symposium on Empirical Software Engineering and Measurement, Baltimore, USA,
pp. 296-299.
Marshall, C., Brereton, P. (2015) “Systematic Review Toolbox: A Catalogue of Tools to
Support Systematic Reviews”. In: Proceedings of the 19th International Conference
on Evaluation and Assessment in Software Engineering, Nanjing, China, pp. 1-6.
Noblit, G., Hare, R. (1988) “Meta-ethnography: Synthesizing Qualitative Studies”.
SAGE Publications.
Petersen, K., Vakkalanka, S., Kuzniarz, L. (2015) “Guidelines for Conducting
Systematic Mapping Studies in Software Engineering: An Update”. Information and
Software Technology 64(2015):1-18.
Riehmann, P., Hanfler, M., Froehlich, B. (2005) “Interactive sankey diagrams”. In:
Proceedings of IEEE Symposium on Information Visualization, Washington, USA
pp. 233-240.
Wasserman, S., Faust, K. (1994) “Social network analysis: methods and applications”.
Cambridge University Press.
Wieringa, R., Maiden, N., Mead, N., Rolland. C. (2006) “Requirements engineering
paper classification and evaluation criteria: a proposal and a discussion”.
Requirements Engineering 11(1):102-107.
80
... Do ponto de vista da academia, verifica-se a existência de diversas definições para ECOS na literatura, além da existência de mais de uma dezena de estudos secundários sobre o assunto [73]. As seguintes definições foram identificadas, embora a de Jansen et al. [20] e a de Bosch [12] Ressalta-se que um ECOS constituído sobre uma plataforma proprietária não precisa ser fechado, ou seja, esses exemplos ilustram de maneira simplista o conceito e negligenciam as diferentes configurações (ou variações) de um ECOS [75]. ...
... Juntamente com os relatos, esses estudos compreendem mais de 80% das publicações em ECOS até 2013 [70]. Até 2017, são pelo menos dez estudos secundários, entre mapeamentos e revisões sistemáticas [73]. Apesar de quase uma década de investigação e interesse crescente em ECOS, pela natureza do assunto, as pesquisas ainda têm caráter exploratório, sobretudo por estarem vinculadas diretamente com a indústria de software e por terem sido tratadas na prática [1]. ...
... Do ponto de vista da academia, verifica-se a existência de diversas definições para ECOS na literatura, além da existência de mais de uma dezena de estudos secundários sobre o assunto [73]. As seguintes definições foram identificadas, embora a de Jansen et al. [20] e a de Bosch [12] Ressalta-se que um ECOS constituído sobre uma plataforma proprietária não precisa ser fechado, ou seja, esses exemplos ilustram de maneira simplista o conceito e negligenciam as diferentes configurações (ou variações) de um ECOS [75]. ...
... Juntamente com os relatos, esses estudos compreendem mais de 80% das publicações em ECOS até 2013 [70]. Até 2017, são pelo menos dez estudos secundários, entre mapeamentos e revisões sistemáticas [73]. Apesar de quase uma década de investigação e interesse crescente em ECOS, pela natureza do assunto, as pesquisas ainda têm caráter exploratório, sobretudo por estarem vinculadas diretamente com a indústria de software e por terem sido tratadas na prática [1]. ...
Conference Paper
Full-text available
Um ecossistema de software (ECOS) é um conjunto de atores e artefatos, internos e externos a uma organização ou comunidade, que trocam recursos e informações centrados em uma plataforma tecnológica comum. Este contexto tem afetado decisões de arquitetura, de governança e de colaboração, nos mais variados domínios de aplicação. Torna-se necessário integrar mecanismos e ferramentas para apoiar a troca de informações, recursos e artefatos, bem como assegurar a comunicação e interação entre desenvolvedores e usuários. O objetivo deste capítulo é apresentar como o contexto de ECOS afeta o projeto e desenvolvimento de plataformas para jogos e entretenimento digital. Para isso, os principais conceitos e estratégias de ECOS serão discutidos, seguidos pela organização de mecanismos e ferramentas para modelar/analisar plataformas para jogos e entretenimento digital.
... To support the identification of the elements of this hierarchy, we used tools such as EvidneceSET 9 , Diagrams 10 and FreeMind 11 . According to Barbosa et al. (2017), EvidenceSET is a web-based tool to support the identification of research themes from a set of primary or secondary studies. EvidenceSET provides a set of views based on the number of frequencies that words appear in a set of studies. ...
Preprint
Full-text available
A software ecosystem (SECO) is an interaction, communication, cooperation, and synergy among a set of players. Depending on the actors type of interaction with others, each one can play a different role. These interactions provide a set of positive relationships (symbiosis) between actors who work together around a common technology platform or a service. SECO has been explored in several studies, some related to their general characteristics and others focusing on a specific topic (e.g., requirements, governance, open-source, mobile). There are many literature reviews of different natures (e.g., systematic literature reviews and systematic mapping studies). This study presents the status of the SECO field motivated by analyzing several secondary studies published over the years. To do so, we conducted a tertiary study. From an initial set of 518 studies on the subject, we selected 22 studies. We identified the theoretical foundations used by researchers and their influences and relationships with other ecosystems. We performed a thematic synthesis and identified one high-order theme, 5 themes, 10 subthemes, and 206 categories. As a result, we proposed a thematic model for SECO containing five themes, namely: social, technical, business, management, and an evaluation theme named Software Ecosystems Assessment Models (SEAM). Our main conclusion is that relationships between SECO themes should not be seen in isolation, and it must be interpreted in a holistic approach, given the number of implications to other themes mainly related to the distinction of governance and management activities in the SECO interactions. Finally, this work provides an overview of the field and points out areas for future research, such as the need of SECO community to further investigate the results from other ecosystems, mainly from the Digital Ecosystem and Digital Business Ecosystem communities.
... • STEP 5 -Perform Analysis and Discussion: studies are classified using the EvidenceSET tool (Barbosa et al., 2017). EvidenceSET requires uploading PDF files analyzed by different code extracting visualizations. ...
Thesis
Full-text available
In Information Systems (IS), accountability encompasses strategies that hold responsible behaviors considering IS purposes. In fact, accountability focus on setting and holding people-process-technology to a common expectation by maintaining all levels of responsibility for accomplishing SoIS purposes. However, its implications remain unclear when it focuses on evaluation strategies, particularly considering several IS working together to accomplish an organizational objective, such as SoIS (Systemsof-Information Systems). In order to fulfill these premises, an accountability evaluation approach is proposed to support the SoIS context understanding and facilitate decisionmaking in scenarios investigated by managers. To achieve this goal, concepts of systems thinking, SoIS, and accountability are used. Therefore, to characterize accountability evaluation, a conceptual model for understanding SoIS from the accountability perspective was developed, with elements extracted from systematic mapping studies in the IS domain, exploratory studies, and an evaluation by 21 specialists (IS managers and researchers). As a result, a framework is developed, called as AESoIS (Accountability Evaluation in Systems-of-Information Systems). AESoIS aims to support SoIS managers to understand organizational scenarios and propose solutions related to accountability evaluation based on three criteria: engagement, management, and regulation. In addition, a tool was developed and focuses on modeling strategies to support SoIS scenarios. AESoIS solutions were evaluated with managers in educational organizations supported by SoIS based on a feasibility study. The relevance of understanding and modeling SoIS scenarios is highlighted, as well as the AESoIS effectiveness.
Article
In the biomedical domain, visualizing the document embeddings of an extensive corpus has been widely used in informationseeking tasks. However, three key challenges with existing visualizations make it difficult for clinicians to find information efficiently. First, the document embeddings used in these visualizations are generated statically by pretrained language models, which cannot adapt to the user's evolving interest. Second, existing document visualization techniques cannot effectively display how the documents are relevant to users' interest, making it difficult for users to identify the most pertinent information. Third, existing embedding generation and visualization processes suffer from a lack of interpretability, making it difficult to understand, trust and use the result for decision-making. In this paper, we present a novel visual analytics pipeline for user-driven document representation and iterative information seeking (VADIS). VADIS introduces a prompt-based attention model (PAM) that generates dynamic document embedding and document relevance adjusted to the user's query. To effectively visualize these two pieces of information, we design a new document map that leverages a circular grid layout to display documents based on both their relevance to the query and the semantic similarity. Additionally, to improve the interpretability, we introduce a corpus-level attention visualization method to improve the user's understanding of the model focus and to enable the users to identify potential oversight. This visualization, in turn, empowers users to refine, update and introduce new queries, thereby facilitating a dynamic and iterative information-seeking experience. We evaluated VADIS quantitatively and qualitatively on a real-world dataset of biomedical research papers to demonstrate its effectiveness.
Article
A systematic review (SR) is essential with up-to-date research evidence to support clinical decisions and practices. However, the growing literature volume makes it challenging for SR reviewers and clinicians to discover useful information efficiently. Many human-in-the-loop information retrieval approaches (HIR) have been proposed to rank documents semantically similar to users' queries and provide interactive visualizations to facilitate document retrieval. Given that the queries are mainly composed of keywords and keyphrases retrieving documents that are semantically similar to a query does not necessarily respond to the clinician's need. Clinicians still have to review many documents to find the solution. The problem motivates us to develop a visual analytics system, DocFlow, to facilitate information-seeking. One of the features of our DocFlow is accepting natural language questions. The detailed description enables retrieving documents that can answer users' questions. Additionally, clinicians often categorize documents based on their backgrounds and with different purposes (e.g., populations, treatments). Since the criteria are unknown and cannot be pre-defined in advance, existing methods can only achieve categorization by considering the entire information in documents. In contrast, by locating answers in each document, our DocFlow can intelligently categorize documents based on users' questions. The second feature of our DocFlow is a flexible interface where users can arrange a sequence of questions to customize their rules for document retrieval and categorization. The two features of this visual analytics system support a flexible information-seeking process. The case studies and the feedback from domain experts demonstrate the usefulness and effectiveness of our DocFlow.
Conference Paper
Full-text available
The field of Software ecosystems is a growing discipline that has been investigated from managerial, social, and technological perspectives. The governance of software ecosystems requires a careful balance of control and autonomy given to players. Orchestrators that are able to balance their own interests by bringing joint benefits for other players are likely to create healthy ecosystems. Selecting appropriate governance mechanisms is a key problem involved in the management of proprietary and open source ecosystems. This article summarizes current literature on software ecosystem governance by framing prevalent definitions, classifying governance mechanisms, and proposing a research agenda. We performed a systematic literature review of 63 primary studies. Several studies describe governance mechanisms, which were classified in three categories: value creation, coordination of players, and organizational openness and control. The number of studies indicates that the domain of software ecosystems and their governance is maturing. However, further studies are needed to address central challenges involved on the implementation of appropriate governance mechanisms that can nurture the health of ecosystems. We present a research agenda with several opportunities for researchers and practitioners to explore these issues.
Article
Full-text available
The objective of this report is to propose comprehensive guidelines for systematic literature reviews appropriate for software engineering researchers, including PhD students. A systematic literature review is a means of evaluating and interpreting all available research relevant to a particular research question, topic area, or phenomenon of interest. Systematic reviews aim to present a fair evaluation of a research topic by using a trustworthy, rigorous, and auditable methodology. The guidelines presented in this report were derived from three existing guidelines used by medical researchers, two books produced by researchers with social science backgrounds and discussions with researchers from other disciplines who are involved in evidence-based practice. The guidelines have been adapted to reflect the specific problems of software engineering research. The guidelines cover three phases of a systematic literature review: planning the review, conducting the review and reporting the review. They provide a relatively high level description. They do not consider the impact of the research questions on the review procedures, nor do they specify in detail the mechanisms needed to perform meta-analysis.
Conference Paper
Full-text available
Systematic review is a widely used research method in software engineering, and in other disciplines, for identifying and analysing empirical evidence. The method is data intensive and time consuming, and hence is usually supported by a wide range of software-based tools. However, systematic reviewers have found that finding and selecting tools can be quite challenging. In this paper, we present the Systematic Review Toolbox; a web-based catalogue of tools, to help reviewers find appropriate tools based on their particular needs.
Conference Paper
Full-text available
To create value with a software ecosystem (SECO), a platform owner has to ensure that the SECO is healthy and sustainable. Key Performance Indicators (KPI) are used to assess whether and how well such objectives are met and what the platform owner can do to improve. This paper gives an overview of existing research on KPI-based SECO assessment using a systematic mapping of research publications. The study identified 34 relevant publications for which KPI research and KPI practice were extracted and mapped. It describes the strengths and gaps of the research published so far, and describes what KPI are measured, analyzed, and used for decision-making from the researcher's point of view. For the researcher, the maps thus capture state-of-knowledge and can be used to plan further research. For practitioners, the generated map points to studies that describe how to use KPI for managing of a SECO.
Conference Paper
Full-text available
Background: Systematic literature reviews (SLRs) have become an established methodology in software engineering (SE) research however they can be very time consuming and error prone. Aim: The aims of this study are to identify and classify tools that can help to automate part or all of the SLR process within the SE domain. Method: A mapping study was performed using an automated search strategy plus snowballing to locate relevant papers. A set of known papers was used to validate the search string. Results: 14 papers were accepted into the final set. Eight presented text mining tools and six discussed the use of visualisation techniques. The stage most commonly targeted was study selection. Only two papers reported an independent evaluation of the tool presented. The majority were evaluated through small experiments and examples of their use. Conclusions: A variety of tools are available to support the SLR process although many are in the early stages of development and usage.
Article
‘Software ecosystems’ is argued to first appear as a concept more than 10 years ago and software ecosystem research started to take off in 2010. We conduct a systematic literature study, based on the most extensive literature review in the field up to date, with two primarily aims: (a) to provide an updated overview of the field and (b) to document evolution in the field. In total, we analyze 231 papers from 2007 until 2014 and provide an overview of the research in software ecosystems. Our analysis reveals a field that is rapidly growing, both in volume and empirical focus, while becoming more mature. We identify signs of field maturity from the increase in: (i) the number of journal articles, (ii) the empirical models within the last two years, and (iii) the number of ecosystems studied. However, we note that the field is far from mature and identify a set of challenges that are preventing the field from evolving. We propose means for future research and the community to address them. Finally, our analysis shapes the view of the field having evolved outside the existing definitions of software ecosystems and thus propose the update of the definition of software ecosystems.
Article
Software ecosystems is becoming a common model for software development in which different actors cooperate around a shared platform. However, it is not clear what the implications are on software quality when moving from a traditional approach to an ecosystem, and this is becoming increasingly important as ecosystems emerge in critical domains such as embedded applications. Therefore, this paper investigates the challenges related to quality assurance in software ecosystems, and identifies what approaches have been proposed in the literature. The research method used is a systematic literature mapping, which however only resulted in a small set of six papers. The literature findings are complemented with a constructive approach where areas are identified that merit further research, resulting in a set of research topics that form a research agenda for quality assurance in software ecosystems. The agenda spans the entire system life-cycle, and focuses on challenges particular to an ecosystem setting, which are mainly the results of the interactions across organizational borders, and the dynamic system integration being controlled by the users.
Article
Context Systematic mapping studies are used to structure a research area, while systematic reviews are focused on gathering and synthesizing evidence. The most recent guidelines for systematic mapping are from 2008. Since that time, many suggestions have been made of how to improve systematic literature reviews (SLRs). There is a need to evaluate how researchers conduct the process of systematic mapping and identify how the guidelines should be updated based on the lessons learned from the existing systematic maps and SLR guidelines. Objective To identify how the systematic mapping process is conducted (including search, study selection, analysis and presentation of data, etc.); to identify improvement potentials in conducting the systematic mapping process and updating the guidelines accordingly. Method We conducted a systematic mapping study of systematic maps, considering some practices of systematic review guidelines as well (in particular in relation to defining the search and to conduct a quality assessment). Results In a large number of studies multiple guidelines are used and combined, which leads to different ways in conducting mapping studies. The reason for combining guidelines was that they differed in the recommendations given. Conclusion The most frequently followed guidelines are not sufficient alone. Hence, there was a need to provide an update of how to conduct systematic mapping studies. New guidelines have been proposed consolidating existing findings.
Article
A software ecosystem is the interaction of a set of actors on top of a common technological platform that results in a number of software solutions or services. Arguably, software ecosystems are gaining importance with the advent of, e.g., the Google Android, Apache, and Salesforce.com ecosystems. However, there exists no systematic overview of the research done on software ecosystems from a software engineering perspective. We performed a systematic literature review of software ecosystem research, analyzing 90 papers on the subject taken from a gross collection of 420. Our main conclusions are that while research on software ecosystems is increasing (a) there is little consensus on what constitutes a software ecosystem, (b) few analytical models of software ecosystems exist, and (c) little research is done in the context of real-world ecosystems. This work provides an overview of the field, while identifying areas for future research.