Figure 7 - uploaded by Martin Leinberger
Content may be subject to copyright.
Source publication
Graph data models are interesting in various domains, in part because of the intuitiveness and flexibility they offer compared to relational models. Specialized query languages, such as Cypher for property graphs or SPARQL for RDF, facilitate their use. In this paper, we present an empirical study on the usage of graph-based query languages in open...
Contexts in source publication
Context 1
... Figure 7 shows the distribution of SELECT (returns result sequences), ASK (returns Boolean value), DE-SCRIBE (returns specific graphs) and CONSTRUCT (returns graphs) queries over the 12 SPARQL applications. Only four repositories use more than 50 queries, with the average being Table 6. ...
Context 2
... Figure 7 shows the distribution of SELECT (returns result sequences), ASK (returns Boolean value), DE-SCRIBE (returns specific graphs) and CONSTRUCT (returns graphs) queries over the 12 SPARQL applications. Only four repositories use more than 50 queries, with the average being Table 6. ...
Similar publications
Recently, a wide range of Web applications (e.g. DBPedia, Uniprot,
and Probase) are built on top of vast RDF knowledge bases and
using the SPARQL query language. The continuous growth of these
knowledge bases led to the investigation of new paradigms and
technologies for storing, accessing, and querying RDF data. In practice, modern big data system...
Citations
... We can find comparable practice in the following papers (Kamei et al. 2013;Falcão et al. 2020;Vasilescu et al. 2015;Borges et al. 2016;Fang et al. 2022;Seifer et al. 2019;Mockus and Weiss 2000;Yan et al. 2020;Jiarpakdee et al. 2021;Nagappan et al. 2010;Rahman et al. 2016;Zimmermann et al. 2007;Zimmermann and Nagappan 2008;Thongtanunam et al. 2016;Tsay et al. 2014;Tantithamthavorn and Hassan 2018). ...
... The next example will focus on one of our previous works that is presented in Seifer et al. (2019). We will illustrate the implications of choosing the wrong output distribution for a regression model. ...
... We will illustrate the implications of choosing the wrong output distribution for a regression model. It is a mistake that we did in Seifer et al. (2019). ...
Empirical Software Engineering studies apply methods, like linear regression, statistic tests, or correlation analysis, to better understand software engineering scenarios. Assuring the validity of such methods and corresponding results is challenging but critical. This is also reflected by quality criteria on the validity that are part of the reviewing process for the corresponding research results. However, such criteria are often hard to define operationally and thus hard to judge by the reviewers. In this paper, we describe a new strategy to define and communicate the validity of methods and results. We conceptually decompose a study into an empirical scenario, a used method, and the produced results. Validity can only be described as the relationship between the three parts. To make the empirical scenario fully operational, we convert informal assumptions on it into executable simulation code that leverages artificial data to replace (or complement) our real data. We can then run the method on the artificial data and examine the impact of our assumptions on the quality of results. This may operationally i) support the validity of a method for a valid result, ii) threaten the validity of a method for an invalid result if assumptions are controversial, or iii) invalidate a method for an invalid result if assumptions are plausible. We encourage researchers to submit simulations as additional artifacts to the reviewing process to make such statements explicit. Rating if a simulated scenario is plausible or controversial is subjective and may benefit from involving a reviewer. We show that existing empirical software engineering studies can benefit from such additional validation artifacts.
... • Neo4j's scalability, capable of handling significant data volumes and traffic, ensures performance and reliability in data-intensive smart city applications by seamlessly adapting to their demands. • Neo4j's potent Cypher query language, specifically designed for graph data, facilitates complex searches and analyses on semantically annotated smart city data, enabling efficient extraction of valuable insights and the full utilization of data potential [27][28][29]. • Neo4j's schema-less property graph model provides crucial flexibility for semantic annotation of raw data in smart city applications, allowing for easy adaptation to evolving data structures and relationships over time. ...
The development of smart city applications often encounters a variety of challenges. These include the need to address complex requirements such as integrating diverse data sources and incorporating geographical data that reflect the physical urban environment. Platforms designed for smart cities hold a pivotal position in materializing these applications, given that they offer a suite of high-level services, which can be repurposed by developers. Although a variety of platforms are available to aid the creation of smart city applications, most fail to couple their services with geographical data, do not offer the ability to execute semantic queries on the available data, and possess restrictions that could impede the development process. This paper introduces SEDIA, a platform for developing smart applications based on diverse data sources, including geographical information, to support a semantically enriched data model for effective data analysis and integration. It also discusses the efficacy of SEDIA in a proof-of-concept smart city application related to air quality monitoring. The platform utilizes ontology classes and properties to semantically annotate collected data, and the Neo4j graph database facilitates the recognition of patterns and relationships within the data. This research also offers empirical data demonstrating the performance evaluation of SEDIA. These contributions collectively advance our understanding of semantically enriched data integration within the realm of smart city applications.
... In this sense, in 2015, GraphQL emerged as an alternative to solve several reported problems of REST [24]; the application of this paradigm in software development has had a growing interest in academia, and industry [21] [28]. Therefore, we have observed that several studies compare various quality characteristics between REST and GraphQL; however, we found that no study compares these paradigms in the context of consumption from mobile applications. ...
Currently, GraphQL has emerged as a query language for developing web APIs that propose to improve several data access problems of RESTful APIs. The present paper aims to study the effects on software quality of APIs developed with REST and GraphQL architectures consumed from mobile applications. For this, we design a computational experiment that compares the quality characteristic “performance efficiency" of mobile application consumption to three APIs; one GraphQL API and two REST APIs (one exposes complex queries on several endpoints, the other exposes complex queries on a single endpoint). The results show that the software quality of the API developed with GraphQL architecture is higher than that developed with REST architecture.
KeywordsGraphQL APIREST APIQuality evaluationMobile application
... In the past decade, some query languages such as SPARQL, 1 Cypher, 2 Gremlin, 3 and GraphQL have emerged as alternatives for API data access [50]. In this study, we focus on GraphQL, because there is a growing interest in the industry since it has proven to be an alternative to solve the problems found in traditional REST technology [50,55]. ...
... In the past decade, some query languages such as SPARQL, 1 Cypher, 2 Gremlin, 3 and GraphQL have emerged as alternatives for API data access [50]. In this study, we focus on GraphQL, because there is a growing interest in the industry since it has proven to be an alternative to solve the problems found in traditional REST technology [50,55]. GraphQL started in 2012 as an internally developed specification on Facebook. ...
GraphQL is a query language and execution engine for web APIs proposed as an alternative to improve data access problems and versioning of REST APIs. In this article, we thoroughly study the GraphQL field, first describing the GraphQL paradigm and its conceptual framework, and then conducting a systematic mapping study (SMS) of 84 primary studies selected from an original set of 3,185. Our work analyzes trends or knowledge gaps about GraphQL by general classification of the studies and specific classification of this research topic. The study’s main conclusions show that GraphQL adoption is growing in the community as a strong alternative to implement APIs. However, we identified the need to strengthen the amount and rigor of empirical evidence collection in applied industry and government studies. In addition, we revealed the opportunity for specific studies on most GraphQL components, especially the consumption of GraphQL API services.
... The usage of EMF in Open Source projects hosted in GitHub is addressed in [23]. Similarly, the usage of graph query languages in Java projects is studied in [49], and the projects are classified according to its application domain. ...
The application of machine learning (ML) algorithms to address problems related to model-driven engineering (MDE) is currently hindered by the lack of curated datasets of software models. There are several reasons for this, including the lack of large collections of good quality models, the difficulty to label models due to the required domain expertise, and the relative immaturity of the application of ML to MDE. In this work, we present ModelSet , a labelled dataset of software models intended to enable the application of ML to address software modelling problems. To create it we have devised a method designed to facilitate the exploration and labelling of model datasets by interactively grouping similar models using off-the-shelf technologies like a search engine. We have built an Eclipse plug-in to support the labelling process, which we have used to label 5,466 Ecore meta-models and 5,120 UML models with its category as the main label plus additional secondary labels of interest. We have evaluated the ability of our labelling method to create meaningful groups of models in order to speed up the process, improving the effectiveness of classical clustering methods. We showcase the usefulness of the dataset by applying it in a real scenario: enhancing the MAR search engine. We use ModelSet to train models able to infer useful metadata to navigate search results. The dataset and the tooling are available at https://figshare.com/s/5a6c02fa8ed20782935c and a live version at http://modelset.github.io .
... (b) Imbalance of data. Due to the intensive labor and language-specific expertise required in annotation (Li et al., 2020), in spite of the various datasets prepared in SPARQL (Talmor and Berant, 2018;Dubey et al., 2019;Keysers et al., 2019), very few works target the semantic parsing of other graph query languages, such as Cypher and Gremlin, that are commonly used in industries (Seifer et al., 2019). Moreover, datasets of different languages are also isolated since no existing tools can support the conversion (Agrawal et al., 2022). ...
... Few-shot learning In practice, it is important for a parser to remain robust in a novel task domain lack of data annotations. Therefore, we reconstruct the METAQA dataset into Cypher, a graph query language commonly used in the industries but rarely studied in previous semantic parsing research works (Seifer et al., 2019), and assess our models under the few-shot learning setting. We adjust the data to ensure only 1, 3 and 5 samples of each question type appear in the training set respectively under the 1-shot, 3-shot and 5-shot settings. ...
Subject to the semantic gap lying between natural and formal language, neural semantic parsing is typically bottlenecked by the paucity and imbalance of data. In this paper, we propose a unified intermediate representation (IR) for graph query languages, namely GraphQ IR. With the IR's natural-language-like representation that bridges the semantic gap and its formally defined syntax that maintains the graph structure, neural semantic parser can more effectively convert user queries into our GraphQ IR, which can be later automatically compiled into different downstream graph query languages. Extensive experiments show that our approach can consistently achieve state-of-the-art performance on benchmarks KQA Pro, Overnight and MetaQA. Evaluations under compositional generalization and few-shot learning settings also validate the promising generalization ability of GraphQ IR with at most 11% accuracy improvement.
... There is currently no standardised format for property graphs, although an abstract model has been proposed [6]. However, one of the most common property graphs is Cypher [7], developed by neo4j 1 . Cypher is used in the study described in this paper. ...
... Cypher is used in the study described in this paper. Based on a survey of open-source projects on GitHub, Seifer et al. [7] observed "higher activity in SPARQL related repositories". However, activity "in Cypher grew faster", i.e. ...
This study compares participant acceptance of the property graph and edge-labelled graph paradigms, as represented by Cypher and the proposed extensions to the W3C standards, RDF* and SPARQL*. In general, modelling preferences are consistent across the two paradigms. When presented with location information, participants preferred to create nodes to represent cities, rather than use metadata; although the preference was less marked for Cypher. In Cypher, participants showed little difference in preference between representing dates or population size as nodes. In RDF*, this choice was not necessary since both could be represented as literals. However, there was a significant preference for using the date as metadata to describe a triple containing population size, rather than vice versa. There was no significant difference overall in accuracy of interpretation of queries in the two paradigms; although in one specific case, the use of a reverse arrow in Cypher was interpreted significantly more accurately than the ^ symbol in SPARQL. Based on our results and on the comments of participants, we make some recommendations for modellers. Techniques for reifing RDF have attracted a great deal of research. Recently, a hybrid approach, employing some of the features of property graphs, has claimed to offer an improved technique for RDF reification. Query-time reasoning is also a requirement which has prompted a number of proposed extensions to SPARQL and which is only possible to a limited extent in the property graph paradigm. Another recent development, the hypergraph paradigm enables more powerful query-time reasoning. There is a need for more research into the user acceptance of these various more powerful approaches to modelling and querying. Such research should take account of complex modelling situations.
... And under-fetching (occurs when the data provider does not offer in a query all the information that the client needs, therefore it must make more requests to obtain the complete information) [6]. Several technological options have emerged to improve the REST problems, such as SPARQL, Cypher, Gremlin, and the most popular of these GraphQL [7]. GraphQL is a query language and execution engine for data in client-server applications that has been accepted in the technology community because it was developed and used in the products of the company Facebook [8]. ...
The software development trend uses service-oriented software architecture (SOA), which provides efficiency, agility, and ease of growth. The architectural design most commonly used in SOA application development is REST (Representational State Transfer); however, some data management problems have been identified in its Application Programming Interface called API-REST. Several technological options have emerged to appease these problems, such as SPARQL, Cypher, Gremlin, and the most popular GraphQL. GraphQL was developed by Facebook in 2012 and released in 2015 to the community as an open-source project, used by companies such as GitHub, Airbnb, Amazon, Apollo, IBM, and Facebook. The goal of this research is to demonstrate whether GraphQL implementations work. Therefore, we based the research design on Design Science Research (DSR) to evaluate the quality-in-use of a GraphQL implementation that automated the systematic mapping studies (SMS) process for technology researchers at Universidad Técnica del Norte - Ecuador. We used the ISO/IEC 25000 series of standards to evaluate the quality in use; the results showed that the implementation met 84.11% of the established quality model’s expected value. The detailed evaluation by quality characteristics was: Effectiveness 96.62%, Efficiency 78.90%, and Satisfaction 70.26%.
... In industry, aside from SPARQL, many other graph query languages such as Cypher (Francis et al., 2018) and Gremlin (Rodriguez, 2015) are equally or even more commonly used in graph database interaction (Angles, 2012;Seifer et al., 2019). However, most graph query semantic parsing works only support SPARQL (Talmor and Berant, 2018;Dubey et al., 2019;Keysers et al., 2020) Figure 1: A a property graph extracted from Wikidata (Vrandecic and Krötzsch, 2014). ...
... Low-resource Generalization To verify whether GraphQ IR can aid the semantic parsing of lowresource languages, we reconstruct the METAQA dataset into Cypher, a graph query language commonly used in the industry but rarely studied in previous semantic parsing works (Seifer et al., 2019). To simulate the low-resource scenario, we adjust the data split to ensure that only 1, 3, and 5 samples of each question type appear in the training set under the 1-, 3-, and 5-shot settings. ...
... Recent property-graph data model (and query language) proposals include G-CORE [2] and the upcoming GQL standard [12], as well as the recently established openCypher standard [17]. They have attracted a lot of research interest and popularity in practical use-cases [19]. ...
Property graphs constitute data models for representing knowledge graphs. They allow for the convenient representation of facts, including facts about facts, represented by triples in subject or object position of other triples. Knowledge graphs such as Wikidata are created by a diversity of contributors and a range of sources leaving them prone to two types of errors. The first type of error, falsity of facts, is addressed by property graphs through the representation of provenance and validity, making triples occur as first-order objects in subject position of metadata triples. The second type of error, violation of domain constraints, has not been addressed with regard to property graphs so far. In RDF representations, this error can be addressed by shape languages such as SHACL or ShEx, which allow for checking whether graphs are valid with respect to a set of domain constraints. Borrowing ideas from the syntax and semantics definitions of SHACL, we design a shape language for property graphs, ProGS, which allows for formulating shape constraints on property graphs including their specific constructs, such as edges with identities and key-value annotations to both nodes and edges. We define a formal semantics of ProGS, investigate the resulting complexity of validating property graphs against sets of ProGS shapes, compare with corresponding results for SHACL, and implement a prototypical validator that utilizes answer set programming.