André Petermann's research while affiliated with University of Leipzig and other places

Publications (16)

Article
Full-text available
We demonstrate Gradoop, an open source framework that combines and extends features of graph database systems with the benefits of distributed graph processing. Using a rich graph data model and powerful graph operators, users can declaratively express graph analytical programs for distributed execution without needing advanced programming experien...
Conference Paper
Full-text available
Despite the growing popularity of techniques related to graph summarization, a general operator for the flexible nesting of graphs is still missing. We propose a novel nested graph data model and a powerful graph nesting operator. In contrast to existing approaches, our approach is able to summarize vertices and paths among vertex groups within a s...
Conference Paper
Transactional frequent subgraph mining identifies frequent structural patterns in a collection of graphs. This research problem has wide applicability and increasingly requires higher scalability over single machine solutions to address the needs of Big Data use cases. We introduce DIMSpan, an advanced approach to frequent subgraph mining that util...
Conference Paper
Full-text available
Frequent pattern mining is an important research field and can be applied to different labeled data structures ranging from itemsets to graphs. There are scenarios where a label can be assigned to a taxonomy and generalized patterns can be mined by replacing labels by their ancestors. In this work, we propose a novel approach to generalized frequen...
Conference Paper
Graph pattern matching is an important and challenging operation on graph data. Typical use cases are related to graph analytics. Since analysts are often non-programmers, a graph system will only gain acceptance, if there is a comprehensible way to declare pattern matching queries. However, respective query languages are currently only supported b...
Conference Paper
Full-text available
Property graphs are an intuitive way to model, analyze and visualize complex relationships among heterogeneous data objects, for example, as they occur in social, biological and information networks. These graphs typically contain thousands or millions of vertices and edges and their entire representation can easily overwhelm an analyst. One way to...
Article
Full-text available
Transactional frequent subgraph mining identifies frequent subgraphs in a collection of graphs. This research problem has wide applicability and increasingly requires higher scalability over single machine solutions to address the needs of Big Data use cases. We introduce DIMSpan, an advanced approach to frequent subgraph mining that utilizes the f...
Chapter
Full-text available
Many big data applications in business and science require the management and analysis of huge amounts of graph data. Suitable systems to manage and to analyze such graph data should meet a number of challenging requirements including support for an expressive graph data model with heterogeneous vertices and edges, powerful query and graph mining c...
Conference Paper
Full-text available
Graph grouping supports data analysts in decision making based on the characteristics of large-scale, heterogeneous networks containing millions or even billions of vertices and edges. We demonstrate graph grouping with Gradoop, a scalable system supporting declarative programs composed from multiple graph operations. Using social network data, we...
Conference Paper
Full-text available
Graphs are an intuitive way to model complex relationships between real-world data objects. Thus, graph ana-lytics plays an important role in research and industry. As graphs often reflect heterogeneous domain data, their representation requires an expressive data model including the abstraction of graph collections, for example, to analyze communi...
Article
Using graph data models for business intelligence applications is a novel and promising approach. In contrast to traditional data warehouse models, graph models enable the mining of relationship patterns. In our prior work, we introduced an approach to graph-based data integration and analytics called BIIIG (Business Intelligence with Integrated In...
Conference Paper
Full-text available
We present FoodBroker, a new data generator for benchmarking graph-based business intelligence systems and approaches. It covers two realistic business processes and their involved master and transactional data objects. The interactions are correlated in controlled ways to enable non-uniform distributions for data and relationships. For benchmarkin...
Article
Full-text available
Many Big Data applications in business and science require the management and analysis of huge amounts of graph data. Previous approaches for graph analytics such as graph databases and parallel graph processing systems (e.g., Pregel) either lack sufficient scalability or flexibility and expressiveness. We are therefore developing a new end-to-end...
Article
Full-text available
We demonstrate BIIIG (Business Intelligence with Integrated Instance Graphs), a new system for graph-based data integration and analysis. It aims at improving business analytics compared to traditional OLAP approaches by comprehensively tracking relationships between entities and making them available for analysis. BIIIG supports a largely automati...
Conference Paper
Full-text available
We propose a new graph-based framework for business intelligence called BIIIG supporting the flexible evaluation of relationships between data instances. It builds on the broad availability of interconnected objects in existing business information systems. Our approach extracts such interconnected data from multiple sources and integrates them int...

Citations

... The black dashed edges in the figure provide the output of the link creation. In vertex join as graph fusion [17] (generalized in [6]), the two graphs should be distinct connected components of the same graph and the matching vertices are now fused into one single vertex (Figure 1d). Path joins such as SMJoin [12] are used for traversing one single graph operand at a time [13,21], where each traversed or matched path ⇝ is replaced by one single edge → : this is possible because both the vertex set and edge set are considered as relational tables, and so the path traversal or match can be implemented as a multi-way join producing one single table, thus providing the new edges. ...
... The black dashed edges in the figure provide the output of the link creation. In vertex join as graph fusion [17] (generalized in [6]), the two graphs should be distinct connected components of the same graph and the matching vertices are now fused into one single vertex (Figure 1d). Path joins such as SMJoin [12] are used for traversing one single graph operand at a time [13,21], where each traversed or matched path ⇝ is replaced by one single edge → : this is possible because both the vertex set and edge set are considered as relational tables, and so the path traversal or match can be implemented as a multi-way join producing one single table, thus providing the new edges. ...
... For frequent pattern mining, specialisation involves the extension of the set with a new frequent item [26]. For multi-dimensional frequent sub-graph mining, the specialization entails either the specialization of the nodes appearing in a graph pattern with more specific terms according to a taxonomy [27] or the extension of the graph with further edges and nodes [28]. We can always slightly refine this algorithm to return the S-border containing maximally specific solutions within a data mining solution, thus returning the most specific hypotheses that still satisfy the quality criterion: ...
... While simplistic mining algorithms [20,21,25] enumerate and test all possible hypotheses in L h (Algorithm 1), more efficient ones [5,26,27] represent the whole hypothesis space as a lattice induced by the relationship while traversing from the most general hypothesis ( , Algorithm 2 Line 2) toward the most specific ones, while still validating the quality conditions, which are then returned (Line 7). These last solutions require the definition of a specialization operator ρ. ...
... We have to use the optional RETURN statement if we need to construct a graph as output. The syntax of GQL's RETURN clause is given in Query 4. GQL follows the definition in Cypher and also provides multiple semantics for graph pattern matching [121,122], i.e., isomorphism and homomorphism. In addition, GQL borrows the syntax of regular path expressions from PGQL [66], i.e., "(a:Person)-/:friend_of*/->(a:Person)". ...
... To overcome the limitations of these system approaches and to combine their strengths, we started in 2015 already the development of a new open-source 2 distributed graph analysis platform called Gradoop (Graph Analytics on Hadoop) that has continuously been extended in the last years [29,[39][40][41]43,65,66]. Gradoop is a distributed platform to achieve high scalability and parallel graph processing. ...
... It seems, that around the year 2008 the general interest in novel algorithms faded and many people moved on to parallelizing existing algorithms [cf. 17,27] or solving different problem formulations using variations of the existing algorithms. Jiang et al. [17] concluded in 2013 that this is due to the maturity of the field. ...
... GDBs have been researched in both academia and industry [10,11,75,90,100,124,133], in terms of query languages [7,8,49,208], database management [49, 120,156,172,175], compression and data models [23,35,40,43,146,147,162], execution in novel environments such as the serverless setting [71,150,219], and others [77]. Many graph databases exist [6, 12-15, 46, 55, 56, 59, 65, 73, 78, 82, 87, 89, 125, 126, 151, 154, 155, 164-168, 178, 182, 183, 202, 209, 211, 214, 217, 218, 233, 238, 240]. ...
... Both of our methods are based on the gSpan algorithm [36], in particular an extension supporting directed multigraphs [24]. The most fundamental component of gSpan is the use of DFS codes to represent graph patterns in a canonical form. ...
... The foundation of GRADOOP is ongoing research in the fields of graph data management, graph analytics and graph mining. Regarding the latter, we investigate in methods using more meaningful interestingness measures than minimum support, for example, to identify patterns correlating with certain business measure values [20]. Due to the huge data volume in many applications and the high computational complexity of graph pattern mining, our second focus is the efficient parallelization of graph algorithms using shared-nothing clusters. ...