Pascal Welke

Pascal Welke
  • Dr.
  • PostDoc Position at TU Wien

About

46
Publications
20,813
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
881
Citations
Introduction
Pascal Welke currently works at the Institute for Computer Science III, University of Bonn. Pascal does research in Data Mining, Applied Graph Theory, Machine Learning, and Human-computer Interaction. He wrote his PhD thesis on 'Probabilistic Frequent Subtree Mining'.
Current institution
TU Wien
Current position
  • PostDoc Position

Publications

Publications (46)
Preprint
Full-text available
We investigate the distance function learned by message passing neural networks (MPNNs) in specific tasks, aiming to capture the functional distance between prediction targets that MPNNs implicitly learn. This contrasts with previous work, which links MPNN distances on arbitrary tasks to structural distances on graphs that ignore task-specific info...
Chapter
Full-text available
Nonwoven materials, characterized by a random fiber structure, are essential for various applications including insulation and filtering. An industrial long-term goal is to establish a framework for the simulation-based design of nonwovens. Due to the random structures, simulations of material properties on fiber network level are computational exp...
Conference Paper
Full-text available
We present a logic based interpretable model for learning on graphs and an algorithm to distill this model from a Graph Neural Network (GNN). Recent results have shown connections between the expressivity of GNNs and the two-variable fragment of first-order logic with counting quantifiers (C2). We introduce a decision-tree based model which leverag...
Preprint
Full-text available
We present a logic based interpretable model for learning on graphs and an algorithm to distill this model from a Graph Neural Network (GNN). Recent results have shown connections between the expressivity of GNNs and the two-variable fragment of first-order logic with counting quantifiers (C2). We introduce a decision-tree based model which leverag...
Chapter
Full-text available
The Rashomon Effect describes the following phenomenon: for a given dataset there may exist many models with equally good performance but with different solution strategies. The Rashomon Effect has implications for Explainable Machine Learning, especially for the comparability of explanations. We provide a unified view on three different comparison...
Preprint
Full-text available
The Rashomon Effect describes the following phenomenon: for a given dataset there may exist many models with equally good performance but with different solution strategies. The Rashomon Effect has implications for Explainable Machine Learning, especially for the comparability of explanations. We provide a unified view on three different comparison...
Preprint
Full-text available
We investigate novel random graph embeddings that can be computed in expected polynomial time and that are able to distinguish all non-isomorphic graphs in expectation. Previous graph embeddings have limited expressiveness and either cannot distinguish all graphs or cannot be computed efficiently for every graph. To be able to approximate arbitrary...
Preprint
Full-text available
Skilled employees are usually seen as the most important pillar of an organization. Despite this, most organizations face high attrition and turnover rates. While several machine learning models have been developed for analyzing attrition and its causal factors, the interpretations of those models remain opaque. In this paper, we propose the HR-DSS...
Chapter
Full-text available
In this work, we propose a method for computing generalized frequent subgraph patterns which is based on the graph edit distance. Graph data is often equipped with semantic information in form of an ontology, for example when dealing with linked data or knowledge graphs. Previous work suggests to exploit this semantic information in order to comput...
Preprint
Full-text available
"Leichte Sprache", the German counterpart to Simple English, is a regulated language aiming to facilitate complex written language that would otherwise stay inaccessible to different groups of people. We present a new sentence-aligned monolingual corpus for Simple German -- German. It contains multiple document-aligned sources which we have aligned...
Article
Full-text available
Nonwoven fiber materials are omnipresent in diverse applications including insulation, clothing and filtering. Simulation of material properties from production parameters is an industry goal but a challenging task. We developed a machine learning based approach to predict the tensile strength of nonwovens from fiber lay-down settings via a regress...
Preprint
Full-text available
Most modern language models infer representations that, albeit powerful, lack both compositionality and semantic interpretability. Starting from the assumption that a large proportion of semantic content is necessarily relational, we introduce a neural language model that discovers networks of symbols (schemata) from text datasets. Using a variatio...
Article
Full-text available
After more than one decade, Weisfeiler-Lehman graph kernels are still among the most prevalent graph kernels due to their remarkable predictive performance and time complexity. They are based on a fast iterative partitioning of vertices, originally designed for deciding graph isomorphism with one-sided error. The Weisfeiler-Lehman graph kernels ret...
Article
Full-text available
The majority of popular graph kernels is based on the concept of Haussler's R-convolution kernel and defines graph similarities in terms of mutual substructures. In this work, we enrich these similarity measures by considering graph filtrations: Using meaningful orders on the set of edges, which allow to construct a sequence of nested graphs, we ca...
Article
Nonwoven materials consist of random fiber structures. They are essential to diverse application areas such as clothing, insulation and filtering. A long term goal in industry is the simulation-based optimization of material properties in dependence of the manufacturing parameters. Recent works developed a framework to predict tensile strength prop...
Preprint
Full-text available
The majority of popular graph kernels is based on the concept of Haussler's $\mathcal{R}$-convolution kernel and defines graph similarities in terms of mutual substructures. In this work, we enrich these similarity measures by considering graph filtrations: Using meaningful orders on the set of edges, which allow to construct a sequence of nested g...
Technical Report
Full-text available
We approach least squares optimization from the point of view of gradient flows. As a practical example, we consider a simple linear regression problem, set up the corresponding differential equation, and show how to solve it using SciPy.
Technical Report
Full-text available
We show how max-sum diversification can be used to solve the-clique problem, a well-known NP-complete problem. This reduction proves that max-sum diversification is NP-hard and provides a simple and practical method to find cliques of a given size using Hopfield networks.
Preprint
Full-text available
This survey presents an overview of integrating prior knowledge into machine learning systems in order to improve explainability. The complexity of machine learning models has elicited research to make them more explainable. However, most explainability methods cannot provide insight beyond the given data, requiring additional information about the...
Technical Report
Full-text available
Having previously considered sorting as a linear programming problem, we now cast it as a quadratic unconstrained binary optimization problem (QUBO). Deriving this formulation is a bit cumbersome but it allows for implementing neural networks or even quantum computing algorithms that sort. Here, however, we consider a simple greedy QUBO solver and...
Technical Report
Full-text available
Linear programming is a surprisingly versatile tool. That is, many problems we would not usually think of in terms of a linear programming problem can actually be expressed as such. In this note, we show that sorting is such a problem and discuss how to solve linear programs for sorting using SciPy.
Technical Report
Full-text available
Having previously discussed how SciPy allows us to solve linear programs, we can study further applications of linear programming. Here, we consider least absolute deviation regression and solve a simple parameter estimation problem deliberately chosen to expose potential pitfalls in using SciPy's optimization functions.
Technical Report
Full-text available
This note discusses how to solve linear programming problems with SciPy. As a practical use case, we consider the task of computing the Chebyshev center of a bounded convex polytope.
Preprint
The Weisfeiler-Lehman graph kernels are among the most prevalent graph kernels due to their remarkable time complexity and predictive performance. Their key concept is based on an implicit comparison of neighborhood representing trees with respect to equality (i.e., isomorphism). This binary valued comparison is, however, arguably too rigid for def...
Article
Full-text available
We test the hypothesis that the extent to which one obtains information on a given topic through Wikipedia depends on the language in which it is consulted. Controlling the size factor, we investigate this hypothesis for a number of 25 subject areas. Since Wikipedia is a central part of the web-based information landscape, this indicates a language...
Preprint
Full-text available
We test the hypothesis that the extent to which one obtains information on a given topic through Wikipedia depends on the language in which it is consulted. Controlling the size factor, we investigate this hypothesis for a number of 25 subject areas. Since Wikipedia is a central part of the web-based information landscape, this indicates a language...
Article
Full-text available
Motivated by the impressive predictive power of simple patterns, we consider the problem of mining frequent subtrees in arbitrary graphs. Although the restriction of the pattern language to trees does not resolve the computational complexity of frequent subgraph mining, in a recent work we have shown that it gives rise to an algorithm generating pr...
Chapter
Full-text available
One of the main differences between inductive logic programming (ILP) and graph mining lies in the pattern matching operator applied: While it is mainly defined by relational homomorphism (i.e., subsumption) in ILP, subgraph isomorphism is the most common pattern matching operator in graph mining. Using the fact that subgraph isomorphisms are injec...
Article
Full-text available
Frequent subgraphs proved to be powerful features for graph classification and prediction tasks. Their practical use is, however, limited by the computational intractability of pattern enumeration and that of graph embedding into frequent subgraph feature spaces. We propose a simple probabilistic technique that resolves both limitations. In particu...
Article
Full-text available
We describe some necessary conditions for the existence of a Hamiltonian path in any graph (in other words, for a graph to be traceable). These conditions result in a linear time algorithm to decide the Hamiltonian path problem for cactus graphs. We apply this algorithm to several molecular databases to report the numbers of graphs that are traceab...
Conference Paper
Full-text available
We consider the problem of affinity prediction for protein ligands. For this purpose, small molecule candidates can easily become regression algorithm inputs if they are represented as vectors indexed by a set of physico-chemical properties or structural features of their molecular graphs. There are plenty of so-called molecular fingerprints, each...
Conference Paper
Full-text available
We propose a fast algorithm for approximating graph similarities. For its advantageous semantic and algorithmic properties, we define the similarity between two graphs by the Jaccard-similarity of their images in a binary feature space spanned by the set of frequent subtrees generated for some training dataset. Since the feature space embedding is...
Conference Paper
Full-text available
Tracking users across websites and apps is as desirable to the marketing industry as it is unalluring to users. The central challenge lies in identifying users from the perspective of different apps/sites. While there are methods to identify users via technical settings of their phones, these are prone to countermeasures. Yet, in this paper, we sho...
Conference Paper
Full-text available
We propose a new probabilistic graph kernel. It is defined by the set of frequent subtrees generated from a small random sample of spanning trees of the transaction graphs. In contrast to the ordinary frequent subgraph kernel it can be computed efficiently for any arbitrary graphs. Due to its probabilistic nature, the embedding function correspondi...
Chapter
Full-text available
We study the complexity of frequent subtree mining in very simple graphs beyond forests. We show for d-tenuous outerplanar graphs that frequent subtrees can be listed with polynomial delay if the cycle degree, i.e., the maximum number of blocks that share a common vertex, is bounded by some constant. The crucial step in the proof of this positive r...
Article
Spectral vegetation indices (SVIs) have been shown to be useful for an indirect detection of plant diseases. However, these indices have not been evaluated to detect or to differentiate between plant diseases on crop plants. The aim of this study was to develop specific spectral disease indices (SDIs) for the detection of diseases in crops. Sugar b...

Network

Cited By