
Christian L. Staudt- Dr. rer. nat. (=PhD)
- Analyst at https://clstaudt.me/
Christian L. Staudt
- Dr. rer. nat. (=PhD)
- Analyst at https://clstaudt.me/
About
35
Publications
57,215
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,002
Citations
Introduction
I have left academia for industry data science.
My PhD research focused on algorithms for the analysis of large complex networks, many of which are part of the open-source software package NetworKit (http://network-analysis.info).
Current institution
https://clstaudt.me/
Current position
- Analyst
Additional affiliations
October 2012 - June 2016
Publications
Publications (35)
We introduce NetworKit, an open-source software package for analyzing the
structure of large complex networks. Appropriate algorithmic solutions are
required to handle increasingly common large graph data sets containing up to
billions of connections. We describe the methodology applied to develop
scalable solutions to network analysis problems, in...
Research on generative models plays a central role in the emerging field of network science, studying how statistical patterns found in real networks can be generated by formal rules. During the last two decades, a variety of models has been proposed with an ultimate goal of achieving comprehensive realism for the generated networks. In this study,...
Research on generative models plays a central role in the emerging field of network science, studying how statistical patterns found in real networks could be generated by formal rules. Output from these generative models is then the basis for designing and evaluating computational methods on networks including verification and simulation studies....
Research on generative models plays a central role in the emerging field of network science, studying how statistical patterns found in real networks can be generated by formal rules. During the last two decades, a variety of models has been proposed with an ultimate goal of achieving comprehensive realism for the generated networks. In this study,...
Complex networks are relational data sets commonly represented as graphs. The analysis of their intricate structure is relevant to many areas of science and commerce, and data sets may reach sizes that require distributed storage and processing. We describe and compare programming models for distributed computing with a focus on graph algorithms fo...
Sparsification reduces the size of networks while preserving structural and statistical properties of interest. Various spar-sifying algorithms have been proposed in different contexts. We contribute the first systematic conceptual and experimental comparison of edge sparsification methods on a diverse set of network properties. It is shown that th...
A partition is a subdivision of a set into disjoint subsets called parts. Partitions are instrumental in tasks such as classification, pattern recognition and network analysis. In the past, a wide spectrum of similarity measures for pairs of partitions P = {P1,. .. , P |P| } and P = {P 1 ,. .. , P |P | } of the same set V have been proposed. Such a...
Complex networks are relational data sets commonly represented as graphs.
The analysis of their intricate structure is relevant to many areas of science and commerce, and data sets may reach sizes that require distributed storage and processing.
We describe and compare programming models for distributed computing with a focus on graph algorithms fo...
Complex networks are relational data sets commonly represented as graphs. The analysis of their intricate structure is relevant to many areas of science and commerce, and data sets may reach sizes that require distributed storage and processing. We describe and compare programming models for distributed computing with a focus on graph algorithms fo...
Sparsification reduces the size of networks while preserving structural and statistical properties of interest. Various sparsifying algorithms have been proposed in different contexts. We contribute the first systematic conceptual and experimental comparison of edge sparsifi-cation methods on a diverse set of network properties. It is shown that th...
We present NetworKit, an open-source software package for high-performance analysis of large complex networks. Complex networks are equally attractive and challenging targets for data mining, and novel algorithmic solutions, including parallelization, are required to handle data sets containing billions of connections. Our goal for NetworKit is to...
We present NetworKit, an open-source software package for high-performance analysis of large complex networks. Complex networks are equally attractive and challenging targets for data mining, and novel algorithmic solutions, including parallelization, are required to handle data sets containing billions of connections. Our goal for NetworKit is to...
Complex networks have become increasingly popular for modeling real-world
phenomena, ranging from web hyperlinks to interactions between people.
Realistic generative network models are important in this context as they avoid
privacy concerns of real data and simplify complex network research regarding
data sharing, reproducibility, and scalability...
The amount of graph-structured data has recently experienced an enormous growth in many applications. To transform such data into useful information, fast analytics algorithms and software tools are necessary. One common graph analytics kernel is disjoint community detection (or graph clustering). Despite extensive research on heuristic solvers for...
The detection of communities (internally dense sub-graphs) is a network analysis task with manifold applications. The special task of selective community detection is concerned with finding high-quality communities locally around seed nodes. Given the lack of conclusive experimental studies, we perform a systematic comparison of different previousl...
Betweenness centrality ranks the importance of nodes by their participation
in all shortest paths of the network. Therefore computing exact betweenness
values is impractical in large networks. For static networks, approximation
based on randomly sampled paths has been shown to be significantly faster in
practice. However, for dynamic networks, no a...
Watch the recording: http://www.youtube.com/watch?v=RtZyHCGyeIk
We introduce NetworKit, an open-source software package for high-performance
analysis of large complex networks. Complex networks are equally
attractive and challenging targets for data mining, and novel
algorithmic solutions, including parallelization, are required to handle
data sets containing billions of connections. Our goal for NetworKit is
t...
The amount of graph-structured data has recently experienced an enormous growth in many applications. To transform such data into useful information, high-performance analytics algorithms and software tools are necessary. One common graph analytics kernel is community detection (or graph clustering). Despite extensive research on heuristic solvers...
Collaboration networks arise when we map the connections between scientists
which are formed through joint publications. These networks thus display the
social structure of academia, and also allow conclusions about the structure of
scientific knowledge. Using the computer science publication database DBLP, we
compile relations between authors and...
Maximizing the quality index modularity has become one of the primary methods for identifying the clus-tering structure within a graph. Since many contemporary networks are not static but evolve over time, traditional static approaches can be inappropriate for specific tasks. In this work, we pioneer the NP-hard problem of online dynamic modularity...
A planted partition graph is an Erd ̋os-R ́enyi type random graph, where, based on a given partition of the vertex set, vertices in the same part are linked with a higher probability than vertices in different parts. Graphs of this type are frequently used to evaluate graph clustering algorithms, i.e., algorithms that seek to partition the vertex s...
n scientometrics, the quantitative study of science, network analysis has become a prominent tool. The kinds of networks most frequently examined have been citation networks (mapping links between publications based on references) and collaboration networks (mapping the collaborative relationships between researchers based on joint publications).Co...
Maximizing the quality index modularity has become one of the primary methods for identifying the clustering structure within a graph. As contemporary networks are
not static but evolve over time, traditional static approaches can be inappropriate for specific tasks. In this work we pioneer
the NP-hard problem of online dynamic modularity maximizat...
The experimental evaluation of many graph algorithms for practical use involves both tests on real- world data and on artificially generated data sets. In particular the latter are useful for systematic and very specific evaluations. Roughly speaking, we are interested in the generation of dynamic random graphs that feature a community structure of...
Questions
Questions (7)
A line of research (e.g. [1]) has considered the question of self-similarity and self-dissimilarity in complex networks.
Very generally, a self-similar object is similar to a part of itself. A natural application of this definition to networks would be to say that a network is self-similar if its graph is composed of subgraphs that are structurally similar to the graph as a whole. Still, this is not yet a precise, quantifiable definition - for instance, what does it mean to say that the graph is "composed" of subgraphs?
Please comment and discuss.
Humans are good at spotting patterns visually, which can be used for exploratory data analysis, but we can be overwhelmed by visual complexity. The usual drawing methods for networks result in pretty pictures that illustrate "complexity", though arguably not much else [1]. Typical graph drawing methods are inadequate for large networks with thousands of edges (and many interesting data sets are considerably larger than that), as they mostly result in "hairballs".
It seems to me that in order to draw large networks meaningfully, we have to go beyond the typical node-link diagram with some type of force-directed layout. Ideas in this direction are hive plots [2] and hierarchical edge bundling [3].
Which visualization methods do you know that are able to expose important features of large complex networks more effectively?
There is basically no limit to the phenomena that can be modeled and analyzed in terms of complex networks - entities and their relationships between which can be represented as the nodes and edges of a graph, and which form a non-trivial pattern. So let's make this a small survey:
- Where in your research do you employ complex networks and network analysis methods?
- What are your data sources? How big are they?
- Which tools do you use for the network analysis process?
- What did you learn from the network analysis?
Nowadays complex network data sets are reaching enormous sizes, so that their analysis challenges algorithms, software and hardware. One possible approach to this is to reduce the amount of data while preserving important information. I'm specifically interested in methods that, given a complex network as input, filter out a significant fraction of the edges while preserving structural properties such as degree distribution, components, clustering coefficients, community structure and centrality. Another term used in this context is "backbone", a subset of important edges that represent the network structure.
There are methods to sparsify/filter/sample edges and preserve a specific property. But are there any methods that aim to preserve a large set of diverse properties?
In the wake of the Snowden leaks, the public has learned that the sheer technical scale of surveillance goes beyond what has usually been thought possible, such as a "full take" of internet communication. The picture emerges that intelligence agencies are prominent users and drivers of "big data" technology. But from the media reporting, capabilities remain rather unclear, certainly also because of the variety of different surveillance programs, the complexity of the technology, and of course secrecy.
I would like to know more about what the technologies employed are and how far technical capabilities go. Since it may be a vast topic, let me narrow the question down: I would like to learn about data analysis capabilities, not about the use of cryptology or computer security research for surveillance. The focus is on mass surveillance rather than spying on specific target persons. I am interested in efforts to make society as a whole "machine-readable" by using techniques such as data mining, machine learning, social network analysis etc.
Especially interesting are things like:
- the intent to build a comprehensive social graph of large population segments [1]
- the use of social media and other data streams to anticipate political events such as protests, civil unrest etc., e.g. [2]
Can you point to publications? Work done in the service of surveillance and contributions of computer scientists to the critical debate around the issue would be equally interesting.