About
221
Publications
17,318
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
3,178
Citations
Citations since 2017
Publications
Publications (221)
In this work we propose a low rank approximation of areal, particularly three dimensional, data utilizing additional weights. This way we enable effective compression if additional information indicates that parts of the data are of higher interest than others. The guiding example are high fidelity finite element simulations of an abdominal aortic...
Image classification is often prone to labelling uncertainty. To generate suitable training data, images are labelled according to evaluations of human experts. This can result in ambiguities, which will affect subsequent models. In this work, we aim to model the labelling uncertainty in the context of remote sensing and the classification of satel...
Spatial smoothing makes use of spatial information to obtain better estimates in regression models. In particular flexible smoothing with B-splines and penalties, which has been propagated by Eilers and Marx (1996) , provides strong tools that can be used to include available spatial information. We consider alternative smoothing methods in spatial...
The COVID-19 pandemic brought upon a massive wave of disinformation, exacerbating polarization in the increasingly divided landscape of online discourse. In this context, popular social media users play a major role, as they have the ability to broadcast messages to large audiences and influence public opinion. In this paper, we make use of openly...
Zusammenfassung
Das neue Mietspiegelgesetz erlaubt die Berücksichtigung von sogenannten außergesetzlichen Merkmalen wie Mietdauer und Vermietertyp bei der Erstellung von Mietspiegeln. Diese außergesetzlichen Merkmale dürfen in zukünftigen Mietspiegeln bei deren Erstellung und Modellwahl Einfluss finden, nicht aber im konkreten Mietspiegelmodell. Di...
Quantifying the number of deaths caused by the COVID-19 crisis has been an ongoing challenge for scientists, and no golden standard to do so has yet been established. We propose a robust approach to calculate age-adjusted yearly excess mortality, and apply it to obtain estimates and uncertainty bounds for 28 countries with publicly available data....
Machine Learning and Deep Learning have achieved an impressive standard today, enabling us to answer questions that were inconceivable a few years ago. Besides these successes, it becomes clear, that beyond pure prediction, which is the primary strength of most supervised machine learning algorithms, the quantification of uncertainty is relevant an...
With the beginning of the COVID-19 pandemic, we became aware of the need for comprehensive data collection and its provision to scientists and experts for proper data analyses. In Germany, the Robert Koch Institute (RKI) has tried to keep up with this demand for data on COVID-19, but there were (and still are) relevant data missing that are needed...
The use of language is innately political and often a vehicle of cultural identity as well as the basis for nation building. Here, we examine language choice and tweeting activity of Ukrainian citizens based on more than 4 million geo-tagged tweets from over 62,000 users before and during the Russian-Ukrainian War, from January 2020 to October 2022...
In this work we propose a low rank approximation of high fidelity finite element simulations by utilizing weights corresponding to areas of high stress levels for an abdominal aortic aneurysm, i.e. a deformed blood vessel. We focus on the van Mises stress, which corresponds to the rupture risk of the aorta. This is modeled as a Gaussian Markov rand...
This paper focuses on the comparison of networks on the basis of statistical inference. For that purpose, we rely on smooth graphon models as a nonparametric modeling strategy that is able to capture complex structural patterns. The graphon itself can be viewed more broadly as density or intensity function on networks, making the model a natural ch...
To explore the driving forces behind innovation, we analyse the dynamic bipartite network of all inventors and patents registered within the field of electrical engineering in Germany in the past two decades. To deal with the sheer size of the data, we decompose the network by exploiting the fact that most inventors tend to only stay active for a r...
Networks are ubiquitous in economic research on organizations, trade, and many other topics. However, while economic theory extensively considers networks, no general framework for their empirical modeling has yet emerged. We thus introduce two different statistical models for this purpose -- the Exponential Random Graph Model (ERGM) and the Additi...
Over the course of the COVID-19 pandemic, Generalized Additive Models (GAMs) have been successfully employed on numerous occasions to obtain vital data-driven insights. In this article we further substantiate the success story of GAMs, demonstrating their flexibility by focusing on three relevant pandemic-related issues. First, we examine the inter...
As early as March 2020, the authors of this letter started to work on surveillance data to obtain a clearer picture of the pandemic’s dynamic. This letter outlines the lessons learned during this peculiar time, emphasizing the benefits that better data collection, management, and communication processes would bring to the table. We further want to...
As relational event models are an increasingly popular model for studying relational structures, the reliability of large-scale event data collection becomes more and more important. Automated or human-coded events often suffer from non-negligible false-discovery rates in event identification. And most sensor data are primarily based on actors’ spa...
The COVID-19 pandemic brought upon a massive wave of disinformation, exacerbating polarization in the increasingly divided landscape of online discourse. In this context, popular social media users play a major role, as they have the ability to broadcast messages to large audiences, thus influencing public opinion. We make use of publicly available...
We examine the inclusion of specific nodal random effects for first- and second-mode nodes towards an ERGM for bipartite networks. The inclusion of such node-specific random effects in the ERGM accounts for unobserved heterogeneity in the bipartite network and ensures stable estimation results, especially for large-scale bipartite networks. Moreove...
Zusammenfassung
Ziel der Studie Die Arbeit untersucht den Effekt der Maßnahme verpflichtender Covid-19 Tests für den Präsenzunterricht an Schulen. In Bayern gilt diese Testpflicht seit Ende der Osterferien 2021. Für die erste Woche nach den Osterferien ergibt sich ein natürliches Experiment, das uns erlaubt den Effekt der Testpflicht an Schule auf...
The authors make an important contribution presenting a comprehensive and thoughtful overview about the many different aspects of data, statistics and data analyses in times of the recent COVID-19 pandemic discussing all relevant topics. The paper certainly provides a very valuable reflection of what has been done, what could have been done and wha...
Substantive research in the Social Sciences regularly investigates signed networks, where edges between actors are either positive or negative. For instance, schoolchildren can be friends or rivals, just as countries can cooperate or fight each other. This research often builds on structural balance theory, one of the earliest and most prominent ne...
The analysis of network data has gained considerable interest in the recent years. This also includes the analysis of large, high dimensional networks with hundreds and thousands of nodes. While Exponential Random Graph Models (ERGMs) serve as workhorse for network data analyses, their applicability to very large networks is problematic via classic...
The paper proposes the combination of stochastic blockmodels with smooth graphon models. The first allow for partitioning the set of individuals in a network into blocks which represent groups of nodes that presumably connect stochastically equivalently, therefore often also called communities. Smooth graphon models instead assume that the network'...
In this short note, we apply the method of De Nicola et al. (2022) to the most recent available data, thereby providing up-to-date estimates of all-cause excess mortality in Germany for 2021. The analysis reveals a preliminary excess mortality of approximately 2.3% for the calendar year considered. The excess is mainly driven by significantly highe...
The presence of unobserved node-specific heterogeneity in exponential random graph models (ERGM) is a general concern, both with respect to model validity as well as estimation instability. We, therefore, include node-specific random effects in the ERGM that account for unobserved heterogeneity in the network. This leads to a mixed model with param...
We analyse the bipartite dynamic network of inventors and patents registered within the main area of electrical engineering in Germany to explore the driving forces behind innovation. The data at hand leads to a bipartite network, where an edge between an inventor and a patent is present if the inventor is a co-owner of the respective patent. Since...
Governments around the world continue to act to contain and mitigate the spread of COVID-19. The rapidly evolving situation compels officials and executives to continuously adapt policies and social distancing measures depending on the current state of the spread of the disease. In this context, it is crucial for policymakers to have a firm grasp o...
Coronavirus disease 2019 (COVID-19) is associated with a very high number of casualties in the general population. Assessing the exact magnitude of this number is a non-trivial problem, as relying only on officially reported COVID-19 associated fatalities runs the risk of incurring in several kinds of biases. One of the ways to approach the issue i...
In this work, we predict the outcomes of high fidelity multivariate computer simulations from low fidelity counterparts using function-to-function regression. The high fidelity simulation takes place on a high definition mesh, while its low fidelity counterpart takes place on a coarsened and truncated mesh. We showcase our approach by applying it t...
Over the course of the COVID-19 pandemic, Generalised Additive Models (GAMs) have been successfully employed on numerous occasions to obtain vital data-driven insights. In this paper we further substantiate the success story of GAMs, demonstrating their flexibility by focusing on three relevant pandemic-related issues. First, we examine the interde...
This paper proposes the estimation of a smooth graphon model for network data analysis using principles of the EM algorithm. The approach considers both variability with respect to ordering the nodes of a network and smooth estimation of the graphon by nonparametric regression. To do so, (linear) B-splines are used, which allow for smooth estimatio...
Accurate and interpretable forecasting models predicting spatially and temporally fine-grained changes in the numbers of intrastate conflict casualties are of crucial importance for policymakers and international non-governmental organizations (NGOs). Using a count data approach, we propose a hierarchical hurdle regression model to address the corr...
Since the primary mode of respiratory virus transmission is person‐to‐person interaction, we are required to reconsider physical interaction patterns to mitigate the number of people infected with COVID‐19. While research has shown that non‐pharmaceutical interventions (NPI) had an evident impact on national mobility patterns, we investigate the re...
Up to this point, we have mainly focused our efforts on univariate distributions. This was mostly just to keep the notation simple. Multivariate data, however, appear often in practice and multivariate distributions are eminently useful and important. It is time now to formalise multivariate distributions and explicitly discuss models for multivari...
In the last chapter, we explored Maximum Likelihood as a common approach for estimating model parameters given a sample from the population. In this chapter, we examine in more detail this estimate as well as the likelihood and its related functions: the log-likelihood, score function and Fisher information. We also derive properties of their asymp...
We already briefly introduced the Bayesian approach to statistical inference in Chap. 3 and in this chapter we will dive deeper into this methodology. Whole books have been written about the different techniques in Bayesian statistics, which is a huge and very well developed field that we could not hope to cover in a single chapter. For this reason...
Now that we have introduced probability models, we have the tools we need to put our statisticians’ hats on. We want to make a probabilistic model that best describes the world around us. How is it that we can best move from our set of observations to a good model—a model that not only describes our samples, but the process that generated them? In...
We begin our work with a summary of many essential concepts from probability theory. The chapter starts with different viewpoints on the very definition of probability and moves on to a number of foundational concepts in probability theory: expected values and variance, independence, conditional probability, Bayes theorem, random variables and vect...
In the early days of statistics, calculations had to be done by hand or by unwieldy mainframe computers with very limited memory and computational power. Consequently, statisticians were permanently looking for ways to simplify calculations through approximation. Time consuming calculations, e.g. computation of quantiles and distribution functions,...
Data are often collected and analysed to answer a particular question. For instance, when modifying an existing product, one wants to know whether said modification has improved the product’s quality. In the social or medical sciences, one wants to know whether an intervention has a positive (or negative) impact. One might also want to know whether...
This chapter focuses on the quality and completeness of data, which is an often overlooked but essential part of data analysis. In fact, achieving the necessary quality of data and the required completeness is often more time consuming than the data analysis itself. One key aspect is missing data, referred to as “missingness” in technical literatur...
A central question in statistical data analysis is when and how one can draw causal conclusions. In the typical setting, we want to ascertain how a covariate X influences an outcome Y . However, we also want to be certain that our statistical model represents a true causal process and not just some observed and possibly spurious correlation. A comm...
In Chaps. 4 and 5 we explored Maximum Likelihood estimation and Bayesian statistics and, given a particular model, used our data to estimate the unknown parameter θ. The validity of the model itself was not questioned, except for a brief detour into the Bayes factor. In this chapter we will delve a little deeper into this idea and explore common ro...
We propose a novel tie-oriented model for longitudinal event network data. The generating mechanism is assumed to be a multivariate Poisson process that governs the onset and repetition of yearly observed events with two separate intensity functions. We apply the model to a network obtained from the yearly dyadic number of international deliveries...
As relational event models are an increasingly popular model for studying relational structures, the reliability of large-scale event data collection becomes increasingly important. Automated or human-coded events often suffer from relatively low sensitivity in event identification. At the same time, most sensor data is primarily based on actors' s...
Mixture models are probabilistic models aimed at uncovering and representing latent subgroups within a population. In the realm of network data analysis, the latent subgroups of nodes are typically identified by their connectivity behaviour, with nodes behaving similarly belonging to the same community. In this context, mixture modelling is pursued...
The paper deals with the scenario where some covariates are observed by design for a subset of the observations only. In the example treated in the paper this occurs with a two phase sampling scheme where in the first phase a relatively large sample is drawn to record a response variable Y and a set of (cheap) covariates x. In a second phase a smal...
The case detection ratio of coronavirus disease 2019 (COVID‐19) infections varies over time due to changing testing capacities, different testing strategies, and the evolving underlying number of infections itself. This note shows a way of quantifying these dynamics by jointly modeling the reported number of detected COVID‐19 infections with nonfat...
Coronavirus disease 2019 (COVID-19) is associated with a very high number of casualties in the general population. Assessing the exact magnitude of this number is a non-trivial problem, as relying only on officially reported COVID-19 associated fatalities runs the risk of incurring in several kinds of biases. One of the ways to approach the issue i...
Many studies suggest that searching for parking is associated with significant direct and indirect costs. Therefore, it is appealing to reduce the time which car drivers spend on finding an available parking lot, especially in urban areas where the space for all road users is limited. The prediction of on-street parking lot occupancy can provide dr...
In this paper, we use a censored regression model to investigate data on the international trade of small arms and ammunition provided by the Norwegian Initiative on Small Arms Transfers. Taking a network‐based view on the transfers, we do not only rely on exogenous covariates but also estimate endogenous network effects. We apply a spatial autocor...
The development and application of models, which take the evolution of network dynamics into account, are receiving increasing attention. We contribute to this field and focus on a profile likelihood approach to model time-stamped event data for a large-scale dynamic network. We investigate the collaboration of inventors using EU patent data. As ev...
In this paper, we tackle the problem of splitting a long (potentially time consuming) questionnaire into two parts, where each participant only responds to a fraction of the questions, and all respondents obtain a common portion of questions. We propose a method that combines regression models to the two independent samples (questionnaires) in the...
The paper motivates high dimensional smoothing with penalized splines and its numerical calculation in an efficient way. If smoothing is carried out over three or more covariates the classical tensor product spline bases explode in their dimension bringing the estimation to its numerical limits. A recent approach by Siebenborn and Wagner(2019) circ...
This textbook provides a comprehensive introduction to statistical principles, concepts and methods that are essential in modern statistics and data science. The topics covered include likelihood-based inference, Bayesian statistics, regression, statistical tests and the quantification of uncertainty. Moreover, the book addresses statistical ideas...
The case detection ratio of COVID-19 infections varies over time due to changing testing capacities, modified testing strategies and also, apparently, due to the dynamics in the number of infected itself. In this paper we investigate these dynamics by jointly looking at the reported number of detected COVID-19 infections with non-fatal and fatal ou...
Accurate and interpretable forecasting models predicting spatially and temporally fine-grained changes in the numbers of intrastate conflict casualties are of crucial importance for policymakers and international non-governmental organisations (NGOs). Using a count data approach, we propose a hierarchical hurdle regression model to address the corr...
Estimation of latent network flows is a common problem in statistical network analysis. The typical setting is that we know the margins of the network, that is, in- and outdegrees, but the flows are unobserved. In this article, we develop a mixed regression model to estimate network flows in a bike-sharing network if only the hourly differences of...
In this paper, we analyse the network of international major conventional weapons (MCW) transfers from 1950 to 2016, based on data from the Stockholm International Peace Research Institute (SIPRI). The dataset consists of yearly bilateral arms transfers between pairs of countries, which allows us to conceive of the individual relationships as part...
Since the primary mode of respiratory virus transmission is person-to-person interaction, we are required to reconsider physical interaction patterns to mitigate the number of people infected with COVID-19. While non-pharmaceutical interventions (NPI) had an evident impact on national mobility patterns, only the relative regional mobility behaviour...
Governments around the world continue to act to contain and mitigate the spread of COVID-19. The rapidly evolving situation compels officials and executives to continuously adapt policies and social distancing measures depending on the current state of the spread of the disease. In this context, it is crucial for policymakers to have a firm grasp o...
Zusammenfassung
Der Artikel diskutiert die verschiedenen Methoden bei der Datenerhebung von Mietspiegeln. Es werden Vor- und Nachteile der in der Praxis zu findenden Methoden diskutiert und aus dem statistischen Blickwinkel beleuchtet. Dabei gehen wir den drei Fragen nach: Wer wird befragt? Wie wird befragt? Wie erfolgt die Stichprobenziehung? Nebe...
Mixture models are probabilistic models aimed at uncovering and representing latent subgroups within a population. In the realm of network data analysis, the latent subgroups of nodes are typically identified by their connectivity behaviour, with nodes behaving similarly belonging to the same community. In this context, mixture modelling is pursued...
We analyse the temporal and regional structure in mortality rates related to COVID-19 infections. We relate the fatality date of each deceased patient to the corresponding day of registration of the infection, leading to a nowcasting model which allows us to estimate the number of present-day infections that will, at a later date, prove to be fatal...
We propose a novel tie-oriented model for longitudinal event network data. The generating mechanism is assumed to be a multivariate Poisson process that governs the onset and repetition of yearly observed events with two separate intensity functions. We apply the model to a network obtained from the number of international deliveries of combat airc...
There are two main approaches to carrying out prediction in the context of penalized regression: with low-rank basis and penalties or through the smooth mixed models. In this article, we give further insight in the case of P-splines showing the influence of the penalty on the prediction. In the context of mixed models, we can connect the new predic...
In the past decades, the growing amount of network data has lead to many novel statistical models. In this paper we consider so called geometric networks. Typical examples are road networks or other infrastructure networks. But also the neurons or the blood vessels in a human body can be interpreted as a geometric network embedded in a three-dimens...
Estimation of latent network flows is a common problem in statistical network analysis. The typical setting is that we know the margins of the network, i.e. in- and outdegrees, but the flows are unobserved. In this paper, we develop a mixed regression model to estimate network flows in a bike-sharing network if only the hourly differences of in- an...
Zusammenfassung
Data Science ist das neue Schlagwort; nach Big Data und Digitalisierung nun also Data Science. Die Stellenbörsen sind voll von Inseraten, Data Scientists werden händeringend gesucht und manch Bewerber fügt heute Data Science in sein Profil, um seine Jobchancen zu erhöhen. Doch was ist Data Science eigentlich? Der nachfolgende Beitra...
The presence of unobserved node specific heterogeneity in Exponential Random Graph Models (ERGM) is a general concern, both with respect to model validity as well as estimation instability. We therefore extend the ERGM by including node specific random effects that account for unobserved heterogeneity in the network. This leads to a mixed model wit...
Given the growing number of available tools for modeling dynamic networks, the choice of a suitable model becomes central. The goal of this survey is to provide an overview of tie‐oriented dynamic network models. The survey is focused on introducing binary network models with their corresponding assumptions, advantages, and shortfalls. The models a...
To capture the systemic complexity of international financial systems, network data is an important prerequisite. However, dyadic data is often not available, raising the need for methods that allow for reconstructing networks based on limited information. In this paper, we are reviewing different methods that are designed for the estimation of mat...
The development and application of models, which take the evolution of networks with a dynamical structure into account are receiving increasing attention. Our research focuses on a profile likelihood approach to model time-stamped event data for a large-scale network applied on patent collaborations. As event we consider the submission of a joint...
The paper proposes the use of parallel computing for Markov graphs as a subclass of exponential random graph models where the network statistics induce a conditional independence structure amongst the edges of the network. This conditional independence allows simulation of edges in parallel using multiple computing cores. Simulation in Markov model...
Given the growing number of available tools for modeling dynamic networks, the choice of a suitable model becomes central. It is often difficult to compare the different models with respect to their applicability and interpretation. The goal of this survey is to provide an overview of popular dynamic network models. The survey is focused on introdu...
Network (or matrix) reconstruction is a general problem which occurs if the margins of a matrix are given and the matrix entries need to be predicted. In this paper we show that the predictions obtained from the iterative proportional fitting procedure (IPFP) or equivalently maximum entropy (ME) can be obtained by restricted maximum likelihood esti...
The paper proposes the estimation of a graphon function for network data using principles of the EM algorithm. The approach considers both, variability with respect to ordering the nodes of a network and estimation of the unique representation of a graphon. To do so (linear) B-splines are used, which allows to easily accommodate constraints in the...