# Panpan ZhangVanderbilt University | Vander Bilt · Department of Biostatistics

Panpan Zhang

PhD

Looking for collaborators who are interested in network data analysis and missing data problems.

## About

47

Publications

6,144

Reads

**How we measure 'reads'**

A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more

193

Citations

Citations since 2017

Introduction

My research interest lies at the interface of applied probability, applied statistics, and machine learning.

Additional affiliations

May 2015 - May 2016

August 2012 - present

August 2010 - May 2012

## Publications

Publications (47)

Preferential attachment (PA) network models have a wide range of applications in various scientific disciplines. Efficient generation of large-scale PA networks helps uncover their structural properties and facilitate the development of associated analytical methodologies. Existing software packages only provide limited functions for this purpose w...

Assortativity coefficients are important metrics to analyze both directed and undirected networks. In general, it is not guaranteed that the fitted model will always agree with the assortativity coefficients in the given network, and the structure of directed networks is more complicated than the undirected ones. Therefore, we provide a remedy by p...

The betweenness centrality plays an important role in input–output analysis. Existing betweenness measures are mostly defined based on the intermediate flow network, without accounting for node-level information which can be of great value in some applications. Here we propose a novel betweenness centrality measure that incorporates available node-...

Motivated by the complexity of network data, we propose a directed hybrid random network that mixes preferential attachment (PA) rules with uniform attachment rules. When a new edge is created, with probability p∈(0,1), it follows the PA rule. Otherwise, this new edge is added between two uniformly chosen nodes. Such mixture makes the in- and out-d...

The spreading pattern of COVID-19 in the early months of the pandemic differs a lot across the states in the US under different quarantine measures and reopening policies. We proposed to cluster the US states into distinct communities based on the daily new confirmed case counts from March 22 to July 25 via a nonnegative matrix factorization (NMF)...

In this paper, we analyze the time series data of the case and death counts of COVID-19 that broke outin China in December, 2019. The study period is during the lockdown of Wuhan. We exploit functional dataanalysis methods to analyze the collected time series data. The analysis is divided into three parts. First, thefunctional principal component a...

Assortativity coefficients are important metrics to analyze both directed and undirected networks. In general, it is not guaranteed that the fitted model will always agree with the assortativity coefficients in the given network, and the structure of directed networks is more complicated than the undirected ones. Therefore, we provide a remedy by p...

In chemical graph theory, caterpillar trees have been an appealing model to represent the molecular structures of benzenoid hydrocarbon. Meanwhile, topological index has been thought of as a powerful tool for modeling quantitative structure-property relationship and quantitative structure-activity between molecules in chemical compounds. In this ar...

PageRank (PR) is a fundamental tool for assessing the relative importance of the nodes in a network. In this paper, we propose a measure, weighted PageRank (WPR), extended from the classical PR for weighted, directed networks with possible non-uniform node-specific information that is dependent or independent of network structure. A tuning paramete...

A multi-regional input–output table (MRIOT) containing the transactions among the region-sectors in an economy defines a weighted and directed network. Using network analysis tools, we analyze the regional and sectoral structure of the Chinese economy with the province-sector MRIOTs of China in 2007 and 2012. Global analyses are done with network t...

Assortativity measures the tendency of a vertex in a network being connected by other vertexes with respect to some vertex-specific features. Classical assortativity coefficients are defined for unweighted and undirected networks with respect to vertex degree. We propose a class of assortativity coefficients that capture the assortative characteris...

In this paper, we investigate several random structures, namely two classes of random lobster trees (RLTs) and a class of random spider trees (RSTs). The first class of RLTs grow with a fixed probability, whereas those from the second class evolve in a dynamic manner underlying a flavor of semi-opposite reinforcement. For these two classes, we char...

PageRank (PR) is a fundamental tool for assessing the relative importance of the nodes in a network. In this paper, we propose a measure, weighted PageRank (WPR), extended from the classical PR for weighted, directed networks with possible non-uniform node-specific information that is dependent or independent of network structure. A tuning paramete...

A multi-regional input-output table (MRIOT) containing the transactions among the region-sectors in an economy defines a weighted and directed network. Using network analysis tools, we analyze the regional and sectoral structure of the Chinese economy and their temporal dynamics from 2007 to 2012 via the MRIOTs of China. Global analyses are done wi...

In chemical graph theory, caterpillar trees have been an appealing model to represent the molecular structures of benzenoid hydrocarbon. Meanwhile, topological index has been thought of as a powerful tool for modeling quantitative structure-property relationship and quantitative structure-activity between molecules in chemical compounds. In this ar...

Assortativity measures the tendency of a vertex in a network being connected by other vertexes with respect to some vertex-specific features. Classical assortativity coefficients are defined for unweighted and undirected networks with respect to vertex degree. We propose a class of assortativity coefficients that capture the assortative characteris...

Motivated by the complexity of network data, we propose a directed hybrid random network that mixes preferential attachment (PA) rules with uniform attachment (UA) rules. When a new edge is created, with probability $p\in [0,1]$, it follows the PA rule. Otherwise, this new edge is added between two uniformly chosen nodes. Such mixture makes the in-...

The spreading pattern of COVID-19 differ a lot across the US states under different quarantine measures and reopening policies. We proposed to cluster the US states into distinct communities based on the daily new confirmed case counts via a nonnegative matrix factorization (NMF) followed by a k-means clustering procedure on the coefficients of the...

In this article, we investigate several properties of high-dimensional random Apollonian networks, including two types of degree profiles, the small-world effect (clustering property), sparsity and three distance-based metrics. The characterizations of the degree profiles are based on several rigorous mathematical and probabilistic methods, such as...

The COVID-19 pandemic so far has caused huge negative impacts on different areas all over the world, and the United States (US) is one of the most affected countries. In this paper, we use methods from the functional data analysis to look into the COVID-19 data in the US. We explore the modes of variation of the data through a functional principal...

TAR-DNA binding protein-43 (TDP-43) proteinopathy is seen in multiple brain diseases. A standardized terminology was recommended recently for common age-related TDP-43 proteinopathy: limbic-predominant, age-related TDP-43 encephalopathy (LATE) and the underlying neuropathological changes, LATE-NC. LATE-NC may be co-morbid with Alzheimer's disease n...

Mean regression model could be inadequate if the probability distribution of the observed responses is not symmetric. Under such situation, the quantile regression turns to be a more robust alternative for accommodating outliers and misspecification of the error distribution, since it characterizes the entire conditional distribution of the outcome...

As the COVID-19 pandemic has strongly disrupted people's daily work and life, a great amount of scientific research has been conducted to understand the key characteristics of this new epidemic. In this manuscript, we focus on four crucial epidemic metrics with regard to the COVID-19, namely the basic reproduction number, the incubation period, the...

As the COVID-19 pandemic has strongly disrupted people's daily work and life, a great amount of scientific research has been conducted to understand the key characteristics of this new epidemic. In this manuscript, we focus on four crucial epidemic metrics with regard to the COVID-19, namely the basic reproduction number, the incubation period, the...

We investigate the joint distribution of nodes of small degrees and the degree profile in preferential dynamic attachment circuits. In particular, we study the joint asymptotic distribution of the number of the nodes of outdegree $0$ (terminal nodes) and outdegree $1$ in a very large circuit. The expectation and variance of the number of those two...

In this paper, several properties of a class of trees presenting preferential attachment phenomenon—plane-oriented recursive trees (PORTs) are uncovered. Specifically, we investigate the degree profile of a PORT by determining the exact probability mass function of the degree of a node with a fixed label. We compute the expectation and the variance...

We study a class of unbalanced constant-differentials Pólya processes on white and blue balls. We show that the number of white balls, the number of blue balls, and the total number of balls, when appropriately scaled, all converge in distribution to gamma random variables with parameters depending on the differential index and the amount of ball a...

There is an unproven duality theory hypothesizing that random discrete trees and their poissonized embeddings in continuous time share fundamental properties. We give additional evidence in favor of this theory by showing that several classes of random trees growing in discrete time and their poissonized counterparts have the same limiting degree G...

In this article, we investigate several properties of high-dimensional random Apollonian networks (HDRANs), including two types of degree profiles, the small-world effect (clustering property), sparsity, and several distance-based properties. The methods that we use to characterize the degree profiles are a two-dimensional mathematical induction, a...

In this paper, we investigate the degree profile and Gini index of random caterpillar trees (RCTs). We consider RCTs which evolve in two different manners: uniform and nonuniform. The degrees of the vertices on the central path (i.e., the degree profile) of a uniform RCT follows a multinomial distribution. For nonuniform RCTs, we focus on those gro...

We propose an elementary but effective approach to studying a general class of Poissonized tenable and balanced urns on two colors. We characterize the asymptotic behavior of the process via a partial differential equation that governs the process, coupled with the method of moments applied in a bootstrapped manner. We show that the limiting distri...

We study a class of unbalanced constant-differentials P\'{o}lya processes on white and blue balls. We show that the number of white balls, the number of blue balls, and the total number of balls, when appropriately scaled, all converge in distribution to a gamma random variables with parameters depending on the differential index and the amount of...

In this paper, we investigate the degree profile and Gini index of random caterpillar trees (RCTs). We consider RCTs which evolve in two different manners: uniform and nonuniform. The degrees of the vertices on the central path (i.e., the degree profile) of a uniform RCT follow a multinomial distribution. For nonuniform RCTs, we focus on those grow...

In this paper, we develop some clique-based methods for social network clustering. The quality of clustering result is measured by a novel clique-based index, which is innovated from the modularity index proposed in [Newman 2006]. We design an effective algorithm based on recursive bipartition in order to maximize the objective function of the prop...

We propose a novel model-based method for social network clustering in this paper. More precisely, we cluster a set of entities in a social network into disjoint communities based a newly adopted distance function. Our model not only allows mixed membership for each entity, but also provides reliable statistical inference on network structure. We d...

In this note, we investigate the degree profile of nodes in plain-oriented recursive trees. More precisely, we determine the probability mass function of the degree of a node with a fixed label. We also look into the moments of the degree random variable, and compute the exact expectation and variance. Phase transitions of the asymptotic expectatio...

We propose a novel approach to studying poissonized tenable and balanced urns on two colors. Our strategy is to produce asymptotic mixed moments of the process via a partial differential equation that governs the process, coupled with the method of moments applied in a bootstrapped manner. We analyze the number of balls of (two) different colors in...

We study a class of Polya processes that underline terminal nodes in a random Apollonian network. We calculate the exact first and second moments of the number of terminal nodes by solving ordinary differential equations. These equations are derived from the partial differential equation governing the process. In fact, the partial differential equa...

This study explores the probabilistic and statistical properties of several popular random networks. We focus on (random) Apollonian networks, their sister structures k-trees, and preferential (dynamic) attachment circuits. We also study Apollonian processes, a class of Pólya processes obtained via embedding Apollonian networks in continuous time....

We investigate the degree profile and total weight in Apollonian networks. We study the distribution of the degrees of vertices as they age in the evolutionary process. Asymptotically, the (suitably-scaled) degree of a node with a fixed label has a Mittag-Leffler-like limit distribution. The degrees of nodes of later ages have different asymptotic...

We investigate terminal nodes and the degree profile in preferential dynamic attachment circuits. We study the distribution of the number of terminal nodes, which are the nodes that have not recruited other nodes, as the circuit ages. The expectation and variance of the number of terminal nodes are both linear with respect to the age of the circuit...

Two-color triangular urn models have been investigated recently. Moments of ball counts of a particular color have been obtained exactly and asymptotically in a number of sources (Janson, 2006, Kuba and Panholzer, 2014, and Flajolet et al., 2006). Exact factorial moments are in Kuba and Panholzer (2014). While Flajolet et al. (2006) gives an exact...

In this paper, I am going to introduce statistical self-similarity for discrete time series.
My thesis is divided into three parts:
In the first part, I will give a mathematical definition of self-similarity and detect the
main properties of self-similar processes. At the end of this part, fractional Brownian
motion(fBm) will be used as an example...

## Projects

Projects (2)

Our goal is to uncover the structure of a random $k$-ary tree by investigating several interesting properties, such as the total external path, the height, and the degree distribution, etc.

Caterpillar trees arise in combinatorial graph theory. They have various applications to physics and chemistry. The goal of this project is to study the degree profile and other probabilistic properties of caterpillar trees.