Preprint

Correlation bounds, mixing and m-dependence under random time-varying network distances with an application to Cox-Processes

Authors:
To read the file of this research, you can request a copy directly from the author.

Abstract

We will consider multivariate stochastic processes indexed either by vertices or pairs of vertices of a dynamic network. Under a dynamic network we understand a network with a fixed vertex set and an edge set which changes randomly over time. We will assume that the spatial dependence-structure of the processes is linked with the network in the following way: Two neighbouring vertices (or two adjacent pairs of vertices) are dependent, while we assume that the dependence decreases as the distance in the network increases. We make this intuition mathematically precise by considering three concepts based on correlation, beta-mixing with time-varying beta-coefficients and conditional independence. Then, we will use these concepts in order to prove weak-dependence results, e.g. an exponential inequality, which might be of independent interest. In order to demonstrate the use of these concepts in an application we study the asymptotics (for growing networks) of a goodness of fit test in a dynamic interaction network model based on a multiplicative Cox-type hazard model. This model is then applied to bike-sharing data.

No file available

Request Full-text Paper PDF

To read the file of this research,
you can request a copy directly from the author.

ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
The main objective of this paper is to introduce and illustrate relational event models, a new class of statistical models for the analysis of time-stamped data with complex temporal and relational dependencies. We outline the main differences between recently proposed relational event models and more conventional network models based on the graph-theoretic formalism typically adopted in empirical studies of social networks. Our main contribution involves the definition and implementation of a marked point process extension of currently available models. According to this approach, the sequence of events of interest is decomposed into two components: (a) event time and (b) event destination. This decomposition transforms the problem of selection of event destination in relational event models into a conditional multinomial logistic regression problem. The main advantages of this formulation are the possibility of controlling for the effect of event-specific data and a significant reduction in the estimation time of currently available relational event models. We demonstrate the empirical value of the model in an analysis of interhospital patient transfers within a regional community of health care organizations. We conclude with a discussion of how the models we presented help to overcome some the limitations of statistical models for networks that are currently available.
Article
Full-text available
We use lasso methods to shrink, select and estimate the network linking the publicly-traded subset of the world's top 150 banks, 2003-2014. We characterize static network connectedness using full-sample estimation and dynamic network connectedness using rolling-window estimation. Statistically, we find that global banking connectedness is clearly linked to bank location, not bank assets. Dynamically, we find that global banking connectedness displays both secular and cyclical variation. The secular variation corresponds to gradual increases/decreases during episodes of gradual increases/decreases in global market integration. The cyclical variation corresponds to sharp increases during crises, involving mostly cross-country, as opposed to within-country, bank linkages.
Article
Full-text available
We present a general principle for estimating a regression function nonparametrically, allowing for a wide variety of data filtering, for example, repeated left truncation and right censoring. Both the mean and the median regression cases are considered. The method works by first estimating the conditional hazard function or conditional survivor function and then integrating. We also investigate improved methods that take account of model structure such as independent errors and show that such methods can improve performance when the model structure is true. We establish the pointwise asymptotic normality of our estimators.
Article
Full-text available
This article incorporates a political decision process into an urban land use model to predict the likely location of a public good. It fills an important gap in the literature by modeling the endogenous location of open space. The article compares open space decisions made under a majority-rules voting scheme with welfare-improving criterion and finds households tied to a location in space compete against each other for public goods located nearer them. Significant differences emerge between the two decision criteria, indicating that requiring referenda for open space decisions is likely to lead to inefficient outcomes. Specifically, many open space votes are likely to fail that would lead to welfare improvements, and any open space decisions that do pass will require amenities larger than needed to achieve the social optimum. The more dispersed and large the population, the larger is the gap between the socially efficient level and the level needed for a public referendum to pass.
Article
Full-text available
The Cox regression model for censored survival data specifies that covariates have a proportional effect on the hazard function of the life-time distribution of an individual. In this paper we discuss how this model can be extended to a model where covariate processes have a proportional effect on the intensity process of a multivariate counting process. This permits a statistical regression analysis of the intensity of a recurrent event allowing for complicated censoring patterns and time dependent covariates. Furthermore, this formulation gives rise to proofs with very simple structure using martingale techniques for the asymptotic properties of the estimators from such a model. Finally an example of a statistical analysis is included.
Article
Full-text available
We introduce a new kernel hazard estimator in a nonparametric model where the stochastic hazard depends on the current value of time and on the current value of a time dependent covariate or marker. We establish the pointwise and global convergence of our estimator.
Article
Full-text available
A semiparametric hazard model with parametrized time but general covariate dependency is formulated and analyzed inside the framework of counting process theory. A profile likelihood principle is introduced for estimation of the parameters: the resulting estimator is n1/2n^{1/2}-consistent, asymptotically normal and achieves the semiparametric efficiency bound. An estimation procedure for the nonparametric part is also given and its asymptotic properties are derived. We provide an application to mortality data.
Article
Full-text available
Scholars, advertisers and political activists see massive online social networks as a representation of social interactions that can be used to study the propagation of ideas, social bond dynamics and viral marketing, among others. But the linked structures of social networks do not reveal actual interactions among people. Scarcity of attention and the daily rythms of life and work makes people default to interacting with those few that matter and that reciprocate their attention. A study of social interactions within Twitter reveals that the driver of usage is a sparse and hidden network of connections underlying the declared set of friends and followers.
Article
Full-text available
In this paper we discuss weak dependence and mixing properties of some popular models. We also develop some of their econometric applications. Autoregressive models, autoregressive conditional heteroskedasticity (ARCH) models, and bilinear models are widely used in econometrics. More generally, stationary Markov modeling is often used. Bernoulli shifts also generate many useful stationary sequences, such as autoregressive moving average (ARMA) or ARCH( ) processes. For Volterra processes, mixing properties obtain given additional regularity assumptions on the distribution of the innovations.We recall associated probability limit theorems and investigate the nonparametric estimation of those sequences.We first thank the editor for the huge amount of additional editorial work provided for this review paper. The efficiency of the numerous referees was especially useful. The error pointed out in Hall and Horowitz (1996) was the origin of the present paper, and we thank the referees for asking for a more detailed treatment of a correct proof for this paper in Section 2.3. Also we thank Marc Henry and Rafal Wojakowski for a very careful rereading of the paper. An anonymous referee has been particularly helpful in the process of revision of the paper. The authors thank him for his numerous suggestions of improvement, including important results on negatively associated sequences and a thorough update in standard English.
Article
Full-text available
Proportional hazard models for survival data, even though popular and numerically handy, suffer from the restrictive assumption that covariate effects are constant over survival time. A number of tests have been proposed to check this assumption. This paper contributes to this area by employing local estimates allowing to fit hazard models in which covariate effects are smoothly varying with time. A formal test is derived to check for proportional hazards against smooth hazards as alternative. The test proves to possess omnibus power in that it is powerful against arbitrary but smooth alternatives. Comparative simulations and two data examples accompany the presentation. Extensions are provided to multiple covariate settings, where the focus of interest is to decide which of the covariate effects vary with time.
Article
Full-text available
We propose new procedures for estimating the component functions in both additive and multiplicativen onparametricm arker-dependenht azard models. We work with a full counting process framework that allows for left truncation and right censoring and time-varying covariates. Our procedures are based on kernel hazard estimation as developed by Nielsen and Linton and on the idea of marginal integration. We provide a central limit theorem for the marginal integration estimator. We then define estimators based on finite-step backfitting in both additive and multiplicative cases and prove that these estimators are asymptotically normal and have smaller variance than the marginal integration method.
Article
Full-text available
We establish both uniform and nonuniform error bounds of the Berry-Esseen type in normal approximation under local dependence. These results are of an order close to the best possible if not best possible. They are more general or sharper than many existing ones in the literature. The proofs couple Stein's method with the concentration inequality approach.
Article
Full-text available
We have analyzed the fully-anonymized headers of 362 million messages exchanged by 4.2 million users of Facebook, an online social network of college students, during a 26 month interval. The data reveal a number of strong daily and weekly regularities which provide insights into the time use of college students and their social lives, including seasonal variations. We also examined how factors such as school affiliation and informal online friend lists affect the observed behavior and temporal patterns. Finally, we show that Facebook users appear to be clustered by school with respect to their temporal messaging patterns.
Book
This snapshot of the current frontier of statistics and network analysis focuses on the foundational topics of modeling, sampling, and design. Primarily for graduate students and researchers in statistics and closely related fields, emphasis is not only on what has been done, but on what remains to be done.
Book
In nonparametric and high-dimensional statistical models, the classical Gauss–Fisher–Le Cam theory of the optimality of maximum likelihood estimators and Bayesian posterior inference does not apply, and new foundations and ideas have been developed in the past several decades. This book gives a coherent account of the statistical theory in infinite-dimensional parameter spaces. The mathematical foundations include self-contained 'mini-courses' on the theory of Gaussian and empirical processes, approximation and wavelet theory, and the basic theory of function spaces. The theory of statistical inference in such models - hypothesis testing, estimation and confidence sets - is presented within the minimax paradigm of decision theory. This includes the basic theory of convolution kernel and projection estimation, but also Bayesian nonparametrics and nonparametric maximum likelihood estimation. In a final chapter the theory of adaptive inference in nonparametric models is developed, including Lepski's method, wavelet thresholding, and adaptive inference for self-similar functions. Winner of the 2017 PROSE Award for Mathematics.
Article
This paper is concerned with cross-sectional dependence arising because observations are interconnected through an observed network. Following (Doukhan and Louhichi, 1999), we measure the strength of dependence by covariances of nonlinearly transformed variables. We provide a law of large numbers and central limit theorem for network dependent variables. We also provide a method of calculating standard errors robust to general forms of network dependence. For that purpose, we rely on a network heteroskedasticity and autocorrelation consistent (HAC) variance estimator, and show its consistency. The results rely on conditions characterized by tradeoffs between the rate of decay of dependence across a network and network’s denseness. Our approach can accommodate data generated by network formation models, random fields on graphs, conditional dependency graphs, and large functional-causal systems of equations.
Article
We consider models for time-to-event data that allow that an event, e.g., a relapse of a disease, never occurs for a certain percentage p of the population, called the cure rate. We suppose that these data are subject to random right censoring and we model the data using a mixture cure model, in which the survival function of the uncured subjects is left unspecified. The aim is to test whether the cure rate p, as a function of the covariates, satisfies a certain parametric model. To do so, we propose a test statistic that is inspired by a goodness-of-fit test for a regression function due to Härdle & Mammen (1993). We show that the statistic is asymptotically normally distributed under the null hypothesis, that the model is correctly specified, and under local alternatives. A bootstrap procedure is proposed to implement the test. The good performance of the approach is confirmed with simulations. For illustration we apply the test to data on the times between first and second births.
Article
We introduce LASSO‐type regularization for large‐dimensional realized covariance estimators of log‐prices. The procedure consists of shrinking the off‐diagonal entries of the inverse realized covariance matrix towards zero. This technique produces covariance estimators that are positive definite and with a sparse inverse. We name the estimator realized network, since estimating a sparse inverse realized covariance matrix is equivalent to detecting the partial correlation network structure of the daily log‐prices. The large sample consistency and selection properties of the estimator are established. An application to a panel of US blue chip stocks shows the advantages of the estimator for out‐of‐sample GMV asset allocation.
Article
Many modern network datasets arise from processes of interactions in a population, such as phone calls, email exchanges, co-authorships, and professional collaborations. In such interaction networks, the edges comprise the fundamental statistical units, making a framework for edge-labeled networks more appropriate for statistical analysis. In this context we initiate the study of edge exchangeable network models and explore its basic statistical properties. Several theoretical and practical features make edge exchangeable models better suited to many applications in network analysis than more common vertex-centric approaches. In particular, edge exchangeable models allow for sparse structure and power law degree distributions, both of which are widely observed empirical properties that cannot be handled naturally by more conventional approaches. Our discussion culminates in the Hollywood model, which we identify here as the canonical family of edge exchangeable distributions. The Hollywood model is computationally tractable, admits a clear interpretation, exhibits good theoretical properties, and performs reasonably well in estimation and prediction as we demonstrate on real network datasets. As a generalization of the Hollywood model, we further identify the vertex components model as a nonparametric subclass of models with a convenient stick breaking construction.
Article
A flexible approach for modeling both dynamic event counting and dynamic link-based networks based on counting processes is proposed, and estimation in these models is studied. We consider nonparametric likelihood based estimation of parameter functions via kernel smoothing. The asymptotic behavior of these estimators is rigorously analyzed by allowing the number of nodes to tend to infinity. The finite sample performance of the estimators is illustrated through an empirical analysis of bike share data.
Book
Presenting tools to aid understanding of asymptotic theory and weakly dependent processes, this book is devoted to inequalities and limit theorems for sequences of random variables that are strongly mixing in the sense of Rosenblatt, or absolutely regular. The first chapter introduces covariance inequalities under strong mixing or absolute regularity. These covariance inequalities are applied in Chapters 2, 3 and 4 to moment inequalities, rates of convergence in the strong law, and central limit theorems. Chapter 5 concerns coupling. In Chapter 6 new deviation inequalities and new moment inequalities for partial sums via the coupling lemmas of Chapter 5 are derived and applied to the bounded law of the iterated logarithm. Chapters 7 and 8 deal with the theory of empirical processes under weak dependence. Lastly, Chapter 9 describes links between ergodicity, return times and rates of mixing in the case of irreducible Markov chains. Each chapter ends with a set of exercises. The book is an updated and extended translation of the French edition entitled "Théorie asymptotique des processus aléatoires faiblement dépendants" (Springer, 2000). It will be useful for students and researchers in mathematical statistics, econometrics, probability theory and dynamical systems who are interested in weakly dependent processes.
Article
We propose various self-exciting point process models for the times when e-mails are sent between individuals in a social network. Using an EM-type approach, we fit these models to an e-mail network dataset from West Point Military Academy and the Enron e-mail dataset. We argue that the self-exciting models adequately capture major temporal clustering features in the data and perform better than traditional stationary Poisson models. We also investigate how accounting for diurnal and weekly trends in e-mail activity improves the overall fit to the observed network data. A motivation and application for fitting these self-exciting models is to use parameter estimates to characterize important e-mail communication behaviors such as the baseline sending rates, average reply rates, and average response times. A primary goal is to use these features, estimated from the self-exciting models, to infer the underlying leadership status of users in the West Point and Enron networks.
Article
Incomplete failure data consisting of times to failure on failed units and differing running times on unfailed units are called multiply censored. Data on units operating in the field, for example, are usually multiply censored. Presented in this paper is a method of plotting multiply censored data on hazard paper to obtain engineering information on the distribution of time to failure. Step-by-step instructions on how to plot and interpret data on hazard paper are given with the aid of examples based on real and simulated data. Hazard paper is presented here for the exponential, Weibull, normal, log normal, and extreme value distributions. The theory underlying hazard paper and plotting is presented in an appendix.
Chapter
This chapter is divided in three main parts. We first present results for general Markov processes; some consequences are provided in § 2.4.0.1. for a class of nonlinear processes, dynamical systems are explored in § 2.4.0.2. and a class of nonhomogeneous processes is considered in § 2.4.0.3. The main consequences are provided in two sections devoted to Polynomial processes (§ 2.4.1) and explicit examples of nonlinear processes (§ 2.4.2).
Article
Dependent phenomena, such as relational, spatial and temporal phenomena, tend to be characterized by local dependence in the sense that units which are close in a well-defined sense are dependent. In contrast with spatial and temporal phenomena, though, relational phenomena tend to lack a natural neighbourhood structure in the sense that it is unknown which units are close and thus dependent. Owing to the challenge of characterizing local dependence and constructing random graph models with local dependence, many conventional exponential family random graph models induce strong dependence and are not amenable to statistical inference. We take first steps to characterize local dependence in random graph models, inspired by the notion of finite neighbourhoods in spatial statistics and M-dependence in time series, and we show that local dependence endows random graph models with desirable properties which make them amenable to statistical inference. We show that random graph models with local dependence satisfy a natural domain consistency condition which every model should satisfy, but conventional exponential family random graph models do not satisfy. In addition, we establish a central limit theorem for random graph models with local dependence, which suggests that random graph models with local dependence are amenable to statistical inference. We discuss how random graph models with local dependence can be constructed by exploiting either observed or unobserved neighbourhood structure. In the absence of observed neighbourhood structure, we take a Bayesian view and express the uncertainty about the neighbourhood structure by specifying a prior on a set of suitable neighbourhood structures. We present simulation results and applications to two real world networks with ‘ground truth’.
Article
This is a concise text developed from lecture notes and ready to be used for a course on the graduate level. The main idea is to introduce the fundamental concepts of the theory while maintaining the exposition suitable for a first approach in the field. Therefore, the results are not always given in the most general form but rather under assumptions that lead to shorter or more elegant proofs. The book has three chapters. Chapter 1 presents basic nonparametric regression and density estimators and analyzes their properties. Chapter 2 is devoted to a detailed treatment of minimax lower bounds. Chapter 3 develops more advanced topics: Pinskers theorem, oracle inequalities, Stein shrinkage, and sharp minimax adaptivity. This book will be useful for researchers and grad students interested in theoretical aspects of smoothing techniques. Many important and useful results on optimal and adaptive estimation are provided. As one of the leading mathematical statisticians working in nonparametrics, the author is an authority on the subject.
Article
The natural habitat of most Bayesian methods is data represented by exchangeable sequences of observations, for which de Finetti's theorem provides the theoretical foundation. Dirichlet process clustering, Gaussian process regression, and many other parametric and nonparametric Bayesian models fall within the remit of this framework; many problems arising in modern data analysis do not. This expository paper provides an introduction to Bayesian models of graphs, matrices, and other data that can be modeled by random structures. We describe results in probability theory that generalize de Finetti's theorem to such data and discuss the relevance of these results to nonparametric Bayesian modeling. With the basic ideas in place, we survey example models available in the literature; applications of such models include collaborative filtering, link prediction, and graph and network analysis. We also highlight connections to recent developments in graph theory and probability, and sketch the more general mathematical foundation of Bayesian methods for other types of data beyond sequences and arrays.
Article
A stochastic model is proposed for social networks in which the actors in a network are partitioned into subgroups called blocks. The model provides a stochastic generalization of the blockmodel. Estimation techniques are developed for the special case of a single relation social network, with blocks specified a priori. An extension of the model allows for tendencies toward reciprocation of ties beyond those explained by the partition. The extended model provides a one degree-of-freedom test of the model. A numerical example from the social network literature is used to illustrate the methods.
Article
The analysis of censored failure times is considered. It is assumed that on each individual are available values of one or more explanatory variables. The hazard function (age‐specific failure rate) is taken to be a function of the explanatory variables and unknown regression coefficients multiplied by an arbitrary and unknown function of time. A conditional likelihood is obtained, leading to inferences about the unknown regression coefficients. Some generalizations are outlined.
Article
Social behavior over short time scales is frequently understood in terms of actions, which can be thought of as discrete events in which one individual emits a behavior directed at one or more other entities in his or her environment (possibly including himself or herself). Here, we introduce a highly flexible framework for modeling actions within social settings, which permits likelihood-based inference for behavioral mechanisms with complex dependence. Examples are given for the parameterization of base activity levels, recency, persistence, preferential attachment, transitive/cyclic interaction, and participation shifts within the relational event framework. Parameter estimation is discussed both for data in which an exact history of events is available, and for data in which only event sequences are known. The utility of the framework is illustrated via an application to dynamic modeling of responder radio communications during the early hours of the World Trade Center disaster.
Book
Networks have permeated everyday life through everyday realities like the Internet, social networks, and viral marketing. As such, network analysis is an important growth area in the quantitative sciences, with roots in social network analysis going back to the 1930s and graph theory going back centuries. Measurement and analysis are integral components of network research. As a result, statistical methods play a critical role in network analysis. This book is the first of its kind in network research. It can be used as a stand-alone resource in which multiple R packages are used to illustrate how to conduct a wide range of network analyses, from basic manipulation and visualization, to summary and characterization, to modeling of network data. The central package is igraph, which provides extensive capabilities for studying network graphs in R. This text builds on Eric D. Kolaczyks book Statistical Analysis of Network Data (Springer, 2009).
Chapter
Suppose dissimilarity data have been collected on a set of n objects or individuals, where there is a value of dissimilarity measured for each pair.The dissimilarity measure used might be a subjective judgement made by a judge, where for example a teacher subjectively scores the strength of friendship between pairs of pupils in her class, or, as an alternative, more objective, measure, she might count the number of contacts made in a day between each pair of pupils. In other situations the dissimilarity measure might be based on a data matrix. The general aim of multidimensional scaling is to find a configuration of points in a space, usually Euclidean, where each point represents one of the objects or individuals, and the distances between pairs of points in the configuration match as well as possible the original dissimilarities between the pairs of objects or individuals. Such configurations can be found using metric and non-metric scaling, which are covered in Sects. 2 and 3. A number of other techniques are covered by the umbrella title of multidimensional scaling (MDS), and here the techniques of Procrustes analysis, unidimensional scaling, individual differences scaling, correspondence analysis and reciprocal averaging are briefly introduced and illustrated with pertinent data sets.
Article
This paper investigates the problem of density estimation for absolutely regular observations. In a first part, we state two important results: a new variance inequality and a Rosenthal type inequality. This allows us to study the ? p -integrated risk, p≧ 2, of a large class of density estimators including kernel or projection estimators. Under the summability condition on the mixing coefficients ∑ k≧ 0 (k+1) p− 2 β k <∞, the rates obtained are those known to be optimal in the independent setting.
Chapter
Networks of relationships help determine the careers that people choose, the jobs they obtain, the products they buy, and how they vote. The many aspects of our lives that are governed by social networks make it critical to understand how they impact behavior, which network structures are likely to emerge in a society, and why we organize ourselves as we do. In Social and Economic Networks, Matthew Jackson offers a comprehensive introduction to social and economic networks, drawing on the latest findings in economics, sociology, computer science, physics, and mathematics. He provides empirical background on networks and the regularities that they exhibit, and discusses random graph-based models and strategic models of network formation. He helps readers to understand behavior in networked societies, with a detailed analysis of learning and diffusion in networks, decision making by individuals who are influenced by their social neighbors, game theory and markets on networks, and a host of related subjects. Jackson also describes the varied statistical and modeling techniques used to analyze social networks. Each chapter includes exercises to aid students in their analysis of how networks function.
Article
Network data often take the form of repeated interactions between senders and receivers tabulated over time. A primary question to ask of such data is which traits and behaviors are predictive of interaction. To answer this question, a model is introduced for treating directed interactions as a multivariate point process: a Cox multiplicative intensity model using covariates that depend on the history of the process. Consistency and asymptotic normality are proved for the resulting partial-likelihood-based estimators under suitable regularity conditions, and an efficient fitting procedure is described. Multicast interactions--those involving a single sender but multiple receivers--are treated explicitly. The resulting inferential framework is then employed to model message sending behavior in a corporate e-mail network. The analysis gives a precise quantification of which static shared traits and dynamic network effects are predictive of message recipient selection.
Article
The kernel function method developed during the last twenty-five years to estimate a probability density function essentially is a way of smoothing the empirical distribution function. This paper shows how one can generalize this method to estimate counting process intensities using kernel functions to smooth the nonparametric Nelson estimator for the cumulative intensity. The properties of the estimator for the intensity itself are investigated, and uniform consistency and asymptotic normality are proved. We also give an illustrative numerical example.
Article
Let B=(N1,,Nk)\mathbf{B} = (N_1, \cdots, N_k) be a multivariate counting process and let Ft\mathscr{F}_t be the collection of all events observed on the time interval [0,t].\lbrack 0, t\rbrack. The intensity process is given by Λi(t)=limh01hE(Ni(t+h)Ni(t)Ft)i=1,,k.\Lambda_i(t) = \lim_{h \downarrow 0} \frac{1}{h}E(N_i(t + h) - N_i(t) \mid \mathscr{F}_t)\quad i = 1, \cdots, k. We give an application of the recently developed martingale-based approach to the study of N\mathbf{N} via Λ.\mathbf{\Lambda}. A statistical model is defined by letting Λi(t)=αi(t)Yi(t),i=1,,k,\Lambda_i(t) = \alpha_i(t)Y_i(t), i = 1, \cdots, k, where α=(α1,,αk)\mathbf{\alpha} = (\alpha_1, \cdots, \alpha_k) is an unknown nonnegative function while Y=(Y1,,Yk),\mathbf{Y} = (Y_1, \cdots, Y_k), together with N,\mathbf{N}, is a process observable over a certain time interval. Special cases are time-continuous Markov chains on finite state spaces, birth and death processes and models for survival analysis with censored data. The model is termed nonparametric when α\mathbf{\alpha} is allowed to vary arbitrarily except for regularity conditions. The existence of complete and sufficient statistics for this model is studied. An empirical process estimating βi(t)=0tαi(s)ds\beta_i(t) = \int^t_0 \alpha_i(s) ds is given and studied by means of the theory of stochastic integrals. This empirical process is intended for plotting purposes and it generalizes the empirical cumulative hazard rate from survival analysis and is related to the product limit estimator. Consistency and weak convergence results are given. Tests for comparison of two counting processes, generalizing the two sample rank tests, are defined and studied. Finally, an application to a set of biological data is given.
Article
We obtain an exponential probability inequality for martingales and a uniform probability inequality for the process gdN\int g dN, where N is a counting process and where g varies within a class of predictable functions G\mathscr{G}. For the latter, we use techniques from empirical process theory. The uniform inequality is shown to hold under certain entropy conditions on G\mathscr{G}. As an application, we consider rates of convergence for (nonparametric) maximum likelihood estimators for counting processes. A similar result for discrete time observations is also presented.
Article
The objective of the present article is to propose and evaluate a probabilistic approach based on Bayesian networks for modelling non-homogeneous and non-linear gene regulatory processes. The method is based on a mixture model, using latent variables to assign individual measurements to different classes. The practical inference follows the Bayesian paradigm and samples the network structure, the number of classes and the assignment of latent variables from the posterior distribution with Markov Chain Monte Carlo (MCMC), using the recently proposed allocation sampler as an alternative to RJMCMC. We have evaluated the method using three criteria: network reconstruction, statistical significance and biological plausibility. In terms of network reconstruction, we found improved results both for a synthetic network of known structure and for a small real regulatory network derived from the literature. We have assessed the statistical significance of the improvement on gene expression time series for two different systems (viral challenge of macrophages, and circadian rhythms in plants), where the proposed new scheme tends to outperform the classical BGe score. Regarding biological plausibility, we found that the inference results obtained with the proposed method were in excellent agreement with biological findings, predicting dichotomies that one would expect to find in the studied systems. Two supplementary papers on theoretical (T) and experi-mental (E) aspects and the datasets used in our study are available from http://www.bioss.ac.uk/associates/marco/supplement/
Article
Very often in survival analysis one has to study martingale integrals where the integrand is not predictable and where the counting process theory of martingales is not directly applicable, as for example in nonparametric and semiparametric applications where the integrand is based on a pilot estimate. We call this the predictability issue in survival analysis. The problem has been resolved by approximations of the integrand by predictable functions which have been justified by ad hoc procedures. We present a general approach to the solution of this problem. The usefulness of the approach is shown in three applications. In particular, we argue that earlier ad hoc procedures do not work in higher-dimensional smoothing problems in survival analysis. Copyright 2007, Oxford University Press.
Article
Cox's proportional hazards model is routinely used in many applied fields, some times, however, with too little emphasis on the fit of the model. In this paper, we suggest some new tests for investigating whether or not covariate effects vary with time. These tests are a natural and integrated part of an extended version of the Cox model. An important new feature of the suggested test is that time constancy for a specific covariate is examined in a model, where some effects of other covariates are allowed to vary with time and some are constant; thus making successive testing of time-dependency possible. The proposed techniques are illustrated with the well-known Mayo liver disease data, and a small simulation study investigates the finite sample properties of the tests. Copyright Board of the Foundation of the Scandinavian Journal of Statistics 2004.
Learning the structure of dynamic probabilistic networks
  • N Friedman
  • K Murphy
  • S Russell
N. Friedman, K. Murphy, and S. Russell. Learning the structure of dynamic probabilistic networks. In Proceedings of the Fourteenth Conference on Uncertainty in Artificial Intelligence, UAI'98, pages 139-147, San Francisco, CA, USA, 1998. Morgan Kaufmann Publishers Inc.