
Brenda Betancourt- University of Florida
Brenda Betancourt
- University of Florida
About
26
Publications
1,044
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
189
Citations
Current institution
Publications
Publications (26)
Recent advances in Bayesian models for random partitions have led to the formulation and exploration of Exchangeable Sequences of Clusters (ESC) models. Under ESC models, it is the cluster sizes that are exchangeable, rather than the observations themselves. This property is particularly useful for obtaining microclustering behavior, whereby cluste...
Network data arises naturally in a wide variety of applications in different fields. In this article we discuss in detail the statistical modeling of financial networks. The structure of such networks red has not been studied thoroughly in the past, mainly due to limited accessible data. We explore the structure of a real trading network correspond...
Record linkage is the task of combining records from multiple files which refer to overlapping sets of entities when there is no unique identifying field. In streaming record linkage, files arrive sequentially in time and estimates of links are updated after the arrival of each file. This problem arises in settings such as longitudinal surveys, ele...
Recent advances in Bayesian models for random partitions have led to the formulation and exploration of Exchangeable Sequences of Clusters (ESC) models. Under ESC models, it is the cluster sizes that are exchangeable, rather than the observations themselves. This property is particularly useful for obtaining microclustering behavior, whereby cluste...
Entity resolution (ER), comprising record linkage and deduplication, is the process of merging noisy databases in the absence of unique identifiers to remove duplicate entities. One major challenge of analysis with linked data is identifying a representative record among determined matches to pass to an inferential or predictive task, referred to a...
In database management, record linkage aims to identify multiple records that correspond to the same individual. Record linkage can be treated as a clustering problem in which one or more noisy database records are associated with a unique latent entity. In contrast to traditional clustering applications, a large number of clusters with a few obser...
A Bayesian statistical model to simultaneously characterize two or more social networks defined over a common set of actors is proposed. The key feature of the model is a hierarchical prior distribution that allows the user to represent the entire system jointly, achieving a compromise between dependent and independent networks. Among others things...
In this work, we propose a Bayesian statistical model to simultaneously characterize two or more social networks defined over a common set of actors. The key feature of the model is a hierarchical prior distribution that allows us to represent the entire system jointly, achieving a compromise between dependent and independent networks. Among others...
Traditional Bayesian random partition models assume that the size of each cluster grows linearly with the number of data points. While this is appealing for some applications, this assumption is not appropriate for other tasks such as entity resolution, modeling of sparse networks, and DNA sequencing tasks. Such applications require models that yie...
In database management, record linkage aims to identify multiple records that correspond to the same individual. This task can be treated as a clustering problem, in which a latent entity is associated with one or more noisy database records. However, in contrast to traditional clustering applications, a large number of clusters with a few observat...
Traditional Bayesian random partition models assume that the size of each cluster grows linearly with the number of data points. While this is appealing for some applications, this assumption is not appropriate for other tasks such as entity resolution, modeling of sparse networks, and DNA sequencing tasks. Such applications require models that yie...
Over recent years there has been a growing interest in using financial trading networks to understand the microstructure of financial markets. Most of the methodologies that have been developed so far for this have been based on the study of descriptive summaries of the networks such as the average node degree and the clustering coefficient. In con...
Appointment no-shows have a negative impact on patient health and have caused substantial loss in resources and revenue for health care systems. Intervention strategies to reduce no-show rates can be more effective if targeted to the subpopulations of patients with higher risk of not showing to their appointments. We use electronic health records (...
Record linkage (entity resolution or de-deduplication) is the process of merging noisy databases to remove duplicate entities. While record linkage removes duplicate entities from the data, many researchers are interested in performing inference, prediction or post-linkage analysis on the linked data, which we call the downstream task. Depending on...
In decision theory, we start with the end in mind – how does one use inference results to make decisions and what consequences do these decisions have? Such statistical decision problems are becoming increasingly important as data are readily available at our fingertips. We outline the fundamental ideas of loss for decision theoretic tasks, Bayesia...
Over the last few years there has been a growing interest in using financial trading networks to understand the microstructure of financial markets. Most of the methodologies developed so far for this purpose have been based on the study of descriptive summaries of the networks such as the average node degree and the clustering coefficient. In cont...
We propose a multinomial logistic regression model for link prediction in a time series of directed binary networks. To account for the dynamic nature of the data we employ a dynamic model for the model parameters that is strongly connected with the fused lasso penalty. In addition to promoting sparseness, this prior allows us to explore the presen...
We develop a sparse autologistic model for investigating the impact of diversification and disintermediation strategies in the evolution of financial trading networks. In order to induce sparsity in the model estimates and address substantive questions about the underlying processes the model includes an $L^1$ regularization penalty. This makes imp...
We develop a sparse autologistic model for investigating the impact of diversification and disintermediation strategies in the evolution of financial trading networks. In order to induce sparsity in the model estimates and address substantive questions about the underlying processes the model includes an L1 regularization penalty. This makes implem...
We propose a multinomial logistic regression model for link prediction in a time series of directed binary networks. To account for the dynamic nature of the data we employ a dynamic model for the model parameters that is strongly connected with the fused lasso penalty. In addition to promoting sparseness, this prior allows us to explore the presen...
Most generative models for clustering implicitly assume that the number of data points in each cluster grows linearly with the total number of data points. Finite mixture models, Dirichlet process mixture models, and Pitman--Yor process mixture models make this assumption, as do all other infinitely exchangeable clustering models. However, for some...
Most generative models for clustering implicitly assume that the number of
data points in each cluster grows linearly with the total number of data
points. Finite mixture models, Dirichlet process mixture models, and
Pitman--Yor process mixture models make this assumption, as do all other
infinitely exchangeable clustering models. However, for some...
We explore the Cauchy and a new heavy tailed (Fuquene, Perez and Pericchi
(2011)) priors to estimate proportions on small areas. Hierarchical models and
the Binomial likelihood in the exponential family form are used. We believe
that the heavy tailed priors in survey sampling settings could be more
effective than the choice of noninformative priors...