Carter T. Butts's research while affiliated with University of California and other places

Publications (218)

Preprint
Full-text available
Despite the popular narrative that the United States is a "land of mobility," its internal migration rates have declined for decades, and reached a historical low. Economic and related factors were able to account for a portion of this trend, but the bulk has remained unexplained. Here, we propose a systemic, relational model of internal migration...
Preprint
When subjected to a sudden, unanticipated threat, human groups characteristically self-organize to identify the threat, determine potential responses, and act to reduce its impact. Central to this process is the challenge of coordinating information sharing and response activity within a disrupted environment. In this paper, we consider coordinatio...
Article
Full-text available
The uneven spread of COVID-19 has resulted in disparate experiences for marginalized populations in urban centers. Using computational models, we examine the effects of local cohesion on COVID-19 spread in social contact networks for the city of San Francisco, finding that more early COVID-19 infections occur in areas with strong local cohesion. Th...
Preprint
Graph processes that unfold in continuous time are of obvious theoretical and practical interest. Particularly useful are those whose long-term behavior converges to a graph distribution of known form. Here, we review some of the conditions for such convergence, and provide examples of novel and/or known processes that do so. These include subfamil...
Preprint
Signal maps are essential for the planning and operation of cellular networks. However, the measurements needed to create such maps are expensive, often biased, not always reflecting the metrics of interest, and posing privacy risks. In this paper, we develop a unified framework for predicting cellular signal maps from limited measurements. We prop...
Article
Using social network data from the American Social Fabric Project (ASFP), this study examines how the distance to social alters may lead to different perceptions of neighborhood and city attachment among urban versus rural residents, and considers which types of relations play influential roles in shaping attachment. Overall, a key finding is that...
Article
Full-text available
Coarse-graining is a powerful tool for extending the reach of dynamic models of proteins and other biological macromolecules. Topological coarse-graining, in which biomolecules or sets thereof are represented via graph structures, is a particularly useful way of obtaining highly compressed representations of molecular structures, and simulations op...
Preprint
Full-text available
The exponential-family random graph models (ERGMs) have emerged as an important framework for modeling social and other networks. ERGMs for valued networks are less well-studied than their unvalued counterparts, and pose particular computational challenges. Networks with edge values on the non-negative integers (count-valued networks) are an import...
Article
The hydroxyl radical is the primary reactive oxygen species produced by the radiolysis of water, and is a significant source of radiation damage to living organisms. Mobility of the hydroxyl radical at low temperatures and/or high pressures is hence a potentially important factor in determining the challenges facing psychrophilic and/or barophilic...
Preprint
The exponential family random graph modeling (ERGM) framework provides a flexible approach for the statistical analysis of networks. As ERGMs typically involve normalizing factors that are costly to compute, practical inference relies on a variety of approximations or other workarounds. Markov Chain Monte Carlo maximum likelihood (MCMC MLE) provide...
Article
Full-text available
Static light scattering is a popular physical chemistry technique that enables calculation of physical attributes such as the radius of gyration and the second virial coefficient for a macromolecule (e.g., a polymer or a protein) in solution. The second virial coefficient is a physical quantity that characterizes the magnitude and sign of pairwise...
Preprint
Coarse-graining is a powerful tool for extending the reach of dynamic models of proteins and other biological macromolecules. Topological coarse-graining, in which biomolecules or sets thereof are represented via graph structures, is a particularly useful way of obtaining highly compressed representations of molecular structure, and simulations ope...
Preprint
Full-text available
The hydroxyl radical is the primary reactive oxygen species produced by the radiolysis of water, and is a significant source of radiation damage to living organisms. Mobility of the hydroxyl radical at low temperatures and/or high pressures is hence a potentially important factor in determining the challenges facing psychrophilic and/or barophilic...
Article
Adolescent drinking remains a prominent public health and socioeconomic issue in the USA with costly consequences. While numerous drinking intervention programs have been developed, there is little guidance whether certain strategies of participant recruitment are more effective than others. The current study aims at addressing this gap in the lite...
Article
Full-text available
This paper presents the design and study of a first-in-class cyclic peptide inhibitor against the SARS-CoV-2 main protease (Mpro). The cyclic peptide inhibitor is designed to mimic the conformation of a substrate at a C-terminal autolytic cleavage site of Mpro. The cyclic peptide contains a [4-(2-aminoethyl)phenyl]-acetic acid (AEPA) linker that is...
Article
Full-text available
Despite decades of research on adolescent friendships, little is known about adolescents who are more likely to form ties outside of school. We examine multiple social and ecological contexts including parents, the school, social networks, and the neighborhood to understand the origins and health significance of out of school ties using survey data...
Article
In this paper, we investigate how message construction, style, content, and the textual content of embedded images impacted message retransmission over the course of the first 8 months of the coronavirus disease 2019 (COVID-19) pandemic in the United States. We analyzed a census of public communications (n = 372,466) from 704 public health agencies...
Article
Communication network connectivity is central to organizational performance, but maintaining connectivity can be difficult during periods of disruption. During the World Trade Center (WTC) disaster of September 11th, 2001, both emergency response-specialized organizations and organizations without such specialization forcibly adapted to a radically...
Article
Public health threats require effective communication. Evaluating effectiveness during a situation that requires emergency risk communication is difficult, however, because these events require an immediate response and collecting data may be secondary to more immediate needs. In this article, we draw on research analyzing the effectiveness of soci...
Preprint
Full-text available
Exponential-family Random Graph Models (ERGMs) constitute a large statistical framework for modeling sparse and dense random graphs, short- and long-tailed degree distributions, covariates, and a wide range of complex dependencies. Special cases of ERGMs are generalized linear models (GLMs), Bernoulli random graphs, β-models, p1-models, and models...
Article
Full-text available
Exponential-family Random Graph Models (ERGMs) constitute a large statistical framework for modeling dense and sparse random graphs with short-or long-tailed degree distributions, covariate effects and a wide range of complex dependencies. Special cases of ERGMs include network equivalents of generalized linear models (GLMs), Bernoulli random graph...
Preprint
Full-text available
Static light scattering is a popular physical chemistry technique that enables calculation of physical attributes such as the radius of gyration and the second virial coefficient for a macromolecule (e.g., a polymer or a protein) in solution. The second virial coefficient is a physical quantity that characterizes the magnitude and sign of pairwise...
Article
In this paper, we capture, identify, and describe the patterns of longitudinal risk communication from public health communicating agencies on Twitter during the first 60 days of the response to the novel coronavirus disease 2019 (COVID-19) pandemic. We collected 138,546 tweets from 696 targeted accounts from February 1 to March 31, 2020, employing...
Article
A rich literature has explored the modeling of homophily and other forms of nonuniform mixing associated with individual-level covariates within the exponential family random graph (ERGM) framework. Such differential mixing does not fully explain phenomena such as stigma, however, which involve the active maintenance of social boundaries by ostraci...
Article
Full-text available
Amyloid fibril formation is central to the etiology of a wide range of serious human diseases, such as Alzheimer's disease and prion diseases. Despite an ever growing collection of amyloid fibril structures found in the Protein Data Bank (PDB) and numerous clinical trials, therapeutic strategies remain elusive. One contributing factor to the lack o...
Article
Full-text available
As the most visible face of health expertise to the general public, health agencies have played a central role in alerting the public to the emerging COVID-19 threat, providing guidance for protective action, motivating compliance with health directives, and combating misinformation. Social media platforms such as Twitter have been a critical tool...
Article
The SARS-CoV-2 main protease (M pro ) is essential to viral replication and cleaves highly specific substrate sequences, making it an obvious target for inhibitor design. However, as for any virus, SARS-CoV-2 is subject to constant neutral drift and selection pressure, with new M pro mutations arising over time. Identification and structural charac...
Article
Full-text available
Standard epidemiological models for COVID-19 employ variants of compartment (SIR or susceptible-infectious-recovered) models at local scales, implicitly assuming spatially uniform local mixing. Here, we examine the effect of employing more geographically detailed diffusion models based on known spatial features of interpersonal networks, most parti...
Article
The validity of survey-based reports of social relationships is a critical assumption for much social network research. Research on informant accuracy has shown that observational data and recalled behavior by informants are imperfectly correlated, which calls into question whether complex relations like friendship and advice-seeking can be accurat...
Article
Microblogging sites have become important data sources for studying network dynamics and information transmission. Both areas of study, however, require accurate counts of indegree, or follower counts; unfortunately, collection of complete time series on follower counts can be limited by application programming interface constraints, system failure...
Article
Many social and other networks exhibit stable size scaling relationships, such that features such as mean degree or reciprocation rates change slowly or are approximately constant as the number of vertices increases. Statistical network models built on top of simple Bernoulli baseline (or reference) measures often behave unrealistically in this res...
Article
Full-text available
The Droserasins, aspartic proteases from the carnivorous plant Drosera capensis, contain a 100-residue plant-specific insert (PSI) that is post-translationally cleaved and independently acts as an antimicrobial peptide. PSIs are of interest not only for their inhibition of microbial growth, but also because they modify the size of lipid vesicles an...
Preprint
Full-text available
Standard epidemiological models for COVID-19 employ variants of compartment (SIR) models at local scales, implicitly assuming spatially uniform local mixing. Here, we examine the effect of employing more geographically detailed diffusion models based on known spatial features of interpersonal networks, most particularly the presence of a long-taile...
Preprint
The SARS-CoV-2 main protease (M pro ) is essential to viral replication and cleaves highly specific substrate sequences, making it an obvious target for inhibitor design. However, as for any virus, SARS-CoV-2 is subject to constant selection pressure, with new M pro mutations arising over time. Identification and structural characterization of M...
Preprint
Full-text available
Bayesian inference for exponential family random graph models (ERGMs) is a doubly-intractable problem because of the intractability of both the likelihood and posterior normalizing factor. Auxiliary variable based Markov Chain Monte Carlo (MCMC) method is asymptotically exact but computationally demanding, and is difficult to extend to modified ERG...
Article
Although it is well known that some exponential family random graph model (ERGM) families exhibit phase transitions (in which small parameter changes lead to qualitative changes in graph structure), the behavior of other models is still poorly understood. Recently, Krivitsky and Morris have reported a previously unobserved phase transition in the e...
Article
The recent popularity of models that capture the dynamic coevolution of both network structure and behavior has driven the need for summary indices to assess the adequacy of these models to reproduce dynamic properties of scientific or practical importance. Whereas there are several existing indices for assessing the ability of the model to reprodu...
Article
Retrospective life history designs are among the few practical approaches for collecting longitudinal network information from large populations, particularly in the context of relationships like sexual partnerships that cannot be measured via digital traces or documentary evidence. While all such designs afford the ability to “peer into the past”...
Preprint
Although it is well-known that some exponential family random graph model (ERGM) families exhibit phase transitions (in which small parameter changes lead to qualitative changes in graph structure), the behavior of other models is still poorly understood. Recently, Krivitsky and Morris have reported a previously unobserved phase transition in the e...
Preprint
Many social and other networks exhibit stable size scaling relationships, such that features such as mean degree or reciprocation rates change slowly or are approximately constant as the number of vertices increases. Statistical network models built on top of simple Bernoulli baseline (or reference) measures often behave unrealistically in this res...
Article
Full-text available
The current study examines crisis communication on social media by observing how twelve National Weather Service (NWS) offices use Twitter to facilitate engagement with stakeholders during threat and nonthreat periods. Using content analytic methods, we examine message features related to content and structure during a 3‐month period in spring 2016...
Preprint
Ensembles of networks arise in many scientific fields, but currently there are few statistical models aimed at understanding their generative processes. To fill in this gap, we propose characterizing network ensembles via finite mixtures of exponential family random graph models, employing a Metropolis-within-Gibbs algorithm to conduct Bayesian inf...
Article
Community structure is an important property that captures inhomogeneities common in large networks, and modularity is one of the most widely used metrics for such community structure. In this paper, we introduce a principled methodology, the Spectral Graph Forge, for generating random graphs that preserves community structure from a real network o...
Preprint
Exponential family Random Graph Models (ERGMs) can be viewed as expressing a probability distribution on graphs arising from the action of competing social forces that make ties more or less likely, depending on the state of the rest of the graph. Such forces often lead to a complex pattern of dependence among edges, with non-trivial large-scale st...
Preprint
Statistical models for networks with complex dependencies pose particular challenges for model selection and evaluation. In particular, many well-established statistical tools for selecting between models assume conditional independence of observations and/or conventional asymptotics, and their theoretical foundations are not always applicable in a...
Article
The mechanisms leading to aggregation of the crystallin proteins of the eye lens remain largely unknown. We use atomistic multiscale molecular simulations to model the solution-state conformational dynamics of γD-crystallin and its cataract-related W42R variant at both infinite dilution and at physiologically relevant concentrations. We find that t...
Preprint
A rich literature has explored the modeling of homophily and other forms of nonuniform mixing associated with individual-level covariates within the exponential family random graph (ERGM) framework. Such differential mixing does not fully explain phenomena such as stigma, however, which involve the active maintenance of social boundaries by ostraci...
Article
Networked social media provide governmental organizations, such as the National Weather Service (NWS), the opportunity to communicate directly with stakeholders over long periods of time as a form of online engagement. Typologies of engagement include aspects of message content that provide information, contribute to community building, and inspire...
Article
Full-text available
Simulations of intrinsically disordered proteins (IDPs) pose numerous challenges to comparative analysis, prominently including highly dynamic conformational states and a lack of well-defined secondary structure. Machine learning (ML) algorithms are especially effective at discriminating among high-dimensional inputs whose differences are extremely...
Article
Amyloid fibrils are locally ordered protein aggregates that self-assemble under a variety of physiological and in vitro conditions. Their formation is of fundamental interest as a physical chemistry problem and plays a central role in Alzheimer’s disease, Type II diabetes, and other human diseases. As the number of known amyloid fibril structures h...
Conference Paper
Signal strength maps are of great importance to cellular providers for network planning and operation, however they are expensive to obtain and possibly limited or inaccurate in some locations. In this paper, we develop a prediction framework based on random forests to improve signal strength maps from limited measurements. First, we propose a rand...
Article
A machine learning-based methodology for the prediction of chemical reaction products, along with automated elucidation of mechanistic details via phase space analysis of reactive trajectories, is introduced using low dimensional heuristic models and then applied to ab-initio computer simulations of the photodissociation of acetaldehyde, an importa...
Article
In this paper, we study the problem of generating synthetic graphs that resemble real-world graphs in terms of their degree correlations and potentially additional properties. We present an algorithmic framework that generates simple undirected graphs with the exact target joint degree matrix, which we refer to as 2K graphs, in linear time in the n...
Article
Full-text available
Social media platforms have the potential to facilitate the dissemination of cancer prevention and control messages following celebrity cancer diagnoses. However, cancer communicators have yet to systematically leverage these naturally occurring interventions on social media as these events are difficult to identify as they are unfolding and little...
Article
In plants, esterase/lipases perform transesterification reactions, playing an important role in the synthesis of useful molecules, such as those comprising the waxy coatings of leaf surfaces. Plant genomes and transcriptomes have provided a wealth of data about expression patterns and the circumstances under which these enzymes are upregulated, e.g...
Article
Different social processes give rise to network structures with distinctive properties. In this paper our goal is to identify the social processes that give rise to distinct network structures (specifically, subgroups). We examine particular structural meta-relations by identifying the properties of individuals associated with specific subgroups. C...
Article
Social media platforms like Twitter and Facebook provide risk communicators with the opportunity to quickly reach their constituents at the time of an emerging infectious disease. On these platforms, messages gain exposure through message passing (called “sharing” on Facebook and “retweeting” on Twitter). This raises the question of how to optimize...
Article
Full-text available
The concurrent or sequential usage of multiple substances during adolescence is a serious public health problem. Given the importance of understanding interdependence in substance use during adolescence, the purpose of this study is to examine the co-evolution of cigarette smoking, alcohol, and marijuana use within the ever-changing landscape of ad...
Data
Stochastic actor-based models of friendship networks and substance use of cigarettes, alcohol, and marijuana with friends' average smoking/drinking/marijuana use level effects. (PDF)
Data
The simulation results of smoking, drinking, and marijuana use levels in Jefferson High under various conditions. (PDF)
Article
Full-text available
Statistical methods for dynamic network analysis have advanced greatly in the past decade. This article extends current estimation methods for dynamic network logistic regression (DNR) models, a subfamily of the Temporal Exponential-family Random Graph Models, to network panel data which contain missing data in the edge and/or vertex sets. We begin...
Article
Data collection designs for social network studies frequently involve asking both parties to a potential relationship to report on the presence of absence of that relationship, resulting in two measurements per potential tie. When inferring the underlying network, is it better to estimate the tie as present only when both parties report it as prese...
Article
Exponential family random graph models (ERGMs) can be understood in terms of a set of structural biases that act on an underlying reference distribution. This distribution determines many aspects of the behavior and interpretation of the ERGM families incorporating it. One important innovation in this area has been the development of an ERGM refere...
Article
Full-text available
Multi-omic approaches promise to supply the power to detect genes underlying disease and fitness-related phenotypes. Optimal use of the resulting profusion of data requires detailed investigation of individual candidate genes, a challenging proposition. Here, we combine transcriptomic and genomic data with molecular modelling of candidate enzymes t...
Article
Community structure is an important property that captures inhomogeneities common in large networks, and modularity is one of the most widely used metrics for such community structure. In this paper, we introduce a principled methodology, the Spectral Graph Forge, for generating random graphs that preserves community structure from a real network o...
Article
Purpose: The aim of this project was to describe and evaluate the levels of lung cancer communication across the cancer prevention and control continuum for content posted to Twitter during a 10-day period (September 30 to October 9) in 2016. Methods: Descriptive and inferential statistics were used to identify relationships between tweet charac...
Article
Several recent implementations of algorithms for sampling reaction pathways employ a strategy for placing interfaces or milestones across the reaction coordinate manifold. Interfaces can be introduced such that the full feature space describing the dynamics of a macromolecule is divided into Voronoi (or other) cells, and the global kinetics of the...