Peter Dodds

Peter Dodds
University of Vermont | UVM · Department of Mathematics and Statistics

PhD

About

162
Publications
45,834
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
11,944
Citations
Additional affiliations
August 2006 - present
University of Vermont
Position
  • Professor (Full)
August 2003 - July 2006
Columbia University
Position
  • Researcher
August 2002 - August 2003
Columbia University
Position
  • PostDoc Position
Education
August 1994 - June 2000
Massachusetts Institute of Technology
Field of study
  • Mathematics

Publications

Publications (162)
Article
When building a global brand of any kind—a political actor, clothing style, or belief system— developing widespread awareness is a primary goal. Short of knowing any of the stories or products of a brand, being talked about in whatever fashion—raw fame—is, as Oscar Wilde would have it, better than not being talked about at all. Here, we measure, ex...
Article
Full-text available
A common task in computational text analyses is to quantify how two corpora differ according to a measurement like word frequency, sentiment, or information content. However, collapsing the texts’ rich stories into a single number is often conceptually perilous, and it is difficult to confidently interpret interesting or unexpected textual patterns...
Preprint
We define `ousiometrics' to be the study of essential meaning in whatever context that meaningful signals are communicated, and `telegnomics' as the study of remotely sensed knowledge. From work emerging through the middle of the 20th century, the essence of meaning has become generally accepted as being well captured by the three orthogonal dimens...
Preprint
Full-text available
Data scientists across disciplines are increasingly in need of exploratory analysis tools for data sets with a high volume of features. We expand upon graph mining approaches for exploratory analysis of high-dimensional data to introduce Sirius, a visualization package for researchers to explore feature relationships among mixed data types using mu...
Article
Full-text available
Human mortality is in part a function of multiple socioeconomic factors that differ both spatially and temporally. Adjusting for other covariates, the human lifespan is positively associated with household wealth. However, the extent to which mortality in a geographical region is a function of socioeconomic factors in both that region and its neigh...
Article
Full-text available
Since the shooting of Black teenager Michael Brown by White police officer Darren Wilson in Ferguson, Missouri, the protest hashtag #BlackLivesMatter has amplified critiques of extrajudicial killings of Black Americans. In response to #BlackLivesMatter, other Twitter users have adopted #AllLivesMatter, a counter-protest hashtag whose content argues...
Article
Full-text available
We developed computational models to predict the emergence of depression and Post-Traumatic Stress Disorder in Twitter users. Twitter data and details of depression history were collected from 204 individuals (105 depressed, 99 healthy). We extracted predictive features measuring affect, linguistic style, and context from participant tweets (N=279,...
Article
Full-text available
Herbert Simon's classic rich-gets-richer model is one of the simplest empirically supported mechanisms capable of generating heavy-tail size distributions for complex systems. Simon argued analytically that a population of flavored elements growing by either adding a novel element or randomly replicating an existing one would afford a distribution...
Article
Full-text available
The task of text segmentation may be undertaken at many levels in text analysis—paragraphs, sentences, words, or even letters. Here, we focus on a relatively fine scale of segmentation, hypothesizing it to be in accord with a stochastic model of language generation, as the smallest scale where independent units of meaning are produced. Our goals in...
Article
Solicited public opinion surveys reach a limited subpopulation of willing participants and are expensive to conduct, leading to poor time resolution and a restricted pool of expert-chosen survey topics. In this study, we demonstrate that unsolicited public opinion polling through sentiment analysis applied to Twitter correlates well with a range of...
Data
European Union E-cigarette Ban Political Debate (#EUecigBan). (Left) Word shift graph comparing tweets tagged #EUecigBan against 2013 English Organic User Tweets (untagged). (top-right) The automated and Organic tagged tweet distributions are plotted. A histogram displays the counts per language and user class. (bottom-right) Word clouds compare ra...
Data
Electronic Cigarette Table of Key Words. List of all key words used in the analysis. Flavors compiled from https://crazyvapors.com/e-liquid-flavor-list/ Keywords other than ‘General Twitter Scrape’ were applied to categorize automated account tweets. (PDF)
Data
Twitter IDs. List of all Twitter IDs appearing in the analysis. (TXT)
Article
Full-text available
Advances in computing power, natural language processing, and digitization of text now make it possible to study our a culture's evolution through its texts using a "big data" lens. Our ability to communicate relies in part upon a shared emotional experience, with stories often following distinct emotional trajectories, forming patterns that are me...
Article
Full-text available
Instabilities and long term shifts in seasons, whether induced by natural drivers or human activities, pose great disruptive threats to ecological, agricultural, and social systems. Here, we propose, measure, and explore two fundamental markers of location-sensitive seasonal variations: the Summer and Winter Teletherms-the on-average annual dates o...
Article
Apples, porcupines, and the most obscure Bob Dylan song--is every topic a few clicks from Philosophy? Within Wikipedia, the surprising answer is yes: nearly all paths lead to Philosophy. Wikipedia is the largest, most meticulously indexed collection of human knowledge ever amassed. More than information about a topic, Wikipedia is a web of naturall...
Article
Full-text available
Identifying and communicating relationships between causes and effects is important for understanding our world, but is affected by language structure, cognitive and emotional biases, and the properties of the communication medium. Despite the increasing importance of social media, much remains unknown about causal statements made online. To study...
Data
Computational Details and Explicit Equations Used. (PDF)
Article
Full-text available
The emergence and global adoption of social media has rendered possible the real-time estimation of population-scale sentiment, bearing profound implications for our understanding of human behavior. Given the growing assortment of sentiment measuring instruments, comparisons between them are evidently required. Here, we perform detailed tests of 6...
Article
Full-text available
Although climate change and energy are intricately linked, their explicit connection is not always prominent in public discourse and the media. Disruptive extreme weather events, including hurricanes, focus public attention in new and different ways, offering a unique window of opportunity to analyze how a focusing event influences public opinion....
Article
Full-text available
A thermal convection loop is a annular chamber filled with water, heated on the bottom half and cooled on the top half. With sufficiently large forcing of heat, the direction of fluid flow in the loop oscillates chaotically, dynamics analogous to the Earth's weather. As is the case for state-of-the-art weather models, we only observe the statistics...
Article
Full-text available
The field of neuroimaging has truly become data rich, and novel analytical methods capable of gleaning meaningful information from large stores of imaging data are in high demand. Those methods that might also be applicable on the level of individual subjects, and thus potentially useful clinically, are of special interest. In the present study, we...
Article
Full-text available
Instabilities and long term shifts in seasons, whether induced by natural drivers or human activities, pose great disruptive threats to ecological, agricultural, and social systems. Here, we propose, quantify, and explore two fundamental markers of seasonal variations: the Summer and Winter Teletherms---the on-average annual dates of the hottest an...
Article
Full-text available
Background: Twitter has become the "wild-west" of marketing and promotional strategies for advertisement agencies. Electronic cigarettes have been heavily marketed across Twitter feeds, offering discounts, "kid-friendly" flavors, algorithmically generated false testimonials, and free samples. Methods: All electronic cigarette keyword related twe...
Article
Full-text available
We propose and develop a Lexicocalorimeter: an online, interactive instrument for measuring the "caloric content" of social media and other large-scale texts. We do so by constructing extensive yet improvable tables of food and activity related phrases, and respectively assigning them with sourced estimates of caloric intake and expenditure. We sho...
Article
Full-text available
Sports are spontaneous generators of stories. Through skill and chance, the script of each game is dynamically written in real time by players acting out possible trajectories allowed by a sport's rules. By properly characterizing a given sport's ecology of `game stories', we are able to capture the sport's capacity for unfolding interesting narrat...
Article
Full-text available
We demonstrate that the concerns expressed by Garcia et al. are misplaced, due to (1) a misreading of our findings in [1]; (2) a widespread failure to examine and present words in support of asserted summary quantities based on word usage frequencies; and (3) a range of misconceptions about word usage frequency, word rank, and expert-constructed wo...
Article
Twitter, a popular social media outlet, has evolved into a vast source of linguistic data, rich with opinion, sentiment, and discussion. Due to the increasing popularity of Twitter, its perceived potential for exerting social influence has led to the rise of a diverse community of automatons, commonly referred to as bots. These inorganic and semi-o...
Article
Full-text available
The consequences of anthropogenic climate change are extensively debated through scientific papers, newspaper articles, and blogs. Newspaper articles may lack accuracy, while the severity of findings in scientific papers may be too opaque for the public to understand. Social media, however, is a forum where individuals of diverse backgrounds can sh...
Article
Full-text available
The Google Books corpus contains millions of books in a variety of languages. Due to its incredible volume and its free availability, it is a treasure trove for linguistic research. In a previous work, we found the unfiltered English data sets from both the 2009 and 2012 versions of the corpus are both heavily saturated with scientific literature,...
Article
Full-text available
In an effort to better understand meaning from natural language texts, we explore methods aimed at organizing lexical objects into contexts. A number of these methods for organization fall into a family defined by word ordering. Unlike demographic or spatial partitions of data, these collocation models are of special importance for their universal...
Article
Full-text available
Natural languages are full of rules and exceptions. One of the most famous quantitative rules is Zipf’s law, which states that the frequency of occurrence of a word is approximately inversely proportional to its rank. Though this “law” of ranks has been found to hold across disparate texts and forms of data, analyses of increasingly large corpora s...
Article
Full-text available
It is tempting to treat frequency trends from Google Books data sets as indicators for the true popularity of various words and phrases. Doing so allows us to draw novel conclusions about the evolution of public perception of a given topic, such as time and gender. However, sampling published works by availability and ease of digitization leads to...
Article
Full-text available
Profiting from the emergence of web-scale social data sets, numerous recent studies have systematically explored human mobility patterns over large populations and large time scales. Relatively little attention, however, has been paid to mobility and activity over smaller time-scales, such as a day. Here, we use Twitter to identify people's frequen...
Article
Full-text available
Attacks by drones (i.e., unmanned combat air vehicles) continue to generate heated political and ethical debates. Here we examine instead the quantitative nature of drone attacks, focusing on how their intensity and frequency compares to other forms of human conflict. In contrast to the power-law distribution found recently for insurgent and terror...
Article
Full-text available
With Zipf’s law being originally and most famously observed for word frequency, it is surprisingly limited in its applicability to human language, holding over no more than three to four orders of magnitude before hitting a clear break in scaling. Here, building on the simple observation that phrases of one or more words comprise the most coherent...
Article
Full-text available
Using human evaluation of 100,000 words spread across 24 corpora in 10 languages diverse in origin and culture, we present evidence of a deep imprint of human sociality in language, observing that (1) the words of natural human language possess a universal positivity bias; (2) the estimated emotional content of words is consistent between languages...
Article
Full-text available
Complex networks underlie an enormous variety of social, biological, physical, and virtual systems. A profound complication for the science of complex networks is that in most cases, observing all nodes and all network interactions is impossible. Yet previous work addressing the impacts of partial network data is surprisingly limited, focuses prima...
Article
Full-text available
Complex, dynamic networks underlie many systems, and understanding these networks is the concern of a great span of important scientific and engineering problems. Quantitative description is crucial for this understanding yet, due to a range of measurement problems, many real network datasets are incomplete. Here we explore how accidentally missing...
Article
Full-text available
The patterns of life exhibited by large populations have been described and modeled both as a basic science exercise and for a range of applied goals such as reducing automotive congestion, improving disaster response, and even predicting the location of individuals. However, these studies have had limited access to conversation content, rendering...
Article
We study binary state dynamics on a network where each node acts in response to the average state of its neighborhood. By allowing varying amounts of stochasticity in both the network and node responses, we find different outcomes in random and deterministic versions of the model. In the limit of a large, dense network, however, we show that these...
Article
Full-text available
Reflective of income and wealth distributions, philanthropic gifting appears to follow an approximate power-law size distribution as measured by the size of gifts received by individual institutions. We explore the ecology of gifting by analysing data sets of individual gifts for a diverse group of institutions dedicated to education, medicine, art...
Article
Full-text available
We conduct a detailed investigation of correlations between real-time expressions of individuals made across the United States and a wide range of emotional, geographic, demographic, and health characteristics. We do so by combining (1) a massive, geo-tagged data set comprising over 80 million words generated in 2011 on the social network service T...
Article
Many real world, complex phenomena have underlying structures of evolving networks where nodes and links are added and removed over time. A central scientific challenge is the description and explanation of network dynamics, with a key test being the prediction of short and long term changes. For the problem of short-term link prediction, existing...
Article
Full-text available
We study binary state dynamics on a network where each node acts in response to the average state of its neighborhood. Allowing varying amounts of stochasticity in both the network and node responses, we find different outcomes in random and deterministic versions of the model. In the limit of a large, dense network, however, we show that these dyn...
Article
Full-text available
The metabolic theory of ecology (MTE) predicts the effects of body size and temperature on metabolism through considerations of vascular distribution networks and biochemical kinetics. MTE has also been extended to characterise processes from cellular to global levels. MTE has generated both enthusiasm and controversy across a broad range of resear...
Article
Full-text available
We study a family of binary state, socially-inspired contagion models which incorporate imitation limited by an aversion to complete conformity. We uncover rich behavior in our models whether operating with either probabilistic or deterministic individual response functions on both dynamic and fixed random networks. In particular, we find significa...
Article
Full-text available
Over the last million years, human language has emerged and evolved as a fundamental instrument of social communication and semiotic representation. People use language in part to convey emotional information, leading to the central and contingent questions: (1) What is the emotional spectrum of natural language? and (2) Are natural languages neutr...
Data
Example words for Twitter as a function of usage frequency rank and standard deviation of happiness estimates. (TIFF)
Data
Example words for the New York Times as a function of usage frequency rank and standard deviation of happiness estimates. (TIFF)
Data
Example words for the Music Lyrics corpus as a function of usage frequency rank and standard deviation of happiness estimates. (TIFF)
Data
Full-text available
The 50 most negative words in our data set. (PDF)
Data
Example words for the Google Books corpus as a function of usage frequency rank and average happiness. (TIFF)