
Nicholas Jon HortonAmherst College · Department of Mathematics and Statistics
Nicholas Jon Horton
ScD
About
292
Publications
96,085
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
15,054
Citations
Introduction
I am a biostatistician and data science interested in developing methodology for the analysis of missing and/or incomplete data as well as the analysis of longitudinal or repeated measures data, the analysis of multiple informant and multiple outcome data, statistical computing and statistics and data science education.
Additional affiliations
July 2013 - March 2016
January 2012 - December 2013
July 2003 - June 2013
Education
June 1994 - December 1998
Publications
Publications (292)
Objective
This prospective cohort study aimed to empirically derive phenotypes of children and adolescents with overweight and obesity.
Methods
Latent class analyses using Mplus were carried out in the Growing Up Today Study. Information on participants' weight status, disordered eating behaviors, body image and weight concerns, depressive symptom...
Although parent ratings, adolescent ratings, and observations are all utilized to measure parent emotion socialization during adolescence, there is a lack of research examining measurement differences and concordance. Thus, the present study compared three measures of parent supportive and nonsupportive emotion socialization and examined whether pa...
Text provides a compelling example of unstructured data that can be used to motivate and explore classification problems. Challenges arise regarding the representation of features of text and student linkage between text representations as character strings and identification of features that embed connections with underlying phenomena. In order to...
Description of the ear canal's geometry is essential for describing peripheral sound flow, yet physical measurements of the canal's geometry are lacking and recent measurements suggest that older-adult-canal areas are systematically larger than previously assumed. Methods to measure ear-canal geometry from multi-planar reconstructions of high-resol...
Objective:
To summarize absorbance and impedance angles from normal-hearing ears within the 2015-2016 and 2017-2020 US National Health and Nutrition Examination Surveys (NHANES).
Design:
Two publicly available NHANES datasets were analyzed. Ears meeting criteria for normal hearing and valid absorbance and impedance angle measurements were identi...
Text provides a compelling example of unstructured data that can be used to motivate and explore classification problems. Challenges arise regarding the representation of features of text and student linkage between text representations as character strings and identification of features that embed connections with underlying phenomena. In order to...
There has been heightened interest in identifying critical windows of exposure for adverse health outcomes; that is, time points during which exposures have the greatest impact on a person's health. Multiple informant models implemented using generalized estimating equations (MIM GEEs) have been applied to address this research question because the...
A substantial fraction of students who complete their college education at a public university in the United States begin their journey at one of the 935 public two-year colleges. While the number of four-year colleges offering bachelor's degrees in data science continues to increase, data science instruction at many two-year colleges lags behind....
Many data science students and practitioners are reluctant to adopt good coding practices as long as the code "works". However, code standards are an important part of modern data science practice, and they play an essential role in the development of "data acumen". Good coding practices lead to more reliable code and often save more time than they...
Introduction
Child abuse is associated with adult obesity. Yet, it is unknown how the developmental timing and combination of abuse types affect this risk. This report examined how distinct child and adolescent abuse patterns were associated with incident obesity in young adulthood.
Methods
Data came from 7,273 participants in the Growing Up Today...
This is a review of Agresti and Kateri's book that was published in JASA.
The world is full of text data, yet text analytics has not traditionally played a large part in statistics education. We consider four different ways to provide students with opportunities to explore whether email messages are unwanted correspondence (spam). Text from subject lines are used to identify features that can be used in classification. T...
The world is full of text data, yet text analytics has not traditionally played a large part in statistics education. We consider four different ways to provide students with opportunities to explore whether email messages are unwanted correspondence (spam). Text from subject lines are used to identify features that can be used in classification. T...
While coursework provides undergraduate data science students with some relevant analytic skills, many are not given the rich experiences with data and computing they need to be successful in the workplace. Additionally, students often have limited exposure to team-based data science and the principles and tools of collaboration that are encountere...
Background
Child maltreatment may be an important risk factor for eating disorder (ED) behaviors. However, most previous research has been limited to clinical, female, and cross-sectional samples, and has not adequately accounted for complex abuse patterns.
Objective
To determine whether women and men with distinct patterns of child and adolescent...
Computing makes up a large and growing component of data science and statistics courses. Many of those courses, especially when taught by faculty who are statisticians by training, teach R as the programming language. A number of instructors have opted to build much of their teaching around the use of the tidyverse. The tidyverse, in the words of i...
Nicholas J. Horton describes some further applications of the expectation‐maximisation (EM) algorithm, demonstrating its flexibility and popularity as a statistical tool Nicholas J. Horton describes some further applications of the expectation‐maximisation (EM) algorithm, demonstrating its flexibility and popularity as a statistical tool.
While coursework introduces undergraduate data science students to some relevant analytic skills, many are not given the myriad experiences with data and computing they need to be successful in the workplace. Additionally, students often have little background with team-based data science and the principles and tools of collaboration that are encou...
Purpose
Male weight concerns tend to focus on shape and muscularity as opposed to a desire for thinness and remain underdetected by conventional eating disorder assessments. We aimed to describe the longitudinal course of weight concerns and disordered eating behaviors among males across adolescence and young adulthood.
Methods
We used prospective...
We strongly believe that real-world training and experiences cannot be reserved just for graduate students. The same primary argument made by Kolaczyk et al. (2021)—that graduate students need a richer understanding of the interplay of theory and practice than we have historically offered—also applies to undergraduate students. Not including these...
Nolan and Temple Lang (2010) argued for the fundamental role of computing in the statistics curriculum. In the intervening decade the statistics education community has acknowledged that computational skills are as important to statistics and data science practice as mathematics. There remains a notable gap, however, between our intentions and our...
Nolan and Temple Lang (2010) argued for the fundamental role of computing in the statistics curriculum. In the intervening decade the statistics education community has acknowledged that computational skills are as important to statistics and data science practice as mathematics. There remains a notable gap, however, between our intentions and our...
A version control system records changes to a file or set of files over time so that changes can be tracked and specific versions of a file can be recalled later. As such, it is an essential element of a reproducible workflow that deserves due consideration among the learning objectives of statistics courses. This paper describes experiences and im...
Wideband acoustic immittance (WAI) measures are noninvasive diagnostic measurements that require an estimate of the ear canal's area at the measurement location. Yet, physical measurements of the area at WAI probe locations are lacking. Methods to measure ear-canal areas from silicone molds were developed and applied to 169 subjects, ages 18–75 yea...
Background
Individuals can have vastly different maltreatment experiences depending on the types, developmental timing, and duration of abuse. Women and men may be differentially affected by distinct abuse patterns.
Objective
To examine whether maltreatment subgroups could be identified based on the types, developmental timing, and duration of abu...
Version control is an essential element of a reproducible workflow that deserves due consideration among the learning objectives of statistics courses. This paper describes experiences and implementation decisions of four contributing faculty who are teaching different courses at a variety of institutions. Each of these faculty have set version con...
Purpose:
The aim of the study was to assess whether girls with mothers who have had an eating disorder (ED) have greater odds of developing ED symptoms and whether girls with ED symptoms have greater odds of receiving ED treatment if their mothers have an ED history.
Methods:
Data came from 3,649 females in the Growing Up Today Study. Data were...
Aims
To examine the impact of multiple psychiatric disorders over the lifetime on risk of mortality in the general population.
Methods
Data came from a random community-based sample of 1397 adults in Atlantic Canada, recruited in 1992. Major depression, dysthymia, panic disorder, generalised anxiety disorder and alcohol use disorders were assessed...
Importance
Eating meals, particularly dinner, with family members has been associated with improved dietary intake among youths. However, existing studies have not examined how family functioning may moderate or confound this association.
Objective
To examine whether level of family functioning is associated cross-sectionally with frequency of fam...
Purpose:
To quantify eating disorder (ED) stability and diagnostic transition among a community-based sample of adolescents and young adult females in the United States.
Methods:
Using 11 prospective assessments from 9,031 U.S. females ages 9-15 years at baseline of the Growing Up Today Study, we classified cases of the following EDs involving b...
Objective
Patient reported outcomes (PROs) are important in oncology research, however, missing data can pose a threat to the validity of results. Psycho‐oncology researchers should be aware of the statistical options for handling missing data robustly. One rarely used set of methods, which includes extensions for handling missing data, is generali...
1 Background
Depression and anxiety disorders are highly comorbid, and share significant symptom overlap. Whereas depression has been consistently associated with excess mortality, the association between anxiety and mortality is less clear. Our aim was to identify constellations of anxious and depressive symptoms and examine their associations wit...
The goal of the “Keeping Data Science Broad” series of webinars and workshops was to garner community input into pathways for keeping data science education broadly inclusive across sectors, institutions, and populations. Input was collected from data science programs across the nation, either traditional or alternative, and from a range of institu...
This chapter introduces basics of how to wrangle data in R. =
What is statistics? We attempt to answer this question as it relates to grounding research in statistics education. We discuss the nature of statistics as the science of learning from data, its history and traditions, what characterizes statistical thinking and how it differs from mathematics, connections with computing and data science, why learni...
Donoho's JCGS (in press) paper is a spirited call to action for statisticians, who he points out are losing ground in the field of data science by refusing to accept that data science is its own domain. (Or, at least, a domain that is becoming distinctly defined.) He calls on writings by John Tukey, Bill Cleveland, and Leo Breiman, among others, to...
Background:
Many studies have shown that depression increases mortality risk. We aimed to investigate the duration of time over which depression is associated with increased risk of mortality, secular trends in the association between depression and mortality, and sex differences in the association between depression and mortality.
Methods:
We c...
Data wrangling is a critical foundation of data science, and wrangling of categorical data is an important component of this process. However, categorical data can introduce unique issues in data wrangling, particularly in real-world settings with collaborators and periodically-updated dynamic data. This paper discusses common problems arising from...
Data wrangling is a critical foundation of data science, and wrangling of categorical data is an important component of this process. However, categorical data can introduce unique issues in data wrangling, particularly in real-world settings with collaborators and periodically-updated dynamic data. This paper discusses common problems arising from...
Data wrangling is a critical foundation of data science, and wrangling of categorical data is an important component of this process. However, categorical data can introduce unique issues in data wrangling, particularly in real-world settings with collaborators and periodically-updated dynamic data. This paper discusses common problems arising from...
Data wrangling is a critical foundation of data science, and wrangling of categorical data is an important component of this process. However, categorical data can introduce unique issues in data wrangling, particularly in real-world settings with collaborators and periodically-updated dynamic data. This paper discusses common problems arising from...
The mosaic package provides a simplified and systematic introduction to the core functionality related to descriptive statistics, visualization, modeling, and simulation-based inference required in first and second courses in statistics. This introduction to the package describes some of the guiding principles behind the design of the package and p...
Since the 2005 American Statistical Association's (ASA) endorsement of the Guidelines for Assessment and Instruction in Statistics Education (GAISE) College Report, changes in the statistics field and statistics education have had a major impact on the teaching and learning of statistics. We now live in a world where "Statistics - the science of le...
In a world awash with data, the ability to think and compute with data has become an important skill for students in many fields. For that reason, inclusion of some level of statistical computing in many introductory-level courses has grown more common in recent years. Existing literature has documented multiple success stories of teaching statisti...
Most analyses of randomised trials with incomplete outcomes make untestable assumptions and should therefore be subjected to sensitivity analyses. However, methods for sensitivity analyses are not widely used. We propose a mean score approach for exploring global sensitivity to departures from missing at random or other assumptions about incomplete...
Created to foster inclusive excellence, Smith College?s Achieving Excellence in Mathematics, Engineering, and Science (AEMES) Scholars program provides early faculty-mentored research opportunities and other programming as a way to foster success in academic outcomes for underrepresented women in science. Using academic record data, we compared Sch...
Diagnostic criteria for eating disorders (ED) remain largely based on clinical presentations, but do not capture the full range of behaviours in the population. We aimed to derive an empirically based ED behaviour classification using behavioural and body mass index (BMI) indicators at three time-points in adolescence, and to validate classes inves...
Confidence intervals provide a way to determine plausible values for a population parameter. They are omnipresent in research articles involving statistical analyses. Appropriately, a key statistical literacy learning objective is the ability to interpret and understand confidence intervals in a wide range of settings. As instructors, we devote a c...
One learning goal of the introductory statistics course is to develop the ability to make sense of research findings in published papers. The Atlantic magazine regularly publishes a feature called "Study of Studies" that summarizes multiple articles published in a particular domain. We describe a classroom activity to develop this capacity using th...
Background:
A noninvasive method to monitor changes in intracranial pressure (ICP) is required for astronauts on long-duration spaceflight who are at risk of developing the Visual Impairment/Intracranial Pressure syndrome that has some, but not all of the features of idiopathic intracranial hypertension. We assessed the validity of distortion prod...
Background
Little is known about how factors within the general family environment are associated with weight and related behaviors among adolescents/young adults.
Methods
We studied 3768 females and 2614 males, 14–24 years old in 2011, participating in the Growing Up Today Study 2. We used generalized mixed models to examine cross-sectional assoc...
Objective
Research on the manifestations and health correlates of eating disorder symptoms among males is lacking. This study identified patterns of appearance concerns and eating disorder behaviors from adolescence through young adulthood and their health correlates.
Method
Participants were 7,067 males from the prospective Growing Up Today Study...
Objective:
The objective is to develop methods to utilize newborn reflectance measures for the identification of middle-ear transient conditions (e.g., middle-ear fluid) during the newborn period and ultimately during the first few months of life. Transient middle-ear conditions are a suspected source of failure to pass a newborn hearing screening...
Eating meals, particularly dinner, with family members has been found to be associated with improved dietary intake, lower prevalence of disordered eating behaviors, lower levels of substance abuse, and improved academic outcomes among adolescents. Limited research has examined how the frequency of family meals has changed over time. The objective...
This is an exciting time to be a statistician. The contribution of the discipline of statistics to scientific knowledge is widely recognized (McNutt 2014) with increasingly positive public perception. Many feel “daunted by the challenge of extracting understanding from floods of disconnected data that threaten to swamp every discipline” (Yamamoto 2...
This is an exciting time to be a statistician. The contribution of the discipline of statistics to scientific knowledge is widely recognized (McNutt, 2014) with increasingly positive public perception. Many feel
"daunted by the challenge of extracting understanding from floods of disconnected data that threaten to swamp every discipline" (Yamamoto...