Brit Youngmann

Brit Youngmann
Massachusetts Institute of Technology | MIT · Computer Science and Artificial Intelligence Laboratory

Doctor of Philosophy

About

26
Publications
1,772
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
72
Citations
Citations since 2016
26 Research Items
72 Citations
2016201720182019202020212022051015202530
2016201720182019202020212022051015202530
2016201720182019202020212022051015202530
2016201720182019202020212022051015202530

Publications

Publications (26)
Preprint
Full-text available
When analyzing large datasets, analysts are often interested in the explanations for surprising or unexpected results produced by their queries. In this work, we focus on aggregate SQL queries that expose correlations in the data. A major challenge that hinders the interpretation of such queries is confounding bias, which can lead to an unexpected...
Article
We demonstrate EDA4Sum, a framework dedicated to generating guided multi-step data summarization pipelines for very large datasets. Data summarization is the process of producing interpretable and representative subsets of an input dataset. It is usually performed following a one-shot process with the purpose of finding the best summary. EDA4Sum le...
Article
Full-text available
Data analytics often make sense of large data sets by generalization: aggregating from the detailed data to a more general context. Given a dataset, misleading generalizations can sometimes be drawn from a cherry-picked level of aggregation to obscure substantial subgroups that oppose the generalization. Our goal is to detect and explain cherry-pic...
Conference Paper
Full-text available
We demonstrate EDA4Sum, a framework dedicated to generating guided multi-step data summarization pipelines for very large datasets. Data summarization is the process of producing interpretable and representative subsets of an input dataset. It is usually performed following a one-shot process with the purpose of finding the best summary. EDA4Sum le...
Preprint
Full-text available
Data summarization is the process of producing interpretable and representative subsets of an input dataset. It is usually performed following a one-shot process with the purpose of finding the best summary. A useful summary contains k individually uniform sets that are collectively diverse to be representative. Uniformity addresses interpretabilit...
Article
Data summarization is the process of producing interpretable and representative subsets of an input dataset. It is usually performed following a one-shot process with the purpose of finding the best summary. A useful summary contains k individually uniform sets that are collectively diverse to be representative. Uniformity addresses interpretabilit...
Article
Intimate partner violence (IPV) is a major public health concern with serious consequences for victims’ physical and mental health. Despite the high prevalence of IPV, describing it and detecting people suffering from it is difficult due to its sensitive nature and stigma associated with it. Existing tools for screening and tracking IPV victims are...
Article
Full-text available
Generalizing from detailed data to statements in a broader context is often critical for users to make sense of large data sets. Correspondingly, poorly constructed generalizations might convey misleading information even if the statements are technically supported by the data. For example, a cherry-picked level of aggregation could obscure substan...
Article
Full-text available
Search advertising, a popular method for online marketing, has been employed to improve health by eliciting positive behavioral change. However, writing effective advertisements requires expertise and experimentation, which may not be available to health authorities wishing to elicit such changes, especially when dealing with public health crises s...
Conference Paper
The use of probabilistic datalog programs has been advocated for applications that involve recursive computation and uncertainty. While using such programs allows for a flexible knowledge derivation, it makes the analysis of query results a challenging task. Particularly, given a set O of output tuples and a number k, one would like to understand w...
Conference Paper
In applications with large userbases such as crowdsourcing, social networks or recommender systems, selecting users is a common and challenging task. Different applications require different policies for selecting users, and implementing such policies is applicationspecific and laborious. To this end, we introduce a novel declarative framework that...
Article
Full-text available
Objective: To develop, apply, and evaluate, a novel web-based classifier for screening for Parkinson disease among a large cohort of search engine users. Methods: A supervised machine learning classifier learned to distinguish web users with self-reported Parkinson's disease from controls based on their interactions with a search engine (Bing, M...
Preprint
Full-text available
Search advertising is one of the most commonly-used methods of advertising. Past work has shown that search advertising can be employed to improve health by eliciting positive behavioral change. However, writing effective advertisements requires expertise and (possible expensive) experimentation, both of which may not be available to public health...
Conference Paper
Full-text available
Influence Maximization (IM) is the problem of finding a set of influential users in a social network, so that their aggregated influence is maximized. IM has natural applications in viral marketing and has been the focus of extensive recent research. One critical problem, however, is that while existing IM algorithms serve the goal of reaching a la...
Conference Paper
Parkinson's disease (PD) is a slowly progressing neurodegenerative disease with early manifestation of motor signs. Recently, there has been a growing interest in developing automatic tools that can assess motor function in PD patients. Here we show that mouse tracking data collected during people's interaction with a search engine can be used to d...
Preprint
Full-text available
Parkinson's disease (PD) is a slowly progressing neurodegenerative disease with early manifestation of motor signs. Recently, there has been a growing interest in developing automatic tools that can assess motor function in PD patients. Here we show that mouse tracking data collected during people's interaction with a search engine can be used to d...
Conference Paper
Full-text available
People seeking information through search engines are assumed to behave similarly, regardless of the topic which they are searching. Here we use mouse tracking, which is correlated with gaze, to show that the information seeking patterns of people differ dramatically depending on their level of anxiety at the time of the search. We investigate the...
Conference Paper
Full-text available
With the proliferation of social image-sharing applications, image search becomes an increasingly common activity. In this work, we focus on a particular class of images that convey semantic meaning beyond the visual appearance, and whose search presents particular challenges. A prominent example is Memes, an emerging popular type of captioned pict...
Article
Adequate crowd selection is an important factor in the success of crowdsourcing platforms, increasing the quality and relevance of crowd answers and their performance in different tasks. The optimal crowd selection can greatly vary depending on properties of the crowd and of the task. To this end, we present December, a declarative platform with no...

Network

Cited By