MyFitnessPal Food Diary Dataset
Abstract
This dataset contains 587,187 days of food diary records logged by 9.9K MyFitnessPal users from September 2014 through April 2015. Each line is a tab-separated list of:
- Anonymized user ID
- Diary date
- List of food entries and nutrients (as JSON objects)
- Daily aggregate of nutrient intake and goal (as JSON objects).
If you use the dataset in scientific publication, a citation to the following paper would be greatly appreciated:
- Ingmar Weber and Palakorn Achananuparp. 2016. Insights from Machine-Learned Diet Success Prediction. In Proceedings of Pacific Symposium on Biocomputing (PSB).
File (1)
Content uploaded by Palakorn Achananuparp
Author content
... Data Availability Statement: The original raw dataset [7] is publicly available at [30]. The custom code developed for this study is available from the corresponding author on reasonable request. ...
Self-regulation of food intake is necessary for maintaining a healthy body weight. One of the characteristics of self-regulation is calorie compensation. Calorie compensation refers to adjusting the current meal's energy content based on the energy content of the previous meal(s). Preload test studies measure a single instance of compensation in a controlled setting. The measurement of calorie compensation in free-living conditions has largely remained unexplored. This paper proposes a methodology that leverages extensive app-based observational food diary data to measure an individual's calorie compensation profile in free-living conditions. Instead of a single compensation index followed in preload-test studies, we present the compensation profile as a distribution of days a user exhibits under-compensation, overcompensation, non-compensation, and precise compensation. We applied our methodology to the public food diary data of 1622 MyFitnessPal users. We empirically established that four weeks of food diaries were sufficient to characterize a user's compensation profile accurately. We observed that meal compensation was more likely than day compensation. Dinner compensation had a higher likelihood than lunch compensation. Precise compensation was the least likely. Users were more likely to overcompensate for missing calories than for additional calories. The consequences of poor compensatory behavior were reflected in their adherence to their daily calorie goal. Our methodology could be applied to food diaries to discover behavioral phenotypes of poor compensatory behavior toward forming an early behavioral marker for weight gain.
... The original raw dataset [16] is publicly available at [37]. The custom code developed for this study is available from the corresponding author on reasonable request. ...
Humans are creatures of habit, and hence one would expect habitual components in our diet. However, there is scant research characterizing habitual behavior in food consumption quantitatively. Longitudinal food diaries contributed by app users are a promising resource to study habitual behavior in food selection. We developed computational measures that leverage recurrence in food choices to describe the habitual component. The relative frequency and span of individual food choices are computed and used to identify recurrent choices. We proposed metrics to quantify the recurrence at both food-item and meal levels. We obtained the following insights by employing our measures on a public dataset of food diaries from MyFitnessPal users. Food-item recurrence is higher than meal recurrence. While food-item recurrence increases with the average number of food-items chosen per meal, meal recurrence decreases. Recurrence is the strongest at breakfast, weakest at dinner, and higher on weekdays than on weekends. Individuals with relatively high recurrence on weekdays also have relatively high recurrence on weekends. Our quantitatively observed trends are intuitive and aligned with common notions surrounding habitual food consumption. As a potential impact of the research, profiling habitual behaviors using the proposed recurrent consumption measures may reveal unique opportunities for accessible and sustainable dietary interventions.
... We used 3 datasets for our evaluation: MyFitnessPal [28], collected Fitbit dataset and Fitbit-GAN dataset. An overview of the datasets in terms of scale is shown in Table I. ...
Privacy preservation plays a vital role in health care applications as the requirements for privacy preservation are very strict in this domain. With the rapid increase in the amount, quality and detail of health data being gathered with smart devices, new mechanisms are required that can cope with the challenges of large scale and real-time processing requirements. Federated learning (FL) is one of the conventional approaches that facilitate the training of AI models without access to the raw data. However, recent studies have shown that FL alone does not guarantee sufficient privacy. Differential privacy (DP) is a well-known approach for privacy guarantees, however, because of the noise addition, DP needs to make a trade-off between privacy and accuracy. In this work, we design and implement an end-to-end pipeline using DP and FL for the first time in the context of health data streams. We propose a clustering mechanism to leverage the similarities between users to improve the prediction accuracy as well as significantly reduce the model training time. Depending on the dataset and features, our predictions are no more than 0.025% far off the ground-truth value with respect to the range of value. Moreover, our clustering mechanism brings a significant reduction in the training time, with up to 49% reduction in prediction accuracy error in the best case, as compared to training a single model on the entire dataset. Our proposed privacy preserving mechanism at best introduces a decrease of ≈ 2% in the prediction accuracy of the trained models. Furthermore, our proposed clustering mechanism reduces the prediction error even in highly noisy settings by as much as 38% as compared to using a single federated private model.
ResearchGate has not been able to resolve any references for this publication.
Linked Research
Conference Paper
January 2016