July 2024
·
99 Reads
Researchers are increasingly using machine learning to study physiological markers of emotional experience. In the present work, we evaluated the promises and limitations of this approach via a big team science competition. Twelve teams of researchers competed to predict self-reported core affective experiences using a multi-modal set of peripheral nervous system features. Models were trained and tested in multiple ways: with data divided by participants, emotions, inductions, and time periods. In 100% of tests, teams outperformed baseline models that made random predictions. In 46% of tests, teams also outperformed baseline models that relied on the simple average of ratings from training datasets. In a follow-up, three models judged most promising by competition organizers exhibited lower prediction accuracy when re-tested on data with simulated physiological randomness. These results bolster claims that machine learning can capture physiological markers of affective experience. More notably, though, results uncovered a methodological challenge: multiplicative constraints on generalizability. Inferences about the accuracy and theoretical implications of machine learning efforts depended not only on their architecture, but also how they were trained, tested, and evaluated. For example, some teams performed better when tested on observations from different (vs. the same) subjects seen during training. Such results could be interpreted as evidence of biologically innate patterns that are sensitive to context. However, such conclusions would be premature because other teams exhibited the opposite pattern. Taken together, results illustrate how big team science can be leveraged to understand the promises and limitations of machine learning methods in affective science and beyond.