November 2024
·
15 Reads
Prior analyses and assessments of the impact of scientific research has mainly relied on analyzing its scope within academia and its influence within scholarly circles. However, by not considering the broader societal, economic, and policy implications of research projects, these studies overlook the ways in which scientific discoveries contribute to technological innovation, public health improvements, environmental sustainability, and other areas of real-world application. We expand upon this prior work by developing and validating a conceptual and computational solution to automatically identify and categorize the impact of scientific research within and especially beyond academia based on text data. We first empirically develop and evaluate an annotation schema to capture and classify the impact of research projects based on research reports from different scientific domains. We then annotate a large dataset of more than 45k sentences extracted from research reports for the developed impact categories. We examine the annotated dataset for patterns in the distribution of impact categories across different scientific domains, co-occurrences of impact categories, and signal words of impact. Using the annotated texts and the novel classification schema, we investigate the performance of large language models (LLMs) for automated impact classification. Our results show that fine-tuning the models on our annotated datasets statistically significantly outperforms zero- and fewshot prompting approaches. This indicates that state-of-the-art LLMs without fine-tuning may not work well for novel classification schemas such as our impact classification schema, and in turn highlights the importance of diligent manual annotations as empirical basis in the field of computational social science.