Jana Diesner’s research while affiliated with Technical University of Munich and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (140)


Examining homophily for each diversity index by comparing the observed data with the randomized data
Each column corresponds to a diversity index, dx:x ∈ (eth, gen, age, Log. exp), and each row compares the observed to the randomized data based on a specific experiment. A: Cumulative distribution of dx. B: Count values of dx across the span of possible values. C: Change in mean diversity index value over time. D: Change in mean diversity index value over author count.
Regression fit between each diversity index and the article’s scientific impact, including a dummy variable
Each curve is overlaid on a histogram of the diversity distribution, with an asterisk (*) marking the mean for each bin. The y-axis indicates the logarithmic value of RCR as used in the final model.
Change in coefficient of diversity indices at each model-fitting step
Each subplot shows the contribution of the specific diversity index (linear, quadratic, and author interaction) at each iterative model fitting process, post-evaluating the best combination of variables. In all, confounding factors minimize the effect size of the diversity index and the diversity-author interaction.
Regression fit between expertise diversity attributes and scientific impact
(a) Second-order regression fit of standardized expertise diversity attributes—variety, balance, and disparity against scientific impact (RCR). (b) Distribution of articles over variety, balance, and disparity, highlighting an initial decline then increase in impact with variety, while balance and disparity mostly reduce impact.
Descriptive statistics for all diversity indices (Normalized)

+4

Patterns of diversity in biomedical coauthorships: An analysis across authors’ ethnicity, gender, age, and expertise
  • Article
  • Full-text available

January 2025

·

10 Reads

Apratim Mishra

·

Haejin Lee

·

Sullam Jeoung

·

[...]

·

Jana Diesner

Multiple studies have linked diversity in scientific collaborations to innovative and impactful research. Here, we explore how different diversity indices—ethnicity, gender, academic age, and topical expertise—interact and thereby influence scientific impact. Leveraging nearly 900,000 biomedical journal articles from PubMed, published in major journals between 1991 and 2014, we investigate the nuanced relationships among these diversity indices and their collective influence on research outcomes. By systematically varying model parametrizations, we assess the robustness of the observed relationships and examine multiple methodological choices. Our findings reveal a consistent pattern of demographic homophily, where scientists tend to collaborate with others who share similar ethnic and gender backgrounds. While each diversity index correlates significantly with impact when considered individually, gender diversity and topical expertise emerge as the strongest positive predictors of impact after accounting for key covariates. However, the association between diversity and impact is moderated by the number of collaborating authors, with larger teams sometimes showing opposite trends due to interactions between the computed diversity indices and team size. Despite this complexity, the practical drivers of scientific impact for an article remain the journal of publication, authors’ prior citation rate, and the number of co-authors. On further examining expertise diversity through three separate dimensions: variety, balance, and disparity, our findings indicate that impactful teams balance a wide range of subject matter expertise while maintaining a focused connection on closely related topics. These findings highlight the importance of strategic team composition and underline the significance of team diversity in scientific research.

Download

Figure 4: Author Name Disambiguation Methods Distribution Barchart
Figure 5: Distribution of Gender Identification Methods
Author Name Disambiguation Methods Paper Categorization
Revisiting gender bias research in bibliometrics: Standardizing methodological variability using Scholarly Data Analysis (SoDA) Cards

January 2025

·

17 Reads

Gender biases in scholarly metrics remain a persistent concern, despite numerous bibliometric studies exploring their presence and absence across productivity, impact, acknowledgment, and self-citations. However, methodological inconsistencies, particularly in author name disambiguation and gender identification, limit the reliability and comparability of these studies, potentially perpetuating misperceptions and hindering effective interventions. A review of 70 relevant publications over the past 12 years reveals a wide range of approaches, from name-based and manual searches to more algorithmic and gold-standard methods, with no clear consensus on best practices. This variability, compounded by challenges such as accurately disambiguating Asian names and managing unassigned gender labels, underscores the urgent need for standardized and robust methodologies. To address this critical gap, we propose the development and implementation of ``Scholarly Data Analysis (SoDA) Cards." These cards will provide a structured framework for documenting and reporting key methodological choices in scholarly data analysis, including author name disambiguation and gender identification procedures. By promoting transparency and reproducibility, SoDA Cards will facilitate more accurate comparisons and aggregations of research findings, ultimately supporting evidence-informed policymaking and enabling the longitudinal tracking of analytical approaches in the study of gender and other social biases in academia.



Figure 1: An example of our proposed approach. We designate responses to the Empirical Question from self-identified Democrats and Republicans as Empirical while using Beliefs to signify the perspectives generated by LM agents on Beliefs Questions for Democrats or Republicans.
Representative Heuristic Result. R corresponds to Republicans, ϵ X + from Eq 5, and D corresponds to Democrats ϵ X − (Eq 6) Colors indicate the intensity of the values, namely, ϵ > 3 , ϵ > 1 and ϵ < 0 . The values are averaged ϵ with standard deviation in the parenthesis. MFQ, the FEEDBACK method showed the largest mitigation. These findings suggest that the prompt styles introduced in Section 4 can either improve or reduce κ, indicating their potential in mitigating stereotypes. Detailed results are available in Table 15.
Human Evaluation Result. The averaged scores and the standard deviations in parentheses.
Misinformation Detection Data Description
Examining Alignment of Large Language Models through Representative Heuristics: The Case of Political Stereotypes

January 2025

·

17 Reads

Examining the alignment of large language models (LLMs) has become increasingly important, particularly when these systems fail to operate as intended. This study explores the challenge of aligning LLMs with human intentions and values, with specific focus on their political inclinations. Previous research has highlighted LLMs' propensity to display political leanings, and their ability to mimic certain political parties' stances on various issues. However, the extent and conditions under which LLMs deviate from empirical positions have not been thoroughly examined. To address this gap, our study systematically investigates the factors contributing to LLMs' deviations from empirical positions on political issues, aiming to quantify these deviations and identify the conditions that cause them. Drawing on cognitive science findings related to representativeness heuristics -- where individuals readily recall the representative attribute of a target group in a way that leads to exaggerated beliefs -- we scrutinize LLM responses through this heuristics lens. We conduct experiments to determine how LLMs exhibit stereotypes by inflating judgments in favor of specific political parties. Our results indicate that while LLMs can mimic certain political parties' positions, they often exaggerate these positions more than human respondents do. Notably, LLMs tend to overemphasize representativeness to a greater extent than humans. This study highlights the susceptibility of LLMs to representativeness heuristics, suggeseting potential vulnerabilities to political stereotypes. We propose prompt-based mitigation strategies that demonstrate effectiveness in reducing the influence of representativeness in LLM responses.


Fig. 1 Ratio of impact indicating sentences in our dataset, numbers in %.
Fig. 3 Distribution of impact categories among domains, in %.
Fig. 5 Most frequent words per impact category (absolute numbers).
Comparison of the performance of the classification models on the main categories across different methods, domains, and intensities (without impact-irrelevant sentences)
Accuracy and F1 of few-shot prompting in ChatGPT across different domains and intensities. The scores of high-intensity impact in the domain of mobility is excluded due to an insufficient number of instances for comparison.
Impact Classification within and beyond Academia: Domain-Robust Annotation and the Capacity of Large Language Models

November 2024

·

25 Reads

Prior analyses and assessments of the impact of scientific research has mainly relied on analyzing its scope within academia and its influence within scholarly circles. However, by not considering the broader societal, economic, and policy implications of research projects, these studies overlook the ways in which scientific discoveries contribute to technological innovation, public health improvements, environmental sustainability, and other areas of real-world application. We expand upon this prior work by developing and validating a conceptual and computational solution to automatically identify and categorize the impact of scientific research within and especially beyond academia based on text data. We first empirically develop and evaluate an annotation schema to capture and classify the impact of research projects based on research reports from different scientific domains. We then annotate a large dataset of more than 45k sentences extracted from research reports for the developed impact categories. We examine the annotated dataset for patterns in the distribution of impact categories across different scientific domains, co-occurrences of impact categories, and signal words of impact. Using the annotated texts and the novel classification schema, we investigate the performance of large language models (LLMs) for automated impact classification. Our results show that fine-tuning the models on our annotated datasets statistically significantly outperforms zero- and fewshot prompting approaches. This indicates that state-of-the-art LLMs without fine-tuning may not work well for novel classification schemas such as our impact classification schema, and in turn highlights the importance of diligent manual annotations as empirical basis in the field of computational social science.


Recommendations for sharing network data and materials

October 2024

·

42 Reads

·

2 Citations

Network Science

One of the goals of open science is to promote the transparency and accessibility of research. Sharing data and materials used in network research is critical to these goals. In this paper, we present recommendations for whether, what, when, and where network data and materials should be shared. We recommend that network data and materials should be shared, but access to or use of shared data and materials may be restricted if necessary to avoid harm or comply with regulations. Researchers should share the network data and materials necessary to reproduce reported results via a publicly accessible repository when an associated manuscript is published. To ensure the adoption of these recommendations, network journals should require sharing, and network associations and academic institutions should reward sharing.


Figure 2: Performance comparison of few-shot methods over three datasets in Table 1. We report the mean accuracy of each setting. Our method shows high stability in the accuracy distribution compared to the considered baseline models.
Figure 4: Various numbers of label terms across four datasets under three phrases.
SciPrompt: Knowledge-augmented Prompting for Fine-grained Categorization of Scientific Topics

October 2024

·

41 Reads

Prompt-based fine-tuning has become an essential method for eliciting information encoded in pre-trained language models for a variety of tasks, including text classification. For multi-class classification tasks, prompt-based fine-tuning under low-resource scenarios has resulted in performance levels comparable to those of fully fine-tuning methods. Previous studies have used crafted prompt templates and verbalizers, mapping from the label terms space to the class space, to solve the classification problem as a masked language modeling task. However, cross-domain and fine-grained prompt-based fine-tuning with an automatically enriched verbalizer remains unexplored, mainly due to the difficulty and costs of manually selecting domain label terms for the verbalizer, which requires humans with domain expertise. To address this challenge, we introduce SciPrompt, a framework designed to automatically retrieve scientific topic-related terms for low-resource text classification tasks. To this end, we select semantically correlated and domain-specific label terms within the context of scientific literature for verbalizer augmentation. Furthermore, we propose a new verbalization strategy that uses correlation scores as additional weights to enhance the prediction performance of the language model during model tuning. Our method outperforms state-of-the-art, prompt-based fine-tuning methods on scientific text classification tasks under few and zero-shot settings, especially in classifying fine-grained and emerging scientific topics.


LERCause: Deep learning approaches for causal sentence identification from nuclear safety reports

August 2024

·

60 Reads

·

1 Citation

Identifying causal sentences from nuclear incident reports is essential for advancing nuclear safety research and applications. Nonetheless, accurately locating and labeling causal sentences in text data is challenging, and might benefit from the usage of automated techniques. In this paper, we introduce LERCause, a labeled dataset combined with labeling methods meant to serve as a foundation for the classification of causal sentences in the domain of nuclear safety. We used three BERT models (BERT, BioBERT, and SciBERT) to 10,608 annotated sentences from the Licensee Event Report (LER) corpus for predicting sentence labels (Causal vs. non-Causal). We also used a keyword-based heuristic strategy, three standard machine learning methods (Logistic Regression, Gradient Boosting, and Support Vector Machine), and a deep learning approach (Convolutional Neural Network; CNN) for comparison. We found that the BERT-centric models outperformed all other tested models in terms of all evaluation metrics (accuracy, precision, recall, and F1 score). BioBERT resulted in the highest overall F1 score of 94.49% from the ten-fold cross-validation. Our dataset and coding framework can provide a robust baseline for assessing and comparing new causal sentences extraction techniques. As far as we know, our research breaks new ground by leveraging BERT-centric models for causal sentence classification in the nuclear safety domain and by openly distributing labeled data and code to enable reproducibility in subsequent research.


Triad census (include only triads which are transitive and balanced). Example of MAN census label - 030T: Mutual=0, Asymmetric=3, Null=0, Letter Label: T=Transitive, D=Down, U=Up
Balanced and imbalanced triples
Case study: College House C network as a directed network (left) and undirected network (right)
Examining differences in network analysis outcomes with and without considering directionality in real-world data
Case study: College House B network as a directed network (left) and undirected network (right). Edge Legends: all indicates the whole network, and V2 egonet indicates the egonet of node V2
Structural balance in real-world social networks: incorporating direction and transitivity in measuring partial balance

August 2024

·

73 Reads

·

1 Citation

Social Network Analysis and Mining

Structural balance theory predicts that triads in networks gravitate towards stable configurations. This theory has been verified for undirected graphs. Since real-world networks are often directed, we introduce a novel method for considering both transitivity and sign consistency for evaluating partial balance in signed digraphs. We test our approach on graphs constructed by using different methods for identifying edge signs: natural language processing to infer signs from underlying text data, and self-reported survey data. Our results show that for various social contexts and edge sign detection methods, partial balance of these digraphs is moderately high, ranging from 61 to 96%. Our approach not only enhances the theoretical framework of structural balance but also provides practical insights into the stability of social networks, enabling a deeper understanding of interpersonal and group dynamics across different communication platforms.


From plan to practice: Interorganizational crisis response networks from governmental guidelines and real‐world collaborations during hurricane events

July 2024

·

21 Reads

·

4 Citations

Journal of Contingencies and Crisis Management

Crisis response involves extensive planning and coordination within and across a multitude of agencies and organisations. This study explores how on‐the‐ground crisis response efforts align with crisis response guidelines. These guidelines are key to the effectiveness of crisis response. To this end, we construct, analyse and compare emergency response networks by using network analysis and natural language processing methods. Differences between plans and practice, that is, false positives (actions delivered but not prescribed) and false negatives (actions prescribed but not delivered), can impact response evaluation and policy revisions. We investigate collaboration networks at the federal, state and local level extracted from official documents (prescribed networks) and empirical data (observed networks) in the form of situational reports (n = 109) and tweets (n = 28,050) from responses to major hurricanes that made landfall in the United States. Our analyses reveal meaningful differences between prescribed and observed collaboration networks (mean node overlap ~9.94%, edge overlap ~3.94%). The observed networks most closely resemble federal‐level networks in terms of node and edge overlap, highlighting the prioritisation of federal response guidelines. We also observed a high ratio of false positives, that is, nongovernmental, nonprofit and volunteer organizations, that play a critical role in crisis response and are not mentioned in response plans. These findings enable us to evaluate the current best practices for response and inform emergency response policy planning.


Citations (60)


... We considered the occurrence of gender bias and hallucinations as follows. First, we considered an LLM output to be biased toward gender if it contains female or male words [91] instead of avoiding them, as instructed by the prompts (see Figures 2 and 3). We considered the following female and male words to quantify gender bias for the LLMs in English ('female', 'male', 'Ms', 'Mrs', 'Mr', 'woman', 'man') and German ('weiblich', 'männlich', 'Frau', 'Herr', 'Frau', 'Mann'). ...

Reference:

Large Language Models for Electronic Health Record De-Identification in English and German
Beyond Binary Gender Labels: Revealing Gender Bias in LLMs through Gender-Neutral Name Predictions

... For instance, researchers suggest that organizational factors, such as crisis plans and teams, culture, leadership, identification, learning, and the dyadic positive relationships with stakeholders are crucial for effective crisis management (see Kovoor-Misra, 2020;Pearson & Clair, 1998). In addition, the field of public administration has studied interorganizational relationships primarily between various governmental agencies during crises, such as wildfires, hurricanes, and bombings (e.g., Curtis, 2018;Dinh et al., 2024;Moynihan, 2009). They focus on the role of crisis response networks that typically are centralized with an incident command leader who has largely dyadic relationships with network members. ...

From plan to practice: Interorganizational crisis response networks from governmental guidelines and real‐world collaborations during hurricane events

Journal of Contingencies and Crisis Management

... We use the explanations described in the AWARENESS REASONING Introducing the suffix "Please give reasons for your answer" prompts the model to provide a rationale for its response. This choice is inspired by observed variations in model responses when engaging in a reasoning process, as documented in prior studies (Wei et al., 2022;Jeoung et al., 2023). Graham et al. (2013) measures the perceived moral foundations a respondent may possess on five moral foundations (e.g. ...

StereoMap: Quantifying the Awareness of Human-like Stereotypes in Large Language Models
  • Citing Conference Paper
  • January 2023

... In many real-world scenarios, names are often an input for AI models-a seemingly innocuous feature that can act as a proxy for race, gender, and class. However, AI systems have been found to exhibit name biases [1,18,20,34,35,39,40,43], which exacerbate inequities, widen opportunity gaps, deepen racial segregation, and perpetuate inequality and discrimination. While a number of studies have examined first-name bias, comparatively little attention has been paid to bias based on last names, and even less to the combined effect of first and last names, despite their profound impact on perceptions and judgments. ...

Examining the Causal Impact of First Names on Language Models: The Case of Social Commonsense Reasoning
  • Citing Conference Paper
  • January 2023

... The bibliodata can also be valuable humanities data in this context. In addition, numerous digital humanities work that apply computational methods to study cultural phenomena and patterns demands more open, machine-readable, and reusable data in volumes (e.g., Underwood, 2019;Hu et al., 2023;Walsh & Antoniak, 2021). As illustrated in these studies, book reviews data and social media data, for example, also make emerging, valuable humanities and cultural datasets. ...

Complexities of leveraging user-generated book reviews for scholarly research: transiency, power dynamics, and cultural dependency

International Journal on Digital Libraries

... A more recent paper casts further doubt on the plausibility of the Jung et al. (2014) argument. Dinh et al. (2023) examined the implications of referring to hurricanes by their gendered names in news coverage, and found that the choice of adjectives and verbs in the description of female-named hurricanes was more negative than for male ones. ...

Are we projecting gender biases to ungendered things? Differences in referring to female versus male named hurricanes in 33 years of news coverage
  • Citing Article
  • January 2023

Computational Communication Research

... For example, are all reviewers merely book readers (Landow, 2006)? To what extent are reviews incentivized (Hu et al., 2023), copied (David and Pinch, 2006), or generated by AI? Why do some reviewers rate their favourite books with 1 star and less favoured ones with 5 stars (citation anonymized for blind review)? ...

Research with User-Generated Book Review Data: Legal and Ethical Pitfalls and Contextualized Mitigations
  • Citing Chapter
  • March 2023

Lecture Notes in Computer Science

... Other research has focused on the theoretical and technical aspects of sentiment analysis, demonstrated in the study by Dinh et al. (2022). They propose improvements to structural equilibrium theory for social network analysis, allowing for a more accurate interpretation of emotional dynamics within groups. ...

Enhancing structural balance theory and measurement to analyze signed digraphs of real-world social networks

Frontiers in Human Dynamics

... Information plays a crucial role in crisis response, serving as the backbone of effective decision-making and coordination between responding agencies and the public. With the advent of social media and microblogging platforms, there is increasingly abundant information available to crisis responders, affected communities, and the general public that allows each of these groups to gain actionable knowledge about a disaster, known as situational awareness (Sarol et al., 2021;Bruns & Burgess, 2014;Imran et al., 2018). Extant literature in crisis informatics has found that Twitter (now X) is the most frequently used platform by citizens, volunteers, and emergency responders, where situational awareness information such as requests for help and resources (Sarol et al., 2020), updates on locations of shelters (Imran et al., 2013), and coordinating donation efforts (Olteanu et al., 2015) were actively shared. ...

Variation in Situational Awareness Information due to Selection of Data Source, Summarization Method, and Method Implementation
  • Citing Article
  • May 2021

Proceedings of the International AAAI Conference on Web and Social Media

... A different perspective has been considered by Diesner et al. [32]; herein, the authors consider social networks constructed from records of social interactions. Potential ambiguities of social entities may greatly affect the network construction process: for instance, nodes associated with the same string could be wrongly merged despite they are associated with distinct individuals. ...

Impact of Entity Disambiguation Errors on Social Network Properties
  • Citing Article
  • August 2021

Proceedings of the International AAAI Conference on Web and Social Media