Yong-Yeol Ahn

Yong-Yeol Ahn
Indiana University Bloomington | IUB · Center for Complex Networks and Systems Research

About

80
Publications
59,309
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
8,178
Citations
Citations since 2017
49 Research Items
4166 Citations
20172018201920202021202220230100200300400500600
20172018201920202021202220230100200300400500600
20172018201920202021202220230100200300400500600
20172018201920202021202220230100200300400500600

Publications

Publications (80)
Preprint
Full-text available
Recent advances in machine learning offer new ways to represent and study scholarly works and the space of knowledge. Graph and text embeddings provide a convenient vector representation of scholarly works based on citations and text. Yet, it is unclear whether their representations are consistent or provide different views of the structure of scie...
Preprint
Full-text available
Research and development investments are key to scientific and economic development and to the well-being of society. Because scientific research demands significant resources, national scientific investment is a crucial driver of scientific production. As scientific production becomes increasingly multinational, it is critically important to study...
Preprint
Full-text available
Narrative is a foundation of human cognition and decision making. Because narratives play a crucial role in societal discourses and spread of misinformation and because of the pervasive use of social media, the narrative dynamics on social media can have profound societal impact. Yet, systematic and computational understanding of online narratives...
Preprint
Full-text available
Recent advances in machine learning research have produced powerful neural graph embedding methods, which learn useful, low-dimensional vector representations of network data. These neural methods for graph embedding excel in graph machine learning tasks and are now widely adopted. However, how and why these methods work -- particularly how network...
Preprint
ChatGPT, the first large language model (LLM) with mass adoption, has demonstrated remarkable performance in numerous natural language tasks. Despite its evident usefulness, evaluating ChatGPT's performance in diverse problem domains remains challenging due to the closed nature of the model and its continuous updates via Reinforcement Learning from...
Preprint
Social contagion is a ubiquitous and fundamental process that drives social changes. Although social contagion arises as a result of cognitive processes and biases, the integration of cognitive mechanisms with the theory of social contagion remains as an open challenge. In particular, studies on social phenomena usually assume contagion dynamics to...
Article
Full-text available
Visiting multiple prescribers is a common method for obtaining prescription opioids for nonmedical use and has played an important role in fueling the United States opioid epidemic, leading to increased drug use disorder and overdose. Recent studies show that centrality of the bipartite network formed by prescription ties between patients and presc...
Article
Full-text available
Science is essential to innovation and economic prosperity. Although studies have shown that national scientific development is affected by geographic, historic and economic factors, it remains unclear whether there are universal structures and trajectories of national scientific development that can inform forecasting and policy-making. Here, by e...
Article
On social media, due to complex interactions between users' attention and recommendation algorithms, the visibility of users' posts can be unpredictable and vary wildly, sometimes creating unexpected viral events for `ordinary’ users. How do such events affect users' subsequent behaviors and long-term visibility on the platform? We investigate thes...
Article
What science does, what science could do, and how to make science work? If we want to know the answers to these questions, we need to be able to uncover the mechanisms of science, going beyond metrics that are easily collectible and quantifiable. In this perspective piece, we link metrics to mechanisms by demonstrating how emerging metrics of scien...
Article
Full-text available
Importance During the pandemic, access to medical care unrelated to COVID-19 was limited because of concerns about viral spread and corresponding policies. It is critical to assess how these conditions affected modes of pain treatment, given the addiction risks of prescription opioids. Objective To assess the trends in opioid prescription and nonp...
Article
Full-text available
The COVID-19 pandemic is a global crisis that has been testing every society and exposing the critical role of local politics in crisis response. In the United States, there has been a strong partisan divide between the Democratic and Republican party’s narratives about the pandemic which resulted in polarization of individual behaviors and diverge...
Preprint
Full-text available
Graph embedding maps a graph into a convenient vector-space representation for graph analysis and machine learning applications. Many graph embedding methods hinge on a sampling of context nodes based on random walks. However, random walks can be a biased sampler due to the structural properties of graphs. Most notably, random walks are biased by t...
Preprint
We investigate predictors of anti-Asian hate among Twitter users throughout COVID-19. With the rise of xenophobia and polarization that has accompanied widespread social media usage in many nations, online hate has become a major social issue, attracting many researchers. Here, we apply natural language processing techniques to characterize social...
Article
Full-text available
This quality improvement study assesses the comorbidities associated with COVID-19 diagnostic codes in US health insurance claims.
Article
Every second, the thoughts and feelings of millions of people across the world are recorded in the form of 140-character tweets using Twitter. However, despite the enormous potential presented by this remarkable data source, we still do not have an understanding of the Twitter population itself: Who are the Twitter users? How representative of the...
Article
Full-text available
Framing is a process of emphasizing a certain aspect of an issue over the others, nudging readers or listeners towards different positions on the issue even without making a biased argument. Here, we propose FrameAxis, a method for characterizing documents by identifying the most relevant semantic axes ("microframes") that are overrepresented in th...
Article
Background and aims: Prescription drug seeking (PDS) from multiple prescribers is a primary means of obtaining prescription opioids; however, PDS behavior has likely evolved in response to policy shifts, and there is little agreement about how to operationalize it. We systematically compared the performance of traditional and novel PDS indicators....
Article
Full-text available
Network embedding is a general-purpose machine learning technique that encodes network structure in vector spaces with tunable dimension. Choosing an appropriate embedding dimension – small enough to be efficient and large enough to be effective – is challenging but necessary to generate embeddings applicable to a multitude of tasks. Existing strat...
Article
Full-text available
Effective control of an epidemic relies on the rapid discovery and isolation of infected individuals. Because many infectious diseases spread through interaction, contact tracing is widely used to facilitate case discovery and control. However, what determines the efficacy of contact tracing has not been fully understood. Here we reveal that, compa...
Article
Recent advancements in data science technologies have allowed researchers to utilize large-scale records of human mobility to study various topics from city growth models to tracing outbreaks and analyzing the labor market. In this paper, after introducing recent studies on human mobility using transportation data, we briefly review the existing st...
Preprint
Full-text available
Science is considered essential to innovation and economic prosperity. Understanding how nations build scientific capacity is therefore crucial to promote economic growth and national development. Although studies have shown that national scientific development is affected by geographic, historic, and economic factors, it remains unclear whether th...
Article
Full-text available
Graph embedding techniques, which learn low-dimensional representations of a graph, are achieving state-of-the-art performance in many graph mining tasks. Most existing embedding algorithms assign a single vector to each node, implicitly assuming that a single representation is enough to capture all characteristics of the node. However, across many...
Preprint
Full-text available
The COVID-19 pandemic is a global crisis that has been testing every society and exposing the critical role of local politics in crisis response. In the United States, there has been a strong partisan divide which resulted in polarization of individual behaviors and divergent policy adoption across regions. Here, to better understand such divide, w...
Article
Full-text available
To what extent can we predict the structure of online conversation trees? We present a generative model to predict the size and evolution of threaded conversations on social media by combining machine learning algorithms. The model is evaluated using datasets that span two topical domains (cryptocurrency and cyber-security) and two platforms (Reddi...
Article
Full-text available
Importance In response to the increase in opioid overdose deaths in the United States, many states recently have implemented supply-controlling and harm-reduction policy measures. To date, an updated policy evaluation that considers the full policy landscape has not been conducted. Objective To evaluate 6 US state-level drug policies to ascertain...
Preprint
Framing is an indispensable narrative device for news media because even the same facts may lead to conflicting understandings if deliberate framing is employed. Therefore, identifying media framing is a crucial step to understanding how news media influence the public. Framing is, however, difficult to operationalize and detect, and thus tradition...
Preprint
We propose FrameAxis, a method of characterizing the framing of a given text by identifying the most relevant semantic axes ("microframes") defined by antonym word pairs. In contrast to the traditional framing analysis, which has been constrained by a small number of manually annotated general frames, our unsupervised approach provides much more de...
Preprint
We propose a method for extracting hierarchical backbones from a bipartite network. Our method leverages the observation that a hierarchical relationship between two nodes in a bipartite network is often manifested as an asymmetry in the conditional probability of observing the connections to them from the other node set. Our method estimates both...
Article
Full-text available
This paper examines network prominence in a co-prescription network as an indicator of opioid doctor shopping (i.e., fraudulent solicitation of opioids from multiple prescribers). Using longitudinal data from a large commercially insured population, we construct a network where a tie between patients is weighted by the number of shared opioid presc...
Preprint
Simulating and predicting planetary-scale techno-social systems poses heavy computational and modeling challenges. The DARPA SocialSim program set the challenge to model the evolution of GitHub, a large collaborative software-development ecosystem, using massive multi-agent simulations. We describe our best performing models and our agent-based sim...
Article
Full-text available
Groups of firms often achieve a competitive advantage through the formation of geo-industrial clusters. Although many exemplary clusters are the subjects of case studies, systematic approaches to identify and analyze the hierarchical structure of geo-industrial clusters at the global scale are scarce. In this work, we use LinkedIn's employment hist...
Chapter
Simulating and predicting planetary-scale techno-social systems poses heavy computational and modeling challenges. The DARPA SocialSim program set the challenge to model the evolution of GitHub, a large collaborative software-development ecosystem, using massive multi-agent simulations. We describe our best performing models and our agent-based sim...
Article
Full-text available
Clustering is one of the most universal approaches for understanding complex data. A pivotal aspect of clustering analysis is quantitatively comparing clusterings; clustering comparison is the basis for many tasks such as clustering evaluation, consensus clustering, and tracking the temporal evolution of clusters. In particular, the extrinsic evalu...
Preprint
The nature of what people enjoy is not just a central question for the creative industry, it is a driving force of cultural evolution. It is widely believed that successful cultural products balance novelty and conventionality: they provide something familiar but at least somewhat divergent from what has come before, and occupy a satisfying middle...
Article
Full-text available
The neural network is a powerful computing framework that has been exploited by biological evolution and by humans for solving diverse problems. Although the computational capabilities of neural networks are determined by their structure, the current understanding of the relationships between a neural network’s architecture and function is still pr...
Preprint
Groups of firms often achieve a competitive advantage through the formation of geo-industrial clusters. Although many exemplary clusters, such as Hollywood or Silicon Valley, have been frequently studied, systematic approaches to identify and analyze the hierarchical structure of the geo-industrial clusters at the global scale are rare. In this wor...
Chapter
Full-text available
Social media and social networking platforms have flourished with the rapid development of mobile technology and the ubiquitous use of the Internet. As a result, memes, or pieces of information spreading from person to person, can be reshared among users quickly and gain huge popularity. As viral memes have tremendous social and economic impact, de...
Chapter
In this final chapter, we consider the state-of-the-art for spreading in social systems and discuss the future of the field. As part of this reflection, we identify a set of key challenges ahead. The challenges include the following questions: how can we improve the quality, quantity, extent, and accessibility of datasets? How can we extract more i...
Chapter
In this chapter, we apply the theoretical framework introduced in the previous chapter to study how the modular structure of the social network affects the spreading of complex contagion. In particular, we focus on the notion of optimal modularity, that predicts the occurrence of global cascades when the network exhibits just the right amount of mo...
Preprint
Because word semantics can substantially change across communities and contexts, capturing domain-specific word semantics is an important challenge. Here, we propose SEMAXIS, a simple yet powerful framework to characterize word semantics using many semantic axes in word- vector spaces beyond sentiment. We demonstrate that SEMAXIS can capture nuance...
Preprint
Full-text available
Clustering is a central approach for unsupervised learning. After clustering is applied, the most fundamental analysis is to quantitatively compare clusterings. Such comparisons are crucial for the evaluation of clustering methods as well as other tasks such as consensus clustering. It is often argued that, in order to establish a baseline, cluster...
Article
Full-text available
We investigate the predictability of successful memes using their early spreading patterns in the underlying social networks. We propose and analyze a comprehensive set of features and develop an accurate model to predict future popularity of a meme given its early spreading patterns. Our paper provides the first comprehensive comparison of existin...
Article
Full-text available
Food occupies a central position in every culture and it is therefore of great interest to understand the evolution of food culture. The advent of the World Wide Web and online recipe repositories have begun to provide unprecedented opportunities for data-driven, quantitative study of food culture. Here we harness an online database documenting rec...
Article
Full-text available
How does network structure affect diffusion? Recent studies suggest that the answer depends on the type of contagion. Complex contagions, unlike infectious diseases (simple contagions), are affected by social reinforcement and homophily. Hence, the spread within highly clustered communities is enhanced, while diffusion across communities is hampere...
Article
Online social networks exhibit small-world network characteristics, implying that information can spread in the network quickly and widely. This ability to spread information rapidly has led to high expectations for word-of-mouth and viral campaigns in online social networks. However, a recent study of the Flickr social network has shown that popul...
Article
Full-text available
The cultural diversity of culinary practice, as illustrated by the variety of regional cuisines, raises the question of whether there are any general patterns that determine the ingredient combinations used in food today or principles that transcend individual tastes and recipes. We introduce a flavor network that captures the flavor compounds shar...
Article
Full-text available
Plants have unique features that evolved in response to their environments and ecosystems. A full account of the complex cellular networks that underlie plant-specific functions is still missing. We describe a proteome-wide binary protein-protein interaction map for the interactome network of the plant Arabidopsis thaliana containing about 6200 hig...
Article
Many systems, from power grids and the internet, to the brain and society, can be modeled using networks of coupled overlapping modules. The elements of these networks perform individual and collective tasks such as generating and consuming electrical load or transmitting data. We study the robustness of these systems using percolation theory: a ra...
Article
Many complex systems, from power grids and the internet, to the brain and society, can be modeled using modular networks. Modules, densely interconnected groups of elements, often overlap due to elements that belong to multiple modules. The elements and modules of these networks perform individual and collective tasks such as generating and consumi...
Conference Paper
Full-text available
Every second, the thoughts and feelings of millions of people across the world are recorded in the form of 140-character tweets using Twitter. However, despite the enormous potential presented by this remarkable data source, we still do not have an understanding of the Twitter population itself: Who are the Twitter users? How representative of the...
Article
Networks have become a key approach to understanding systems of interacting objects, unifying the study of diverse phenomena including biological organisms and human society. One crucial step when studying the structure and dynamics of networks is to identify communities: groups of related nodes that correspond to functional subunits such as protei...
Article
Full-text available
Social network analysis has long been an untiring topic of sociology. However, until the era of information technology, the availability of data, mainly collected by the traditional method of personal survey, was highly limited and prevented large-scale analysis. Recently, the exploding amount of automatically generated data has completely changed...
Article
Full-text available
User generated content (UGC), now with millions of video producers and consumers, is reshaping the way people watch video and TV. In particular, UGC sites are creating new viewing patterns and social interactions, empowering users to be more creative, and generating new business opportunities. Compared to traditional video-on-demand (VoD) systems,...
Article
Identifying modular network structure is generally a problem of finding the correct community membership of each node in a network. An alternative approach, clustering links, naturally accounts for real world characteristics such as strong community overlap, multi-partite structure, and hierarchical organization. By introducing a pair-wise link sim...
Article
Modular and hierarchical organization are two of the most important organizing principles observed in many complex networks. It has often been assumed that detecting a hierarchy also implies finding modular struc-ture. However, highly overlapping community structure, present in many real networks including social and biological networks, interferes...
Conference Paper
Full-text available
Online social networking services are among the most popular Internet services according to Alexa.com and have become a key feature in many Internet services. Users interact through various features of online social networking services: making friend relationships, sharing their photos, and writing comments. These friend relationships are expected...
Conference Paper
Full-text available
User Generated Content (UGC) is re-shaping the way people watch video and TV, with millions of video producers and consumers. In particular, UGC sites are creating new view- ing patterns and social interactions, empowering users to be more creative, and developing new business opportunities. To better understand the impact of UGC systems, we have a...
Article
Full-text available
Social networking services are a fast-growing business in the Internet. However, it is unknown if online relationships and their growth patterns are the same as in real-life social networks. In this paper, we compare the structures of three online social networking services: Cyworld, MySpace, and orkut, each with more than 10 million users, respect...
Article
Full-text available
We study the nonequilibrium phase transition in a model for epidemic spreading on scale-free networks. The model consists of two particle species A and B, and the coupling between them is taken to be asymmetric; A induces B while B suppresses A. This model describes the spreading of an epidemic on networks equipped with a reactive immune system. We...
Article
Full-text available
Today's social networking services have tens of millions of users, and are growing fast. Their sheer size poses a significant challenge in capturing and analyzing their topological characteristics. Snowball sampling is a popular method to crawl and sample network topologies, but requires a high sampling ratio for accurate estimation of certain metr...