Jon Kleinberg's research while affiliated with Cornell University and other places

Publications (378)

Preprint
The advent of machine learning models that surpass human decision-making ability in complex domains has initiated a movement towards building AI systems that interact with humans. Many building blocks are essential for this activity, with a central one being the algorithmic characterization of human behavior. While much of the existing work focuses...
Preprint
An emerging theme in artificial intelligence research is the creation of models to simulate the decisions and behavior of specific people, in domains including game-playing, text generation, and artistic expression. These models go beyond earlier approaches in the way they are tailored to individuals, and the way they are designed for interaction r...
Conference Paper
Opportunities such as higher education can promote intergenerational mobility, leading individuals to achieve levels of socioeconomic status above that of their parents. In this work, which is an extended abstract of a longer paper in the proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, we develop a dynamic mode...
Preprint
Full-text available
Fairness, Accountability, and Transparency (FAccT) for socio-technical systems has been a thriving area of research in recent years. An ACM conference bearing the same name has been the central venue for scholars in this area to come together, provide peer feedback to one another, and publish their work. This reflexive study aims to shed light on F...
Conference Paper
Full-text available
We introduce a random hypergraph model for core-periphery structure. By leveraging our model's sufficient statistics, we develop a novel statistical inference algorithm that is able to scale to large hypergraphs with runtime that is practically linear wrt. the number of nodes in the graph after a preprocessing step that is almost linear in the numb...
Preprint
Full-text available
We introduce a random hypergraph model for core-periphery structure. By leveraging our model's sufficient statistics, we develop a novel statistical inference algorithm that is able to scale to large hypergraphs with runtime that is practically linear wrt. the number of nodes in the graph after a preprocessing step that is almost linear in the numb...
Preprint
Full-text available
This paper considers the Pointer Value Retrieval (PVR) benchmark introduced in [ZRKB21], where a 'reasoning' function acts on a string of digits to produce the label. More generally, the paper considers the learning of logical functions with gradient descent (GD) on neural networks. It is first shown that in order to learn logical functions with gr...
Preprint
Full-text available
We study the problem of designing dynamic intervention policies for minimizing networked defaults in financial networks. Formally, we consider a dynamic version of the celebrated Eisenberg-Noe model of financial network liabilities, and use this to study the design of external intervention policies. Our controller has a fixed resource budget in eac...
Preprint
Full-text available
The tendency for individuals to form social ties with others who are similar to themselves, known as homophily, is one of the most robust sociological principles. Since this phenomenon can lead to patterns of interactions that segregate people along different demographic dimensions, it can also lead to inequalities in access to information, resourc...
Conference Paper
Full-text available
We study the problem of financial assistance (bailouts, stimulus payments, or subsidy allocations) in a network where individuals experience income shocks. These questions are pervasive both in policy domains and in the design of new Web-enabled forms of financial interaction. We build on the financial clearing framework of Eisenberg and Noe that a...
Article
We explore the implications of two central human biases studied in behavioral economics, reference points and loss aversion, in optimal stopping problems. In such problems, people evaluate a sequence of options in one pass, either accepting the option and stopping the search or giving up on the option forever. Here we assume that the best option se...
Preprint
A fundamental task underlying many important optimization problems, from influence maximization to sensor placement to content recommendation, is to select the optimal group of $k$ items from a larger set. Submodularity has been very effective in allowing approximation algorithms for such subset selection problems. However, in several applications,...
Preprint
Online platforms have a wealth of data, run countless experiments and use industrial-scale algorithms to optimize user experience. Despite this, many users seem to regret the time they spend on these platforms. One possible explanation is that incentives are misaligned: platforms are not optimizing for user happiness. We suggest the problem runs de...
Article
Full-text available
Scientific communities confer many forms of credit on their successful members. The motivation provided by these forms of credit helps shaping a community’s collective attention toward different lines of research. The allocation of scientific credit, however, has also been the focus of long-documented pathologies: certain research questions are sai...
Preprint
Interactions involving multiple objects simultaneously are ubiquitous across many domains. The systems these interactions inhabit can be modelled using hypergraphs, a generalization of traditional graphs in which each edge can connect any number of nodes. Analyzing the global and static properties of these hypergraphs has led to a plethora of novel...
Preprint
Full-text available
In many real-world situations, data is distributed across multiple locations and can't be combined for training. Federated learning is a novel distributed learning approach that allows multiple federating agents to jointly learn a model. While this approach might reduce the error each agent experiences, it also raises questions of fairness: to what...
Preprint
Minimizing a sum of simple submodular functions of limited support is a special case of general submodular function minimization that has seen numerous applications in machine learning. We develop fast techniques for instances where components in the sum are cardinality-based, meaning they depend only on the size of the input set. This variant is o...
Preprint
The Friendship Paradox--the principle that ``your friends have more friends than you do''--is a combinatorial fact about degrees in a graph; but given that many Web-based social activities are correlated with a user's degree, this fact has been taken more broadly to suggest the empirical principle that ``your friends are also more active than you a...
Preprint
The successes of deep learning critically rely on the ability of neural networks to output meaningful predictions on unseen data -- generalization. Yet despite its criticality, there remain fundamental open questions on how neural networks generalize. How much do neural networks rely on memorization -- seeing highly similar training examples -- and...
Preprint
Full-text available
We analyze a dataset of retinal images using linear probes: linear regression models trained on some "target" task, using embeddings from a deep convolutional (CNN) model trained on some "source" task as input. We use this method across all possible pairings of 93 tasks in the UK Biobank dataset of retinal images, leading to ~164k different models....
Preprint
Contact tracing is a key tool for managing epidemic diseases like HIV, tuberculosis, and COVID-19. Manual investigations by human contact tracers remain a dominant way in which this is carried out. This process is limited by the number of contact tracers available, who are often overburdened during an outbreak or epidemic. As a result, a crucial de...
Preprint
Homophily -- the tendency of nodes to connect to others of the same type -- is a central issue in the study of networks. Here we take a local view of homophily, defining notions of first-order homophily of a node (its individual tendency to link to similar others) and second-order homophily of a node (the aggregate first-order homophily of its neig...
Article
Full-text available
Homophily—the tendency of nodes to connect to others of the same type—is a central issue in the study of networks. Here we take a local view of homophily, defining notions of first-order homophily of a node (its individual tendency to link to similar others) and second-order homophily of a node (the aggregate first-order homophily of its neighbors)...
Preprint
Full-text available
In light of increasing recent attention to political polarization, understanding how polarization can arise poses an important theoretical question. While more classical models of opinion dynamics seem poorly equipped to study this phenomenon, a recent novel approach by H\k{a}z{\l}a, Jin, Mossel, and Ramnarayan (HJMR) proposes a simple geometric mo...
Preprint
Full-text available
We present a novel model for capturing the behavior of an agent exhibiting sunk-cost bias in a stochastic environment. Agents exhibiting sunk-cost bias take into account the effort they have already spent on an endeavor when they evaluate whether to continue or abandon it. We model planning tasks in which an agent with this type of bias tries to re...
Preprint
Federated learning is a distributed learning paradigm where multiple agents, each only with access to local data, jointly learn a global model. There has recently been an explosion of research aiming not only to improve the accuracy rates of federated learning, but also provide certain guarantees around social good properties such as total error. O...
Preprint
Full-text available
We study the problem of allocating bailouts (stimulus, subsidy allocations) to people participating in a financial network subject to income shocks. We build on the financial clearing framework of Eisenberg and Noe that allows the incorporation of a bailout policy that is based on discrete bailouts motivated by the types of stimulus checks people r...
Preprint
People are often reluctant to sell a house, or shares of stock, below the price at which they originally bought it. While this is generally not consistent with rational utility maximization, it does reflect two strong empirical regularities that are central to the behavioral science of human decision-making: a tendency to evaluate outcomes relative...
Article
Computational social science is more than just large repositories of digital data and the computational methods needed to construct and analyse them. It also represents a convergence of different fields with different ways of thinking about and doing science. The goal of this Perspective is to provide some clarity around how these approaches differ...
Preprint
Finding dense subgraphs of a large graph is a standard problem in graph mining that has been studied extensively both for its theoretical richness and its many practical applications. In this paper we introduce a new family of dense subgraph objectives, parameterized by a single parameter $p$, based on computing generalized means of degree sequence...
Article
As algorithms are increasingly applied to screen applicants for high-stakes decisions in employment, lending, and other domains, concerns have been raised about the effects of algorithmic monoculture, in which many decision-makers all rely on the same algorithm. This concern invokes analogies to agriculture, where a monocultural system runs the ris...
Preprint
Homophily is the seemingly ubiquitous tendency for people to connect with similar others, which is fundamental to how society organizes. Even though many social interactions occur in groups, homophily has traditionally been measured from collections of pairwise interactions involving just two individuals. Here, we develop a framework using hypergra...
Preprint
Full-text available
Many policies allocate harms or benefits that are uncertain in nature: they produce distributions over the population in which individuals have different probabilities of incurring harm or benefit. Comparing different policies thus involves a comparison of their corresponding probability distributions, and we observe that in many instances the poli...
Preprint
In the analysis of large-scale network data, a fundamental operation is the comparison of observed phenomena to the predictions provided by null models: when we find an interesting structure in a family of real networks, it is important to ask whether this structure is also likely to arise in random networks with similar characteristics to the real...
Preprint
Full-text available
Opportunities such as higher education can promote intergenerational mobility, leading individuals to achieve levels of socioeconomic status above that of their parents. We develop a dynamic model for allocating such opportunities in a society that exhibits bottlenecks in mobility; the problem of optimal allocation reflects a trade-off between the...
Preprint
Full-text available
As algorithms are increasingly applied to screen applicants for high-stakes decisions in employment, lending, and other domains, concerns have been raised about the effects of algorithmic monoculture, in which many decision-makers all rely on the same algorithm. This concern invokes analogies to agriculture, where a monocultural system runs the ris...
Article
In this letter, we summarize our recent work examining the incentives produced by algorithmic decision-making. Drawing upon principal-agent models in the mechanism design literature, we construct and analyze a model of strategic behavior under algorithmic evaluation. We characterize which behaviors can be incentivized by any reasonable mechanism, s...
Article
Algorithms are often used to produce decision-making rules that classify or evaluate individuals. When these individuals have incentives to be classified a certain way, they may behave strategically to influence their outcomes. We develop a model for how strategic agents can invest effort in order to change the outcomes they receive, and we give a...
Preprint
A long line of work in social psychology has studied variations in people's susceptibility to persuasion -- the extent to which they are willing to modify their opinions on a topic. This body of literature suggests an interesting perspective on theoretical models of opinion formation by interacting parties in a network: in addition to considering i...
Preprint
Federated learning is a setting where agents, each with access to their own data source, combine models learned from local data to create a global model. If agents are drawing their data from different distributions, though, federated learning might produce a biased global model that is not optimal for each agent. This means that agents face a fund...
Preprint
Even when machine learning systems surpass human ability in a domain, there are many reasons why AI systems that capture human-like behavior would be desirable: humans may want to learn from them, they may need to collaborate with them, or they may expect them to serve as partners in an extended interaction. Motivated by this goal of human-like AI...
Article
Full-text available
Preventing discrimination requires that we have means of detecting it, and this can be enormously difficult when human beings are making the underlying decisions. As applied today, algorithms can increase the risk of discrimination. But as we argue here, algorithms by their nature require a far greater level of specificity than is usually possible...
Preprint
In recent years, hypergraph generalizations of many graph cut problems have been introduced and analyzed as a way to better explore and understand complex systems and datasets characterized by multiway relationships. Recent work has made use of a generalized hypergraph cut function which for a hypergraph $\mathcal{H} = (V,E)$ can be defined by asso...
Preprint
As artificial intelligence becomes increasingly intelligent---in some cases, achieving superhuman performance---there is growing potential for humans to learn from and collaborate with algorithms. However, the ways in which AI systems approach problems are often different from the ways people do, and thus may be uninterpretable and hard to learn fr...
Article
There are widespread concerns that the growing use of machine learning algorithms in important decisions may reproduce and reinforce existing discrimination against legally protected groups. Most of the attention to date on issues of “algorithmic bias” or “algorithmic fairness” has come from computer scientists and machine learning researchers. We...
Article
Poverty and economic hardship are understood to be highly complex and dynamic phenomena. Due to the multi-faceted nature of welfare, assistance programs targeted at alleviating hardship can face challenges, as they often rely on simpler welfare measurements, such as income or wealth, that fail to capture to full complexity of each family's state. H...
Article
Machine learning is often used to produce decision-making rules that classify or evaluate individuals. When these individuals have incentives to be classified a certain way, they may behave strategically to influence their outcomes. We develop a model for how strategic agents can invest effort to change the outcomes they receive, and we give a tigh...
Preprint
We study the connections between network structure, opinion dynamics, and an adversary's power to artificially induce disagreements. We approach these questions by extending models of opinion formation in the social sciences to represent scenarios, familiar from recent events, in which external actors seek to destabilize communities through sophist...
Preprint
There is inherent information captured in the order in which we write words in a list. The orderings of binomials --- lists of two words separated by `and' or `or' --- has been studied for more than a century. These binomials are common across many areas of speech, in both formal and informal text. In the last century, numerous explanations have be...
Preprint
Local graph clustering algorithms are designed to efficiently detect small clusters of nodes that are biased to a localized region of a large graph. Although many techniques have been developed for local clustering in graphs, very few algorithms have been designed to detect local clusters in hypergraphs, which better model complex systems involving...
Preprint
The minimum $s$-$t$ cut problem in graphs is one of the most fundamental problems in combinatorial optimization, and graph cuts underlie algorithms throughout discrete mathematics, theoretical computer science, operations research, and data science. While graphs are a standard model for pairwise relationships, hypergraphs provide the flexibility to...
Preprint
Full-text available
A recent normative turn in computer science has brought concerns about fairness, bias, and accountability to the core of the field. Yet recent scholarship has warned that much of this technical work treats problematic features of the status quo as fixed, and fails to address deeper patterns of injustice and inequality. While acknowledging these cri...
Preprint
We use machine learning to provide a tractable measure of the amount of predictable variation in the data that a theory captures, which we call its "completeness." We apply this measure to three problems: assigning certain equivalents to lotteries, initial play in games, and human generation of random sequences. We discover considerable variation i...
Preprint
Resource allocation problems are a fundamental domain in which to evaluate the fairness properties of algorithms, and the trade-offs between fairness and utilization have a long history in this domain. A recent line of work has considered fairness questions for resource allocation when the demands for the resource are distributed across multiple gr...
Preprint
There has been rapidly growing interest in the use of algorithms for employment assessment, especially as a means to address or mitigate bias in hiring. Yet, to date, little is known about how these methods are being used in practice. How are algorithmic assessments built, validated, and examined for bias? In this work, we document and assess the c...
Conference Paper
Algorithms can be a powerful aid to decision-making - particularly when decisions rely, even implicitly, on predictions [7]. We are already seeing algorithms play this role in domains including hiring, education, lending, medicine, and criminal justice [2, 6, 10]. As is typical in machine learning applications, accuracy is an important measure for...
Conference Paper
Algorithms are often used to produce decision-making rules that classify or evaluate individuals. When these individuals have incentives to be classified a certain way, they may behave strategically to influence their outcomes. We develop a model for how strategic agents can invest effort in order to change the outcomes they receive, and we give a...
Preprint
In various application areas, networked data is collected by measuring interactions involving some specific set of core nodes. This results in a network dataset containing the core nodes along with a potentially much larger set of fringe nodes that all have at least one interaction with a core node. In many settings, this type of data arises for st...
Conference Paper
Data collection often involves the partial measurement of a larger system. A common example arises in collecting network data: we often obtain network datasets by recording all of the interactions among a small set of core nodes, so that we end up with a measurement of the network consisting of these core nodes along with a potentially much larger...
Preprint
In a wide array of areas, algorithms are matching and surpassing the performance of human experts, leading to consideration of the roles of human judgment and algorithmic prediction in these domains. The discussion around these developments, however, has implicitly equated the specific task of prediction with the general task of automation. We argu...
Article
Computer algorithms are increasingly being used to predict people's preferences and make recommendations. Although people frequently encounter these algorithms because they are cheap to scale, we do not know how they compare to human judgment. Here, we compare computer recommender systems to human recommenders in a domain that affords humans many a...
Preprint
With the increasingly varied applications of deep learning, transfer learning has emerged as a critically important technique. However, the central question of how much feature reuse in transfer is the source of benefit remains unanswered. In this paper, we present an in-depth analysis of the effects of transfer, focusing on medical imaging, which...
Preprint
The law forbids discrimination. But the ambiguity of human decision-making often makes it extraordinarily hard for the legal system to know whether anyone has actually discriminated. To understand how algorithms affect discrimination, we must therefore also understand how they affect the problem of detecting discrimination. By one measure, algorith...
Article
Social network research has begun to take advantage of fine-grained communications regarding coordination, decision-making, and knowledge sharing. These studies, however, have not generally analyzed how external events are associated with a social network’s structure and communicative properties. Here, we study how external events are associated wi...
Article
Recent discussion in both the academic literature and the public sphere about classification by algorithms has involved tension between competing notions of what it means for such a classification to be fair to different groups [1, 2, 5-7]. We consider several of the key fairness conditions that lie at the heart of these debates, and discuss recent...
Article
The echoes of power Understanding social interaction within groups is key to analyzing online communities. Most current work focuses on structural properties: who talks to whom, and how such interactions form larger network structures. The interactions themselves, however, generally take place in the form of natural language – either spoken or writ...