
Avinatan HassidimBar Ilan University | BIU · Department of Computer Science
Avinatan Hassidim
About
159
Publications
19,239
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
7,491
Citations
Publications
Publications (159)
In this paper, we investigate whether the privacy mechanism of periodically changing the pseudorandom identities of Bluetooth Low Energy (BLE) beacons is sufficient to ensure privacy. We consider a new natural privacy notion for BLE broadcasting beacons which we call ``Timed-sequence- indistinguishability'' of beacons. This new privacy definition i...
Robust medical Machine Learning (ML) models have the potential to revolutionize healthcare by accelerating clinical research, improving workflows and outcomes, and producing novel insights or capabilities. Developing such ML models from scratch is cost prohibitive and requires substantial compute, data, and time (e.g., expert labeling). To address...
Two prominent objectives in social choice are utilitarian - maximizing the sum of agents' utilities, and leximin - maximizing the smallest agent's utility, then the second-smallest, etc. Utilitarianism is typically computationally easier to attain but is generally viewed as less fair. This paper presents a general reduction scheme that, given a uti...
A conversation following an overly predictable pattern is likely boring and uninformative; conversely, if it lacks structure, it is likely nonsensical. The delicate balance between predictability and surprise has been well studied using information theory during speech perception, focusing on how listeners predict upcoming words based on context an...
Despite recent advancements in Large Language Models (LLMs), their performance on tasks involving long contexts remains sub-optimal. In-Context Learning (ICL) with few-shot examples may be an appealing solution to enhance LLM performance in this scenario; However, naively adding ICL examples with long context introduces challenges, including substa...
Reinforcement Learning from Human Feedback (RLHF) has become the standard approach for aligning Large Language Models (LLMs) with human preferences, allowing LLMs to demonstrate remarkable abilities in various tasks. Existing methods work by emulating the preferences at the single decision (turn) level, limiting their capabilities in settings that...
Background
AI models have shown promise in performing many medical imaging tasks. However, our ability to explain what signals these models have learned is severely lacking. Explanations are needed in order to increase the trust of doctors in AI-based models, especially in domains where AI prediction capabilities surpass those of humans. Moreover,...
Contextual embeddings, derived from deep language models (DLMs), provide a continuous vectorial representation of language. This embedding space differs fundamentally from the symbolic representations posited by traditional psycholinguistics. We hypothesize that language areas in the human brain, similar to DLMs, rely on a continuous embedding spac...
Floods are one of the most common natural disasters, with a disproportionate impact in developing countries that often lack dense streamflow gauge networks¹. Accurate and timely warnings are critical for mitigating flood risks², but hydrological simulation models typically must be calibrated to long data records in each watershed. Here we show that...
Leximin is a common approach to multi-objective optimization, frequently employed in fair division applications. In leximin optimization, one first aims to maximize the smallest objective value; subject to this, one maximizes the second-smallest objective; and so on. Often, even the single-objective problem of maximizing the smallest value cannot b...
Forecasting the timing of earthquakes is a long-standing challenge. Moreover, it is still debated how to formulate this problem in a useful manner, or to compare the predictive power of different models. Here, we develop a versatile neural encoder of earthquake catalogs, and apply it to the fundamental problem of earthquake rate prediction, in the...
Floods are one of the most common and impactful natural disasters, with a disproportionate impact in developing countries that often lack dense streamflow monitoring networks. Accurate and timely warnings are critical for mitigating flood risks, but accurate hydrological simulation models typically must be calibrated to long data records in each wa...
Humans effortlessly use the continuous acoustics of speech to communicate rich linguistic meaning during everyday conversations. In this study, we leverage 100 hours (half a million words) of spontaneous open-ended conversations and concurrent high-quality neural activity recorded using electrocorticography (ECoG) to decipher the neural basis of re...
AI models have shown promise in many medical imaging tasks. However, our ability to explain what signals these models have learned is severely lacking. Explanations are needed in order to increase the trust in AI-based models, and could enable novel scientific discovery by uncovering signals in the data that are not yet known to experts. In this pa...
Despite the seeming success of contemporary grounded text generation systems, they often tend to generate factually inconsistent text with respect to their input. This phenomenon is emphasized in tasks like summarization, in which the generated summaries should be corroborated by their source article. In this work, we leverage recent progress on te...
Leximin is a common approach to multi-objective optimization, frequently employed in fair division applications. In leximin optimization, one first aims to maximize the smallest objective value; subject to this, one maximizes the second-smallest objective; and so on. Often, even the single-objective problem of maximizing the smallest value cannot b...
High-quality datasets are essential to support hydrological science and modeling. Several CAMELS (Catchment Attributes and Meteorology for Large-sample Studies) datasets exist for specific countries or regions, however these datasets lack standardization, which makes global studies difficult. This paper introduces a dataset called Caravan (a series...
“Exposure Notification (EN) Systems” which have been envisioned by a number of academic and industry groups, are useful in aiding health authorities worldwide to fight the COVID-19 pandemic spread via contact tracing. Among these systems, many rely on the BLE based Google-Apple Exposure Notification (GAEN) API (for iPhones and Android systems).We a...
A streaming algorithm is said to be adversarially robust if its accuracy guarantees are maintained even when the data stream is chosen maliciously, by an adaptive adversary . We establish a connection between adversarial robustness of streaming algorithms and the notion of differential privacy . This connection allows us to design new adversarially...
Google's operational flood forecasting system was developed to provide accurate real-time flood warnings to agencies and the public with a focus on riverine floods in large, gauged rivers. It became operational in 2018 and has since expanded geographically. This forecasting system consists of four subsystems: data validation, stage forecasting, inu...
Despite recent advances in natural language understanding and generation, and decades of research on the development of conversational bots, building automated agents that can carry on rich open-ended conversations with humans "in the wild" remains a formidable challenge. In this work we develop a real-time, open-ended dialogue system that uses rei...
Deep language models (DLMs) provide a novel computational paradigm for how the brain processes natural language. Unlike symbolic, rule-based models from psycholinguistics, DLMs encode words and their context as continuous numerical vectors. These "embeddings" are constructed by a sequence of layered computations to ultimately capture surprisingly s...
High-quality datasets are essential to support hydrological science and modeling. Several CAMELS (Catchment Attributes and Meteorology for Large-sample Studies) datasets exist for specific countries or regions, however these datasets lack standardization, which makes global studies difficult. This paper introduces a dataset called \emph{Caravan} (a...
Physicians record their detailed thought-processes about diagnoses and treatments as unstructured text in a section of a clinical note called the assessment and plan. This information is more clinically rich than structured billing codes assigned for an encounter but harder to reliably extract given the complexity of clinical language and documenta...
Grounded text generation systems often generate text that contains factual inconsistencies, hindering their real-world applicability. Automatic factual consistency evaluation may help alleviate this limitation by accelerating evaluation cycles, filtering inconsistent outputs and augmenting training data. While attracting increasing attention, such...
Contextual embeddings, derived from deep language models (DLMs), provide a continuous vectorial representation of language. This embedding space differs fundamentally from the symbolic representations posited by traditional psycholinguistics. Do language areas in the human brain, similar to DLMs, rely on a continuous embedding space to represent la...
Departing from traditional linguistic models, advances in deep learning have resulted in a new type of predictive (autoregressive) deep language models (DLMs). Using a self-supervised next-word prediction task, these models generate appropriate linguistic responses in a given context. In the current study, nine participants listened to a 30-min pod...
Beacons are small devices which are playing an important role in the Internet of Things (IoT), connecting “things” without IP connection to the Internet via Bluetooth Low Energy (BLE) communication. In this paper we present the first private end-to-end encryption protocol called the Eddystone-Ephemeral-ID (Eddystone-EID) protocol. This protocol ena...
The operational flood forecasting system by Google was developed to provide accurate real-time flood warnings to agencies and the public, with a focus on riverine floods in large, gauged rivers. It became operational in 2018 and has since expanded geographically. This forecasting system consists of four subsystems: data validation, stage forecastin...
This paper studies repetitive negotiation over the execution of an exploration process between two self-interested, fully rational agents in a full information environmentwith side payments. A key aspect of the protocolis that the exploration’s execution may interleaves ith the negotiation itself, inflicting some degradationon the exploration’s fle...
In this paper, we introduce adversarially robust streaming algorithms for central machine learning and algorithmic tasks, such as regression and clustering, as well as their more general counterparts, subspace embedding, low-rank approximation, and coreset construction. For regression and other numerical linear algebra related tasks, we consider th...
Given the ubiquity of negative campaigning in recent political elections, we find it important to study its properties from a computational perspective. To this end, we present a model where elections can be manipulated by convincing voters to demote specific non-favored candidates, and study its properties in the classic setting of scoring rules....
Image classification models can depend on multiple different semantic attributes of the image. An explanation of the decision of the classifier needs to both discover and visualize these properties. Here we present StylEx, a method for doing this, by training a generative model to specifically explain multiple attributes that underlie classifier de...
Departing from classical rule-based linguistic models, advances in deep learning have led to the development of a new family of self-supervised deep language models (DLMs). These models are trained using a simple self-supervised autoregressive objective, which aims to predict the next word in the context of preceding words in real-life corpora. Aft...
Floods are among the most common and deadly natural disasters in the world, and flood warning systems have been shown to be effective in reducing harm. Yet the majority of the world's vulnerable population does not have access to reliable and actionable warning systems, due to core challenges in scalability, computational costs, and data availabili...
This paper studies the set of stable allocations in college admissions markets where students can attend the same college under different financial terms. The stable deferred acceptance mechanism implicitly allocates funding based on merit. In Hungary, where the centralized mechanism is based on deferred acceptance, an alternate stable algorithm wo...
Classic cake-cutting algorithms enable people with different preferences to divide among them a heterogeneous resource (“cake”) such that the resulting division is fair according to each agent’s individual preferences. However, these algorithms either ignore the geometry of the resource altogether or assume it is one-dimensional. In practice, it is...
We consider the classic problem of $(\epsilon,\delta)$-PAC learning a best arm where the goal is to identify with confidence $1-\delta$ an arm whose mean is an $\epsilon$-approximation to that of the highest mean arm in a multi-armed bandit setting. This problem is one of the most fundamental problems in statistics and learning theory, yet somewhat...
We study the problem of clustering the vertices of a weighted hypergraph such that on average the vertices of each edge can be covered by a small number of clusters. This problem has many applications, such as for designing medical tests, clustering files on disk servers, and placing network services on servers. The edges of the hypergraph model gr...
Organizations often require agents’ private information to achieve critical goals such as efficiency or revenue maximization, but frequently it is not in the agents’ best interest to reveal this information. Strategy-proof mechanisms give agents incentives to truthfully report their private information. In the context of matching markets, they elim...
The free-form portions of clinical notes are a significant source of information for research, but before they can be used, they must be de-identified to protect patients' privacy. De-identification efforts have focused on known identifier types (names, ages, dates, addresses, ID's, etc.). However, a note can contain residual “Demographic Traits” (...
A streaming algorithm is said to be adversarially robust if its accuracy guarantees are maintained even when the data stream is chosen maliciously, by an adaptive adversary. We establish a connection between adversarial robustness of streaming algorithms and the notion of differential privacy. This connection allows us to design new adversarially r...
We show that many-to-one matching markets with contracts where colleges’ preferences satisfy the hidden substitutes condition of Hatfield and Kominers (2015) may not be embedded, in the sense of Echenique (2012) into a Kelso and Crawford(1982) matching-with-salaries market. Our proof relies on a configurations of preferences that is observed in man...
We study a classic algorithmic problem through the lens of statistical learning. That is, we consider a matching problem where the input graph is sampled from some distribution. This distribution is unknown to the algorithm; however, an additional graph which is sampled from the same distribution is given during a training phase (preprocessing). Mo...
The final step in getting an Israeli MD is performing a year-long internship in one of the hospitals in Israel. Internships are decided upon by a lottery, which is known as the Internship Lottery. In 2014, we redesigned the lottery, replacing it with a more efficient one. This article presents the market, the redesign process, and the new mechanism...
In many practical uses of reinforcement learning (RL) the set of actions available at a given state is a random variable, with realizations governed by an exogenous stochastic process. Somewhat surprisingly, the foundations for such sequential decision processes have been unaddressed. In this work, we formalize and investigate MDPs with stochastic...
Motivated by applications such as stock exchanges and spectrum auctions, there is a growing interest in mechanisms for arranging trade in two-sided markets. However, existing mechanisms are either not truthful, do not guarantee an asymptotically-optimal gain-from-trade, rely on a prior on the traders' valuations, or operate in limited settings such...
We study the problem of controlling linear time-invariant systems with known noisy dynamics and adversarially chosen quadratic losses. We present the first efficient online learning algorithms in this setting that guarantee $O(\sqrt{T})$ regret under mild assumptions, where $T$ is the time horizon. Our algorithms rely on a novel SDP relaxation for...
In many practical uses of reinforcement learning (RL) the set of actions available at a given state is a random variable, with realizations governed by an exogenous stochastic process. Somewhat surprisingly, the foundations for such sequential decision processes have been unaddressed. In this work, we formalize and investigate MDPs with stochastic...
The classic bribery problem is to find a minimal subset of voters who need to change their vote to make some preferred candidate win.We find an approximate solution for this problem for a broad family of scoring rules (which includes Borda and t-approval), in the following sense: if there is a strategy which requires bribing k voters, we efficientl...
In a seminal paper, McAfee (1992) presented a truthful mechanism for double auctions, attaining asymptotically-optimal gain-from-trade without any prior information on the valuations of the traders. McAfee's mechanism handles single-parametric agents, allowing each seller to sell a single unit and each buyer to buy a single unit. This paper present...
In a seminal paper, McAfee (1992) presented a truthful mechanism for double auctions, attaining asymptotically-optimal gain-from-trade without any prior information on the valuations of the traders. McAfee's mechanism handles single-parametric agents, allowing each seller to sell a single unit and each buyer to buy a single unit. This paper present...
A seminal theorem of Myerson and Satterthwaite (1983) proves that, in a game of bilateral trade between a single buyer and a single seller, no mechanism can be simultaneously individually-rational, budget-balanced, incentive-compatible and socially-efficient. However, the impossibility disappears if the price is fixed exogenously and the social-eff...
We study the problem of coalitional manipulation---where $k$ manipulators try to manipulate an election on $m$ candidates---under general scoring rules, with a focus on the Borda protocol. We do so both in the weighted and unweighted settings. We focus on minimizing the maximum score obtainable by a non-preferred candidate. In the strongest, most g...
Ranking alternatives is a natural way for humans to explain their preferences. It is being used in many settings, such as school choice (NY, Boston), Course allocations, and the Israeli medical lottery. In some cases (such as the latter two), several ``items'' are given to each participant. Without having any information on the underlying cardinal...
Prior to 2014, the admission to Master's and PhD programs in psychology in Israel was a mostly decentralized process. In 2013, in response to concerns about the existing procedure, we proposed to use a mechanism that is both stable and strategy-proof for applicants. The first part of this paper describes how we successfully centralized this market,...
Ranking alternatives is a natural way for humans to explain their preferences. It is being used in many settings, such as school choice (NY, Boston), course allocations, and the Israeli medical lottery. In some cases (such as the latter two), several "items" are given to each participant. Without having any information on the underlying cardinal ut...
We report on the centralization of a two-sided matching-with-contracts market, in which pre-existing choice functions violate the substitutes condition. The ability to accommodate these choice functions was critical for the success of our design. The new mechanism is stable and strategy-proof for applicants. It is well accepted by both sides of the...
Honesty is the best policy in the face of a strategy-proof mechanism--irrespective of others' behavior, the best course of action is to report one's preferences truthfully. We review evidence from different markets in different countries and find that a substantial percentage of participants do not report their true preferences to the strategy-proo...
Often one would like to allocate shared resources in a fair way. A common and well-studied notion of fairness is Max-Min Fairness, where we first maximize the smallest allocation, and subject to that the second smallest, and so on. We consider a networking application where multiple commodities compete over the capacity of a network. In our setting...
We consider the classic problem of fairly dividing a heterogeneous good (“cake”) among several agents with different valuations. Classic cake-cutting procedures either allocate each agent a collection of disconnected pieces, or assume that the cake is a one-dimensional interval. In practice, however, the two-dimensional shape of the allotted pieces...
In a large, possibly infinite population, each subject is colored red with probability $p$, independently of the others. Then, a finite sub-population is selected, possibly as a function of the coloring. The imbalance in the sub-population is defined as the difference between the number of reds in it and p times its size. This paper presents high-p...
We consider the classic problem of envy-free division of a heterogeneous good ("cake") among several agents. It is known that, when the allotted pieces must be connected, the problem cannot be solved by a finite algorithm for three or more agents. The impossibility result, however, assumes that the entire cake must be allocated. In this article, we...
We consider a simple simultaneous first price auction for two identical items in a complete information setting. Our goal is to analyze this setting for a simple, yet highly interesting, AND-OR game, where one agent is single minded and the other is unit demand. We find a mixed equilibrium of this game and show that every other equilibrium admits t...
In a seminal paper, McAfee (1992) presented the first dominant strategy truthful mechanism for double auction. His mechanism attains nearly optimal gain-from-trade when the market is sufficiently large. However, his mechanism may leave money on the table, since the price paid by the buyers may be higher than the price paid to the sellers. This mone...
We consider the problem of fairly dividing a two-dimensional heterogeneous resource among several agents with different preferences. Potential applications include dividing land-estates among heirs, museum space among presenters or space in print and electronic media among advertisers. Classic cake-cutting procedures either consider a one-dimension...
We introduce the notion of local computation mechanism design—designing game-theoretic mechanisms that run in polylogarithmic time and space. Local computation mechanisms reply to each query in polylogarithmic time and space, and the replies to different queries are consistent with the same global feasible solution. When the mechanism employs payme...
A mechanism is said to be strategy-proof if no agent has an incentive to misrepresent her true preferences. This property is considered highly desirable for mechanisms that are used in real-life markets. And indeed, many of the great success stories of market design employ strategy-proof mechanisms, such as the second-price sealed-bid auction (Vick...
In a seminal paper, McAfee (1992) presented the first dominant strategy truthful mechanism for double auction. His mechanism attains nearly optimal gain-from-trade when the market is sufficiently large. However, his mechanism may leave money on the table, since the price paid by the buyers may be higher than the price paid to the sellers. This mone...
We consider the class of valuations on indivisible items called gross-substitute (GS). This class was introduced by Kelso and Crawford (1982) and is widely used in studies of markets with indivisibilities. GS is a condition on the demand-flow in a specific scenario: some items become more expensive while other items retain their price. We prove tha...