
Yehuda Afek- Tel Aviv University
Yehuda Afek
- Tel Aviv University
About
177
Publications
14,203
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
5,132
Citations
Current institution
Publications
Publications (177)
We present a novel yet simple and comprehensive DNS cache POisoning Prevention System (POPS), designed to integrate as a module in Intrusion Prevention Systems (IPS). POPS addresses statistical DNS poisoning attacks, including those documented from 2002 to the present, and offers robust protection against similar future threats. It consists of two...
This paper presents a new localhost browser based vulnerability and corresponding attack that opens the door to new attacks on private networks and local devices. We show that this new vulnerability may put hundreds of millions of internet users and their IoT devices at risk. We demonstrate the viability of the attack on a real product, "Folding@Ho...
This paper presents a new localhost browser based vulnerability and corresponding attack that opens the door to new attacks on private networks and local devices. We show that this new vulnerability may put hundreds of millions of internet users and their IoT devices at risk. Following the attack presentation, we suggest three new protection mechan...
The Domain Name System (DNS) infrastructure, a most critical system the Internet depends on, has recently been the target for different DDoS and other cyber-attacks, e.g., the notorious Mirai botnet. While these attacks can be destructive to both recursive and authoritative DNS servers, little is known about how recursive resolvers operate under su...
A new scalable ISP level system architecture to secure and protect all IoT devices in a large number of homes is presented. The system is based on whitelisting, as in the Manufacturer Usage Description (MUD) framework, implemented as a VNF. Unlike common MUD suggestions that place the whitelist application at the home/enterprise network, our approa...
We think of a tournament $T=([n], E)$ as a communication network where in each round of communication processor $P_i$ sends its information to $P_j$, for every directed edge $ij \in E(T)$. By Landau's theorem (1953) there is a King in $T$, i.e., a processor whose initial input reaches every other processor in two rounds or less. Namely, a processor...
In this paper we present three attacks on private internal networks behind a NAT and a corresponding new protection mechanism, Internal Network Policy, to mitigate a wide range of attacks that penetrate internal networks behind a NAT. In the attack scenario, a victim is tricked to visit the attacker's website, which contains a malicious script that...
We present a basic tool for zero day attack signature extraction. Given two large sets of messages, P the messages captured in the network at peacetime (i.e., mostly legitimate traffic) and A the messages captured during attack time (i.e., contains many attack messages), we present a tool for extracting a set S of strings that are frequently found...
Random Subdomain DDoS attacks on the Domain Name System (DNS) infrastructure are becoming a popular vector in recent attacks (e.g., recent Mirai attack on Dyn). In these attacks, many queries are sent for a single or a few victim domains, yet they include highly varying non-existent subdomains generated randomly.
Motivated by these attacks we desig...
Efficient algorithms and techniques to detect and identify large flows in a high throughput traffic stream in the SDN match-and-action model are presented. This is in contrast to previous work that either deviated from the match and action model by requiring additional switch level capabilities or did not exploit the SDN data plane. Our constructio...
Efficient algorithms and techniques to detect and identify large flows in a high throughput traffic stream in the SDN match-and-action model are presented. This is in contrast to previous work that either deviated from the match and action model by requiring additional switch level capabilities or did not exploit the SDN data plane. Our constructio...
Motivated by a recent new type of randomized Distributed Denial of Service (DDoS) attacks on the Domain Name Service (DNS), we develop novel and efficient distinct heavy hitters algorithms and build an attack identification system that uses our algorithms. Heavy hitter detection in streams is a fundamental problem with many applications, including...
Hardware lock-elision (HLE) introduces concurrency into legacy lock-based code by optimistically executing critical sections in a fast-path as hardware transactions. Its main limitation is that in case of repeated aborts, it reverts to a fallback-path that acquires a serial lock. This fallback-path lacks hardware-software concurrency, because all f...
Hardware lock-elision (HLE) introduces concurrency into legacy lock-based code by optimistically executing critical sections in a fast-path as hardware transactions. Its main limitation is that in case of repeated aborts, it reverts to a fallback-path that acquires a serial lock. This fallback-path lacks hardware-software concurrency, because all f...
Configuring range based packet classification rules in network switches is crucial to all network core functionalities, such as firewalls and routing. However, OpenFlow, the leading management protocol for SDN switches, lacks the interface to configure range rules directly and only provides mask based rules, named flow entries. In this work we pres...
This paper introduces a temporally bounded total store ordering (TBTSO) memory model, and shows that it enables nonblocking fence-free solutions to asymmetric synchronization problems, such as those arising in memory reclamation and biased locking.
TBTSO strengthens the TSO memory model by bounding the time it takes a store to drain from the store...
A method for processing communication traffic includes receiving an incoming stream of compressed data conveyed by a sequence of data packets, each containing a respective portion of the compressed data. The respective portion of the compressed data contained in the first packet is stored in a buffer, having a predefined buffer size. Upon receiving...
A non transactional load (NTL), is a load instruction, which is invisible to the transactional system, even if done within a transaction. It ignores and suppose to be ignored by the other concurrent transactions , thus NTL does not introduce any conflicts. An analysis of the potential benefits of NTL and the issues in introducing it are discussed....
A recursive and fast construction of an n -element priority queue from exponentially smaller hardware priority queues and size n RAM is presented. All priority queue implementations to date require either O(logn)O(logn) instructions per operation or, exponential (with key size) space or, expensive special hardware whose cost and latency dramaticall...
Work stealing is the method of choice for load balancing in task parallel programming languages and frameworks. Yet despite considerable effort invested in optimizing work stealing task queues, existing algorithms issue a costly memory fence when removing a task, and these fences are believed to be necessary for correctness.
This paper refutes this...
Work stealing is the method of choice for load balancing in task parallel programming languages and frameworks. Yet despite considerable effort invested in optimizing work stealing task queues, existing algorithms issue a costly memory fence when removing a task, and these fences are believed to be necessary for correctness.
This paper refutes this...
Work stealing is the method of choice for load balancing in task parallel programming languages and frameworks. Yet despite considerable effort invested in optimizing work stealing task queues, existing algorithms issue a costly memory fence when removing a task, and these fences are believed to be necessary for correctness.
This paper refutes this...
We present a basic tool for zero day attack signature extraction. Given two large sets of messages, P of messages captured in the network at peacetime (i.e., mostly legitimate traffic) and A captured during attack time (i.e., contains many attack messages), we present a tool for extracting a set S of strings, that are frequently found in A and not...
A recursive and fast construction of an n elements priority queue from exponentially smaller hardware priority queues and size n RAM is presented. All priority queue implementations to date either require O (log n) instructions per operation or exponential (with key size) space or expensive special hardware whose cost and latency dramatically incre...
Conventional wisdom in designing concurrent data structures is to use the most powerful synchronization primitive, namely compare-and-swap (CAS), and to avoid contended hot spots. In building concurrent FIFO queues, this reasoning has led researchers to propose combining-based concurrent queues.
This paper takes a different approach, showing how to...
We present a simple yet effective technique for improving performance of lock-based code using the hardware lock elision (HLE) feature in Intel's upcoming Haswell processor.
We also describe how to extend Haswell's HLE mechanism to achieve a similar effect to our lock elision scheme entirely in hardware.
This paper takes advantage of the emerging multi-core computer architecture to design a general framework for mitigating network-based complexity attacks. In complexity attacks, an attacker carefully crafts "heavy" messages (or packets) such that each heavy message consumes substantially more resources than a normal message. Then, it sends a suffic...
Read-write locks are one of the most prevalent lock forms in concurrent applications because they allow read accesses to locked code to proceed in parallel. However, they do not offer any parallelism between reads and writes.
This paper introduces pessimistic lock-elision (PLE), a new approach for non-speculatively replacing read-write locks with p...
We present the CB tree, a counting-based self-adjusting binary search tree in which, as in splay trees, more-frequently accessed items move closer to the root. In a sequential execution, after m operations of which c(v) access itema v access of v traverses a path of length O(1+log m/c(v)) while doing few if any rotations. Unlike the original splay...
In the {\em Musical Chairs} game $MC(n,m)$ a team of $n$ players plays
against an adversarial {\em scheduler}. The scheduler wins if the game proceeds
indefinitely, while termination after a finite number of rounds is declared a
win of the team. At each round of the game each player {\em occupies} one of
the $m$ available {\em chairs}. Termination...
The JavaTMTM developers kit requires a size() operation for all objects, tracking the number of elements in the object. Unfortunately, the best known solution, available in the Java concurrency package, has a blocking concurrent implementation that does not scale. This paper presents a highly scalable wait-free implementation of a concurrent size()...
We consider the problem of computing a maximal independent set (MIS) in an
extremely harsh broadcast model that relies only on carrier sensing. The model
consists of an anonymous broadcast network in which nodes have no knowledge
about the topology of the network or even an upper bound on its size.
Furthermore, it is assumed that an adversary choos...
In this paper we focus on the process of deep packet inspection of compressed web traffic. The major limiting factor in this process imposed by the compression, is the high memory requirements of 32 KB per connection. This leads to the requirements of hundreds of megabytes to gigabytes of main memory on a multi-connection setting. We introduce new...
A synchronous message passing complete network with an adversary that may purge messages is used to precisely model tasks that are read-write wait-free computable.
In the past, adversaries that reduce the computational power of a system as they purge messages were studied in the context of their ability to foil consensus. This paper considers the o...
The Java[superscript TM] developers kit requires a size() operation for all objects, tracking the number of elements in the object. Unfortunately, the best known solution, available in the Java concurrency package, has a blocking concurrent implementation that does not scale. This paper presents a highly scalable wait-free implementation of a concu...
It is well known that guaranteeing program consistency when accessing shared data comes at the price of degraded performance and scalability.
This paper initiates the investigation of consistency oblivious programming (COP). In COP, sections of concurrent code that meet certain criteria are executed without checking for consistency. However, checkp...
Poor placement of data blocks in memory may negatively impact application performance because of an increase in the cache conflict miss rate [18]. For dynamically allocated structures this placement is typically determined by the memory allocator. Cache index-oblivious allocators may inadvertently place blocks on a restricted fraction of the availa...
In an asymmetric rendezvous system, such as an unfair synchronous queue and an elimination array, threads of two types, consumers
and producers, show up and are matched, each with a unique thread of the other type. Here we present a new highly scalable,
high throughput asymmetric rendezvous system that outperforms prior synchronous queue and elimin...
We consider the problem of computing a maximal independent set (MIS) in an extremely harsh broadcast model that relies only
on carrier sensing. The model consists of an anonymous broadcast network in which nodes have no knowledge about the topology
of the network or even an upper bound on its size. Furthermore, it is assumed that nodes wake up asyn...
We introduce oblivious protocols, a new framework for distributed computation with limited communication. Within this model we consider the musical chairs task MC(n,m), involving n players (processors) and m chairs. Initially, players occupy arbitrary chairs. Two players are in conflict if they both occupy the same chair. The task terminates when t...
Humans are very good at optimizing solutions for specific problems.
Biological processes, on the other hand, have evolved to handle multiple
constrained distributed environments and so they are robust and adaptable.
Inspired by observations made in a biological system we have recently presented
a simple new randomized distributed MIS algorithm \cit...
We consider the power of objects in the unbounded concurrency shared memory model, where there is an infinite set of processes and the number of processes active concurrently may increase without bound. By studying this model we obtain new results and observations that are relevant and meaningful to the standard bounded concurrency model. First we...
Lock-based software transactional memory algorithms do not perform well in workloads with a high rate of context switches, which is caused for example by scheduling events or page faults. This occurs since threads that are switched-out by the operating system while holding locks block other threads from progressing, causing their transactions to ab...
Compressing web traffic using standard GZIP is becoming both popular and challenging due to the huge increase in wireless
web devices, where bandwidth is limited. Security and other content based networking devices are required to decompress the
traffic of tens of thousands concurrent connections in order to inspect the content for different signat...
Computational and biological systems are often distributed so that processors (cells) jointly solve a task, without any of them receiving all inputs or observing all outputs. Maximal independent set (MIS) selection is a fundamental distributed computing procedure that seeks to elect a set of local leaders in a network. A variant of this problem is...
Linearizability, the key correctness condition that most optimized concurrent object implementations comply with, imposes tight synchronization between the object concurrent operations. This tight synchronization usually comes with a performance and scalability price. Yet, these implementations are often employed in an environment where a more rela...
Working on shared mutable data requires synchronization through barriers, locks or transactional memory mechanisms. To avoid this overhead a thread may privatize part of the data and work on it locally. By privatizing a data item a thread is guaranteed that it is the only one accessing this data, i.e., that it accesses the data item in exclusion.
T...
Software Transactional Memory (STM) compilers commonly instrument memory accesses by transforming them into calls to STM library
functions. Done naïvely, this instrumentation imposes a large overhead, slowing down the transaction execution. Many compiler
optimizations have been proposed in an attempt to lower this overhead. In this paper we attempt...
Many linearizable and optimized concurrent algorithms are available for known algorithms and data structures, such as, Queue,
Tree, Stack, Counter and HashTable. However, sometimes these implementations are used in a more relaxed environment, provided
as part of larger design pattern where a relaxed linearizability suffices rather than a strict one...
Producer-consumer pools, that is, collections of unordered objects or tasks, are a fundamental element of modern multiprocessor software and a target of extensive research and development. For example, there are three common ways to implement such pools in the Java JDK6.0: the SynchronousQueue, the LinkedBlockingQueue, and the ConcurrentLinkedQueue...
We present view transactions, a model for relaxed consistency checks in software transactional memory (STM). View transactions always operate on a consistent snapshot of memory but may commit in a different snapshot. They are therefore simpler to reason about, provide opacity and maintain composability. In addition, view transactions avoid many of...
This paper introduces and investigates the k-simultaneous consensus problem: each process participates at the same time in k independent consensus instances until it decides in any one of them. Two results are presented. The first shows that the k-simultaneous consensus problem and the k-set agreement problem are wait-free equivalent in read/write...
We address two problems, the g-tight group renaming task and what we call, safe-consensus task, and show the relations between them. We show that any g-tight group renaming task, the first problem, implements g processes consensus. We show this by introducing an intermediate task, the safe-consensus task, the second problem, and showing that g-tigh...
This paper explores the power of failure detectors in read write shared memory systems with n processes whose names are drawn from the set {1...m}, m>=2n-1. We do so by making an additional assumption, name obliviousness, on top of the three failure detector assumptions introduced by ZieliDski. We present name non-oblivious failure detectors that a...
We study the group renaming task, which is a natural generalization of the renaming task. An instance of this task consists of n processors, partitioned into m groups, each of at most g processors. Each processor knows the name of its group, which is in { 1, ..., M }. The task of each processor is to choose a new name for its group such that proces...
This paper extends Common2, the family of objects that implement and are wait-free implementable from 2 consensus objects, in two ways: First, the
stack
object is shown to be in the family, refuting a conjecture to the contrary [6]. Second, Common2 is investigated in the unbounded concurrency model, whereas until now it was considered only in an n-...
Deterministic collect algorithms are presented that are adaptive to total contention and are efficient with respect to both
the number of registers used and the step complexity. One of them has optimal O(k) step and O(n) space complexities, but assumes that processes’ identifiers are in O(n), where n is the total number of processes in the system a...
We address the problem of solving a task T=(T
1,...T
m
) (called (m,1)-BG), in which a processor returns in an arbitrary one of m simultaneous consensus subtasks T
1,...T
m
. Processor p
i
submits to T an input vector of proposals (prop
i,1,...,prop
i,m
), one entry per subtask, and outputs, from just one subtask ℓ, a pair (ℓ, prop
j,l
) for some j...
What characteristics of an object determine its consensus number? Here we analyze how the consensus power of various objects
changes without changing their functionality, but by placing certain restrictions on the object usage. For example it is shown
that the consensus number of either a bounded-use queue or stack is 3 while the consensus number o...
Common2, the family of objects that implement and are wait-free implementable from 2 consensus objects, is extended inhere in two ways: First, the stack object is added to the family --- an object that was conjectured not to be in the family. Second, Common2 is investigated in the unbounded concurrency model, whereas until now it was considered onl...
A simple, general and optimal procedure to adapt algorithms designed for fixed topology networks to run on a network with dynamically changing topology is presented. The communication and time complexities of the procedure, per topological change, are independent of the number of topological changes and are linearly bounded by the size of the subne...
This paper presents an economical, randomized, wait-free construction of an n-process test-and-set bit from read write registers. The test-and-set shared object has two atomic operations, test&set, which atomically reads the bit and sets its value to 1, and the reset operation that resets the bit to 0.
We identify two new complexity measures by whi...
Space and step complexity efficient deterministic adaptive to total contention collect algorithms are presented. One of them
has an optimal O(k) step and O(n) space complexities, but restrict the processes identifiers size to O(n). Where n is the total number of processes in the system and k is the total contention, the total number of processes ac...
Labovitz et al. (2001) and Labovitz et al. (2000) noticed that sometimes it takes border gateway protocol (BGP) a substantial amount of time and messages to converge and stabilize following the failure of some node in the Internet. In this paper, we suggest a minor modification to BGP that eliminates the problem pointed out and substantially reduce...
In (Ref.1), (Ref.2) it was noticed that sometimes it takes BGP a substantial amount of time and messages to converge and stabilize following the failure of some node in the Internet. In this paper we suggest a minor modification to BGP that eliminates the problem pointed out and substantially reduces the convergence time and communication complexit...
Layered communication protocols frequently implement a FIFO message facility On top of an unreliable non-FIFO service such as that provided by a packet-switching network. This paper investigates the possibility of implementing a reliable message layer on top of an underlying layer that can lose packets and deliver them out of order, with the additi...
The notion of Internet Policy Atoms has been recently introduced in [1], [2] as groups of prefixes sharing a common BGP AS path at any Internet backbone router. In this paper we further research these 'Atoms'. First we offer a new method for computing the Internet policy atoms, and use the RIPE RIS database [6] to derive their structure. Second, we...
this paper. Motivated by their first work Moir and Anderson developed renaming algorithms, in the read/write model, when such a bound on the maximum number of processes is known in advance. This led to a sequence of works on the renaming problem in this model [MA95, MG96, BGHM95] that lead to a long-lived (2K Gamma 1)-renaming algorithm with O(K )...
Though it is common practice to treat synchronization primitives for multiprocessors as abstract data types, they are in reality machine instructions on registers. A crucial theoretical question with practical implications is the relationship between the size of the register and its computational power. We wish to study this question and choose as...
Long-lived and adaptive implementations of mutual exclusion and renaming in the read/write shared memory model are presented. An implementation of a task is adaptive if the step complexity of any operation in the implementation is a function of the number of processes that take steps concurrently with the operation. The renaming algorithm assigns a...
A new general theory about restoration of network paths is first introduced. The theory pertains to restoration of shortest paths in a network following failure,
e.g., we prove that a shortest path in a network after removing k edges is the concatenation of at most k+1 shortest paths in the original network. The theory is then combined with efficie...
A new paradigm for the design of self-stabilizing distributed algorithms, called local detection, is introduced. The essence of the paradigm is in defining a local condition based on the state of a processor and its immediate neighborhood such that the system is in a globally legal state if and only if the local condition is satisfied at all the no...
A new general theory about restoration of network paths is first introduced. The theory pertains to restoration of shortest paths in a network following failure, e.g., we prove that a shortest path in a network after removing k edges is the concatenation of at most k + 1 shortest paths in the original network.The theory is then combined with effici...
This paper presents Phantom, a simple constant space algorithm for rate-based flow control. As shown by our simulations, it converges fast to a fair rate allocation while generating a moderate queue length. While our approach can be easily implemented in ATM switches for managing available bit rate (ABR) traffic, it is also suitable for flow contro...
A new general theory about restoration of network paths is first introduced. The theory pertains to restoration of shortest paths in a network following failure, e.g., we prove that a shortest path in a network after removing k edges is the concatenation of at most k + 1 shortest paths in the original network.The theory is then combined with effici...
A new general theory about restoration of network paths is rst introduced. The theory pertains to restoration of shortest paths in a network following failure, e.g., we prove that a shortest path in a network after removing k edges is the concatenation of at most k + 1 shortest paths in the original network. The theory is then combined with ecient...
In this paper we prove: For any constant d there is a large enough n such that there is no long-lived adaptive implementation of collect or renaming in the read write model with n processes that uses d or less MWMR registers.In other words, there is no implementation of a long-lived and adaptive renaming or collect object in the atomic read/write m...
Long-lived and adaptive to point contention implementations of snapshot and immediate snapshot objects in the read/write shared-memory model are presented. In [2] we presented adaptive algorithms for mutual exclusion, collect and snapshot. However, the collect and snapshot algorithms were adaptive only when the number of local primitive operations...
Trainet, a new scheme to extend MPLS (multi-protocol label
switching) is presented. The scheme works much like the subway system in
a large metropolitan area. Each (unidirectional) subway line corresponds
to a labeled path, and a route in the network is defined by either a
pair 〈label, count-value〉, where count specifies how many hops
a packet stil...
Several adaptive algorithms are automatically generated via a simple transformation from single-writer multi-reader algorithms, using the O(k) adaptive collect algorithm of Attiya and Fouren [AF98a]. Among these algorithms are an adaptive snapshot algorithm with step complexity O(k 2 ), and three algorithms solving (2k Gamma 1)-renaming, but with h...
We suggest a new simple forwarding technique to speed up IP destination address lookup. The technique is a natural extension of IP, requires 5 bits in the IP header (IPv4, 7 in IPv6), and performs IP lookup nearly as fast as IP/Tag switching but with a smaller memory requirement and a much simpler protocol. The basic idea is that each router adds a...
We consider shared memory systems that support multiobject operations in which processes may simultaneously access several objects in one atomic operation. We provide upper and lower bounds on the synchronization power (consensus number) of multiobject systems as a function of the type and the number of objects that may be simultaneously accessed i...
Two implementations of an adaptive, wait-free, and long-lived renaming task in the read/write shared memory model are presented. Implementations of long-lived and adaptive objects were previously known only in the much stronger model of load-linked and store-conditional (i.e., read-modify-write) shared memory. In read/write shared-memory only one-s...
An implementation of fast, wait-free, long-lived and dynamic renaming task in the read/write shared memory model is presented. The algorithm assigns a new unique id in the range 1; ; 4k 2 to any process whose initial unique name is taken from a set of size N , for an arbitrary N and where k is the number of processors that actually take steps or ho...
This paper studies basic properties of rate-based flow-control algorithms and of the max-min fairness criteria. For the algorithms we suggest a new approach for their modeling and analysis, which may be considered more “optimistic” and realistic than traditional approaches. Three variations of the approach are presented, and their rate of convergen...