# Jérémy Félix BarbayUniversity of Chile · Departamento de Ciencias de la Computación

Jérémy Félix Barbay

Ph. D. in Computer Science

## About

77

Publications

3,333

Reads

**How we measure 'reads'**

A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more

1,032

Citations

Citations since 2017

Introduction

Theoretical Computer Science (TCS)
Learning Management Systems (LMS)
Animal Computer Interaction (ACI)

Education

September 1998 - August 2002

## Publications

Publications (77)

When comparing two sets of objects, one can compare their cardinalities, or map one set to the other. Which of the two comparison methods an individual (human or not) is using depends of the cardinalities compared and of whether the representation is well known or not. While the discrimination ability is well documented for various species, less in...

Validating that non human animals can communicate with humans using Augmentative and Alternative Communication requires extensive logging, and traditional techniques are costly in resources and time. We propose to implement 1) a configurable "communication board" application aimed at small non human animals able to use touch interfaces, which not o...

Some video games were developed to entertain non human animals while measuring their abilities, logged in a file which can be analyzed later. Using such games to measure the limits of such abilities is problematic, as it requires the subjects to be exposed to instances that they cannot solve, potentially frustrating them. Could presenting the subje...

Ain et al. measured three African Grey (Psittacus erithacus) parrot's discrimination abilities between discreet and continuous quantities. Some features of their experimental protocol make it difficult to apply to other subjects and/or species without introducing a risk for some bias, as subjects could read cues from the experimenter (even though t...

Motivated by the analysis of range queries in databases, we introduce the computation of the depth distribution of a set B of n d-dimensional boxes (i.e., axis aligned d-dimensional hyperrectangles), which generalizes the computation of the Klee's measure and maximum depth of B. We present an algorithm to compute the depth distribution running in t...

There are no silver bullets in algorithm design, and no single algorithmic idea is powerful and flexible enough to solve every computational problem. Nor are there silver bullets in algorithm analysis, as the most enlightening method for analyzing an algorithm often depends on the problem and the application. However, typical algorithms courses rel...

The game of Hangman is a classical asymmetric two player game in which one player, the setter, chooses a secret word from a language, that the other player, the guesser, tries to discover through single letter matching queries, answered by all occurrences of this letter if any. In the Evil Hangman variant, the setter can change the secret word duri...

Given a set B of d-dimensional boxes (i.e., axis-aligned hyperrectangles), a minimum coverage kernel is a subset of B of minimum size covering the same region as B. Computing it is NP-hard, but as for many similar NP-hard problems (e.g., Box Cover, and Orthogonal Polygon Covering), the problem becomes solvable in polynomial time under restrictions...

The Swap-Insert Correction distance from a string S of length n to another string L of length m ≥ n on the alphabet [1‥σ] is the minimum number of insertions, and swaps of pairs of adjacent symbols, converting S into L. Contrarily to other correction distances, computing it is NP-Hard in the size σ of the alphabet. We describe an alg...

We consider the Minimum Coverage Kernel problem: given a set \(\mathcal {B}\) of d-dimensional boxes, find a subset of \(\mathcal {B}\) of minimum size covering the same region as \(\mathcal {B}\). This problem is \(\mathsf {NP}\)-hard, but as for many \(\mathsf {NP}\)-hard problems on graphs, the problem becomes solvable in polynomial time under r...

We describe and analyze the first adaptive algorithm for merging k convex hulls in the plane. This merging algorithm in turn yields a synergistic algorithm to compute the convex hull of a set of planar points, taking advantage both of the positions of the points and their order in the input. This synergistic algorithm asymptotically outperforms all...

There are efficient dynamic programming solutions to the computation of the Edit Distance from $S\in[1..\sigma]^n$ to $T\in[1..\sigma]^m$, for many natural subsets of edit operations, typically in time within $O(nm)$ in the worst-case over strings of respective lengths $n$ and $m$ (which is likely to be optimal), and in time within $O(n{+}m)$ in so...

The discrete Fr{\'e}chet distance is a measure of similarity between point sequences which permits to abstract differences of resolution between the two curves, approximating the original Fr{\'e}chet distance between curves. Such distance between sequences of respective length $n$ and $m$ can be computed in time within $O(nm)$ and space within $O(n...

We consider the Minimum Coverage Kernel problem: given a set B of $d$-dimensional boxes, find a subset of B of minimum size covering the same region as B. This problem is $\mathsf{NP}$-hard, but as for many $\mathsf{NP}$-hard problems on graphs, the problem becomes solvable in polynomial time under restrictions on the graph induced by $B$. We consi...

Motivated by the analysis of range queries in databases, we introduce the computation of the Depth Distribution of a set $\mathcal{B}$ of axis aligned boxes, whose computation generalizes that of the Klee's Measure and of the Maximum Depth. In the worst case over instances of fixed input size $n$, we describe an algorithm of complexity within $O({n...

We prove the existence of an algorithm A for computing 2D or 3D convex hulls that is optimal for every point set in the following sense: for every sequence σ of n points and for every algorithm A′ in a certain class A, the running time of A on input σ is at most a constant factor times the running time of A′ on the worst possible permutation of σ f...

Refinements of the worst case complexity over instances of fixed input size consider the input order or the input structure, but rarely both at the same time. Barbay et al. [2016] described ``synergistic'' solutions on multisets, which take advantage of the input order and the input structure, such as to asymptotically outperform any comparable sol...

We study the problem of computing the \textsc{Maxima} of a set of $n$ $d$-dimensional points. For dimensions 2 and 3, there are algorithms to solve the problem with order-oblivious instance-optimal running time. However, in higher dimensions there is still room for improvements. We present an algorithm sensitive to the structural entropy of the inp...

The action of teaching reinforces one's learning but requires some external quality control when done by nonprofessionals (e.g., a professor supervising teaching assistants). This quality control is costly and has limited the adoption of peer teaching in schools. Our solution to this problem is to ask students to create pedagogical material that wi...

Karp et al. (1988) described Deferred Data Structures for Multisets as "lazy" data structures which partially sort data, so that to support online rank and select queries, with the minimum amount of work in the worst case over instances of size $n$ and query number $q$ fixed (i.e., the query size). Barbay et al. (2016) refined this approach to take...

The problem of the Hanoi Tower is a classic exercise in recursive programming: the solution has a simple recursive definition, and its complexity and the matching lower bound are the solution of a simple recursive function (the solution is so easy that most students memorize it and regurgitate it at exams without truly understanding it). We describ...

We describe an algorithm computing an optimal prefix free code for $n$ unsorted positive weights in time within $O(n(1+\lg \alpha))\subseteq O(n\lg n)$, where the alternation $\alpha\in[1..n-1]$ measures the amount of sorting required by the computation. This asymptotical complexity is within a constant factor of the optimal in the algebraic decisi...

Divide-and-Conquer is a central paradigm for the design of algorithms,
through which fundamental computational problems like sorting arrays and
computing convex hulls are solved in optimal time within $\Theta(n\log{n})$ in
the worst case over instances of size $n$. A finer analysis of those problems
yields complexities within $O(n(1 + \mathcal{H}(n...

The Klee's Measure of $n$ axis-parallel boxes in $d$-dimensional space is the
volume of their union. It can be computed in time within $O(n^{d/2})$ in the
worst case. We describe three techniques to boost its computation: two based on
some type of "degeneracy" of the input, and one, more technical, on the
inherent "easiness" of some instance.
The f...

The Swap-Insert String-to-String Correction distance from a string $S$ to
another string $L$ on the alphabet $[1..d]$ is the minimum number of insertions
and swaps of pairs of adjacent symbols converting $S$ into $L$. We describe an
algorithm computing this distance in time polynomial in the lengths $n$ of $S$
and $m$ of $L$, so that it is as good...

We consider the dynamic version of the online multiselection problem for internal and external memory, in which q selection queries are requested on an unsorted array of N elements. Our internal memory result is 1-competitive with the offline result of Kaligosiet al.[ICALP 2005]. In particular, we extend the results of Barbaryet al.[ESA 2013] by su...

Given a set P of n points in RdRd, where each point p of P is associated with a weight w(p)w(p) (positive or negative), the Maximum-Weight Box problem is to find an axis-aligned box B maximizing ∑p∈B∩Pw(p)∑p∈B∩Pw(p).
We describe algorithms for this problem in two dimensions that run in the worst case in O(n2)O(n2) time, and much less on more specif...

We introduce a new online algorithm for the multiselection problem which performs a sequence of selection queries on a given unsorted array. We show that our online algorithm is 1-competitive in terms of data comparisons. In particular, we match the bounds (up to lower order terms) from the optimal offline algorithm proposed by Kaligosi et al. [ICA...

A deterministic algorithm is correct if it solves each instance in a valid way. This chapter shows the analysis techniques used to study the complexity of probabilistic algorithms can be just as easily used to analyze the approximation quality of combinatorial optimization algorithms. It gives a more formal definition of the concepts and notations...

In many cases, the relation between encoding space and execution time translates into combinatorial lower bounds on the computational complexity of algorithms in the comparison or external memory models. We describe a few cases which illustrate this relation in a distinct direction, where fast algorithms inspire compressed encodings or data structu...

We introduce an online version of the multiselection problem, in which q
selection queries are requested on an unsorted array of n elements. We provide
the first online algorithm that is 1-competitive with Kaligosi et al. [ICALP
2005] in terms of comparison complexity. Our algorithm also supports online
search queries efficiently.
We then extend ou...

Adaptive analysis is a well known technique in computational geometry, which renes the traditional worst case analysis over all instances of xed input size by taking into account some other parameters, such as the size of the output in the case of output sensitive analysis. We present two adaptive techniques for the computation of the convex hull i...

Given a set $P$ of $n$ planar points, two axes and a real-valued score
function $f()$ on subsets of $P$, the Optimal Planar Box problem consists in
finding a box (i.e. axis-aligned rectangle) $H$ maximizing $f(H\cap P)$. We
consider the case where $f()$ is monotone decomposable, i.e. there exists a
composition function $g()$ monotone in its two arg...

What is the difference between struggling for achievements and competing for success? What is the effect of competitions on a scientic field? What are the specific implications on TOC? In this opinionated essay, I address these questions and related ...

Binary relations are an important abstraction arising in many data
representation problems. The data structures proposed so far to represent them
support just a few basic operations required to fit one particular application.
We identify many of those operations arising in applications and generalize
them into a wide set of desirable queries for a...

We define and design succinct indexes for several abstract data types (ADTs). The concept is to design auxiliary data structures that ideally occupy asymptotically less space than the information-theoretic lower bound on the space required to encode the given data, and support an extended set of operations using the basic operators defined in the A...

Previous compact representations of permutations have focused on adding a
small index on top of the plain data $<\pi(1), \pi(2),...\pi(n)>$, in order to
efficiently support the application of the inverse or the iterated permutation.
In this paper we initiate the study of techniques that exploit the
compressibility of the data itself, while retainin...

LRM-Trees are an elegant way to partition a sequence of values into sorted consecutive blocks, and to express the relative
position of the first element of each block within a previous block. They were used to encode ordinal trees and to index integer
arrays in order to support range minimum queries on them. We describe how they yield many other co...

We present a data structure that stores a string s[1..n] over the alphabet [1..σ] in nH
0(s) + o(n)(H
0(s) + 1) bits, where H
0(s) is the zero-order entropy of s. This data structure supports the queries access and rank in time (Olg lgs)({\mathcal O}{{\rm lg lg}\sigma}), and the select query in constant time. This result improves on previously know...

LRM-Trees are an elegant way to partition a sequence of values into sorted consecutive blocks, and to express the relative position of the first element of each block within a previous block. They were used to encode ordinal trees and to index integer arrays in order to support range minimum queries on them. We describe how they yield many other co...

Binary relations are an important abstraction arising in a number of data representation problems. Each existing data structure specializes in the few basic operations required by one single application, and takes only limited advantage of the inherent redundancy of binary relations. We show how to support more general operations eciently, while ta...

The intersection of large ordered sets is a common problem in the context of the evaluation of boolean queries to a search engine. In this paper we propose several improved algorithms for computing the intersection of sorted arrays, and in particular for searching sorted array s in the intersection context. We perform an experimental comparison wit...

We present a data structure that stores a sequence $s[1..n]$ over alphabet
$[1..\sigma]$ in $n\Ho(s) + o(n)(\Ho(s){+}1)$ bits, where $\Ho(s)$ is the
zero-order entropy of $s$. This structure supports the queries \access, \rank\
and \select, which are fundamental building blocks for many other compressed
data structures, in worst-case time $\Oh{\lg\...

In the context of queries to indexed search engines such as Google, Barbay and Kenyon introduced and solved threshold set queries, answered by the set of references associated with at least $t$ keywords out of the $k$ given as input, for some constant parameter $t$. We slightly generalize those results to the easy case where weights are associated...

We prove the existence of an algorithm A for computing 2-d or 3-dconvex hulls that is optimal for every point set in the following sense: for every set S of n points and for every algorithm A' in a certain class A, the running time of A on the worst permutation of S for A is at most a constant factor times the running time of A' on the worst permut...

We explore various techniques to compress a permutation overn integers, taking advantage of ordered subsequences in , while supporting its application (i) and the application of its inverse 1(i) in small time. Our compression schemes yield several interesting byproducts, in many cases matching, improving or extending the best existing results on ap...

We present an optimal adaptive algorithm for context queries in tagged content. The queries consist of locating instances of a tag within a context specified by the query using patterns with preorder, ancestor-descendant and proximity operators in the document tree implied by the tagged content. The time taken to resolve a query $Q$ on a document t...

Traditionally the analysis of algorithms measures the complexity of a problem or algorithm in terms of the worst-case behavior over all inputs of a given size. However, in certain cases an improved algorithm can be obtained by considering a finer partition of the input space. As this idea has been independently rediscovered in many areas, the works...

The intersection of sorted arrays problem has applications in search engines such as Google. Previous work has proposed and compared deterministic algorithms for this problem, in an adaptive analysis based on the encoding size of a certificate of the result (cost analysis).We define the alternation analysis, based on the nondeterministic complexity...

We prove a tight asymptotic bound of �(� log(n/�)) on the worst case computational complexity of the convex hull of the union of two convex objects of sizes summing to n requiringorientation tests to certify the answer. Our algorithm is deterministic, it uses portions of the convex hull of input objects to describe the final convex hull, and it tak...

Problem DefinitionFigure 1A permutation on \( { \{1,\ldots,8\} }
\), with two cycles and three back pointers. The full black lines
correspond to the permutation, the dashed lines to the back pointers and the gray
lines to the edges traversed to compute \( { \pi^{-1}(3) }
\)A succinct data structure for a given data type is a representation of the u...

In many applications, the properties of an object being modeled are stored as labels on vertices or edges of a graph. In this
paper, we consider succinct representation of labeled graphs. Our main results are the succinct representations of labeled
and multi-labeled graphs (we consider vertex labeled planar triangulations, as well as edge labeled p...

We define and design succinct indexes for several abstract data types (ADTs). The concept is to design auxiliary data structures that ideally occupy asymptotically less space than the information- theoretic lower bound on the space required to encode the given data, and support an extended set of operations using the basic operators defined in the...

In many applications, the properties of an object being modeled are stored as labels on vertices or edges of a graph. In this
paper, we consider succinct representation of labeled graphs. Our main results are the succinct representations of labeled
and multi-labeled graphs (we consider planar triangulations, planar graphs and k-page graphs) to supp...

The most heavily used methods to answer conjunctive queries on binary relations (such as the one associating keywords with
web pages) are based on inverted lists stored in sorted arrays and use variants of binary search. We show that a succinct
representation of the binary relation permits much better results, while using space within a lower order...

The intersection of large ordered sets is a common problem in the context of the evaluation of boolean queries to a search engine. In this paper we engineer a better algorithm for this task, which im- proves over those proposed by Demaine, Munro and Lopez-Ortiz (SODA 2000/ALENEX 2001), by using a variant of interpolation search. More specifically,...

The intersection of large ordered sets is a common problem in the context of the evaluation of boolean queries to a search engine. In this paper we propose several improved algorithms for computing the intersection of sorted arrays, and in particular for searching sorted array s in the intersection context. We perform an experimental comparison wit...

We consider in this paper the problem of encoding XML documents in small space while still support- ing XPath Location steps efficiently. We model XML documents as multi-labeled trees, and propose for those an encoding which takes space close to the lower bound suggested by information theory, while still supporting the search for the ancestors, de...

Given k sorted arrays, the t-Threshold problem, which is motivated by indexed search engines, consists of finding the elements which are present in at least t of the arrays. We present a new deterministic algorithm for it and prove that, asymptotically in the sizes of the arrays, it is optimal in the alternation model used to study adaptive algorit...

The ”Intersection of sorted arrays” problem has applications in indexed search engines such as Google. Previous works propose
and compare deterministic algorithms for this problem, and offer lower bounds on the randomized complexity in different models
(cost model, alternation model).
We refine the alternation model into the redundancy model to pr...

Consider the problem of computing the intersection of k sorted sets. In the comparison model, we prove a new lower bound which depends on the non-deterministic complexity of the instance, and implies that the algorithm of Demaine, L'opez-Ortiz and Munro [2] is usually optimal in this "adaptive" sense. We extend the lower bound and the algorith...

Consider the problem of computing the intersection of k sorted sets whose sizes sum to n. In the comparison model, we prove a new lower bound which depends on the non-deterministic complexity of the instance, and implies that the algorithm of Demaine, Lpez-Ortiz and Munro [1] is optimal in this "adaptive" sense (for k much smaller than n). We exten...

We propose a discrete variant of the Bak-Sneppen model for self-organized criticality. In this process, a configuration is an n-bit word, and at each step one chooses a random bit of minimum value (usually a zero) and replaces it and its two neighbors by independent Bernoulli variables with parameter p. We prove bounds on the average number of ones...

We propose a discrete variant of the Bak-Sneppen model for Self-Organized criticality. In this process, a configuration is an n-bit word, and at each step one chooses a random bit of minimum value and replaces it and its two neighbors by independent Bernoulli variables with parameter p. We prove boundson the average number of ones in the stationary...

We propose an adaptive algorithm for context queries (queries expressed as preorder and ancestor- descendant relations on labeled nodes), which can be used to find patterns in XML documents. Our algorithm takes advantage of the correlation between terms of the query without any preprocessed information, and it runs in time (kd(lg lg min(n,s)+lg lg(...

The intersection of large ordered sets is a common problem in the context of the evaluation of boolean queries to a search engine. In this paper we propose several parallel algorithms for computing the intersection of sorted arrays, taking advantage of the parallelism provided by the new generation of multicore processors and of the programming lib...

Any sorting algorithm in the comparison model defines an encoding scheme for permutations. As adaptive sorting algorithms perform o(n lg n) comparisons on restricted classes of permutations, each defines one or more compression schemes for permutations. In the case of the compression schemes inspired by Adaptive Merge Sort, a small amount of additi...

Considering indexes and algorithms to answer XPath queries over XML data, we propose an index structure and a related algorithm, both adapted to the comparison model, where elements can be accessed non-sequentially. The indexing scheme uses classical labelling techniques, but structurally represents the ancestor-descendant relationships of nodes of...

The index of an XML document typically consists of a set of lists of node references. For each node type, a list gives the references of all nodes of this type, in the order defined by the prefix traversal of the document. A twig pattern matching query is a labeled tree structure (a "twig pat-tern"), it is answered by the list of all occurrences of...

Crossover is believed to initiate at specific sites called hotspots, by combinational-repair mechanism in which the initi-ating hotspot is replaced by a copy of its homologue. Boulton et al. studied through simulation the effect of this mechanism, and observed in their model that active hotspot alleles are rapidly replaced by inactive alleles. This...

From 19.01. to 24.04.2009, the Dagstuhl Seminar 09171 ``Adaptive, Output Sensitive, Online and Parameterized Algorithms '' was held in Schloss Dagstuhl~--~Leibniz Center for Informatics. During the seminar, several participants presented their current research, and ongoing work and open problems were discussed. Abstracts of the presentations given...

## Projects

Projects (10)

We aim to carefully design applications to entertain Other Animals Than Humans (OATHs) while getting useful data about their sensory and cognitive abilities.

Initiated by Christina Hunger and Stella, the 2020s has seen a trend in using AAC techniques designed for humans in order for humans to improve their communication with Other Animals Than Humans (OATHs), but with very little formal data being gathered. We aim to provide solutions to capture and to help analyzing detailed logs of such inter species communication, through the design of 1) easy-to-reproduce specialized "Big Keys" keyboards, 2) a configurable "Communication Board" application, and 3) a server gathering the usage logs of the various instances of communication boards with some basic tools to analyze them.