Gonzalo Del C Navarro

Gonzalo Del C Navarro
Pontificia Universidad Javeriana · Department of Architecture

Professor
Arquitectura desde los sistemas complejos . Planificación del territorio rural en Colombia.

About

266
Publications
15,593
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
6,829
Citations

Publications

Publications (266)
Preprint
The $r$-index (Gagie et al., JACM 2020) represented a breakthrough in compressed indexing of repetitive text collections, outperforming its alternatives by orders of magnitude. Its space usage, $\mathcal{O}(r)$ where $r$ is the number of runs in the Burrows-Wheeler Transform of the text, is however larger than Lempel-Ziv and grammar-based indexes,...
Article
Shannon’s entropy is a clear lower bound for statistical compression. The situation is not so well understood for dictionary-based compression. A plausible lower bound is $\boldsymbol {b}$ , the least number of phrases of a general bidirectional parse of a text, where phrases can be copied from anywhere else in the text. Since computing $\boldsy...
Article
Full-text available
Some of the anatomical and functional basis of cognitive impairment in multiple sclerosis (MS) currently remains unknown. In particular, there is scarce knowledge about modulations in induced EEG (nonphase activity) for diverse frequency bands related to attentional deficits in this pathology. The present study analyzes phase and nonphase alpha and...
Preprint
Full-text available
We describe a grammar for DNA sequencing reads from which we can compute the BWT directly. Our motivation is to perform in succinct space genomic analyses that require complex string queries not yet supported by repetition-based self-indexes. Our approach is to store the set of reads as a grammar, but when required, compute its BWT to carry out the...
Article
The predecessor problem is a key component of the fundamental sorting-and-searching core of algorithmic problems. While binary search is the optimal solution in the comparison model, more realistic machine models on integer sets open the door to a rich universe of data structures, algorithms, and lower bounds. In this article, we review the evoluti...
Preprint
The research on indexing repetitive string collections has focused on the same search problems used for regular string collections, though they can make little sense in this scenario. For example, the basic pattern matching query "list all the positions where pattern $P$ appears" can produce huge outputs when $P$ appears in an area shared by many d...
Preprint
Let a text $T[1..n]$ be the only string generated by a context-free grammar with $g$ (terminal and nonterminal) symbols, and of size $G$ (measured as the sum of the lengths of the right-hand sides of the rules). Such a grammar, called a grammar-compressed representation of $T$, can be encoded using essentially $G\lg g$ bits. We introduce the first...
Article
Full-text available
We introduce a compressed suffix array representation that, on a text $T$ of length $n$ over an alphabet of size $\sigma$, can be built in $O(n)$ deterministic time, within $O(n\log\sigma)$ bits of working space, and counts the number of occurrences of any pattern $P$ in $T$ in time $O(|P| + \log\log_w \sigma)$ on a RAM machine of $w=\Omega(\log n)...
Article
Motivation: Genome repositories are growing faster than our storage capacities, challenging our ability to store, transmit, process and analyze them. While genomes are not very compressible individually, those repositories usually contain myriads of genomes or genome reads of the same species, thereby creating opportunities for orders-of-magnitude...
Chapter
Suffix trees are a fundamental data structure in stringology, but their space usage, though linear, is an important problem in applications. We design and implement a new compressed suffix tree targeted to highly repetitive texts, such as large genomic collections of the same species. Our suffix tree builds on Block Trees, a recent Lempel-Ziv-bound...
Chapter
Document listing on string collections is the task of finding all documents where a pattern appears. It is regarded as the most fundamental document retrieval problem, and is useful in various applications. Many of the fastest-growing string collections are composed of very similar documents, such as versioned code and document collections, genome...
Article
The Burrows-Wheeler Transform (BWT) has become since its introduction a key tool for representing large text collections in compressed space while supporting indexed searching: on a text of length n over an alphabet of size σ, it requires O(𝑛 1g σ) bits of space, instead of the O(𝑛 1g 𝑛) bits required by classical indexes. A challenge for its adopt...
Article
Full-text available
La periodontitis es la inflamación de los tejidos periodontales que puede causar destrucción de los tejidos de soporte del diente, ocasionando pérdida ósea y dental. La periodontitis crónica es causada por bacterias periodontopatógenas como Porphyromonas gingivalis, Aggregatibacter actinomycetemcomitams, Tannerella forsythia y Treponema dentícola,...
Preprint
Worst-case optimal join algorithms have gained a lot of attention in the database literature. We now count with several different algorithms that have all been shown to be optimal in the worst case, and many of them have also been implemented and tested in practice. However, the implementation of these algorithms often requires an enhanced indexing...
Article
Signaling pathways are responsible for the regulation of cell processes, such as monitoring the external environment, transmitting information across membranes, and making cell fate decisions. Given the increasing amount of biological data available and the recent discoveries showing that many diseases are related to the disruption of cellular sign...
Article
Venezuela vive una tragedia, que no se inició como muchos creen en 1999. Empezó antes, en 1993 cuando se utilizó el sistema de justicia para saldar deudas políticas. Las barras que leerán no van tan atrás. Se inician con ocasión a la victoria de la oposición en las elecciones parlamentarias frente al gobierno en diciembre de 2015. Estas fueron escr...
Article
Document retrieval structures index a collection of string documents, to retrieve those that are relevant to query strings p: document listing retrieves all documents where p appears; top-k retrieval retrieves the k most relevant of those. Classical structures use too much space in practice. Most current research uses compressed suffix arrays, but...
Preprint
Document listing on string collections is the task of finding all documents where a pattern appears. It is regarded as the most fundamental document retrieval problem, and is useful in various applications. Many of the fastest-growing string collections are composed of very similar documents, such as versioned code and document collections, genome...
Preprint
Suffix trees are a fundamental data structure in stringology, but their space usage, though linear, is an important problem for its applications. We design and implement a new compressed suffix tree targeted to highly repetitive texts, such as large genomic collections of the same species. Our suffix tree tree builds on Block Trees, a recent Lempel...
Article
Full-text available
The lack of regulation that exists in Argentina regarding the importance of preservation of water reservoirs and even less of promoting them as a way to soften the climate and environmental change produced by the growing anthropization of the region makes crucial and strategic for the Pampas in the Buenos Aires Province to properly understand the b...
Book
Este Manual presenta los resultados de una investigación realizada durante los años 2017 y 2018 por un equipo de la Universidad Católica Boliviana en la microcuenca terrestre de una laguna urbana andina fuertemente eutrofizada (Laguna Alalay). Se lleva a cabo una actualización de la caracterización biogeofísica del entorno terrestre de la Laguna, s...
Preprint
The Burrows-Wheeler Transform (BWT) is an important technique both in data compression and in the design of compact indexing data structures. It has been generalized from single strings to collections of strings and some classes of labeled directed graphs, such as tries and de Bruijn graphs. The BWTs of repetitive datasets are often compressible us...
Article
Full-text available
What we already know about this topic: Hospital mortality in acute respiratory distress syndrome is approximately 40%, but mortality and trajectory in "mild" acute respiratory distress syndrome (classified only since 2012) are unknown, and many cases are not detected WHAT THIS ARTICLE TELLS US THAT IS NEW: Approximately 80% of cases of mild acute...
Article
Identification of shrubland types based on their floristic composition and on ecological factors in Central and southern Bolivian Altiplano (Bolivia, central-western South America). Vascular plants were recorded in a field survey of 101 relevés. Relevés were subjected to hierarchical agglomerative classification to define numerical vegetation group...
Article
Irreducible grammars are a class of context-free grammars with well-known representatives, such as Repair (with a few tweaks), Longest Match, Greedy, and Sequential. We show that a grammar-based compression method described by Kieffer and Yang (2000) is upper bounded by the high-order empirical entropy of the string when the underlying grammar is i...
Preprint
Full-text available
The advent of high-throughput sequencing has resulted in massive genomic datasets, some consisting of assembled genomes but others consisting of raw reads. We consider how to reduce the amount of space needed to index a set of reads, in particular how to reduce the number of runs in the Burrows-Wheeler Transform (BWT) that is the basis of FM-indexi...
Article
Full-text available
Many proteins work together with others in groups called complexes in order to achieve a specific function. Discovering protein complexes is important for understanding biological processes and predict protein functions in living organisms. Large-scale and throughput techniques have made possible to compile protein-protein interaction networks (PPI...
Data
Table A1: Mining algorithm. Discovering DSGs in DAPG. Table A2: Detection of an DSG starting at a given node in DAPG. Table A3: Algorithms for redundancy-filtering. Table A4-A17: DAPG results with different parameters and input PPI networks. Table A18-A29: Other method results with different parameters and input PPI networks. (PDF)
Conference Paper
The Block Tree is a recently proposed data structure that reaches compression close to Lempel-Ziv while supporting efficient direct access to text substrings. In this paper we show how a self-index can be built on top of a Block Tree so that it provides efficient pattern searches while using space proportional to that of the original data structure...
Article
We consider document listing on string collections, that is, finding in which strings a given pattern appears. In particular, we focus on repetitive collections: a collection of size $N$ over alphabet $[1,\sigma]$ is composed of $D$ copies of a string of size $n$, and $s$ single-character or block edits are applied on ranges of copies. We introduce...
Conference Paper
Full-text available
In the Metropolitan Area of the Buenos Aires City (AMBA) does not exist regulation which prioritizes the need to preserve water reservoirs and still less to generate them to compensate for the growing anthropization of the territory and to mitigate the effects of climate change. The objective of this work is to evaluate the impact of altering the m...
Article
Full-text available
Indexing highly repetitive texts --- e.g., genomic databases, software repositories and versioned text collections --- has become an important problem since the turn of the millennium. A simple solution is to use an FM-index based on the run-length compressed Burrows-Wheeler Transform (RLBWT) of the text, which achieves excellent compression in pra...
Article
Given an array A[1, n] of elements with a total order, we consider the problem of building a data structure that solves two queries: (a) selection queries receive a range [i, j] and an integer k and return the position of the kth largest element in A[i, j]; (b) top-k queries receive [i, j] and k and return the positions of the k largest elements in...
Article
Full-text available
In the range $\alpha$-majority query problem, we preprocess a given sequence $S[1..n]$ for a fixed threshold $\alpha \in (0, 1]$, such that given a query range $[i..j]$, the symbols that occur more than $\alpha (j-i+1)$ times in $S[i..j]$ can be reported efficiently. We design the first compressed solution to this problem in dynamic settings. Our d...
Article
Full-text available
Blelloch and Farzan (2010) showed how we can represent succinctly any planar embedding of a connected planar simple graph while still supporting constant-time navigation queries, but their representation does not allow multi-edges. Other authors have shown how to represent any connected planar multigraph compactly while supporting fast navigation,...
Article
Fischer and Heun [SICOMP 2011] proposed the first Range Minimum Query (RMQ) data structure on an array that uses bits and answers queries in time without accessing A. Their scheme converts the Cartesian tree of A into a general tree, which is represented using DFUDS. We show that, by using instead the BP representation, the formula becomes simpler...
Article
Recent compressed suffix trees targeted to highly repetitive sequence collections reach excellent compression performance, but operation times are very high. We design a new suffix tree representation for this scenario that still achieves very low space usage, only slightly larger than the best previous one, but supports the operations orders of ma...
Article
The Block Tree is a recently proposed data structure that reaches compression close to Lempel-Ziv while supporting efficient direct access to text substrings. In this paper we show how a self-index can be built on top of a Block Tree so that it provides efficient pattern searches while using space proportional to that of the original data structure...
Conference Paper
Compressed data structures provide the same functionality as their classical counterparts while using entropy-bounded space. While they have succeeded in a wide range of static structures, which do not undergo updates, they are less mature in the dynamic case, where the theory-versus-practice gap is wider. We implement compressed dynamic bitvectors...
Article
The fully-functional succinct tree representation of Navarro and Sadakane (ACM Transactions on Algorithms, 2014) supports a large number of operations in constant time using $2n+o(n)$ bits. However, the full idea is hard to implement. Only a simplified version with $O(\log n)$ operation time has been implemented and shown to be practical and compet...
Conference Paper
Full-text available
In this paper, we seek to determine the validity of a set of arguments and criteria designed to assess the possible progress of a science of consciousness (contemporary to the "hard problem of consciousness" proposed by Chalmers). For the purposes of this analysis, two theories or models of great impact on current neuroscience and psychology of con...
Article
Full-text available
We face the problem of designing a data structure that can report all $\tau$-majorities within any range of an array $A[1,n]$, without storing $A$. A $\tau$-majority in a range $A[i,j]$, for $0<\tau< 1$, is an element that occurs more than $\tau(j-i+1)$ times in $A[i,j]$. We show that $\Omega(n\log(1/\tau))$ bits are necessary for such a data struc...
Article
Full-text available
The present work is about preparation and characterization of RF sputtered Cu films on cotton by the usage of a Magnetron Sputter Source and 99.995% purity Cu target at room temperature. Cotton fabric samples of 1, 2 and 4 min of sputtering time at discharge pressure of 1×10-2 Torr and distance between target and sample of 8 cm were used. The main...
Article
Proximity searching is the problem of retrieving, from a given database, those objects closest to a query. To avoid exhaustive searching, data structures called indexes are built on the database prior to serving queries. The curse of dimensionality is a well-known problem for indexes: in spaces with sufficiently concentrated distance histograms, no...
Article
Full-text available
The computational cost of transfer matrix methods for the Potts model is directly related to the problem of \textit{into how many ways can two adjacent blocks of a lattice be connected}. Answering this question leads to the generation of a combinatorial set of lattice configurations. This set defines the \textit{configuration space} of the problem,...
Article
Full-text available
Network packet tracing has been used for many different purposes during the last few decades, such as network software debugging, networking performance analysis, forensic investigation, and so on. Meanwhile, the size of packet traces becomes larger, as the speed of network rapidly increases. Thus, to handle huge amounts of traces, we need not only...
Conference Paper
Full-text available
Let $\mathcal{D} = \{\T_1,\T_2, \dots,\T_D\}$ be a collection of $D$ string documents of $n$ characters in total, that are drawn from an alphabet set $\Sigma=[\sigma]$. The \emph{top-$k$ document retrieval problem} is to preprocess $\D$ into a data structure that, given a query $(P[1..p],k)$, can return the $k$ documents of $\D$ most relevant to pa...
Article
We consider the problem of preprocessing an array A[1..n] to answer range selection and range top-k queries. Given a query interval [i..j] and a value k, the former query asks for the position of the kth largest value in A[i..j], whereas the latter asks for the positions of all the k largest values in A[i..j]. We consider the encoding version of th...
Conference Paper
Given a collection of strings (called documents), the top-k document retrieval problem is that of, given a string pattern p, finding the k documents where p appears most often. This is a basic task in most information retrieval scenarios. The best current implementations require 20–30 bits per character (bpc) and k to 4k microseconds per query, or...
Conference Paper
Let \({\cal D}\) be a collection of string documents of n characters in total. The top-k document retrieval problem is to preprocess \({\cal D}\) into a data structure that, given a query (P,k), can return the k documents of \({\cal D}\) most relevant to pattern P. The relevance of a document d for a pattern P is given by a predefined ranking funct...
Article
We address the problem of indexing a collection D={T1,T2,...TD}D={T1,T2,...TD} of D string documents of total length n , so that we can efficiently answer top -k queries : retrieve k documents most relevant to a pattern P of length p given at query time. There exist linear-space data structures, that is, using O(n)O(n) words, that answer such queri...
Article
Full-text available
The {\em wavelet tree} is a flexible data structure that permits representing sequences $S[1,n]$ of symbols over an alphabet of size $\sigma$, within compressed space and supporting a wide range of operations on $S$. When $\sigma$ is significant compared to $n$, current wavelet tree representations incur in noticeable space or time overheads. In th...
Article
We consider the problem of encoding range minimum queries (RMQs): given an array A[1..n] of distinct totally ordered values, to pre-process A and create a data structure that can answer the query RMQ(i,j), which returns the index containing the smallest element in A[i..j], without access to the array A at query time. We give a data structure whose...
Conference Paper
Full-text available
The present work is about preparation and characterization of RF sputtered Cu films on cotton by the usage of a Magnetron Sputter Source and 99.995% purity Cu target at room temperature. Cotton fabric samples of 1, 2 and 4 min of sputtering time at discharge pressure of Torr and distance between target and sample of 8 cm were used. The main goal wa...
Conference Paper
Let \(\cal{D}\)= {d 1,d 2,...d D } be a given set of D string documents of total length n. Our task is to index \(\cal{D}\) such that the k most relevant documents for an online query pattern P of length p can be retrieved efficiently. There exist linear space data structures of O(n) words for answering such queries in optimal O(p + k) time. In th...
Article
Full-text available
We consider the problem of encoding range minimum queries (RMQs): given an array A[1..n] of distinct totally ordered values, to pre-process A and create a data structure that can answer the query RMQ(i,j), which returns the index containing the smallest element in A[i..j], without access to the array A at query time. We give a data structure whose...
Conference Paper
Document listing is the problem of preprocessing a set of sequences, called documents, so that later, given a short string called the pattern, we retrieve the documents where the pattern appears. While optimal-time and linear-space solutions exist, the current emphasis is in reducing the space requirements. Current document listing solutions build...
Conference Paper
We consider the problem of retrieving the k documents from a collection of strings where a given pattern P appears most often. We show that, by representing the collection using a Compressed Suffix Array CSA, a data structure using the asymptotically optimal |CSA|+o(n) bits can answer queries in the time needed by CSA to find the suffix array inter...
Conference Paper
We study the problem of encoding the positions the top-k elements of an array A[1··n] for a given parameter 1≤k≤n. Specifically, for any i and j, we wish create a data structure that reports the positions of the largest k elements in A[i··j] in decreasing order, without accessing A at query time. This is a natural extension of the well-known encodi...
Article
Compressed representations have become effective to store and access large Web and social graphs, in order to support various graph querying and mining tasks. The existing representations exploit various typical patterns in those networks and provide basic navigation support. In this paper, we obtain unprecedented results by finding “dense subgraph...
Conference Paper
We introduce a new representation of the inverted index that performs faster ranked unions and intersections while using less space. Our index is based on the treap data structure, which allows us to intersect/merge the document identifiers while simultaneously thresholding by frequency, instead of the costlier two-step classical processing methods...