Alexandre P. Francisco's research while affiliated with Inesc-ID and other places

Publications (104)

Preprint
Full-text available
Epidemiological surveillance and phylogenetic studies rely nowadays on processing and analysing huge volumes of data. Processing tasks consist on running and refining a series of intertwined computational tasks. And, despite of existing several web applications for data processing and interactive visualization for phylogenetic studies, integrating...
Article
Full-text available
Drawing on discussions about the manifestation of incivility in online news comments sections, our research operationalizes the concept of incivility and suggests a methodological approach that relies on manual and automated text analysis and regression analysis to assess its prevalence and identify its predictors. Relying on a data analysis of ove...
Chapter
Full-text available
The brain’s functional networks can be assessed using imaging techniques like functional magnetic resonance imaging (fMRI) and electroencephalography (EEG). Recent studies have suggested a link between the dynamic functional connectivity (dFC) captured by these two modalities, but the exact relationship between their spatiotemporal organization is...
Article
Energy forecasting covers a wide range of prediction problems in the utility industry, such as forecasting demand, generation, price, and power load over time horizons and different power levels. Short-term load forecasting allows the system operator to make important decisions during network management and planning, which represents an economic im...
Article
This article presents a fast SIMD Hilbert space-filling curve generator, that supports a new cache-oblivious blocking-scheme technique applied to the out-of-place transposition of general matrices. Matrix operations found in high performance computing libraries are usually parameterized based on host microprocessor specifications in order to minimi...
Article
Full-text available
Computing the product of the (binary) adjacency matrix of a large graph with a real-valued vector is an important operation that lies at the heart of various graph analysis tasks, such as computing PageRank. In this paper, we show that some well-known webgraph and social graph compression formats are computation-friendly, in the sense that they all...
Article
Given a pattern string p of size m and a text string t, the problem of order-preserving pattern matching (OPPM) is to find all substrings of t that satisfy one of the possible orderings defined by p. This problem is well-studied and has several applications on time series analysis. However given its strict nature this model is unable to deal with i...
Article
Full-text available
Data are an important asset that the electric power industry have available today to support management decisions, excel in operational efficiency, and be more competitive. The advent of smart grids has increased power grid sensorization and so, too, the data availability. However, the inability to recognize the value of data beyond the siloed appl...
Article
Full-text available
Graphs are found in a plethora of domains, including online social networks, the World Wide Web and the study of epidemics, to name a few. With the advent of greater volumes of information and the need for continuously updated results under temporal constraints, it is necessary to explore alternative approaches that further enable performance impro...
Conference Paper
Full-text available
When we address phylogeny, telecommunication/electric networks, among other networks, we are often interested in studying measures that go beyond shortest path/fixed distance properties-if we want to know how strongly connected a network is, i.e., which links are fundamental to keep the network connected and which are redundant. In algorithms for p...
Article
We address the problem of representing dynamic graphs using k2-trees. The k2-tree data structure is one of the succinct data structures proposed for representing static graphs, and binary relations in general. It relies on compact representations of bit vectors. Hence, by relying on compact representations of dynamic bit vectors, we can also repres...
Article
Full-text available
From social contracts to climate agreements, individuals engage in groups that must collectively reach decisions with varying levels of equality and fairness. These dilemmas also pervade distributed artificial intelligence, in domains such as automated negotiation, conflict resolution, or resource allocation, which aim to engineer self-organized gr...
Article
Full-text available
We consider the problem of identifying tandem scattered subsequences within a string. Our algorithm identifies a longest subsequence which occurs twice without overlap in a string. This algorithm is based on the Hunt-Szymanski algorithm, therefore its performance improves if the string is not self similar, which occurs naturally on strings over lar...
Conference Paper
Full-text available
From employment contracts to climate agreements, individuals often engage in groups that must reach decisions with varying levels of fairness. These dilemmas also pervade AI, e.g. in automated negotiation , conflict resolution or resource allocation. As evidenced by the Ultimatum Game, payoff maximization is frequently at odds with fairness. Elicit...
Article
Full-text available
The value of graph-based big data can be unlocked by exploring the topology and metrics of the networks they represent, and the computational approaches to this exploration take on many forms. For the use-case of performing global computations over a graph, it is first ingested into a graph processing system from one of many digital representations...
Preprint
Full-text available
From social contracts to climate agreements, individuals engage in groups that must collectively reach decisions with varying levels of equality and fairness. These dilemmas also pervade Distributed Artificial Intelligence, in domains such as automated negotiation, conflict resolution or resource allocation. As evidenced by the well-known Ultimatum...
Preprint
Full-text available
With growing exchanges of people and merchandise between countries, epidemics have become an issue of increasing importance and huge amounts of data are being collected every day. Hence, analyses that were usually run in personal computers and desktops are no longer feasible. It is now common to run such tasks in High-performance computing (HPC) en...
Preprint
Full-text available
Evolutionary relationships between species are usually inferred through phylogenetic analysis, which provides phylogenetic trees computed from allelic profiles built by sequencing specific regions of the sequences and abstracting them to categorical indexes. With growing exchanges of people and merchandise, epidemics have become increasingly import...
Article
Parity check matrices (PCMs) are used to define linear error correcting codes and ensure reliable information transmission over noisy channels. The set of codewords of such a code is the null space of this binary matrix. We consider the problem of minimizing the number of one-entries in parity-check matrices. In the maximum-likelihood (ML) decoding...
Article
Typing methods are widely used in the surveillance of infectious diseases, outbreaks investigation and studies of the natural history of an infection. Moreover, their use is becoming standard, in particular with the introduction of high-throughput sequencing. On the other hand, the data being generated are massive and many algorithms have been prop...
Preprint
We consider the problem of identifying tandem scattered subsequences within a string. Our algorithm identifies a longest subsequence which occurs twice without overlap in a string. This algorithm is based on the Hunt-Szymanski algorithm, therefore its performance improves if the string is not self similar. This occurs naturally on strings over larg...
Preprint
Full-text available
Typing methods are widely used in the surveillance of infectious diseases, outbreaks investigation and studies of the natural history of an infection. And their use is becoming standard, in particular with the introduction of High Throughput Sequencing (HTS). On the other hand, the data being generated is massive and many algorithms have been propo...
Preprint
We consider the complexity properties of modern puzzle games, Hexiom, Cut the Rope and Back to Bed. The complexity of games plays an important role in the type of experience they provide to players. Back to Bed is shown to be PSPACE-Hard and the first two are shown to be NP-Hard. These results give further insight into the structure of these games...
Preprint
Parity check matrices (PCMs) are used to define linear error correcting codes and ensure reliable information transmission over noisy channels. The set of codewords of such a code is the null space of this binary matrix. We consider the problem of minimizing the number of one-entries in parity-check matrices. In the maximum-likelihood (ML) decoding...
Preprint
We consider the problem of updating the information about multiple longest common sub-sequences. This kind of sub-sequences is used to highlight information that is shared across several information sequences, therefore it is extensively used namely in bioinformatics and computational genomics. In this paper we propose a way to maintain this inform...
Preprint
Lempel-Ziv is an easy-to-compute member of a wide family of so-called macro schemes; it restricts pointers to go in one direction only. Optimal bidirectional macro schemes are NP-complete to find, but they may provide much better compression on highly repetitive sequences. We consider the problem of approximating optimal bidirectional macro schemes...
Conference Paper
Full-text available
We address the problem of representing dynamic graphs using k²-trees. The k²-tree data structure is one of the succinct data structures proposed for representing static graphs, and binary relations in general. It relies on compact representations of bit vectors. Hence, by relying on compact representations of dynamic bit vectors, we can also repres...
Preprint
Large graphs can be processed with single high-memory or distributed systems, focusing on querying the graph or executing algorithms using high-level APIs. For systems focused on processing graphs, common use-cases consist in executing algorithms such as PageRank or community detection on top of distributed systems that read from storage (local or...
Preprint
We address the problem of representing dynamic graphs using $k^2$-trees. The $k^2$-tree data structure is one of the succinct data structures proposed for representing static graphs, and binary relations in general. It relies on compact representations of bit vectors. Hence, by relying on compact representations of dynamic bit vectors, we can also...
Chapter
With the advent of high-throughput sequencing methods, new ways of visualizing and analyzing increasingly amounts of data are needed. Although some software already exist, they do not scale well or require advanced skills to be useful in phylogenetics. The aim of this thesis was to implement three community finding algorithms – Louvain, Infomap and...
Poster
Full-text available
Social media often reveals a complex interplay between positive and negative ties. And real online social networks are proven to show high social balance. Yet, the origin of such complex patterns of interaction remains largely elusive. In this work we study how third parties may sway our perception of others. We build a model of peer-influence rely...
Presentation
With the advent of high-throughput sequencing methods, new ways of visualizing and analyzing increasingly amounts of data are needed. Although some software already exist, they do not scale well or require advanced skills to be useful in phylogenetics. The aim of this thesis was to implement three community finding algorithms – Louvain, Infomap and...
Preprint
Given an indeterminate string pattern $p$ and an indeterminate string text $t$, the problem of order-preserving pattern matching with character uncertainties ($\mu$OPPM) is to find all substrings of $t$ that satisfy one of the possible orderings defined by $p$. When the text and pattern are determinate strings, we are in the presence of the well-st...
Chapter
Full-text available
Measuring the inner characteristics of financial markets risks have been proven to be key at understanding what promotes financial instability and volatility swings. Advances in complex network analysis have shown the capability to characterize the specificities of financial networks, ranging from credit networks, volatility networks, and supply-ch...
Article
Full-text available
RNA-Seq is a Next-Generation Sequencing (NGS) protocol for sequencing the messenger RNA in a cell and generates millions of short sequence fragments, reads, in a single run. These reads can be used to measure levels of gene expression and to identify novel splice variants of genes. One of the critical steps in an RNA-Seq experiment is mapping NGS r...
Preprint
Graphs are found in a plethora of domains, including online social networks, the World Wide Web and the study of epidemics, to name a few. With the advent of greater volumes of information and the need for continuously updated results under temporal constraints, it is necessary to explore novel approaches that further enable performance improvement...
Article
Full-text available
Current methods struggle to reconstruct and visualize the genomic relationships of large numbers of bacterial genomes. GrapeTree facilitates the analyses of large numbers of allelic profiles by a static"GrapeTree Layout" algorithm which supports interactive visualizations of large trees within a web browser window. GrapeTree also implements a novel...
Article
The understanding of bacterial population genetics and evolution is crucial in epidemic outbreak studies and pathogen surveillance. However, all epidemiological studies are limited to their sampling capacities which, by being usually biased or limited due to economic constraints, can hamper the real knowledge of the bacterial population structure o...
Conference Paper
Full-text available
Community networks (CNs) have seen an increase in the last fifteen years. Their members contact nodes which operate Internet proxies, web servers, user file storage and video streaming services, to name a few. Detecting communities of nodes with properties (such as co-location) and assessing node eligibility for service placement is thus a key-fact...
Article
Full-text available
Ischemic stroke is a leading cause of disability and death worldwide among adults. The individual prognosis after stroke is extremely dependent on treatment decisions physicians take during the acute phase. In the last five years, several scores such as the ASTRAL, DRAGON, and THRIVE have been proposed as tools to help physicians predict the patien...
Article
Full-text available
Background Microbial typing methods are commonly used to study the relatedness of bacterial strains. Sequence-based typing methods are a gold standard for epidemiological surveillance due to the inherent portability of sequence and allelic profile data, fast analysis times and their capacity to create common nomenclatures for strains or clones. Thi...
Article
Full-text available
We consider the problem of uniformly generating a spanning tree, of a connected undirected graph. This process is useful to compute statistics, namely for phylogenetic trees. We describe a Markov chain for producing these trees. For cycle graphs we prove that this approach significantly outperforms existing algorithms. For general graphs we obtain...
Preprint
Full-text available
Current methods struggle to reconstruct and visualise the genomic relationships of ≥100,000 bacterial genomes. GrapeTree facilitates the analyses of allelic profiles from 10,000’s of core genomes within a web browser window. GrapeTree implements a novel minimum spanning tree algorithm to reconstruct genetic relationships despite missing data togeth...
Article
Computing the product of the adjacency (binary) matrix of a large graph with a real-valued vector is an important operation that lies at the heart of various graph analysis tasks, such as computing PageRank. In this paper we show that some well-known Web and social graph compression formats are {\em computation-friendly}, in the sense that they all...
Conference Paper
Biosciences have been revolutionised by NGS technologies in last years, leading to new perspectives in medical, industrial and environmental applications. And although our motivation comes from biosciences, the following is true for many areas of science: published results are usually hard to reproduce, delaying the adoption of new methodologies an...
Conference Paper
Typing methods are widely used in the surveillance of infectious diseases, outbreaks investigation and studies of the natural history of an infection. And their use is becoming standard, in particular with the introduction of High Throughput Sequencing (HTS). On the other hand, the data being generated is massive and many algorithms have been propo...
Conference Paper
Full-text available
Large-scale population genetics studies are fundamental for phylogenetic and epidemiology analysis of pathogens. And the validation of both evolutionary models and methods used in such studies depend on large data analysis. It is, however, unrealistic to work with large datasets as only rather small samples of the real pathogen population are avail...
Preprint
Full-text available
Position weight matrices (PWMs) are the standard way to model binding site affinities in bioinformatics. However, they assume that symbol occurrences are position independent and, hence, they do not take into account symbols co-occurrence at different sequence positions. To address this problem, we propose to construct finite-state machines (FSMs)...
Preprint
Full-text available
Community network micro-clouds (CNMCs) have seen an increase in the last fifteen years. Their members contact nodes which operate Internet proxies, web servers, user file storage and video streaming services, to name a few. Detecting communities of nodes with properties (such as co-location) and assessing node eligibility for service placement is t...
Article
Full-text available
Graphs may be used to represent many different problem domains -- a concrete example is that of detecting communities in social networks, which are represented as graphs. With big data and more sophisticated applications becoming widespread in recent years, graph processing has seen an emergence of requirements pertaining data volume and volatility...
Conference Paper
Full-text available
Social media often reveals a complex interplay between positive and negative ties. Yet, the origin of such complex patterns of interaction remains largely elusive. In this paper we study how third parties may sway our perception of others. Our model relies on the analysis of all triadic relations taking into account the influence and relations with...
Article
Full-text available
Biosciences have been revolutionized by next generation sequencing (NGS) technologies in last years, leading to new perspectives in medical, industrial and environmental applications. And although our motivation comes from biosciences, the following is true for many areas of science: published results are usually hard to reproduce either because da...
Article
Full-text available
High Throughput Sequencing provides a cost effective means of generating high resolution data for hundreds or even thousands of strains, and is rapidly superseding methodologies based on a few genomic loci. The wealth of genomic data deposited on public databases such as Sequence Read Archive/European Nucleotide Archive provides a powerful resource...
Poster
Full-text available
In this work we address the simulation of genetic evolution of bacterial populations in presence of host contact networks. In particular we consider traditional evolution models combined with well mixed and not well mixed host populations, the latter being more realistic. To our knowledge this is the first approach to consider not well mixed host po...
Article
Full-text available
We extend the functionality of the quick hypervolume (QHV) algorithm. Given a set of d-dimensional points this algorithm determines the hypervolume of the dominated space, a useful measure for multiobjective evolutionary algorithms (MOEAs). We extend QHV in two ways: adapt it to compute the exclusive hypervolume of each point, and speed it up with...
Article
Full-text available
High-throughput sequencing methods generated allele and single nucleotide polymorphism information for thousands of bacterial strains that are publicly available in online repositories and created the possibility of generating similar information for hundreds to thousands of strains more in a single study. Minimum spanning tree analysis of allelic...
Article
We address the publication of a large academic information dataset while ensuring privacy. We evaluate anonymization techniques achieving the intended protection, while retaining the utility of the anonymized data. The published data can help to infer behaviors and study interaction patterns in an academic population. These could subsequently be us...
Poster
Full-text available
We present a study about spanning edge betweenness, an edge-based metric for complex network analysis that is defined as the probability of an edge being part of a minimum spanning tree. This probability reflects how redundant an edge is in what concerns the connectivity of a given network and, hence, its value gives information about the network top...
Chapter
Web-based social relations mirror several known phenomena identified by Social Sciences, such as Homophily. Social circles are inferable from those relations and there are already solutions to find the underlying sentiment of social interactions. We present an empirical study that combines existing Graph Clustering and Sentiment Analysis techniques...
Chapter
Full-text available
In this paper we present a study about spanning edge betweenness, an edge-based metric for complex network analysis that is defined as the probability of an edge being part of a minimum spanning tree . This probability reflects how redundant an edge is in what concerns the connectivity of a given network and, hence, its value gives information abou...
Article
Dynamic networks, in particular Delay Tolerant Networks (DTNs), are characterized by a lack of end-to-end paths at any given instant. Because of that, DTN routing protocols employ a store-carry-and-forward approach, holding messages until a suitable node to forward them is found. But, the selection of the best forwarding node poses a considerable c...
Article
Full-text available
Trees, including minimum spanning trees (MSTs), are commonly used in phylogenetic studies. But, for the research community, it may be unclear that the presented tree is just a hypothesis, chosen from among many possible alternatives. In this scenario, it is important to quantify our confidence in both the trees and the branches/edges included in su...
Article
Several methods have previously been proposed for mapping and enabling the understanding of the brain’s organization. A widely used class of such methods consists in reconstructing brain functional connectivity networks from imaging data, such as fMRI data, which is then analysed with appropriate graph theory algorithms. If the imaging datasets are...
Code
Quick HyperVolume Implementations, including parallel versions and the exclusive version. Quick HyperVolume is an algorithm that computes the HyperVolume occupied by a set of hyper-rectangles, that share the "lower leftmost" vertex. A full description of the non-parallel and non-exclusive algorithm is published in IEEE Transactions on Evolutionary...
Conference Paper
We address the publication of a large academic information dataset addressing privacy issues. We evaluate anonymization techniques achieving the intended protection, while retaining the utility of the anonymized data. The released data could help infer behaviors and subsequently find solutions for daily planning activities, such as cafeteria attend...
Article
Full-text available
Bacterial identification and characterization at subspecies level is commonly known as Microbial Typing. Currently, these methodologies are fundamental tools in Clinical Microbiology and bacterial population genetics studies to track outbreaks and to study the dissemination and evolution of virulence or pathogenicity factors and antimicrobial resis...
Article
Full-text available
The YEASTRACT (http://www.yeastract.com) information system is a tool for the analysis and prediction of transcription regulatory associations in Saccharomyces cerevisiae. Last updated in June 2013, this database contains over 200 000 regulatory associations between transcription factors (TFs) and target genes, including 326 DNA binding sites for 1...
Conference Paper
Full-text available
Amyotrophic Lateral Sclerosis is a devastating neurodegenerative disease characterized by a usually fast progression of muscular denervation, generally leading to death in a few years from onset. In this context, any significant improvement of the patient's life expectancy and quality is of major relevance. Several studies have been made to address...
Conference Paper
Full-text available
We present a new edge betweenness metric for undirected and weighted graphs. This metric is defined as the fraction of minimum spanning trees where a given edge is present and it was motivated by the necessity of evaluating phylogenetic trees. Moreover we provide results and methods concerning the exact computation of this metric based on the well...
Article
Full-text available
The human interaction through the web generates both implicit and explicit knowledge. An example of an implicit contribution is searching, as people contribute with their knowledge by clicking on retrieved documents. When this information is available, an important and interesting challenge is to extract relations from query logs, and, in particula...