Alexandre P. Francisco's research while affiliated with Inesc-ID and other places
What is this page?
This page lists the scientific contributions of an author, who either does not have a ResearchGate profile, or has not yet added these contributions to their profile.
It was automatically created by ResearchGate to create a record of this author's body of work. We create such pages to advance our goal of creating and maintaining the most comprehensive scientific repository possible. In doing so, we process publicly available (personal) data relating to the author as a member of the scientific community.
If you're a ResearchGate member, you can follow this page to keep up with this author's work.
If you are this author, and you don't want us to display this page anymore, please let us know.
It was automatically created by ResearchGate to create a record of this author's body of work. We create such pages to advance our goal of creating and maintaining the most comprehensive scientific repository possible. In doing so, we process publicly available (personal) data relating to the author as a member of the scientific community.
If you're a ResearchGate member, you can follow this page to keep up with this author's work.
If you are this author, and you don't want us to display this page anymore, please let us know.
Publications (104)
Epidemiological surveillance and phylogenetic studies rely nowadays on processing and analysing huge volumes of data. Processing tasks consist on running and refining a series of intertwined computational tasks. And, despite of existing several web applications for data processing and interactive visualization for phylogenetic studies, integrating...
Drawing on discussions about the manifestation of incivility in online news comments sections, our research operationalizes the concept of incivility and suggests a methodological approach that relies on manual and automated text analysis and regression analysis to assess its prevalence and identify its predictors. Relying on a data analysis of ove...
The brain’s functional networks can be assessed using imaging techniques like functional magnetic resonance imaging (fMRI) and electroencephalography (EEG). Recent studies have suggested a link between the dynamic functional connectivity (dFC) captured by these two modalities, but the exact relationship between their spatiotemporal organization is...
Energy forecasting covers a wide range of prediction problems in the utility industry, such as forecasting demand, generation, price, and power load over time horizons and different power levels. Short-term load forecasting allows the system operator to make important decisions during network management and planning, which represents an economic im...
This article presents a fast SIMD Hilbert space-filling curve generator, that supports a new cache-oblivious blocking-scheme technique applied to the out-of-place transposition of general matrices. Matrix operations found in high performance computing libraries are usually parameterized based on host microprocessor specifications in order to minimi...
Computing the product of the (binary) adjacency matrix of a large graph with a real-valued vector is an important operation that lies at the heart of various graph analysis tasks, such as computing PageRank. In this paper, we show that some well-known webgraph and social graph compression formats are computation-friendly, in the sense that they all...
Given a pattern string p of size m and a text string t, the problem of order-preserving pattern matching (OPPM) is to find all substrings of t that satisfy one of the possible orderings defined by p. This problem is well-studied and has several applications on time series analysis. However given its strict nature this model is unable to deal with i...
Data are an important asset that the electric power industry have available today to support management decisions, excel in operational efficiency, and be more competitive. The advent of smart grids has increased power grid sensorization and so, too, the data availability. However, the inability to recognize the value of data beyond the siloed appl...
Graphs are found in a plethora of domains, including online social networks, the World Wide Web and the study of epidemics, to name a few. With the advent of greater volumes of information and the need for continuously updated results under temporal constraints, it is necessary to explore alternative approaches that further enable performance impro...
When we address phylogeny, telecommunication/electric networks, among other networks, we are often interested in studying measures that go beyond shortest path/fixed distance properties-if we want to know how strongly connected a network is, i.e., which links are fundamental to keep the network connected and which are redundant. In algorithms for p...
We address the problem of representing dynamic graphs using k2-trees. The k2-tree data structure is one of the succinct data structures proposed for representing static graphs, and binary relations in general. It relies on compact representations of bit vectors. Hence, by relying on compact representations of dynamic bit vectors, we can also repres...
From social contracts to climate agreements, individuals engage in groups that must collectively reach decisions with varying levels of equality and fairness. These dilemmas also pervade distributed artificial intelligence, in domains such as automated negotiation, conflict resolution, or resource allocation, which aim to engineer self-organized gr...
We consider the problem of identifying tandem scattered subsequences within a string. Our algorithm identifies a longest subsequence which occurs twice without overlap in a string. This algorithm is based on the Hunt-Szymanski algorithm, therefore its performance improves if the string is not self similar, which occurs naturally on strings over lar...
From employment contracts to climate agreements, individuals often engage in groups that must reach decisions with varying levels of fairness. These dilemmas also pervade AI, e.g. in automated negotiation , conflict resolution or resource allocation. As evidenced by the Ultimatum Game, payoff maximization is frequently at odds with fairness. Elicit...
The value of graph-based big data can be unlocked by exploring the topology and metrics of the networks they represent, and the computational approaches to this exploration take on many forms. For the use-case of performing global computations over a graph, it is first ingested into a graph processing system from one of many digital representations...
From social contracts to climate agreements, individuals engage in groups that must collectively reach decisions with varying levels of equality and fairness. These dilemmas also pervade Distributed Artificial Intelligence, in domains such as automated negotiation, conflict resolution or resource allocation. As evidenced by the well-known Ultimatum...
With growing exchanges of people and merchandise between countries, epidemics have become an issue of increasing importance and huge amounts of data are being collected every day. Hence, analyses that were usually run in personal computers and desktops are no longer feasible. It is now common to run such tasks in High-performance computing (HPC) en...
Evolutionary relationships between species are usually inferred through phylogenetic analysis, which provides phylogenetic trees computed from allelic profiles built by sequencing specific regions of the sequences and abstracting them to categorical indexes. With growing exchanges of people and merchandise, epidemics have become increasingly import...
Parity check matrices (PCMs) are used to define linear error correcting codes and ensure reliable information transmission over noisy channels. The set of codewords of such a code is the null space of this binary matrix. We consider the problem of minimizing the number of one-entries in parity-check matrices. In the maximum-likelihood (ML) decoding...
Typing methods are widely used in the surveillance of infectious diseases, outbreaks investigation and studies of the natural history of an infection. Moreover, their use is becoming standard, in particular with the introduction of high-throughput sequencing. On the other hand, the data being generated are massive and many algorithms have been prop...
We consider the problem of identifying tandem scattered subsequences within a string. Our algorithm identifies a longest subsequence which occurs twice without overlap in a string. This algorithm is based on the Hunt-Szymanski algorithm, therefore its performance improves if the string is not self similar. This occurs naturally on strings over larg...
Typing methods are widely used in the surveillance of infectious diseases, outbreaks investigation and studies of the natural history of an infection. And their use is becoming standard, in particular with the introduction of High Throughput Sequencing (HTS). On the other hand, the data being generated is massive and many algorithms have been propo...
We consider the complexity properties of modern puzzle games, Hexiom, Cut the Rope and Back to Bed. The complexity of games plays an important role in the type of experience they provide to players. Back to Bed is shown to be PSPACE-Hard and the first two are shown to be NP-Hard. These results give further insight into the structure of these games...
Parity check matrices (PCMs) are used to define linear error correcting codes and ensure reliable information transmission over noisy channels. The set of codewords of such a code is the null space of this binary matrix. We consider the problem of minimizing the number of one-entries in parity-check matrices. In the maximum-likelihood (ML) decoding...
We consider the problem of updating the information about multiple longest common sub-sequences. This kind of sub-sequences is used to highlight information that is shared across several information sequences, therefore it is extensively used namely in bioinformatics and computational genomics. In this paper we propose a way to maintain this inform...
Lempel-Ziv is an easy-to-compute member of a wide family of so-called macro schemes; it restricts pointers to go in one direction only. Optimal bidirectional macro schemes are NP-complete to find, but they may provide much better compression on highly repetitive sequences. We consider the problem of approximating optimal bidirectional macro schemes...
We address the problem of representing dynamic graphs using k²-trees.
The k²-tree data structure is one of the succinct data structures proposed for representing static graphs, and binary relations in general.
It relies on compact representations of bit vectors.
Hence, by relying on compact representations of dynamic bit vectors, we can also repres...
Large graphs can be processed with single high-memory or distributed systems, focusing on querying the graph or executing algorithms using high-level APIs. For systems focused on processing graphs, common use-cases consist in executing algorithms such as PageRank or community detection on top of distributed systems that read from storage (local or...
We address the problem of representing dynamic graphs using $k^2$-trees. The $k^2$-tree data structure is one of the succinct data structures proposed for representing static graphs, and binary relations in general. It relies on compact representations of bit vectors. Hence, by relying on compact representations of dynamic bit vectors, we can also...
With the advent of high-throughput sequencing methods, new ways of visualizing and analyzing increasingly amounts of data are needed. Although some software already exist, they do not scale well or require advanced skills to be useful in phylogenetics. The aim of this thesis was to implement three community finding algorithms – Louvain, Infomap and...
Social media often reveals a complex interplay between positive and negative ties. And real online social networks are proven to show high social balance. Yet, the origin of such complex patterns of interaction remains largely elusive. In this work we study how third parties may sway our perception of others. We build a model of peer-influence rely...
With the advent of high-throughput sequencing methods, new ways of visualizing and analyzing increasingly amounts of data are needed. Although some software already exist, they do not scale well or require advanced skills to be useful in phylogenetics.
The aim of this thesis was to implement three community finding algorithms – Louvain, Infomap and...
Given an indeterminate string pattern $p$ and an indeterminate string text $t$, the problem of order-preserving pattern matching with character uncertainties ($\mu$OPPM) is to find all substrings of $t$ that satisfy one of the possible orderings defined by $p$. When the text and pattern are determinate strings, we are in the presence of the well-st...
Measuring the inner characteristics of financial markets risks have been proven to be key at understanding what promotes financial instability and volatility swings. Advances in complex network analysis have shown the capability to characterize the specificities of financial networks, ranging from credit networks, volatility networks, and supply-ch...
RNA-Seq is a Next-Generation Sequencing (NGS) protocol for sequencing the messenger RNA in a cell and generates millions of short sequence fragments, reads, in a single run. These reads can be used to measure levels of gene expression and to identify novel splice variants of genes. One of the critical steps in an RNA-Seq experiment is mapping NGS r...
Graphs are found in a plethora of domains, including online social networks, the World Wide Web and the study of epidemics, to name a few. With the advent of greater volumes of information and the need for continuously updated results under temporal constraints, it is necessary to explore novel approaches that further enable performance improvement...
Current methods struggle to reconstruct and visualize the genomic relationships of large numbers of bacterial genomes. GrapeTree facilitates the analyses of large numbers of allelic profiles by a static"GrapeTree Layout" algorithm which supports interactive visualizations of large trees within a web browser window. GrapeTree also implements a novel...
The understanding of bacterial population genetics and evolution is crucial in epidemic outbreak studies and pathogen surveillance. However, all epidemiological studies are limited to their sampling capacities which, by being usually biased or limited due to economic constraints, can hamper the real knowledge of the bacterial population structure o...
Community networks (CNs) have seen an increase in the last fifteen years. Their members contact nodes which operate Internet proxies, web servers, user file storage and video streaming services, to name a few. Detecting communities of nodes with properties (such as co-location) and assessing node eligibility for service placement is thus a key-fact...
Ischemic stroke is a leading cause of disability and death worldwide among adults. The individual prognosis after stroke is extremely dependent on treatment decisions physicians take during the acute phase. In the last five years, several scores such as the ASTRAL, DRAGON, and THRIVE have been proposed as tools to help physicians predict the patien...
Background
Microbial typing methods are commonly used to study the relatedness of bacterial strains. Sequence-based typing methods are a gold standard for epidemiological surveillance due to the inherent portability of sequence and allelic profile data, fast analysis times and their capacity to create common nomenclatures for strains or clones. Thi...
We consider the problem of uniformly generating a spanning tree, of a connected undirected graph. This process is useful to compute statistics, namely for phylogenetic trees. We describe a Markov chain for producing these trees. For cycle graphs we prove that this approach significantly outperforms existing algorithms. For general graphs we obtain...
Current methods struggle to reconstruct and visualise the genomic relationships of ≥100,000 bacterial genomes.
GrapeTree facilitates the analyses of allelic profiles from 10,000’s of core genomes within a web browser window.
GrapeTree implements a novel minimum spanning tree algorithm to reconstruct genetic relationships despite missing data togeth...
Computing the product of the adjacency (binary) matrix of a large graph with a real-valued vector is an important operation that lies at the heart of various graph analysis tasks, such as computing PageRank. In this paper we show that some well-known Web and social graph compression formats are {\em computation-friendly}, in the sense that they all...
Biosciences have been revolutionised by NGS technologies in last years, leading to new perspectives in medical, industrial and environmental applications. And although our motivation comes from biosciences, the following is true for many areas of science: published results are usually hard to reproduce, delaying the adoption of new methodologies an...
Typing methods are widely used in the surveillance of infectious diseases, outbreaks investigation and studies of the natural history of an infection. And their use is becoming standard, in particular with the introduction of High Throughput Sequencing (HTS). On the other hand, the data being generated is massive and many algorithms have been propo...
Large-scale population genetics studies are fundamental for phylogenetic and epidemiology analysis of pathogens. And the validation of both evolutionary models and methods used in such studies depend on large data analysis. It is, however, unrealistic to work with large datasets as only rather small samples of the real pathogen population are avail...
Position weight matrices (PWMs) are the standard way to model binding site affinities in bioinformatics. However, they assume that symbol occurrences are position independent and, hence, they do not take into account symbols co-occurrence at different sequence positions. To address this problem, we propose to construct finite-state machines (FSMs)...
Community network micro-clouds (CNMCs) have seen an increase in the last fifteen years. Their members contact nodes which operate Internet proxies, web servers, user file storage and video streaming services, to name a few. Detecting communities of nodes with properties (such as co-location) and assessing node eligibility for service placement is t...
Graphs may be used to represent many different problem domains -- a concrete example is that of detecting communities in social networks, which are represented as graphs. With big data and more sophisticated applications becoming widespread in recent years, graph processing has seen an emergence of requirements pertaining data volume and volatility...
Social media often reveals a complex interplay between positive and negative ties. Yet, the origin of such complex patterns of interaction remains largely elusive. In this paper we study how third parties may sway our perception of others. Our model relies on the analysis of all triadic relations taking into account the influence and relations with...
Biosciences have been revolutionized by next generation sequencing (NGS) technologies in last years, leading to new perspectives in medical, industrial and environmental applications. And although our motivation comes from biosciences, the following is true for many areas of science: published results are usually hard to reproduce either because da...
High Throughput Sequencing provides a cost effective means of generating high resolution data for hundreds or even thousands of strains, and is rapidly superseding methodologies based on a few genomic loci. The wealth of genomic data deposited on public databases such as Sequence Read Archive/European Nucleotide Archive provides a powerful resource...
In this work we address the simulation of genetic evolution of bacterial populations in presence of host contact networks. In particular we consider traditional evolution models combined with well mixed and not well mixed host populations, the latter being more realistic. To our knowledge this is the first approach to consider not well mixed host po...
We extend the functionality of the quick hypervolume (QHV) algorithm. Given a set of d-dimensional points this algorithm determines the hypervolume of the dominated space, a useful measure for multiobjective evolutionary algorithms (MOEAs). We extend QHV in two ways: adapt it to compute the exclusive hypervolume of each point, and speed it up with...
High-throughput sequencing methods generated allele and single nucleotide polymorphism information for thousands of bacterial
strains that are publicly available in online repositories and created the possibility of generating similar information for
hundreds to thousands of strains more in a single study. Minimum spanning tree analysis of allelic...
We address the publication of a large academic information dataset while ensuring privacy. We evaluate anonymization techniques achieving the intended protection, while retaining the utility of the anonymized data. The published data can help to infer behaviors and study interaction patterns in an academic population. These could subsequently be us...
We present a study about spanning edge betweenness, an edge-based metric for complex network analysis that is defined as the probability of an edge being part of a minimum spanning tree. This probability reflects how redundant an edge is in what concerns the connectivity of a given network and, hence, its value gives information about the network top...
Web-based social
relations mirror several
known phenomena identified
by Social Sciences, such as Homophily. Social circles are inferable from those relations and there are already solutions to find the underlying sentiment of social interactions. We present an empirical study that combines existing Graph Clustering and Sentiment Analysis techniques...
In this paper we present
a study about spanning edge betweenness, an edge-based metric for complex network analysis that is defined as the probability of an edge being part of a minimum spanning tree
. This probability
reflects how redundant an edge is in what concerns the connectivity of a given network and, hence, its value gives information abou...
Dynamic networks, in particular Delay Tolerant Networks (DTNs), are characterized by a lack of end-to-end paths at any given instant. Because of that, DTN routing protocols employ a store-carry-and-forward approach, holding messages until a suitable node to forward them is found. But, the selection of the best forwarding node poses a considerable c...
Trees, including minimum spanning trees (MSTs), are commonly used in phylogenetic studies. But, for the research community, it may be unclear that the presented tree is just a hypothesis, chosen from among many possible alternatives. In this scenario, it is important to quantify our confidence in both the trees and the branches/edges included in su...
Several methods have previously been proposed for mapping and enabling the understanding of the brain’s organization. A widely used class of such methods consists in reconstructing brain functional connectivity networks from imaging data, such as fMRI data, which is then analysed with appropriate graph theory algorithms. If the imaging datasets are...
Quick HyperVolume Implementations, including parallel versions and the exclusive version. Quick HyperVolume is an algorithm that computes the HyperVolume occupied by a set of hyper-rectangles, that share the "lower leftmost" vertex. A full description of the non-parallel and non-exclusive algorithm is published in IEEE Transactions on Evolutionary...
We address the publication of a large academic information dataset addressing privacy issues. We evaluate anonymization techniques achieving the intended protection, while retaining the utility of the anonymized data. The released data could help infer behaviors and subsequently find solutions for daily planning activities, such as cafeteria attend...
Bacterial identification and characterization at subspecies level is commonly known as Microbial Typing. Currently, these methodologies are fundamental tools in Clinical Microbiology and bacterial population genetics studies to track outbreaks and to study the dissemination and evolution of virulence or pathogenicity factors and antimicrobial resis...
The YEASTRACT (http://www.yeastract.com) information system is a tool for the analysis and prediction of transcription regulatory associations in Saccharomyces cerevisiae. Last updated in June 2013, this database contains over 200 000 regulatory associations between transcription factors (TFs)
and target genes, including 326 DNA binding sites for 1...
Amyotrophic Lateral Sclerosis is a devastating neurodegenerative disease characterized by a usually fast progression of muscular denervation, generally leading to death in a few years from onset. In this context, any significant improvement of the patient's life expectancy and quality is of major relevance. Several studies have been made to address...
We present a new edge betweenness metric for undirected
and weighted graphs. This metric is defined as the fraction
of minimum spanning trees where a given edge is present
and it was motivated by the necessity of evaluating phylogenetic
trees. Moreover we provide results and methods
concerning the exact computation of this metric based on
the well...
The human interaction through the web generates both implicit and explicit knowledge. An example of an implicit contribution is searching, as people contribute with their knowledge by clicking on retrieved documents. When this information is available, an important and interesting challenge is to extract relations from query logs, and, in particula...