Giovanni Felici

Giovanni Felici
Italian National Research Council | CNR · Institute for Systems Analysis and Computer Science "Antonio Ruberti" IASI

PhD

About

127
Publications
28,349
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
2,424
Citations
Introduction
Skills and Expertise
Additional affiliations
September 1995 - July 1996
The University of Texas at Dallas
Position
  • Research Assistant
January 1997 - present
Italian National Research Council
Position
  • Senior Researcher

Publications

Publications (127)
Article
The present study aims to clarify the role of the fraction of patients under antiretroviral therapy (ART) achieving viral suppression (VS) (i.e. having plasma viral load below the detectability threshold) on the human immunodeficiency virus (HIV) epidemic in Italy. Based on the hypothesis that VS makes the virus untransmittable, we extend a previou...
Article
Full-text available
Background In the Next Generation Sequencing (NGS) era a large amount of biological data is being sequenced, analyzed, and stored in many public databases, whose interoperability is often required to allow an enhanced accessibility. The combination of heterogeneous NGS genomic data is an open challenge: the analysis of data from different experimen...
Article
Full-text available
Background The high growth of Next Generation Sequencing data currently demands new knowledge extraction methods. In particular, the RNA sequencing gene expression experimental technique stands out for case-control studies on cancer, which can be addressed with supervised machine learning techniques able to extract human interpretable models compo...
Chapter
Thanks to Next Generation Sequencing (NGS) techniques, public available genomic data of cancer is growing quickly. Indeed, the largest public database of cancer called The Cancer Genome Atlas (TCGA) contains huge amounts of biomedical big data to be analyzed with advanced knowledge extraction methods. In this work, we focus on the NGS experiment of...
Article
Full-text available
In this paper we consider a particular graph-optimization problem. Given an edge-colored graph and a set of constraints on the sequence of the colors, one is to find the longest path whose colored edges obey the constraints on the sequence of the colors. In the actual formulation, the problem generalizes already known NP-Complete problems, and, evi...
Article
Full-text available
Background: Alzheimer's Disease (AD) is a neurodegenaritive disorder characterized by a progressive dementia, for which actually no cure is known. An early detection of patients affected by AD can be obtained by analyzing their electroencephalography (EEG) signals, which show a reduction of the complexity, a perturbation of the synchrony, and a sl...
Article
Full-text available
Common operation scheduling (COS) problems arise in real-world applications, such as industrial processes of material cutting or component dismantling. In COS, distinct jobs may share operations, and when an operation is done, it is done for all the jobs that share it. We here propose a 0-1 LP formulation with exponentially many inequalities to min...
Conference Paper
Full-text available
There are several examples of dual propulsion vehicles: hybrid cars, bi-fuel vehicles, electric bikes. Compute a path from a starting point to a destination for these typologies of vehicles requires evaluation of many alternatives. In this paper we develop a mathematical model, able to compute paths for dual propulsion vehicles, that takes in accou...
Article
Full-text available
The analysis of high throughput gene expression patients/controls experiments is based on the determination of differentially expressed genes according to standard statistical tests. A typical bioinformatics approach to this problem is composed of two separate steps: first, a subset of genes with altered expression level is identified; then the pat...
Conference Paper
A substantial connection exists between supervised learning from data represented in logic form and the solution of the Minimum Cost Satisfiability Problem (MinCostSAT). Methods based on such connection have been developed and successfully applied in many contexts. The deployment of such methods to large-scale learning problem is often hindered by...
Article
Full-text available
Increasing evidence points to a key role played by epithelial-mesenchymal transition (EMT) in cancer progression and drug resistance. In this study, we used wet and in silico approaches to investigate whether EMT phenotypes are associated to resistance to target therapy in a non-small cell lung cancer model system harboring activating mutations of...
Conference Paper
Full-text available
Data integration is one of the most challenging research topic in many knowledge domains, and biology is surely one of them. However theory and state of the art methods make this task complex for most of the small research centers. Fortunately, several organizations are focusing on collecting heterogeneous data making an easier task to design analy...
Article
In the present paper we propose a simple time-varying ODE model to describe the evolution of HIV epidemic in Italy. The model considers a single population of susceptibles, without distinction of high-risk groups within the general population, and accounts for the presence of immigration and emigration, modelling their effects on both the general d...
Preprint
Full-text available
Data mining is one of the main activities in bioinformatics, specifically to extract knowledge from massive data sets related with gene expression measurement, CNV, DNA strings, and others. A long array of methods are used to perform such task, ranging from the more established parametric statistical analysis to non parametric techniques, to classi...
Preprint
Full-text available
Data mining is one of the main activities in bioinformatics, specifically to extract knowledge from massive data sets related with gene expression measurement, CNV, DNA strings, and others. A long array of methods are used to perform such task, ranging from the more established parametric statistical analysis to non parametric techniques, to classi...
Article
Full-text available
Background Continuous improvements in next generation sequencing technologies led to ever-increasing collections of genomic sequences, which have not been easily characterized by biologists, and whose analysis requires huge computational effort. The classification of species emerged as one of the main applications of DNA analysis and has been addre...
Conference Paper
Due to the great advances of Next Generation Sequencing (NGS) techniques, bioinformaticians are faced with large amounts of genomic and clinical data, which are growing exponentially. A striking example is The Cancer Genome Atlas (TCGA), whose aim is to provide a comprehensive archive of biomedical data about tumors. Indeed, TCGA contains more than...
Conference Paper
The Closest String Problem (CSP) calls for finding an n-string that minimizes its maximum distance from m given n-strings. Integer linear programming (ILP) proved to be able to solve large CSPs under the Hamming distance, whereas for the Levenshtein distance, preferred in computational biology, no ILP formulation has so far be investigated. Recent...
Article
Full-text available
In this paper we present a new bound obtained with the probabilistic method for the solution of the Set Covering problem with unit costs. The bound is valid for problems of fixed dimension, thus extending previous similar asymptotic results, and it depends only on the number of rows of the coefficient matrix and the row densities. We also consider...
Conference Paper
Full-text available
Table of contents A1 Highlights from the eleventh ISCB Student Council Symposium 2015 Katie Wilkins, Mehedi Hassan, Margherita Francescatto, Jakob Jespersen, R. Gonzalo Parra, Bart Cuypers, Dan DeBlasio, Alexander Junge, Anupama Jigisha, Farzana Rahman O1 Prioritizing a drug’s targets using both gene expression and structural similarity Griet Laene...
Article
Full-text available
We propose a new Robust Optimization method for the energy offering problem of a price-taker generating company that wants to build offering curves for its generation units, in order to maximize its profit while taking into account the uncertainty of market price. Our investigations have been motivated by a critique to another Robust Optimization m...
Chapter
The analysis of gene expression profiles from microarray/RNA sequencing (RNA-Seq) experimental samples demands new efficient methods from statistics and computer science. This chapter considers two main types of gene expression data analysis such as gene clustering and experiment classification. It introduces the transcriptome analysis, highlightin...
Conference Paper
Global sourcing in complex assembly production systems entails the management of potentially high variability and multiple risks in costs, quality and lead times. Additionally, current strategies of many companies or environmental regulatory frameworks impose - or will impose - on industries worldwide to take control, among others, of CO2 emissions...
Article
Full-text available
Alignment-free algorithms can be used to estimate the similarity of biological sequences and hence are often applied to the phylogenetic reconstruction of genomes. Most of these algorithms rely on comparing the frequency of all the distinct substrings of fixed length (k-mers) that occur in the analyzed sequences. In this paper, we present Logic Ali...
Article
Casual mutations and natural selection have driven the evolution of protein amino acid sequences that we observe at present in nature. The question about which is the dominant force of proteins evolution is still lacking of an unambiguous answer. Casual mutations tend to randomize protein sequences while, in order to have the correct functionality,...
Article
Full-text available
Motivation: Nowadays, knowledge extraction methods from Next Generation Sequencing data are highly requested. In this work, we focus on RNA-seq gene expression analysis and specifically on case-control studies with rule-based supervised classification algorithms that build a model able to discriminate cases from controls. State of the art algorith...
Article
In this paper we propose a new method to measure the contribution of discretized features for supervised learning and discuss its applications to biological data analysis. We restrict the description and the experiments to the most representative case of discretization in two intervals and of samples belonging to two classes. In order to test the v...
Article
Full-text available
The EURO Working Group on Operations Research in Computational Biology, Bioinformatics and Medicine held its fourth conference in Poznan-Biedrusko, Poland, June 26–28, 2014. The editorial board of RAIRO-OR invited submissions of papers to a special issue on Recent Advances in Operations Research in Computational Biology, Bioinformatics and Medicine...
Article
Full-text available
Feature selection methods are used in machine learning and data analysis to select a subset of features that may be successfully used in the construction of a model for the data. These methods are applied under the assumption that often many of the available features are redundant for the purpose of the analysis. In this paper, we focus on a partic...
Article
Full-text available
Many approaches exist to integrate protein-protein interaction data with other sources of information, most notably with gene co-expression data, to obtain information on network dynamics. It is of interest to look at groups of interacting gene products that form a protein complex. We were interested in applying new tools to the characterization of...
Conference Paper
Full-text available
Alzheimer's Disease (AD) and its preliminary stage - Mild Cognitive Impairment (MCI) - are the most widespread neurodegenerative disorders, and their investigation remains an open challenge. ElectroEncephalography (EEG) appears as a non-invasive and repeatable technique to diagnose brain abnormalities. Despite technical advances, the analysis of EE...
Article
Full-text available
Next Generation Sequencing (NGS) machines extract from a biological sample a large number ofshort DNA fragments (reads). These reads are then used for several applications, e.g., sequencereconstruction, DNA assembly, gene expression profiling, mutation analysis. We propose a method to evaluate the similarity between reads. This method does not rely...
Conference Paper
We study an operation scheduling problem where a finite set of jobs with due dates must be completed by one machine: each job is completed as soon as a specific subset of unit operations is done. Distinct jobs may share operations, and when an operation is done, it is done for all the jobs that share it. The goal is to schedule operations so that t...
Article
Full-text available
In order to understand a network function, it’s necessary the understanding of its topology, since the topology is designed to better undertake the function, and the efficiency of network function is influenced by its topology. For this reason, topological analysis of complex networks has been an intensely researched area in the last decade. Result...
Conference Paper
Full-text available
Objective: Alzheimer's Disease (AD) is the most common form of dementia, for which actually no cure is known [1]. Different studies have shown that AD has (at least) three major effects on electroencephalography (EEG) signals: enhanced complexity, slowing of signals, and perturbations in EEG synchrony [2]. The aim of this work is to achieve an a...
Article
Much of the valuable information in supporting decision making processes originates in text-based documents. Although these documents can be effectively searched and ranked by modern search engines, actionable knowledge need to be extracted and transformed in a structured form before being used in a decision process. In this paper we describe how t...
Conference Paper
Full-text available
Alignment-free methods are routinely used in largescale, gene-independent phylogeny reconstruction. Such methods measure the similarity of two genomes by comparing the frequency of all their distinct substrings of length k. In this paper we apply logic data mining methods to discover a minimal subset of k-mers whose frequency information is suffici...
Conference Paper
In this paper we address the issue of solving a Unit Commitment (UC) problem including the transmission network with Active Switching (AS). The switching operation consists in a dynamic reconfiguration of the network, i.e. a tripping of some lines; this paradigm is named UC with Optimal Transmission Switching (UCOTS). The UCOTS is a novel way to le...
Article
Full-text available
Specific fragments, coming from short portions of DNA (e.g., mitochondrial, nuclear, and plastid sequences), have been defined as DNA Barcode and can be used as markers for organisms of the main life kingdoms. Species classification with DNA Barcode sequences has been proven effective on different organisms. Indeed, specific gene regions have been...
Conference Paper
Full-text available
In this paper we introduce the Mathematical Desk for Italian Industry, a project based on applied and industrial mathematics developed by a team of researchers from the Italian National Research Council in collaboration with two major Italian associations for applied mathematics, SIMAI and AIRO. The scope of this paper is to clarify the motivations...
Article
Full-text available
The large integration of wind energy into electrical systems poses important challenges to the power operators in the scheduling of the production and in the management of the network. This leads to the necessity to modify the current industry procedures, such as the Unit Commitment (UC) and the Economic Dispatch (ED), to take into account large am...
Article
Abstract Experimental co-expression data and protein-protein interaction networks are frequently used to analyze the interactions among genes or proteins. Recent studies have investigated methods to integrate these two sources of information. We propose a new method to integrate co-expression data obtained through DNA microarray analysis (MA) and p...
Article
A concept of an Orderly Colored Longest Path (OCLP) refers to the problem of finding the longest path in a graph whose edges are colored with a given number of colors, under the constraint that the path follows a predefined order of colors. The problem has not been widely studied in the previous literature, especially for more than two colors in th...
Conference Paper
Full-text available
Background / Purpose: We propose a filtering method for read pairs based on alignment free distance. The similarity of two reads is assessed by comparing the frequencies of their substrings of fixed dimensions (k-mers). Main conclusion: We present computational results that show the efficacy of an alignment free distance in estimating a good r...
Conference Paper
Full-text available
The wide spread of electronic data collection in medical environments leads to an exponential growth of clinical data extracted from heterogeneous patient samples. Collecting, managing, integrating and analyzing these data are essential activities in order to shed light on diseases and on related therapies. The major issues in clinical data analysi...
Article
Full-text available
Machine Learning (ML) algorithms are used to train computers to perform a variety of complex tasks and improve with experience. Computers learn how to recognize patterns, make unintended decisions, or react to a dynamic environment. Certain trained machines may be more effective than others because they are based on more suitable ML algorithms or b...
Technical Report
Full-text available
The Sensor Networks Localization Problem (SNLP) consists in seeking on embedding in R2 of a given weighted undirected graph where the vertices represent the sensors in a global coordinate system and the weight of each edge is the Euclidean distance between two sensors. The SNLP belongs to the class of the problems of Distance Geometry (DGP). In thi...
Article
In this paper we describe an effective approach to design, implement, and operate a traffic control system based on logic programming. With this approach it is possible to implement very flexible control strategies that can be easily developed by traffic engineers using a simple description language. An important feature of the system is the use of...
Article
BLOG (Barcoding with LOGic) is a diagnostic and character-based DNA Barcode analysis method. Its aim is to classify specimens to species based on DNA Barcode sequences and on a supervised machine learning approach, using classification rules that compactly characterize species in terms of DNA Barcode locations of key diagnostic nucleotides. The BLO...
Conference Paper
Full-text available
Methods ODNA sequence assembly The DNA sequence assembly process is based on the alignment and merging of reads (stretch of sequences) in order to reconstruct the original primary structure of the DNA sample sequences. Given a set of sequences S={S1, S2,…, sn}, where s∈ S is a fragment of the primary structure of DNA (read)(eg s={ATTCGA... CTGACT})...
Data
Microarray Logic Analyzer (MALA) is a clustering and classification software, particularly engineered for microar-ray gene expression analysis. The aims of MALA are to cluster the microarray gene expression profiles in order to reduce the amount of data to be analyzed and to classify the microarray ex-periments. To fulfil this objective MALA uses a...
Article
Full-text available
Differences in genomic sequences are crucial for the classification of viruses into different species. In this work, viral DNA sequences belonging to the human polyomaviruses BKPyV, JCPyV, KIPyV, WUPyV, and MCPyV are analyzed using a logic data mining method in order to identify the nucleotides which are able to distinguish the five different human...
Data
Separating formulas for ST gene region. All the discriminating base pairs for the virus classification within the gene region ST.
Data
Separating formulas for LT gene region. All the discriminating base pairs for the virus classification within the gene region LT.
Data
Appendix. Test Plan and statistical experiments.
Article
Full-text available
Recently diverged species are challenging for identification, yet they are frequently of special interest scientifically as well as from a regulatory perspective. DNA barcoding has proven instrumental in species identification, especially in insects and vertebrates, but for the identification of recently diverged species it has been reported to be...
Data
Relative method performance based on simulated data for all species. Boxplots of query identification success (N = 300) of six methods that were applied to ‘recently diverged’ species in simulated query data sets. NJ = neighbor joining, PAR = parsimony, NN = nearest neighbor. Success scores not significantly different in post-hoc pairwise Wilcoxon...
Data
Full-text available
Influence of divergence time on species identification success per method compared. (PDF)
Data
Simulated ultrametric species tree. Tree with 50 species simulated under the Yule model and with a total tree depth of 1 million generations. Terminal branches subtending species considered as ‘recently diverged’ are in red, those subtending species considered as ‘old’ are in blue. (TIF)
Data
Full-text available
Method performance based on simulated data for all species. (PDF)
Data
Full-text available
Results for all 112 species represented by 5 or more sequences in the Cypraeidae empirical data set. (PDF)
Data
Full-text available
Influence of effective population size ( Ne ) on species identification success per method compared. (PDF)
Data
Full-text available