About
69
Publications
9,496
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
922
Citations
Introduction
Additional affiliations
January 2012 - present
January 2007 - December 2011
October 2011 - February 2012
Publications
Publications (69)
Background
Patient-derived xenografts (PDX) mice models play an important role in preclinical trials and personalized medicine. Sharing data on the models is highly valuable for numerous reasons – ethical, economical, research cross validation etc. The EurOPDX Consortium was established 8 years ago to share such information and avoid duplicating ef...
Patient-derived xenografts (PDXs) are resected human tumors engrafted into mice for preclinical studies and therapeutic testing. It has been proposed that the mouse host affects tumor evolution during PDX engraftment and propagation, affecting the accuracy of PDX modeling of human cancer. Here, we exhaustively analyze copy number alterations (CNAs)...
In the last years, Laboratory Information Management Systems (LIMS) have been growing from mere inventory systems into increasingly comprehensive software platforms, spanning functionalities as diverse as data search, annotation and analysis. Our institution started in 2011 a LIMS project named the Laboratory Assistant Suite with the purpose of ass...
Each cancer is a complex system with unique molecular features determining its dynamics, such as its prognosis and response to therapies. Understanding the role of these biological traits is fundamental in order to personalize cancer clinical care according to the characteristics of each patient's disease. To achieve this, translational researchers...
Rapid technological evolution is providing biomedical research laboratories with huge amounts of complex and heterogeneous data. The LIMS project Laboratory Assistant Suite (LAS), started by our Institution, aims to assist researchers throughout all of their laboratory activities, providing graphical tools to support decision-making tasks and build...
Research laboratories produce a huge amount of complex and heterogeneous data typically managed by Laboratory Information Management Systems (LIMSs). Although many LIMSs are available, it is often difficult to identify a product that covers all the requirements and peculiarities of a specific institution. To deal with this lack, the Candido Cancer...
Stromal content heavily impacts the transcriptional classification of colorectal cancer (CRC), with clinical and biological implications. Lineage-dependent stromal transcriptional components could therefore dominate over more subtle expression traits inherent to cancer cells. Since in patient-derived xenografts (PDXs) stromal cells of the human tum...
Supplementary figures and supplementary tables.
Classification of the CRC-LM dataset according to three public transcriptional classifiers
Classification of the CRC PDX dataset according to three public transcriptional classifiers
Sample set enrichment analysis of ligands/receptor pairs' expression in CRIS classes
80 CRC liver metastases annotated for clinical response to cetuximab monotherapy
CRIS-TSP and CRIS-NTP80 classification on CRC samples
Principal component analysis on the TCGA dataset
Correlation between PDX and CRC-LM gene expression profiles
Classification of the CRC-LM dataset according to CRIS
Sample set enrichment analysis (SSEA) of curated signatures' expression across CRIS classes
GSE14333, a clinically annotated dataset of primary CRC tumors
GSEA of hallmark gene sets in CRIS classes
Gene pairs for CRIS-TSP and CRIS-NTP80 classifiers
The CRISclassifier, an R-Bioconductor package to classify independent gene expression datasets according to either CRIS-NTP or CRIS-TSP algorithms
Identification of CRC subtypes in the PDX dataset and NTP-based CRIS classification
CRIS classification of public CRC gene expression datasets
Clinical annotation of public gene expression datasets of CRC primary tumors
Nowadays, a huge amount of high throughput molecular data are available for analysis and provide novel and useful insights into complex biological systems, through the acquisition of a high-resolution picture of their molecular status in defined experimental conditions. In this context, microarrays are a powerful tool to analyze thousands of gene e...
Automatic Vehicle Monitoring (AVM) systems are exploited by public transport companies to manage and control their fleet of vehicles. However, these systems are usually based on the background knowledge of the transport network which can change during the time and in some cases can be missing or erroneous. GPS data and other information captured by...
Multidocument summarization addresses the selection of a compact subset of highly informative sentences, i.e., the summary, from a collection of textual documents. To perform sentence selection, two parallel strategies have been proposed: (a) apply general-purpose techniques relying on data mining or information retrieval techniques, and/or (b) per...
Research laboratories produce a huge amount of complex and heterogeneous data typically managed by Laboratory Information Management Systems (LIMSs). Although many LIMSs are available, it is often difficult to identify a product that covers all the requirements and peculiarities of a specific institution. To deal with this lack, the Candido Cancer...
With the diffusion of online newspapers and social media, users are becoming capable of retrieving dozens of news articles covering the same topic in a short time. News article summarization is the task of automatically selecting a worthwhile subset of news' sentences that users could easily explore. Promising research directions in this field are...
Nowadays, a huge amount of high throughput molecular data are available for analysis and provide novel and useful insights into complex biological systems, through the acquisition of a high-resolution picture of their molecular status in defined experimental conditions. In this context, microarrays are a powerful tool to analyze thousands of gene e...
Classical data-mining indexes are applied to evaluate clustering results. Such measures are divided into internal and external measures. The internal measures are based on the concept of distance among cluster objects and evaluate cluster results in terms of intracluster cohesion and intercluster separation. The external measures assume that the tr...
Tag recommendation is focused on recommending useful tags to a user who is annotating a Web resource. A relevant research issue is the recommendation of additional tags to partially annotated resources, which may be based on either personalized or collective knowledge. However, since the annotation process is usually not driven by any controlled vo...
Personalized tag recommendation focuses on helping users find desirable keywords (tags) to annotate Web resources based on both user profiles and main resource characteristics. Flickr is a popular online photo service whose resource sharing system significantly relies on annotations. However, recommending tags to a Flickr user who is annotating a p...
Social networks and online communities are taking a primary role in enabling communication and content sharing (e.g., posts, documents, photos, videos) among Web users. Knowledge discovery from user-generated content is becoming an increasingly appealing research context. Many different approaches have been devoted to addressing this issue.This cha...
Sentence-based multi-document summarization is the task of generating a succinct summary of a document collection, which consists of the most salient document sentences. In recent years, the increasing availability of semantics-based models (e.g., ontologies and taxonomies) has prompted researchers to investigate their usefulness for improving summ...
User-generated content (UGC) coming from social networks and online communities continuously grows and changes. By analyzing relevant patterns from the UGC, analysts may discover peculiar user behaviors and interests which can be used to personalize Web-oriented applications. In the last several years, the use of dynamic mining techniques has captu...
The outstanding growth of the Internet has made available to analysts a huge and increasing amount of Web documents (e.g., news articles) and user-generated content (e.g., social network posts) coming from social networks and online communities that are worth considering together. On one hand, the need of novel and more effective approaches to summ...
In this chapter we present the analysis of the Wikipedia collection by means of the ELiDa framework with the aim of enriching linked data. ELiDa is based on association rule mining, an exploratory technique to discover relevant correlations hidden in the analyzed data. To compactly store the large volume of extracted knowledge and efficiently retri...
The increasing availability of user-generated content coming from online communities allows the analysis of common user behaviors and trends in social network usage. This paper presents the TweM Tweet Miner framework that entails the discovery of hidden and high level correlations, in the form of generalized association rules, among the content and...
The past few years have witnessed the rapid proliferation of Web communities such as social networking sites, wikis, blogs, and media sharing communities. The published social content is commonly characterized by a high dynamicity and reflects the most recent trends and common user behaviors. The Data mining and Knowledge Discovery (KDD) process fo...
During recent years, the outstanding growth of social network communities has caught the attention of the research community. A huge amount of user-generated content is shared among community users and gives researchers the unique opportunity to thoroughly investigate social community behavior. Many studies have been focused on both developing mode...
The rapid technological evolution in the biomedical and molecular oncology fields is providing research laboratories with huge amounts of complex and heterogeneous data. Automated systems are needed to manage and analyze this knowledge, allowing the discovery of new information related to tumors and the improvement of medical treatments. This paper...
Selecting a small number of discriminative genes from thousands is a fundamental task in microarray data analysis. An effective feature selection allows biologists to investigate only a subset of genes instead of the entire set, thus avoiding insignificant, noisy, and redundant features. This paper presents the MaskedPainter feature selection metho...
A summary is a succinct and informative description of a data collection. In the context of multi-document summarization, the selection of the most relevant and not redundant sentences belonging to a collection of textual documents is definitely a challenging task. Frequent itemset mining is a well-established data mining technique to discover corr...
Everyday online communities and social networks are accessed by millions of Web users, who produce a huge amount of user-generated content (UGC). The UGC and its publication context typically evolve over time and reflect the actual user interests and behaviors. Thus, the application of data mining techniques to discover the evolution of common user...
Microarray technology provides a simple way for collecting huge amounts of data on the expression level of thousands of genes.
Detecting similarities among genes is a fundamental task, both to discover previously unknown gene functions and to focus
the analysis on a limited set of genes rather than on thousands of genes. Similarity between genes is...
The widespread popularity of the Web has supported collaborative efforts to build large collections of community-contributed media. For example, social video-sharing communities like YouTube are incorporating ever-increasing amounts of user-contributed media, or photo-sharing communities like Flickr are managing a huge photographic database at a la...
During recent years, the outstanding growth of social network communities has caught the attention of the research community. A huge amount of user-generated content is shared among community users and gives researchers the unique opportunity to thoroughly investigate social community behavior. Many studies have been focused on both developing mode...
Microarray technology is a powerful tool to analyze thousands of gene expression values with a single experiment. Due to the huge amount of data, most of recent studies are focused on the analysis and the extraction of useful and interesting information from microarray data. Examples of applications include detecting genes highly correlated to dise...
BioSumm is a summarization environment that supports user queries on online repositories of scientific publications by providing abstract descriptions of focused document groups. The summarization approach is driven by a grading function which evaluates the occurrences of domain dictionary terms. The demonstrated system enables users to query and d...
The availability of increasingly wider repositories of biomedical and biological texts requires effective techniques to manage the huge mass of unstructured information there contained. The availability of ad-hoc document summaries, targeted to specific topics, may assist researchers in inferring previously undisclosed knowledge and in performing t...
Research activity in the life science area is becoming increasingly data intensive. Huge amounts of highly heterogeneous data, including high throughput experiment results, publication collections, and clinical records are generated at a fast pace by researchers all over the world. The capability of correlating heterogeneous information stored in s...
A fundamental problem in microarray analysis is to identify relevant genes from large amounts of expression data. Feature selection aims at identifying a subset of features for building robust learning models. However, finding the optimal number of features is a challenging problem, as it is a trade off between information loss when pruning excessi...
When analyzing the relationship between genes under different scenarios, the integration of different microarray experiments becomes a relevant task. This paper presents a framework to address some intrinsic problems of integration, due for instance to scaling issues, error bias, different experimental conditions or technology and protocols. Our ap...
Feature selection is a fundamental task in microarray data analysis. It aims at identifying the genes which are mostly associated with a tissue category, disease state or clinical outcome. An effective feature selection reduces computation costs and increases classification accuracy. This paper presents a novel multi-class approach to feature selec...