Giuseppe Jurman

Giuseppe Jurman
Fondazione Bruno Kessler | FBK · Data Science for Health (DSH)

Ph.D.

About

249
Publications
78,076
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
17,379
Citations
Introduction
Additional affiliations
June 2001 - December 2001
Università degli Studi di Trento
Position
  • PostDoc Position
January 2021 - present
Fondazione Bruno Kessler
Position
  • Head of Department
February 1999 - February 2001
Australian National University
Position
  • PostDoc Position
Education
October 1994 - November 1998
Università degli Studi di Trento
Field of study
  • Mathematics
October 1989 - July 1993
Università degli Studi di Trento
Field of study
  • Mathematics

Publications

Publications (249)
Article
Full-text available
Evolving multiplex networks are a powerful model for representing the dynamics along time of different phenomena, such as social networks, power grids, biological pathways. However, exploring the structure of the multiplex network time series is still an open problem. Here we propose a two-step strategy to tackle this problem based on the concept o...
Article
Full-text available
Cell adaptability to environmental changes is conferred by complex transcriptional regulatory networks, which respond to external stimuli by modulating the expression dynamics of each gene. Hence, deciphering the network of transcriptional regulation is remarkably important, but proves to be extremely challenging, mainly due to the unfavorable rati...
Article
Full-text available
Here we introduce a novel web-infrastructure for differential network analysis. The aim of the web-site is to provide a comprehensive collection of tools for network inference, network comparison and network reproducibility analysis. Four main processes are available through the web service: the network inference process which include 11 reconstruc...
Article
Full-text available
Gene coexpression networks inferred by correlation from high-throughput profiling such as microarray data represent a simple but effective technique for discovering and interpreting linear gene relationships. In the last years several approach have been proposed to tackle the problem of deciding when the resulting correlation values are statistical...
Article
Full-text available
Due to the ever rising importance of the network paradigm across several areas of science, comparing and classifying graphs represent essential steps in the networks analysis of complex systems. Both tasks have been recently tackled via quite different strategies, even tailored ad-hoc for the investigated problem. Here we deal with both operations...
Poster
The NeuroArtP3 (NET-2018-12366666) is a multicenter study funded by the Italian Ministry of Health. The aim of the project is to identify the prognostic trajectories of Alzheimer's disease (AD), through the application of artificial intelligence (AI). In literature just few studies investigated the variables associated with cognitive worsening in A...
Article
Full-text available
Background The burden of Parkinson Disease (PD) represents a key public health issue and it is essential to develop innovative and cost-effective approaches to promote sustainable diagnostic and therapeutic interventions. In this perspective the adoption of a P3 (predictive, preventive and personalized) medicine approach seems to be pivotal. The Ne...
Article
Full-text available
Autosomal dominant polycystic kidney disease (ADPKD) is a monogenic, rare disease, characterized by the formation of multiple cysts that grow out of the renal tubules. Despite intensive attempts to develop new drugs or repurpose existing ones, there is currently no definitive cure for ADPKD. This is primarily due to the complex and variable pathoge...
Article
Full-text available
Background Discrimination between patients affected by inflammatory bowel diseases and healthy controls on the basis of endoscopic imaging is an challenging problem for machine learning models. Such task is used here as the testbed for a novel deep learning classification pipeline, powered by a set of solutions enhancing characterising elements suc...
Preprint
Synthetic data has recently risen as a new precious item in the computational pathologist's toolbox, supporting several tasks such as helping with data scarcity or augmenting training set in deep learning. Nonetheless, the use of such novel resources requires a carefully planned construction and evaluation, to avoid pitfalls such as the generation...
Preprint
Full-text available
Representation bias in health data can lead to unfair decisions, compromising the generalisability of research findings and impeding underrepresented subpopulations from benefiting from clinical discoveries. Several approaches have been developed to mitigate representation bias, ranging from simple resampling methods, such as SMOTE, to recent appro...
Article
Even if assessing binary classifications is a common task in scientific research, no consensus on a single statistic summarizing the confusion matrix has been reached so far. In recent studies, we demonstrated the advantages of the Matthews correlation coefficient (MCC) over other popular rates such as cross-entropy error, F1 score, accuracy, balan...
Article
Full-text available
Background Parkinson’s disease is a common neurodegenerative disorder that has been studied from multiple perspectives using several data modalities. Given the size and complexity of these data, machine learning emerged as a useful approach to analyze them for different purposes. These methods have been successfully applied in a broad range of appl...
Article
Full-text available
Neuroblastoma is a childhood neurological tumor which affects hundreds of thousands of children worldwide, and information about its prognosis can be pivotal for patients, their families, and clinicians. One of the main goals in the related bioinformatics analyses is to provide stable genetic signatures able to include genes whose expression levels...
Article
Full-text available
Bioinformatics has become a key aspect of the biomedical research programmes of many hospitals’ scientific centres, and the establishment of bioinformatics facilities within hospitals has become a common practice worldwide. Bioinformaticians working in these facilities provide computational biology support to medical doctors and principal investiga...
Article
Full-text available
Binary classification is a common task for which machine learning and computational statistics are used, and the area under the receiver operating characteristic curve (ROC AUC) has become the common standard metric to evaluate binary classifications in most scientific fields. The ROC curve has true positive rate (also called sensitivity or recall)...
Article
Full-text available
Deep Learning (DL) is rapidly permeating the field of Digital Pathology with algorithms successfully applied to ease daily clinical practice and to discover novel associations. However, most DL workflows for Digital Pathology include custom code for data preprocessing, usually tailored to data and tasks of interest, resulting in software that is er...
Article
Full-text available
Background The SI-CURA project (Soluzioni Innovative per la gestione del paziente e il follow up terapeutico della Colite UlceRosA) is an Italian initiative aimed at the development of artificial intelligence solutions to discriminate pathologies of different nature, including inflammatory bowel disease (IBD), namely Ulcerative Colitis (UC) and Cro...
Article
Full-text available
Cancer is one of the leading causes of death worldwide and can be caused by environmental aspects (for example, exposure to asbestos), by human behavior (such as smoking), or by genetic factors. To understand which genes might be involved in patients’ survival, researchers have invented prognostic genetic signatures : lists of genes that can be use...
Article
Full-text available
Systemic lupus erythematosus and primary Sjogren's syndrome are complex systemic autoimmune diseases that are often misdiagnosed. In this article, we demonstrate the potential of machine learning to perform differential diagnosis of these similar pathologies using gene expression and methylation data from 651 individuals. Furthermore, we analyzed t...
Preprint
Full-text available
Several approaches have been developed to mitigate algorithmic bias stemming from health data poverty, where minority groups are underrepresented in training datasets. Augmenting the minority class using resampling (such as SMOTE) is a widely used approach due to the simplicity of the algorithms. However, these algorithms decrease data variability...
Article
Full-text available
Emerging evidence suggests that the prognosis of patients with lung adenocarcinoma can be determined from germline variants and transcript levels in non-tumoral lung tissue. Gene expression data from non-involved lung tissue of 483 lung adenocarcinoma patients were tested for correlation with overall survival using multivariable Cox proportional ha...
Article
Full-text available
Functional enrichment analysis or pathway enrichment analysis (PEA) is a bioinformatics technique which identifies the most over-represented biological pathways in a list of genes compared to those that would be associated with them by chance. These biological functions are found on bioinformatics annotated databases such as The Gene Ontology or KE...
Article
Full-text available
We introduce here a novel machine learning (ML) framework to address the issue of the quantitative assessment of the immune content in neuroblastoma (NB) specimens. First, the EUNet, a U-Net with an EfficientNet encoder, is trained to detect lymphocytes on tissue digital slides stained with the CD3 T-cell marker. The training set consists of 3782 i...
Article
Full-text available
Regression analysis makes up a large part of supervised machine learning, and consists of the prediction of a continuous independent target from a set of other predictor variables. The difference between binary classification and regression is in the target range: in binary classification, the target can have only two values (usually encoded as 0 a...
Article
Full-text available
Using the Cap Analysis of Gene Expression (CAGE) technology, the FANTOM5 consortium provided one of the most comprehensive maps of transcription start sites (TSSs) in several species. Strikingly, ~72% of them could not be assigned to a specific gene and initiate at unconventional regions, outside promoters or enhancers. Here, we probe these unassig...
Article
Full-text available
Inflammatory bowel diseases (IBDs) are a group of disorders causing chronic inflammation of small intestine and colon, and include Chron’s disease and ulcerative colitis as most common occurrences. Patients suffering from IBD have more chances to experience an arterial event, such as a stroke or an acute coronary syndrome. In this setting, computat...
Article
Full-text available
Even if measuring the outcome of binary classifications is a pivotal task in machine learning and statistics, no consensus has been reached yet about which statistical rate to employ to this end. In the last century, the computer science and statistics communities have introduced several scores summing up the correctness of the predictions with res...
Article
Full-text available
Background Oncopanel genomic testing, which identifies important somatic variants, is increasingly common in medical practice and especially in clinical trials. Currently, there is a paucity of reliable genomic reference samples having a suitably large number of pre-identified variants for properly assessing oncopanel assay analytical quality and p...
Article
Full-text available
To assess the quality of a binary classification, researchers often take advantage of a four-entry contingency table called confusion matrix, containing true positives, true negatives, false positives, and false negatives. To recap the four values of a confusion matrix in a unique score, researchers and statisticians have developed several rates an...
Chapter
Reproducibility of AI models on biomedical data still stays as a major concern for their acceptance into the clinical practice. Initiatives for reproducibility in the development of predictive biomarkers as the MAQC Consortium already underlined the importance of appropriate Data Analysis Plans (DAPs) to control for different types of bias, includi...
Article
Full-text available
Evaluating binary classifications is a pivotal task in statistics and machine learning, because it can influence decisions in multiple areas, including for example prognosis or therapies of patients in critical conditions. The scientific community has not agreed on a general-purpose statistical indicator for evaluating two-class confusion matrices...
Article
Full-text available
Hepatitis C is an infectious disease that affects more than 70 million people worldwide, even killing 400 thousand of them annually. To better understand this disease and its prognosis, medical doctors can take advantage of the electronic health records (EHRs) of patients, which contain data that computer-based approaches built on statistics and co...
Preprint
Full-text available
Over the last two decades, molecular biology has been changed by the introduction of high-throughput technologies. Data sharing requirements have prompted the establishment of persistent data archives. A standardized approach for recording and managing these data was first proposed in the Minimal Information About a Microarray Experiment (MIAME) gu...
Article
Full-text available
Tumor-infiltrating lymphocytes play an essential role in improving clinical outcome of neu-roblastoma (NB) patients, but their relationship with other tumor-infiltrating immune cells in the T cell-inflamed tumors remains poorly investigated. Here we show that dendritic cells (DCs) and natural killer (NK) cells are positively correlated with T-cell...
Article
Full-text available
Introduction: We introduce in this study CovMulNet19, a comprehensive COVID-19 network containing all available known interactions involving SARS-CoV-2 proteins, interacting-human proteins, diseases and symptoms that are related to these human proteins, and compounds that can potentially target them. Materials and Methods: Extensive network analysi...
Article
Full-text available
Sepsis is a life-threatening condition caused by an exaggerated reaction of the body to an infection, that leads to organ failure or even death. Since sepsis can kill a patient even in just one hour, survival prediction is an urgent priority among the medical community: even if laboratory tests and hospital analyses can provide insightful informati...
Article
Full-text available
We introduce here the Grape Berries Counting Net (GBCNet), a tool for accurate fruit yield estimation from smartphone cameras, by adapting Deep Learning algorithms originally developed for crowd counting. We test GBCNet using cross-validation procedure on two original datasets CR1 and CR2 of grape pictures taken in-field before veraison. A total of...
Article
Full-text available
We introduce TAASRAD19, a high-resolution radar reflectivity dataset collected by the Civil Protection weather radar of the Trentino South Tyrol Region, in the Italian Alps. The dataset includes 894,916 timesteps of precipitation from more than 9 years of data, offering a novel resource to develop and benchmark analog ensemble models and machine le...
Article
Retinal diseases affect an increasing number of patients worldwide because of the aging population. Request for diagnostic imaging in ophthalmology is ramping up, while the number of specialists keeps shrinking. Cutting-edge technology embedding artificial intelligence (AI) algorithms is thus advocated to help ophthalmologists perform their clinica...
Article
Full-text available
Recent technological advances and international efforts, such as The Cancer Genome Atlas (TCGA), have made available several pan-cancer datasets encompassing multiple omics layers with detailed clinical information in large collection of samples. The need has thus arisen for the development of computational methods aimed at improving cancer subtypi...
Preprint
Full-text available
Recent technological advances and international efforts, such as The Cancer Genome Atlas (TCGA), have made available several pan-cancer datasets encompassing multiple omics layers with detailed clinical information in large collection of samples. The need has thus arisen for the development of computational methods aimed at improving cancer subtypi...
Article
Full-text available
One of the most crucial applications of radar-based precipitation nowcasting systems is the short-term forecast of extreme rainfall events such as flash floods and severe thunderstorms. While deep learning nowcasting models have recently shown to provide better overall skill than traditional echo extrapolation models, they suffer from conditional b...
Article
Full-text available
Background: Drug-induced liver injury (DILI) is a major concern in drug development, as hepatotoxicity may not be apparent at early stages but can lead to life threatening consequences. The ability to predict DILI from in vitro data would be a crucial advantage. In 2018, the Critical Assessment Massive Data Analysis group proposed the CMap Drug Sa...
Article
Full-text available
Background: Cardiovascular diseases kill approximately 17 million people globally every year, and they mainly exhibit as myocardial infarctions and heart failures. Heart failure (HF) occurs when the heart cannot pump enough blood to meet the needs of the body.Available electronic medical records of patients quantify symptoms, body features, and cl...
Article
Full-text available
Background: To evaluate binary classifications and their confusion matrices, scientific researchers can employ several statistical rates, accordingly to the goal of the experiment they are investigating. Despite being a crucial issue in machine learning, no widespread consensus has been reached on a unified elective chosen measure yet. Accuracy an...
Article
Full-text available
The use of analog-similar weather patterns for weather forecasting and analysis is an established method in meteorology. The most challenging aspect of using this approach in the context of operational radar applications is to be able to perform a fast and accurate search for similar spatiotemporal precipitation patterns in a large archive of histo...
Article
Full-text available
Document recommendation systems for locating relevant literature have mostly relied on methods developed a decade ago. This is largely due to the lack of a large offline gold-standard benchmark of relevant documents that cover a variety of research fields such that newly developed literature search techniques can be compared, improved and translate...
Article
Document recommendation systems for locating relevant literature have mostly relied on methods developed a decade ago. This is largely due to the lack of a large offline gold-standard benchmark of relevant documents that cover a variety of research fields such that newly developed literature search techniques can be compared, improved and translate...
Preprint
Full-text available
The use of analogs - similar weather patterns - for weather forecasting and analysis is an established method in meteorology. The most challenging aspect of using this approach in the context of operational radar applications is to be able to perform a fast and accurate search for similar spatiotemporal precipitation patterns in a large archive of...
Preprint
Digital technologies ignited a revolution in the agrifood domain known as precision agriculture: a main question for enabling precision agriculture at scale is if accurate product quality control can be made available at minimal cost, leveraging existing technologies and agronomists' skills. As a contribution along this direction we demonstrate a t...
Preprint
Full-text available
Bioinformatics of high throughput omics data (e.g. microarrays and proteomics) has been plagued by uncountable issues with reproducibility at the start of the century. Concerns have motivated international initiatives such as the FDA’s led MAQC Consortium, addressing reproducibility of predictive biomarkers by means of appropriate Data Analysis Pla...
Preprint
Full-text available
Climate change impacts could cause progressive decrease of crop quality and yield, up to harvest failures. In particular, heat waves and other climate extremes can lead to localized food shortages and even threaten food security of communities worldwide. In this study, we apply a deep learning architecture for high resolution forecasting (300 m, 10...
Poster
Autism Spectrum Disorders (ASD) are early onset pervasive neurodevelopmental disorders characterized by: • persistent deficits in social communication and interaction and restricted/repetitive interests and behaviour • incidence around 1-2% worldwide and diagnosis possible not earlier than 2-3 years of age, based only on a cognitive-behavioural ass...
Article
Full-text available
Artificial Intelligence is exponentially increasing its impact on healthcare. As deep learning is mastering computer vision tasks, its application to digital pathology is natural, with the prom- ise of aiding in routine reporting and standardizing results across trials. Deep learning fea- tures inferred from digital pathology scans can improve vali...
Data
Impact of task complexity (VGG backend network). Performance decreases when the number of tissues increases. Adding more classes to the task is possibly complicated by the introduction of tissues with similar histological patterns. (PDF)
Data
Deep features and tissue of origin. Distributions of the values of the top-3 deep features computed with the VGG backend architecture for the 10 classes of the HINT10 dataset. (PDF)
Data