Fabio Rinaldi

Fabio Rinaldi
Dalle Molle Institute for Artificial Intelligence | IDSIA

PhD

About

219
Publications
31,526
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
3,010
Citations
Additional affiliations
January 2000 - present
University of Zurich
Position
  • Senior Researcher

Publications

Publications (219)
Preprint
Digital data play an increasingly important role in advancing medical research and care. However, most digital data in healthcare are in an unstructured and often not readily accessible format for research. Specifically, unstructured data are available in a non-standardized format and require substantial preprocessing and feature extraction to tran...
Conference Paper
In this paper, we discuss our contribution to the NII Testbeds and Community for Information Access Research (NTCIR)-16 Real-MedNLP shared task. Our team (ZuKyo) participated in the English subtask: Few-resource Named Entity Recognition. The main challenge in this low-resource task was a low number of training documents annotated with a high number...
Conference Paper
In this NTCIR-16 Real-MedNLP shared task paper, we present the methods of the ZuKyo-JA subteam for solving the Japanese part of Subtask1 and Subtask3 (Subtask1-CR-JA, Subtask1-RR-JA, Subtask3-RR-JA). Our solution is based on a sliding-window approach using a Japanese BERT pre-trained masked-language model., which was used as a common architecture f...
Article
Full-text available
Background Named Entity Recognition (NER) and Normalisation (NEN) are core components of any text-mining system for biomedical texts. In a traditional concept-recognition pipeline, these tasks are combined in a serial way, which is inherently prone to error propagation from NER to NEN. We propose a parallel architecture, where both NER and NEN are...
Article
Full-text available
Background Melanoma is one of the least common but the deadliest of skin cancers. This cancer begins when the genes of a cell suffer damage or fail, and identifying the genes involved in melanoma is crucial for understanding the melanoma tumorigenesis. Thousands of publications about human melanoma appear every year. However, while biological curat...
Article
Digital data play an increasingly important role in advancing medical research and care. However, most digital data in healthcare are in an unstructured and often not readily accessible format for research. Specifically, unstructured data are available in a non-standardized format and require substantial preprocessing and feature extraction to tran...
Article
Full-text available
The COST Action Gene Regulation Ensemble Effort for the Knowledge Commons (GREEKC, CA15205, www.greekc.org) organized nine workshops in a four-year period, starting September 2016. The workshops brought together a wide range of experts from all over the world working on various parts of the knowledge cycle that is central to understanding gene regu...
Article
Full-text available
Automatic document classification for highly interrelated classes is a demanding task that becomes more challenging when there is little labeled data for training. Such is the case of the coronavirus disease 2019 (COVID-19) Clinical repository—a repository of classified and translated academic articles related to COVID-19 and relevant to the clinic...
Article
Entity relation extraction plays an important role in the biomedical, healthcare, and clinical research areas. Recently, pre-trained models based on transformer architectures and their variants have shown remarkable performances in various natural language processing tasks. Most of these variants were based on slight modifications in the architectu...
Article
The number of published papers in a given field of biomedical research makes it rather impossible for a researcher to keep up to date. This is where manually curated databases facilitate the access to knowledge. However, the structure required by databases strongly limits the type of valuable information that can be incorporated. Here, we present L...
Conference Paper
Full-text available
Negation is a linguistic universal that poses difficulties for cognitive and computational processing. Despite many advances in text analytics, negation resolution remains an acute and continuously researched question in Natural Language Processing. Reliable negation parsing affects results in biomedical text mining, sentiment analysis, machine tra...
Article
Full-text available
Background Named Entity Recognition is a common task in Natural Language Processing applications, whose purpose is to recognize named entities in textual documents. Several systems exist to solve this task in the biomedical domain, based on Natural Language Processing techniques and Machine Learning algorithms. A crucial step of these applications...
Chapter
Full-text available
The Swiss Monitoring of Adverse Drug Events (SwissMADE) project is part of the SNSF-funded Smarter Health Care initiative, which aims at improving health services for the public. Its goal is to use text mining on electronic patient reports to automatically detect adverse drug events automatically in hospitalised elderly patients who received anti-t...
Article
Full-text available
Automatic text summarization methods generate a shorter version of the input text to assist the reader in gaining a quick yet informative gist. Existing text summarization methods generally focus on a single aspect of text when selecting sentences, causing the potential loss of essential information. In this study, we propose a domain-specific meth...
Article
Full-text available
Pathology data have been reported to be important for surveillance, as they are crucial for correctly recognizing and identifying new or re-emerging diseases in animal populations. However, there are no reports in the literature of necropsy data being compared or complemented with other data. In our study, we compared cattle necropsy reports extrac...
Preprint
Lexical semantic change detection (also known as semantic shift tracing) is a task of identifying words that have changed their meaning over time. Unsupervised semantic shift tracing, focal point of SemEval2020, is particularly challenging. Given the unsupervised setup, in this work, we propose to identify clusters among different occurrences of ea...
Poster
Recently, character-level embeddings have become popular in the Natural Language Processing community. These representations provide a description of a word which depends solely on its inner structure, i.e. the sequence of characters. Convolutional and recurrent neural networks are the undisputed protagonists in this context, and they represent the...
Preprint
Automatic text summarization methods generate a shorter version of the input text to assist the reader in gaining a quick yet informative gist. Existing text summarization methods generally focus on a single aspect of text when selecting the sentences, causing potential loss of essential information. We propose a domain-specific method that models...
Conference Paper
Full-text available
Detecting instances of negation in text is crucially important for several applications, yet it is often neglected. Several decades of research in automated negation detection have not yet provided a reliable solution, especially in a multilingual context. Negation scope resolution poses particular challenges since identifying the scope of influenc...
Preprint
Full-text available
The amount of published papers in biomedical research makes it rather impossible for a researcher to keep up to date. This is where machine processing of scientific publications could contribute to facilitate the access to knowledge. How to make use of text mining capabilities and still preserve the high quality of manual curation, is the challenge...
Preprint
Full-text available
Motivation: Named Entity Recognition (NER) and Normalisation (NEN) are core components of any text-mining system for biomedical texts. In a traditional concept-recognition pipeline, these tasks are combined in a serial way, which is inherently prone to error propagation from NER to NEN. We propose a parallel architecture, where both NER and NEN are...
Conference Paper
Full-text available
As our submission to the CRAFT shared task 2019, we present two neural approaches to concept recognition. We propose two different systems for joint named entity recognition (NER) and normalization (NEN), both of which model the task as a sequence labeling problem. Our first system is a BiLSTM network with two separate outputs for NER and NEN train...
Article
Full-text available
Objective: Author-centric analyses of fast-growing biomedical reference databases are challenging due to author ambiguity. This problem has been mainly addressed through author disambiguation using supervised machine-learning algorithms. Such algorithms, however, require adequately designed gold standards that reflect the reference database proper...
Preprint
Full-text available
Of the 2652 articles considered, 106 met the inclusion criteria. Review of the included papers resulted in identification of 43 chronic diseases, which were then further classified into 10 disease categories using ICD-10. The majority of studies focused on diseases of the circulatory system (n=38) while endocrine and metabolic diseases were fewest...
Conference Paper
Full-text available
We describe our submissions to the 4th edition of the Social Media Mining for Health Applications (SMM4H) shared task. Our team (UZH) participated in two sub-tasks: Automatic classifications of adverse effects mentions in tweets (Task 1) and Generalizable identification of personal health experience mentions (Task 4). For our submissions, we exploi...
Article
Full-text available
Dependency parsing is often used as a component in many text analysis pipelines. However, performance, especially in specialized domains, suffers from the presence of complex terminology. Our hypothesis is that including named entity annotations can improve the speed and quality of dependency parses. As part of BLAH5, we built a web service deliver...
Article
Full-text available
BLAH is organized annually by the Database Center for Life Science (DBCLS), Research Organization of Information and Systems (ROIS). The goal of the BLAH series is to enhance the interoperability of resources for biomedical text annotation and mining, which we believe is a key for the next breakthrough of biomedical text mining. This special issue...
Preprint
Full-text available
Email is the primary means of communication for scientists. However, scientific authors change email address over time. Using a new method, we have calculated that approximately 18% of all authors' contact email addresses in MEDLINE are invalid. While an unfortunate number, it is, however, lower than previously estimated. To mitigate this problem,...
Article
Full-text available
Background: Novel approaches that complement and go beyond evidence-based medicine are required in the domain of chronic diseases, given the growing incidence of such conditions on the worldwide population. A promising avenue is the secondary use of electronic health records (EHRs), where patient data are analyzed to conduct clinical and translati...
Poster
Full-text available
Necropsy reports contain detailed information on the cause of disease and death, and provide information on pathological changes. They can be of value for animal health surveillance by providing information about the pathology seen in different species, age classes, seasons, and geographical areas. However, the real representativeness of these data...
Article
Full-text available
Background: We present a text-mining tool for recognizing biomedical entities in scientific literature. OGER++ is a hybrid system for named entity recognition and concept recognition (linking), which combines a dictionary-based annotator with a corpus-based disambiguation component. The annotator uses an efficient look-up strategy combined with a n...
Article
Full-text available
Email is the primary means of communication for scientists. However, scientific authors change email address over time. Using a new method, we have calculated that approximately 18% of all authors’ contact email addresses in MEDLINE are invalid. While an unfortunate number, it is, however, lower than previously estimated. To mitigate this problem,...
Conference Paper
Full-text available
Our team at the University of Zurich participated in the first 3 of the 4 sub-tasks at the Social Media Mining for Health Applications (SMM4H) shared task. We experimented with different approaches for text classification, namely traditional feature-based classifiers (Logistic Regression and Support Vector Machines), shallow neural networks, RCNNs,...
Conference Paper
Full-text available
Biomedical Named Entity Recognition is a common task in Natural Language Processing applications, whose purpose is to recognize and categorize different types of entities in biomedical documents. Recently, the literature has shown effective methods based on combinations of Machine Learning algorithms and Natural Language Processing techniques. Howe...
Conference Paper
Despite that the majority of machine learning approaches aim to solve binary classification problems, several real-world applica- tions require specialized algorithms able to handle many different classes, as in the case of single-label multi-class and multi-label classification problems. The Label Ranking framework is a generalization of the above...
Chapter
Full-text available
Despite that the majority of machine learning approaches aim to solve binary classification problems, several real-world applications require specialized algorithms able to handle many different classes, as in the case of single-label multi-class and multi-label classification problems. The Label Ranking framework is a generalization of the above m...
Preprint
BACKGROUND Worldwide, the burden of chronic diseases is growing, necessitating novel approaches that complement and go beyond evidence-based medicine. In this respect a promising avenue is the secondary use of Electronic Health Records (EHR) data, where clinical data are analysed to conduct basic and clinical and translational research. Methods bas...
Article
Full-text available
Background: Animal health data recorded in free text, such as in necropsy reports, can have valuable information for national surveillance systems. However, these data are rarely utilized because the text format requires labor-intensive classification of records before they can be analyzed with using statistical or other software. In a previous st...
Article
Full-text available
Background: This article describes a high-recall, high-precision approach for the extraction of biomedical entities from scientific articles. Method: The approach uses a two-stage pipeline, combining a dictionary-based entity recognizer with a machine-learning classifier. First, the OGER entity recognizer, which has a bias towards high recall, anno...
Conference Paper
Full-text available
This short paper briefly presents an efficient implementation of a named entity recognition system for biomedical entities, which is also available as a web service. The approach is based on a dictionary-based entity recognizer combined with a machine-learning classifier which acts as a filter. We evaluated the efficiency of the approach through pa...
Conference Paper
Full-text available
Livestock necropsy reports from diagnostic laboratories may be of interest for disease surveillance. However, they are usually created using natural language making the extraction of relevant data complicated. To evaluate necropsy reports for animal health surveillance, we first developed a text mining tool to automatically classify necropsy report...
Conference Paper
Full-text available
We present OGER, an annotation service built on top of OntoGene’s biomedical entity recognition system, which participates in the TIPS task (technical interoperability and performance of annotation servers) of the BeCalm (biomedical annotation metaserver) challenge. The annotation server is a web application tailored to the needs of the task, using...
Article
Full-text available
MicroRNAs (miRNAs) are small and non-coding RNA molecules that inhibit gene expression posttranscriptionally. They play important roles in several biological processes, and in recent years there has been an interest in studying how they are related to the pathogenesis of diseases. Although there are already some databases that contain information f...
Article
Full-text available
Experimentally generated biological information needs to be organized and structured in order to become meaningful knowledge. However, the rate at which new information is being published makes manual curation increasingly unable to cope. Devising new curation strategies that leverage upon data mining and text analysis is, therefore, a promising av...
Article
Full-text available
Experimentally generated biological information needs to be organized and structured in order to become meaningful knowledge. However, the rate at which new information is being published makes manual curation increasingly unable to cope. Devising new curation strategies that leverage upon data mining and text analysis is, therefore, a promising av...
Poster
Full-text available
Livestock necropsy reports from diagnostic laboratories may be of interest for disease surveillance. However, they are usually created using natural language making the extraction of relevant data complicated. To evaluate necropsy reports for animal health surveillance, we first developed a text mining tool to automatically classify necropsy report...
Conference Paper
Full-text available
Author name disambiguation (AND) in publication and citation resources is a well-known problem. Often, information about email address and other details in the affiliation is missing. In cases where such information is not available, identifying the authorship of publications becomes very challenging. Consequently, there have been attempts to resol...
Article
Full-text available
Fully automated text mining (TM) systems promote efficient literature searching, retrieval, and review but are not sufficient to produce ready-to-consume curated documents. These systems are not meant to replace biocurators, but instead to assist them in one or more literature curation steps. To do so, the user interface is an important aspect that...
Conference Paper
Full-text available
This paper presents an approach towards high performance extraction of biomedical entities from the literature, which is built by combining a high recall dictionary-based technique with a high-precision machine learning filtering step. The technique is then evaluated on the CRAFT corpus. We present the performance we obtained, analyze the errors an...
Article
Full-text available
Automatic extraction of biological network information is one of the most desired and most complex tasks in biological and medical text mining. Track 4 at BioCreative V attempts to approach this complexity using fragments of large-scale manually curated biological networks, represented in Biological Expression Language (BEL), as training and test d...
Article
Full-text available
Background: The use of complementary and alternative medicine (CAM) among cancer patients is widespread and mostly self-administrated. Today, one of the most relevant topics is the nondisclosure of CAM use to doctors. This general lack of communication exposes patients to dangerous behaviors and to less reliable information channels, such as the We...
Conference Paper
We present the first version of a corpus annotated for psychiatric disorders and their etiological factors. The paper describes the choice of text, annotated entities and events/relations as well as the annotation scheme and procedure applied. The corpus is featuring a selection of focus psychiatric disorders including depressive disorder, anxiety...