Lenz Furrer

Lenz Furrer

Doctor of Philosophy

About

28
Publications
7,830
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
264
Citations

Publications

Publications (28)
Article
Full-text available
Background Named Entity Recognition (NER) and Normalisation (NEN) are core components of any text-mining system for biomedical texts. In a traditional concept-recognition pipeline, these tasks are combined in a serial way, which is inherently prone to error propagation from NER to NEN. We propose a parallel architecture, where both NER and NEN are...
Conference Paper
Full-text available
Detecting instances of negation in text is crucially important for several applications, yet it is often neglected. Several decades of research in automated negation detection have not yet provided a reliable solution, especially in a multilingual context. Negation scope resolution poses particular challenges since identifying the scope of influenc...
Preprint
Full-text available
Motivation: Named Entity Recognition (NER) and Normalisation (NEN) are core components of any text-mining system for biomedical texts. In a traditional concept-recognition pipeline, these tasks are combined in a serial way, which is inherently prone to error propagation from NER to NEN. We propose a parallel architecture, where both NER and NEN are...
Conference Paper
Full-text available
As our submission to the CRAFT shared task 2019, we present two neural approaches to concept recognition. We propose two different systems for joint named entity recognition (NER) and normalization (NEN), both of which model the task as a sequence labeling problem. Our first system is a BiLSTM network with two separate outputs for NER and NEN train...
Conference Paper
Full-text available
We describe our submissions to the 4th edition of the Social Media Mining for Health Applications (SMM4H) shared task. Our team (UZH) participated in two sub-tasks: Automatic classifications of adverse effects mentions in tweets (Task 1) and Generalizable identification of personal health experience mentions (Task 4). For our submissions, we exploi...
Article
Full-text available
Background: We present a text-mining tool for recognizing biomedical entities in scientific literature. OGER++ is a hybrid system for named entity recognition and concept recognition (linking), which combines a dictionary-based annotator with a corpus-based disambiguation component. The annotator uses an efficient look-up strategy combined with a n...
Article
Full-text available
Crowdsourcing approaches for post-correction of OCR output (Optical Character Recognition) have been successfully applied to several historical text collections. We report on our crowd-correction platform Kokos, which we built to improve the OCR quality of the digitized yearbooks of the Swiss Alpine Club (SAC) from the 19th century. This multilingu...
Article
Full-text available
Background: Animal health data recorded in free text, such as in necropsy reports, can have valuable information for national surveillance systems. However, these data are rarely utilized because the text format requires labor-intensive classification of records before they can be analyzed with using statistical or other software. In a previous st...
Article
Full-text available
Background: This article describes a high-recall, high-precision approach for the extraction of biomedical entities from scientific articles. Method: The approach uses a two-stage pipeline, combining a dictionary-based entity recognizer with a machine-learning classifier. First, the OGER entity recognizer, which has a bias towards high recall, anno...
Conference Paper
Full-text available
This short paper briefly presents an efficient implementation of a named entity recognition system for biomedical entities, which is also available as a web service. The approach is based on a dictionary-based entity recognizer combined with a machine-learning classifier which acts as a filter. We evaluated the efficiency of the approach through pa...
Conference Paper
Full-text available
Livestock necropsy reports from diagnostic laboratories may be of interest for disease surveillance. However, they are usually created using natural language making the extraction of relevant data complicated. To evaluate necropsy reports for animal health surveillance, we first developed a text mining tool to automatically classify necropsy report...
Conference Paper
Full-text available
We present OGER, an annotation service built on top of OntoGene’s biomedical entity recognition system, which participates in the TIPS task (technical interoperability and performance of annotation servers) of the BeCalm (biomedical annotation metaserver) challenge. The annotation server is a web application tailored to the needs of the task, using...
Poster
Full-text available
Livestock necropsy reports from diagnostic laboratories may be of interest for disease surveillance. However, they are usually created using natural language making the extraction of relevant data complicated. To evaluate necropsy reports for animal health surveillance, we first developed a text mining tool to automatically classify necropsy report...
Conference Paper
Full-text available
This paper presents an approach towards high performance extraction of biomedical entities from the literature, which is built by combining a high recall dictionary-based technique with a high-precision machine learning filtering step. The technique is then evaluated on the CRAFT corpus. We present the performance we obtained, analyze the errors an...
Conference Paper
Full-text available
Crowdsourcing approaches for post-correction of OCR output (Optical Character Recognition) have been successfully applied to several historic text collections. We report on our crowd-correction platform Kokos, which we built to improve the OCR quality of the digitized yearbooks of the Swiss Alpine Club (SAC) from the 19th century. This multilingual...
Conference Paper
Full-text available
Public health surveillance systems rely on the automated monitoring of large amounts of text. While building a text mining system for veterinary syndromic surveillance, we exploit automatic and semi-automatic methods for terminology construction at different stages. Our approaches include term extraction from free-text, grouping of term variants ba...
Conference Paper
Full-text available
In this paper, we present a large biomedical term resource automatically compiled from the terminology of a selection of biomedical databases. The resource has a very simple and intuitive format and therefore can be easily embedded into a system for biomedical text mining and used as a linguistic resource. It is continuously updated and a user inte...
Conference Paper
Full-text available
Challenging the assumption that traditional whitespace/punctuation-based tokenisation is the best solution for any NLP application, I propose an alternative approach to segmenting text into processable units. The proposed approach is nearly knowledge-free, in that it does not rely on language-dependent, man-made resources. The text segmentation app...
Article
Full-text available
Evidence in support of relationships among biomedical entities, such as protein-protein interactions, can be gathered from a multiplicity of sources. The larger the pool of evidence, the more likely a given interaction can be considered to be. In the context of biomedical text mining, this elementary observation can be translated into an approach t...
Conference Paper
Full-text available
Evidence in support of relationships among biomedical entities, such as protein-protein interactions, can be gathered from a multiplicity of sources. The larger the pool of evidence, the more likely a given interaction can be considered to be. In the context of biomedical text mining, this elementary observation can be translated into an approach t...
Article
Full-text available
In this paper, we describe our experiments in preposition disambiguation based on a – compared to a previous study – revised annotation scheme and new features derived from a matrix factorization approach as used in the field of distributional semantics. We report on the annotation and Maximum Entropy modelling of the word senses of two German prep...
Conference Paper
Full-text available
This paper describes the details of our system submitted to the SemEval-2013 shared task on sentiment analysis in Twitter. Our approach to predicting the sentiment of Tweets and SMS is based on supervised machine learning techniques and task-specific feature engineering. We used a linear classifier trained by stochastic gradient descent with hinge...
Article
Full-text available
In order to improve OCR quality in texts originally typeset in Gothic script, we have built an automated correction system which is highly specialized for the given text. Our approach includes external dictionary resources as well as information derived from the text itself. The focus lies on testing and improving different methods for classifying...
Chapter
Full-text available
In this paper we describe our efforts in reducing and correcting OCR errors in the context of building a large multilingual heritage corpus of Alpine texts which is based on digitizing the publications of various Alpine clubs. We have already digitized the yearbooks of the Swiss Alpine Club from its start in 1864 until 1995 with more than 75,000 pa...
Conference Paper
Full-text available
This paper describes our efforts to build a multilingual heritage corpus of alpine texts. Currently we digitize the yearbooks of the Swiss Alpine Club which contain articles in French, German, Italian and Romansch. Articles comprise mountaineering reports from all corners of the earth, but also scientific topics such as topography, geology or gla...

Projects

Project (1)