Michel Oleynik

Michel Oleynik
BuildingMinds

Doctor of Philosophy

About

26
Publications
5,779
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
179
Citations
Citations since 2017
17 Research Items
174 Citations
2017201820192020202120222023010203040
2017201820192020202120222023010203040
2017201820192020202120222023010203040
2017201820192020202120222023010203040
Additional affiliations
September 2020 - present
BuildingMinds
Position
  • Analyst
January 2020 - August 2020
Medical University of Graz
Position
  • Research Assistant
March 2011 - January 2015
Hospital A. C. Camargo
Position
  • Research and Development
Education
April 2015 - June 2020
Medical University of Graz
Field of study
  • Medical Informatics
May 2010 - April 2012
University of São Paulo
Field of study
  • Computer Science
January 2005 - December 2009

Publications

Publications (26)
Article
Full-text available
Objective: Automated clinical phenotyping is challenging because word-based features quickly turn it into a high-dimensional problem, in which the small, privacy-restricted, training datasets might lead to overfitting. Pretrained embeddings might solve this issue by reusing input representation schemes trained on a larger dataset. We sought to eva...
Article
Full-text available
Objectives: We survey recent developments in medical Information Extraction (IE) as reported in the literature from the past three years. Our focus is on the fundamental methodological paradigm shift from standard Machine Learning (ML) techniques to Deep Neural Networks (DNNs). We describe applications of this new paradigm concentrating on two basi...
Conference Paper
Full-text available
Clinical narratives in electronic health record systems are a rich resource of patient-based in- formation. They constitute an ongoing challenge for natural language processing, due to their high compactness and abundance of short forms. German medical texts exhibit numerous ad-hoc abbreviations that terminate with a period character. The disambigu...
Article
Full-text available
Word embeddings have become the predominant representation scheme on a token-level for various clinical natural language processing (NLP) tasks. More recently, character-level neural language models, exploiting recurrent neural networks, have again received attention, because they achieved similar performance against various NLP benchmarks. We inve...
Article
Full-text available
Acronyms frequently occur in clinical text, which makes their identification, disambiguation and resolution an important task in clinical natural language processing. This paper contributes to acronym resolution in Spanish through the creation of a set of sense inventories organized by clinical specialty containing acronyms, their expansions, and c...
Preprint
From 2017 to 2019 the Text REtrieval Conference (TREC) held a challenge task on precision medicine using documents from medical publications (PubMed) and clinical trials. Despite lots of performance measurements carried out in these evaluation campaigns, the scientific community is still pretty unsure about the impact individual system features and...
Conference Paper
Full-text available
The 2019 Precision Medicine Track at TREC (TREC-PM) aimed at identifying relevant documents from two collections, namely PubMed (biomedical abstracts) and ClinicalTrials.gov (clinical trials), given 40 precision medicine topics representing (virtual) patients. The organizers also proposed a new subtask on treatment retrieval from PubMed. We describ...
Conference Paper
Full-text available
The TREC-PM challenge aims for advances in the field of information retrieval applied to precision medicine. Here we describe our experimental setup and the achieved results in its 2018 edition. We explored the use of unsupervised topic models, supervised document classification, and rule-based query-time search term boosting and expansion. We part...
Poster
Full-text available
The TREC Conference held in 2017 the Precision Medicine Track with the challenge of finding relevant documents from two collections, namely biomedical abstracts and clinical trials, given a set of 30 input topics representing cancer patients. We proposed a free and open-source (FOSS) Java framework for design, testing, and validation of ranking str...
Conference Paper
Full-text available
In this paper we report on our participation in the TREC 2017 Precision Medicine track (team name: imi_mug ). We submitted 5 fully automatic runs to both the biomedical articles and clinical trials subtasks, focusing strongly on the former. Our system was based on Elasticsearch, whose queries were generated modularly via our own open source framewo...
Article
Full-text available
Clinical narratives are typically produced under time pressure, which incites the use of abbreviations and acronyms. To expand such short forms in a correct way eases text comprehension and further semantic processing. We propose a completely unsupervised and data-driven algorithm for the resolution of non-lexicalised and potentially ambiguous abbr...
Conference Paper
Full-text available
Pathology reports are a main source of information regarding cancer diagnosis and are commonly written following semi-structured templates that include tumour localisation and behaviour. In this work, we evaluated the efficiency of support vector machines (SVMs) to classify pathology reports written in Portuguese into the International Classificati...
Conference Paper
Full-text available
TNM is a classification system for assessment of progression stage of malignant tumors. The physician, upon patient examination, classifies a tumor using three variables: T, N and M. Definitions of values for T, N and M depend on the tumor topography (or body part), specified as ICD-O codes. These values are then used to infer the Clinical Stage (C...
Conference Paper
Full-text available
This work develops an automated classifier of pathology reports which infers the topography and the morphology classes of a tumor using codes from the International Classification of Diseases for Oncology (ICD-O). Data from 94,980 patients of the A.C. Camargo Cancer Center was used for training and validation of Naive Bayes classifiers, evaluated b...
Conference Paper
Full-text available
Clinical trials are studies designed to assess whether a new intervention is better than the current alternatives. However, most of them fail to recruit participants on schedule. It is hard to use Electronic Health Record (EHR) data to find eligible patients, therefore studies rely on manual assessment, which is time consuming, inefficient and requ...
Presentation
Full-text available
This work aims at developing an automated classifier of pathology reports, which should be able to infer the localization (topography) and the histological type (morphology) of a tumor in the International Classification of Diseases for Oncology (ICD-O). We used data provided by the A.C. Camargo Cancer Center located in São Paulo for training and v...
Thesis
Full-text available
Clinical reports are usually written in natural language due to its descriptive power and ease of communication among specialists. Processing data for knowledge discovery and statistical analysis requires information retrieval techniques, already established for newswire texts, but still rare in the medical subdomain. The present work aims at devel...
Conference Paper
Full-text available
Context: Many clinical documents are written in narrative form and often use free text. In order to identify information contained in clinical narratives, natural language processing (NLP) tools can be applied. An NLP tool can recognize sentences, determine individual words (tokens) from a text group (corpus) and tag them according to language sema...
Conference Paper
Full-text available
Part of speech taggers need a considerable amount of data to train their models. Such data is not readily available for medical texts in Portuguese. We evaluated the accuracy of a morphological tagger against a gold standard when trained with corpora of different sizes and domains. Accuracy was the highest with a medical corpus during the complete...
Technical Report
Full-text available
Internet pages response time is commonly related as too high by its users. This delay is due to several factors and, although numerous efforts have been made in order to decrease it, it is not possible to eliminate it completely. Prefetching links in the interval between two requests is the usual solution. In our work, future access prediction is c...

Network

Cited By

Projects

Project (1)
Archived project
CBmed: 1.2 - Innovative Use of Information for Clinical Care and Biomarker Research Abstract In Project 1.2 CBmed, called IICCAB (Innovative Use of Information for Clinical Care and Biomarker Research) large-scale clinical data sets are processed for better re-use. Data for biomarker research come not only from specialized laboratories but also from different sources, in which routine clinical data are stored. The merging of these data requires automatic semantic normalization in order to aggregate them and to make them available for innovative applications. The core of the system developed by IICCAB is a high performance database, which uses SAP HANA technology. Since an important part of clinical information is found exclusively as narratives within free-text fields of clinical databases, human language technology methods are required to analyse this content and to map it to a standardized vocabulary. The data obtained allow, together with already available structured data like lab parameters, disease codes, etc. semantically standardized patient profiles. The processed data is the basis for four different application scenarios: 1. "Recruiting" will facilitate patient cohorts in terms of various characteristics, using advanced graphical interfaces for querying and visualization. This is a crucial prerequisite for all kinds of clinical research, especially regarding biomarkers and the use of biosamples. 2. "Prediction" will focus on predictive analytics based on semantically enriched patient profiles, in order to help estimate the probability of future events, such as hospital re-admissions. 3. "Patient QuickView" will provide an automatic summary of decision-relevant patient data, depending on the preferences of each user group and task. Thus, an alternative is provided for time-consuming browsing through numerous documents in electronic health records. 4. "Coding" supports physicians in the coding of disease cases for administrative purposes. IICCAB analyses available clinical data and proposes appropriate disease and procedure codes. The first phase of the Federation (FFG) IICCAB - project within the Austrian competence center for biomarker (CBMed -- http://www.cbmed.org/en) runs until the end of 2018 and is headed by Stefan Schulz, University professor of medicine computer science at the Medical University of Graz. Cooperation partners are the Styrian Hospital Association M.B.H. (KAGes), the Medical University of Graz, the Biobank Graz, and the German software company SAP. Project Leader: Schulz Stefan Duration: 01.07.2015-31.12.2018