About
37
Publications
4,505
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
366
Citations
Introduction
Skills and Expertise
Publications
Publications (37)
Clinical oncology generates vast, unstructured data that often contain inconsistencies, missing information, and ambiguities, making it difficult to extract reliable insights for data-driven decision-making. General-purpose large language models (LLMs) struggle with these challenges due to their lack of domain-specific reasoning, including speciali...
Capturing subtle speech disruptions across the psychosis spectrum is challenging because of the inherent variability in speech patterns. This variability reflects individual differences and the fluctuating nature of symptoms in both clinical and non-clinical populations. Accounting for uncertainty in speech data is essential for predicting symptom...
The widespread use of chest X-rays (CXRs), coupled with a shortage of radiologists, has driven growing interest in automated CXR analysis and AI-assisted reporting. While existing vision-language models (VLMs) show promise in specific tasks such as report generation or abnormality detection, they often lack support for interactive diagnostic capabi...
Many different methods for prompting large language models have been developed since the emergence of OpenAI's ChatGPT in November 2022. In this work, we evaluate six different few-shot prompting methods. The first set of experiments evaluates three frameworks that focus on the quantity or type of shots in a prompt: a baseline method with a simple...
International enterprises, organizations, or hospitals collect large amounts of multi-modal data stored in databases, text documents, images, and videos. While there has been recent progress in the separate fields of multi-modal data exploration as well as in database systems that automatically translate natural language questions to database query...
The potential for improvements brought by Large Language Models (LLMs) in Text-to-SQL systems is mostly assessed on monolingual English datasets. However, LLMs' performance for other languages remains vastly unexplored. In this work, we release the StatBot.Swiss dataset, the first bilingual benchmark for evaluating Text-to-SQL systems based on real...
Specialised pre-trained language models are becoming more frequent in Natural language Processing (NLP) since they can potentially outperform models trained on generic texts. BioBERT (Sanh et al., Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter. arXiv preprint arXiv: 1910.01108 , 2019) and BioClinicalBERT (Alsentzer et...
In this manuscript, we describe our submission to the RR-TNM subtask of NTCIR-17 MedNLP-SC shared task. We took an approach to create extensive question-and-answer (Q&A) pairs related to TNM classification as a method of domain-specific augmentation. Compared to the result without data augmentation, improvement in the accuracy especially for the M...
We describe our submission to the RR-TNM subtask of the NTCIR-17 MedNLP-SC shared task. In the RR-TNM subtask, we developed our system for automatic extraction and classification of the TNM staging from Japanese radiology reports of lung cancers. In our system, zero-shot classification and prompt engineering were performed using ChatGPT and LangCha...
Recent transformer-based models have made significant strides in generating radiology reports from chest X-ray images. However, a prominent challenge remains: these models often lack prior knowledge, resulting in the generation of synthetic reports that mistakenly reference non-existent prior exams. This discrepancy can be attributed to a knowledge...
While several benefits were realized for multilingual vision-language pretrained models, recent benchmarks across various tasks and languages showed poor cross-lingual generalisation when multilingually pre-trained vision-language models are applied to non-English data, with a large gap between (supervised) English performance and (zero-shot) cross...
Current transformer-based models achieved great success in generating radiology reports from chest X-ray images. Nonetheless, one of the major issues is the model's lack of prior knowledge, which frequently leads to false references to non-existent prior exams in synthetic reports. This is mainly due to the knowledge gap between radiologists and th...
Background: Federated learning methods offer the possibility of training machine learning models on privacy-sensitive data sets, which cannot be easily shared. Multiple regulations pose strict requirements on the storage and usage of healthcare data, leading to data being in silos (i.e. locked-in at healthcare facilities). The application of federa...
While several benefits were realized for multilingual vision-language pretrained models, recent benchmarks across various tasks and languages showed poor cross-lingual generalisation when multilingually pre-trained vision-language models are applied to non-English data, with a large gap between (supervised) English performance and (zero-shot) cross...
Objectives:
The first objective of this study was to implement and assess the performance and reliability of a vision transformer (ViT)-based deep-learning model, an 'off-the-shelf' artificial intelligence solution, for identifying distinct signs of microangiopathy in nailfold capilloroscopy (NFC) images of patients with SSc. The second objective...
While several benefits were realized for multilingual vision-language pretrained models, recent benchmarks across various tasks and languages showed poor cross-lingual generalisation when multilingually pre-trained vision-language models are applied to non-English data, with a large gap between (supervised) English performance and (zero-shot) cross...
In this paper, we discuss our contribution to the NII Testbeds and Community for Information Access Research (NTCIR)-16 Real-MedNLP shared task. Our team (ZuKyo) participated in the English subtask: Few-resource Named Entity Recognition. The main challenge in this low-resource task was a low number of training documents annotated with a high number...
In this NTCIR-16 Real-MedNLP shared task paper, we present the methods of the ZuKyo-JA subteam for solving the Japanese part of Subtask1 and Subtask3 (Subtask1-CR-JA, Subtask1-RR-JA, Subtask3-RR-JA). Our solution is based on a sliding-window approach using a Japanese BERT pre-trained masked-language model., which was used as a common architecture f...
Background
An accurate assessment of nailfold capillaroscopy (NFC) images has great importance in the diagnosis and prognosis of systemic sclerosis (SSc). To overcome some of the inherent problems with NFC image analysis (operator/observer bias, time requirements), there is an interest to automate and standardize NFC image assessment using computer...
Inspired by Curriculum Learning, we propose a consecutive (i.e. image-to-text-to-text) generation framework where we divide the problem of radiology report generation into two steps. Contrary to generating the full radiology report from the image at once, the model generates global concepts from the image in the first step and then reforms them int...
Inspired by Curriculum Learning, we propose a consecutive (i.e., image-to-text-to-text) generation framework where we divide the problem of radiology report generation into two steps. Contrary to generating the full radiology report from the image at once, the model generates global concepts from the image in the first step and then reforms them in...
Real-world applications of natural language processing (NLP) are challenging. NLP models rely heavily on supervised machine learning and require large amounts of annotated data. These resources are often based on language data available in large quantities, such as English newswire. However, in real-world applications of NLP, the textual resources...
Learning what to share between tasks has been a topic of high importance recently, as strategic sharing of knowledge has been shown to improve the performance of downstream tasks. The same applies to sharing between languages, and is especially important when considering the fact that most languages in the world suffer from being under-resourced. I...
Existing named entity recognition (NER) systems rely on large amounts of human-labeled data for supervision. However, obtaining large-scale annotated data is challenging particularly in specific domains like health-care, e-commerce and so on. Given the availability of domain specific knowledge resources, (e.g., ontologies, dictionaries), distant su...
In short First we extract the shortest dependency path (sdp) between two entities. We introduce a convolutional neural network (CNN) which takes the shortest dependency path embeddings as input. This approach achieved overall F1 scores of 76.7 and 83.2 for relation classification on clean and noisy data, respectively. For combined relation extracti...
We investigate the use of different syntactic dependency representations in a neu-ral relation classification task and compare the CoNLL, Stanford Basic and Universal Dependencies schemes. We further compare with a syntax-agnostic approach and perform an error analysis in order to gain a better understanding of the results.
We investigate the use of different syntactic dependency representations in a neural relation classification task and compare the CoNLL, Stanford Basic and Universal Dependencies schemes. We further compare with a syntax-agnostic approach and perform an error analysis in order to gain a better understanding of the results.
This article presents the SIRIUS-LTG-UiO system for the SemEval 2018 Task 7 on Semantic Relation Extraction and Classification in Scientific Papers. First we extract the shortest dependency path (sdp) between two entities, then we introduce a convolutional neural network (CNN) which takes the shortest dependency path embeddings as input and perform...
The extraction and the disambiguation of knowledge guided by textual resources on the web is a crucial process to advance the Web of Linked Data. The goal of our work is to semantically enrich raw data by linking the mentions of named entities in the text to the corresponding known entities in knowledge bases. In our approach multiple aspects are c...
In the context of Social Media Analytics, Natural Language Processing tools face new challenges on on-line conversational text, such as microblogs, chat, or text messages, because of the specificity of the language used in these channels. This work addresses the problem of Part-Of-Speech tagging (initially for French but also for English) on noisy...
The main objective of this paper is to compare the sentiments that prevailed before and after the presidential elections, held in both US and France in the year 2012. To achieve this objective we extracted the content information from a social medium such as Twitter and used the tweets from electoral candidates and the public users (voters), collec...