
Jekaterina NovikovaWinterlight Labs · ML Research
Jekaterina Novikova
PhD
Research in Natural Language Processing, Machine Learning for Health, Responsible AI, Evaluation and Metrics
About
78
Publications
12,983
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,620
Citations
Introduction
Jekaterina Novikova is a Director of Machine Learning Research at Winterlight Labs, where she leads a research group in the area of Natural Language Processing and Machine Learning, with the focus on the intersection of Language Technology and Machine Learning in Healthcare. She is also a Science Lead at the ARVA non-profit organization, where she leads research efforts towards creating an open-source knowledge base of failure modes for AI models and datasets.
=== http://jeknov.github.io ===
Additional affiliations
November 2015 - April 2018
October 2012 - November 2015
Education
October 2012 - October 2015
August 2010 - June 2011
Blekinge Tekniska Högskola
Field of study
Publications
Publications (78)
Robotic emotional expressions could benefit social communication between humans and robots, if the cues such expressions contain were to be intelligible to human observers. In this paper, we present a design framework for modelling emotionally expressive robotic movements. The framework combines approach-avoidance with Shape and Effort dimensions,...
Faragó et al (hereafter FMS&G) draw attention to an important issue for researchers of human-robot interaction (HRI): can we conceive a scheme for making social robot behaviour both comprehensible and appropriate in human social settings? We agree with the authors concerning the potential utility of drawing on the example of domestic animals — part...
With the increasing demand for robots to assist humans in shared workspaces and environments designed for humans, research on human-robot interaction (HRI) gains more and more importance. Robots in shared environments must be safe and act in a way understandable by humans, through the way they interact and move. As visual cues such as facial expres...
In this paper, we describe a project that explores a novel experimental setup towards building a spoken, multi-modally rich, and human-like multiparty tutoring robot. A human-robot interaction setup is designed, and a human-human dialogue corpus is collected. The corpus targets the development of a dialogue system platform to study verbal and nonve...
Human-Robot Interaction requires coordination strategies that allow human and artificial agencies to interpret and interleave their actions. In this paper we consider the potential of artificial emotions to serve as coordination devices in human-robot teams. We propose an approach for modelling action selection based on artificial emotions and sign...
Detecting duplicate patient participation in clinical trials is a major challenge because repeated patients can undermine the credibility and accuracy of the trial's findings and result in significant health and financial risks. Developing accurate automated speaker verification (ASV) models is crucial to verify the identity of enrolled individuals...
Unlabelled:
Speech and language changes occur in Alzheimer's disease (AD), but few studies have characterized their longitudinal course. We analyzed open-ended speech samples from a prodromal-to-mild AD cohort to develop a novel composite score to characterize progressive speech changes. Participant speech from the Clinical Dementia Rating (CDR) i...
Detecting duplicate patient participation in clinical trials is a major challenge because repeated patients can undermine the credibility and accuracy of the trial's findings and result in significant health and financial risks. Developing accurate automated speaker verification (ASV) models is crucial to verify the identity of enrolled individuals...
Mental distress like depression and anxiety contribute to the largest proportion of the global burden of diseases. Automated diagnosis systems of such disorders, empowered by recent innovations in Artificial Intelligence, can pave the way to reduce the sufferings of the affected individuals. Development of such systems requires information-rich and...
Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technolog...
Depression is the most common psychological disorder and is considered as a leading cause of disability and suicide worldwide. An automated system capable of detecting signs of depression in human speech can contribute to ensuring timely and effective mental health care for individuals suffering from the disorder. Developing such automated system r...
Traditional screening practices for anxiety and depression pose an impediment to monitoring and treating these conditions effectively. However, recent advances in NLP and speech modelling allow textual, acoustic, and hand-crafted language-based features to jointly form the basis of future mental health screening and condition detection. Speech is a...
Novel automated tools for analyzing speech and language may provide new insights into Alzheimer’s disease (AD). Although speech and language changes occur in AD and other neurodegenerative diseases, current clinical assessments to monitor these symptoms can be burdensome and may have limited sensitivity. Through analyses of open‐ended naturalistic...
Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technolog...
Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technolog...
A significant number of studies apply acoustic and linguistic characteristics of human speech as prominent markers of dementia and depression. However, studies on discriminating depression from dementia are rare. Co-morbid depression is frequent in dementia and these clinical conditions share many overlapping symptoms, but the ability to distinguis...
Models that accurately detect depression from text are important tools for addressing the post-pandemic mental health crisis. BERT-based classifiers' promising performance and the off-the-shelf availability make them great candidates for this task. However, these models are known to suffer from performance inconsistencies and poor generalization. I...
Evaluation in machine learning is usually informed by past choices, for example which datasets or metrics to use. This standardization enables the comparison on equal footing using leaderboards, but the evaluation choices become sub-optimal as better alternatives arise. This problem is especially pertinent in natural language generation which requi...
Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is v...
Research related to automatically detecting Alzheimer's disease (AD) is important, given the high prevalence of AD and the high cost of traditional methods. Since AD significantly affects the acoustics of spontaneous speech, speech processing and machine learning (ML) provide promising techniques for reliably detecting AD. However, speech audio may...
Understanding robustness and sensitivity of BERT models predicting Alzheimer's disease from text is important for both developing better classification models and for understanding their capabilities and limitations. In this paper, we analyze how a controlled amount of desired and undesired text alterations impacts performance of BERT. We show that...
Abstract Background Language impairment is an important marker of neurodegenerative disorders. Despite this, there is no universal system of terminology used to describe these impairments and large inter-rater variability can exist between clinicians assessing language. The use of natural language processing (NLP) and automated speech analysis (ASA...
In this paper, we study the performance and generalizability of three approaches for AD detection from speech on the recent ADReSSo challenge dataset: 1) using conventional acoustic features 2) using novel pre-trained acoustic embeddings 3) combining acoustic features and embeddings. We find that while feature-based approaches have a higher precisi...
Introduction: Research related to the automatic detection of Alzheimer's disease (AD) is important, given the high prevalence of AD and the high cost of traditional diagnostic methods. Since AD significantly affects the content and acoustics of spontaneous speech, natural language processing, and machine learning provide promising techniques for re...
Background
Language impairment is an important marker of neurocognitive disorders. Despite this, there is no universal system of terminology used to describe speech impairment and large inter‐rater variability can exist between clinicians assessing speech. The role of automated speech analysis is emerging as a novel and potentially more objective m...
Background
The COVID‐19 pandemic has brought the need for reliable, remote assessments for clinical trials into sharp focus. Remote, home‐based assessment reduces patient and caregiver burden, enabling more frequent monitoring. There are concerns, however, regarding the quality of in‐clinic vs. remote assessments. In the present study, we compared...
Fine-tuned Bidirectional Encoder Representations from Transformers (BERT)-based sequence classification models have proven to be effective for detecting Alzheimer's Disease (AD) from transcripts of human speech. However, previous research shows it is possible to improve BERT's performance on various tasks by augmenting the model with additional inf...
Despite the widely reported success of embedding-based machine learning methods on natural language processing tasks, the use of more easily interpreted engineered features remains common in fields such as cognitive impairment (CI) detection. Manually engineering features from noisy text is time and resource consuming, and can potentially result in...
Research related to automatically detecting Alzheimer's disease (AD) is important, given the high prevalence of AD and the high cost of traditional methods. Since AD significantly affects the content and acoustics of spontaneous speech, natural language processing and machine learning provide promising techniques for reliably detecting AD. We compa...
Research related to automatically detecting Alzheimer's disease (AD) is important, given the high prevalence of AD and the high cost of traditional methods. Since AD significantly affects the content and acoustics of spontaneous speech, natural language processing and machine learning provide promising techniques for reliably detecting AD. We compa...
Multi-language speech datasets are scarce and often have small sample sizes in the medical domain. Robust transfer of linguistic features across languages could improve rates of early diagnosis and therapy for speakers of low-resource languages when detecting health conditions from speech. We utilize out-of-domain, unpaired, single-speaker, healthy...
Understanding the vulnerability of linguistic features extracted from noisy text is important for both developing better health text classification models and for interpreting vulnerabilities of natural language models. In this paper, we investigate how generic language characteristics, such as syntax or the lexicon, are impacted by artificial text...
This paper provides a comprehensive analysis of the first shared task on End-to-End Natural Language Generation (NLG) and identifies avenues for future research based on the results. This shared task aimed to assess whether recent end-to-end NLG systems can generate more complex output by learning from datasets containing higher lexical richness, s...
We seek to improve the data efficiency of neural networks and present novel implementations of parameterized piece-wise polynomial activation functions. The parameters are the y-coordinates of n+1 Chebyshev nodes per hidden unit and Lagrangian interpolation between the nodes produces the polynomial on [-1, 1]. We show results for different methods...
Automatic Speech Recognition (ASR) is a critical component of any fully-automated speech-based Alzheimer's disease (AD) detection model. However, despite years of speech recognition research, little is known about the impact of ASR performance on AD detection. In this paper, we experiment with controlled amounts of artificially generated ASR errors...
This paper provides a detailed summary of the first shared task on End-to-End Natural Language Generation (NLG) and identifies avenues for future research based on the results. This shared task aimed to assess whether recent end-to-end NLG systems can generate more complex output by learning from datasets containing higher lexical richness, syntact...
Speech datasets for identifying Alzheimer's disease (AD) are generally restricted to participants performing a single task, e.g. describing an image shown to them. As a result, models trained on linguistic features derived from such datasets may not be generalizable across tasks. Building on prior work demonstrating that same-task data of healthy p...
This paper summarises the experimental setup and results of the first shared task on end-to-end (E2E) natural language generation (NLG) in spoken dialogue systems. Recent end-to-end generation systems are promising since they reduce the need for data annotation. However, they are currently limited to small, delexicalised datasets. The E2E NLG share...
Linguistic features have shown promising applications for detecting various cognitive impairments. To improve detection accuracies, increasing the amount of data or linguistic features have been two applicable approaches. However, acquiring additional clinical data could be expensive, and hand-carving features are burdensome. In this paper, we take...
One of the most prevalent symptoms among the elderly population, dementia, can be detected using linguistic features extracted from narrative transcripts. However, these linguistic features are impacted in a similar but different fashion by normal aging process. It has been hard for machine learning classifiers to isolate the effects of confounding...
This paper introduces transductive consensus network (TCNs), as an extension of a consensus network (CN), for semi-supervised learning. TCN does multi-modal classification based on a few available labels by urging the {\em interpretations} of different modalities to resemble each other. We formulate the multi-modal, semi-supervised learning problem...
Human evaluation for natural language generation (NLG) often suffers from inconsistent user ratings. While previous research tends to attribute this problem to individual user preferences, we show that the quality of human judgements can also be improved by experimental design. We present a novel rank-based magnitude estimation method (RankME), whi...
Working in human populated environments requires fast and robust action selection and execution especially when deliberately trying to interact with humans. This work presents the combination of a high-level planner (ROSPlan) for action sequencing and automatically generated finite state machines (PNP) for execution. Using this combined system we a...
Most of today's task-based spoken dialogue systems perform poorly if the user goal is not within the system's task domain. On the other hand, chatbots cannot perform tasks involving robot actions but are able to deal with unforeseen user input. To overcome the limitations of each of these separate approaches and be able to exploit their strengths,...
Traditional automatic evaluation measures for natural language generation (NLG) use costly human-authored references to estimate the quality of a system output. In this paper, we propose a referenceless quality estimation (QE) approach based on recurrent neural networks, which predicts a quality score for a NLG system output by comparing it to the...
The majority of NLG evaluation relies on automatic metrics, such as BLEU . In this paper, we motivate the need for novel, system- and data-independent automatic evaluation methods: We investigate a wide range of metrics, including state-of-the-art word-based and novel grammar-based ones, and demonstrate that they only weakly reflect human judgement...
This paper describes the E2E data, a new dataset for training end-to-end, data-driven natural language generation systems in the restaurant domain, which is ten times bigger than existing, frequently used datasets in this area. The E2E dataset poses new challenges: (1) its human reference texts show more lexical richness and syntactic variation, in...
We argue that there are currently two major bottlenecks to the commercial use of statistical machine learning approaches for natural language generation (NLG): (a) The lack of reliable automatic evaluation metrics for NLG, and (b) The scarcity of high quality in-domain corpora. We address the first problem by thoroughly analysing current evaluation...
Recognition of social signals, from human facial expressions or prosody of speech, is a popular research topic in human-robot interaction studies. There is also a long line of research in the spoken dialogue community that investigates user satisfaction in relation to dialogue characteristics. However, very little research relates a combination of...
Recent advances in corpus-based Natural Language Generation (NLG) hold the promise of being easily portable across domains, but require costly training data, consisting of meaning representations (MRs) paired with Natural Language (NL) utterances. In this work, we propose a novel framework for crowdsourcing high quality NLG training data, using aut...
In the emerging world of human-robot interaction, people and robots will work together to achieve joint objectives. This paper discusses the design and validation of a general scheme for creating emotionally expressive behaviours for robots, in order that people might better interpret how a robot collaborator is succeeding or failing in its work. I...
In this paper, we describe a project that explores an open-sourced enhanced robot simulator SIGVerse towards researching a social human-robot interaction. Research on high level social human-robot interaction systems that includes collaboration and emotional intercommunication between people and robots requires a big amount of data based on embodie...
Coordination of human–robot joint activity must depend on the ability of human and artificial agencies to interpret and interleave their actions. In this paper we consider the potential of artificial emotions to serve as task-relevant coordination devices in human–robot teams. We present two studies aiming to understand whether a non-humanoid robot...
Robotic emotional expressions could benefit social communication between humans and robots, if the cues such expressions contain were to be intelligible to human observers. In this paper, we present a design framework for modelling emotionally expressive robotic movements. The framework combines approach-avoidance with Shape and Effort dimensions,...
This paper describes a novel experimental setup exploiting state-of-the-art capture equipment to collect a multimodally rich game-solving collaborative multiparty dialogue corpus. The corpus is targeted and designed towards the development of a dialogue system platform to explore verbal and nonverbal tutoring strategies in multiparty spoken interac...
In this paper, we describe a project that explores a novel experi-mental setup towards building a spoken, multi-modally rich, and human-like multiparty tutoring robot. A human-robot interaction setup is designed, and a human-human dialogue corpus is collected. The corpus targets the development of a dialogue system platform to study verbal and nonv...
This project explores a novel experimental setup towards building spoken, multi-modally rich, and human-like multiparty tutoring agent. A setup is developed and a corpus is collected that targets the development of a dialogue system platform to explore verbal and nonverbal tutoring strategies in multiparty spoken interactions with embodied agents....
Questions
Question (1)
I'm trying to build an affective lego robot for my study of human-robot collaboration. Could you suggest me what to read about expressing basic affect in robots (especially with a help of eyebrows and upper body)? And I'd be very grateful if you could answer this short survey so I could get some data about the current emotional expressions of my robot: http://kwiksurveys.com/s.asp?sid=x58ni0wxzxid65t122704 .