Cecilia Mascolo’s research while affiliated with University of Cambridge and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (455)


ECG-DPM: Electrocardiogram Generation via a Spectrogram-based Diffusion Probabilistic Model
  • Conference Paper

December 2024

·

20 Reads

Lujundong Li

·

Tong Xia

·

·

[...]

·

Cecilia Mascolo

An electrocardiogram (ECG) records the electrical signals from the heart to assess various cardiovascular conditions. Deep learning methods have been proposed to model ECGs, but the insufficient availability of ECG data and annotations often hinders their performance. To address this challenge, this paper explores the latest data synthesis technique, i.e., diffusion proba-bilistic models (DPMs), to enable the generation of an unlimited number of ECGs representing various cardiovascular conditions. In contrast to previous approaches that treat ECGs as time series data or convert them into power spectrograms, we introduce a novel multi-channel spectrogram-based diffusion framework. In our method, the diffusion model enhances generation diversity, while the multi-channel spectrogram preserves both magnitude and phase information, ensuring high fidelity in the reconstructed ECGs. Extensive experiments conducted on real-world ECG data demonstrate the superiority of our approach. Notably, our ECG-DPM outperforms the best baseline by a margin ranging from 12.5% to 62.5% when generating ECGs for 30 seconds.




Electrocardiogram-Language Model for Few-Shot Question Answering with Meta Learning
  • Preprint
  • File available

October 2024

·

21 Reads

Electrocardiogram (ECG) interpretation requires specialized expertise, often involving synthesizing insights from ECG signals with complex clinical queries posed in natural language. The scarcity of labeled ECG data coupled with the diverse nature of clinical inquiries presents a significant challenge for developing robust and adaptable ECG diagnostic systems. This work introduces a novel multimodal meta-learning method for few-shot ECG question answering, addressing the challenge of limited labeled data while leveraging the rich knowledge encoded within large language models (LLMs). Our LLM-agnostic approach integrates a pre-trained ECG encoder with a frozen LLM (e.g., LLaMA and Gemma) via a trainable fusion module, enabling the language model to reason about ECG data and generate clinically meaningful answers. Extensive experiments demonstrate superior generalization to unseen diagnostic tasks compared to supervised baselines, achieving notable performance even with limited ECG leads. For instance, in a 5-way 5-shot setting, our method using LLaMA-3.1-8B achieves accuracy of 84.6%, 77.3%, and 69.6% on single verify, choose and query question types, respectively. These results highlight the potential of our method to enhance clinical ECG interpretation by combining signal processing with the nuanced language understanding capabilities of LLMs, particularly in data-constrained scenarios.

Download

StatioCL: Contrastive Learning for Time Series via Non-Stationary and Temporal Contrast

October 2024

·

8 Reads

Contrastive learning (CL) has emerged as a promising approach for representation learning in time series data by embedding similar pairs closely while distancing dissimilar ones. However, existing CL methods often introduce false negative pairs (FNPs) by neglecting inherent characteristics and then randomly selecting distinct segments as dissimilar pairs, leading to erroneous representation learning, reduced model performance, and overall inefficiency. To address these issues, we systematically define and categorize FNPs in time series into semantic false negative pairs and temporal false negative pairs for the first time: the former arising from overlooking similarities in label categories, which correlates with similarities in non-stationarity and the latter from neglecting temporal proximity. Moreover, we introduce StatioCL, a novel CL framework that captures non-stationarity and temporal dependency to mitigate both FNPs and rectify the inaccuracies in learned representations. By interpreting and differentiating non-stationary states, which reflect the correlation between trends or temporal dynamics with underlying data patterns, StatioCL effectively captures the semantic characteristics and eliminates semantic FNPs. Simultaneously, StatioCL establishes fine-grained similarity levels based on temporal dependencies to capture varying temporal proximity between segments and to mitigate temporal FNPs. Evaluated on real-world benchmark time series classification datasets, StatioCL demonstrates a substantial improvement over state-of-the-art CL methods, achieving a 2.9% increase in Recall and a 19.2% reduction in FNPs. Most importantly, StatioCL also shows enhanced data efficiency and robustness against label scarcity.


Figure 1: Automated consultation and auscultation for respiratory health screening.
Figure 2: Multimodal models for respiratory health prediction. (a) Existing concatenation-based fusion method. (b) Our LLM-based fusion method.
Figure 4: The model architecture of RespLLM. Text embeddings from task prompts and personal DMS, along with audio embeddings from respiratory sounds, are sequentialized as input for the LLM consisting of multiple transformer blocks.
RespLLM: Unifying Audio and Text with Multimodal LLMs for Generalized Respiratory Health Prediction

October 2024

·

46 Reads

The high incidence and mortality rates associated with respiratory diseases underscores the importance of early screening. Machine learning models can automate clinical consultations and auscultation, offering vital support in this area. However, the data involved, spanning demographics, medical history, symptoms, and respiratory audio, are heterogeneous and complex. Existing approaches are insufficient and lack generalizability, as they typically rely on limited training data, basic fusion techniques, and task-specific models. In this paper, we propose RespLLM, a novel multimodal large language model (LLM) framework that unifies text and audio representations for respiratory health prediction. RespLLM leverages the extensive prior knowledge of pretrained LLMs and enables effective audio-text fusion through cross-modal attentions. Instruction tuning is employed to integrate diverse data from multiple sources, ensuring generalizability and versatility of the model. Experiments on five real-world datasets demonstrate that RespLLM outperforms leading baselines by an average of 4.6% on trained tasks, 7.9% on unseen datasets, and facilitates zero-shot predictions for new tasks. Our work lays the foundation for multimodal models that can perceive, listen to, and understand heterogeneous data, paving the way for scalable respiratory health diagnosis.




Electrocardiogram Report Generation and Question Answering via Retrieval-Augmented Self-Supervised Modeling

September 2024

·

22 Reads

Interpreting electrocardiograms (ECGs) and generating comprehensive reports remain challenging tasks in cardiology, often requiring specialized expertise and significant time investment. To address these critical issues, we propose ECG-ReGen, a retrieval-based approach for ECG-to-text report generation and question answering. Our method leverages a self-supervised learning for the ECG encoder, enabling efficient similarity searches and report retrieval. By combining pre-training with dynamic retrieval and Large Language Model (LLM)-based refinement, ECG-ReGen effectively analyzes ECG data and answers related queries, with the potential of improving patient care. Experiments conducted on the PTB-XL and MIMIC-IV-ECG datasets demonstrate superior performance in both in-domain and cross-domain scenarios for report generation. Furthermore, our approach exhibits competitive performance on ECG-QA dataset compared to fully supervised methods when utilizing off-the-shelf LLMs for zero-shot question answering. This approach, effectively combining self-supervised encoder and LLMs, offers a scalable and efficient solution for accurate ECG interpretation, holding significant potential to enhance clinical decision-making.



Citations (51)


... OmniBuds could potentially integrate emotion recognition based on physiological changes, such as heart rate, body temperature, voice tone and facial expressions. For example, the device could monitor stress, anxiety and fatigue and adjust audio feedback accordingly, offering calming sounds or adjusting music tempo to match the user's emotional state [8]. This speculative application leverages multi-modal sensor data available on OmniBuds and their privacy-preserving design. ...

Reference:

OmniBuds: A Sensory Earable Platform for Advanced Bio-Sensing and On-Device Machine Learning
EarTune: Exploring the Physiology of Music Listening
  • Citing Conference Paper
  • October 2024

... However, this process is hindered by the phenomenon known as "catastrophic forgetting" (CF) [36,44], where learning new tasks with fresh data causes the model to overwrite its prior knowledge, leading to a drastic decline in performance on older tasks. To tackle this challenge, CL has gained sig-nificant attention in recent years [5,27,37,41,45,48,53]. CL aims to develop methods that enable models to learn from a continuous stream of data by balancing the retention of prior knowledge with the ability to adapt to new information. ...

Kaizen: Practical self-supervised continual learning with continual fine-tuning
  • Citing Conference Paper
  • January 2024

... Healthcare monitoring is crucial in our daily lives as it allows for the early detection of hidden diseases, enables timely interventions, and helps sustain well-being by continuously tracking physical status and other health metrics [1][2][3][4][5][6]. The growing development of artificial intelligence has led to significant attention on large language models (LLMs), which demonstrate the capability to comprehensively understand and analyze vast amounts of unstructured data, such as time-series data collected by wearable sensors, thereby facilitating health status predictions [7]. ...

UR2M: Uncertainty and Resource-Aware Event Detection on Microcontrollers
  • Citing Conference Paper
  • March 2024

... Butkow et al. 35 presented a novel approach for detecting PR which utilized an in-ear acoustic sensor. The methodology included a 4th-order Butterworth band-pass filter [0.5-50 Hz], followed by short-time Fourier transform in order to compute the spectrogram, which was normalized and then fed to a pre-trained U-Net model which attempted to map acoustic spectrograms to ECG spectrograms (obtained by Zephyr BioHarness 3.0 (Medtronic)). ...

An evaluation of heart rate monitoring with in-ear microphones under motion
  • Citing Article
  • March 2024

Pervasive and Mobile Computing

... Evidential Deep Learning. The evidential deep learning typically utilizes empster-Shafer Theory of Evidence (DST) [24] and Subjective Logic (SL) [25] for uncertainty estimation [19], [26]- [30]. In this setup, for a classification problem encompassing K classes, the softmax function typically used in the final layer of the classification model in party m is substituted with an activation function. ...

Uncertainty-Aware Health Diagnostics via Class-Balanced Evidential Deep Learning
  • Citing Article
  • February 2024

IEEE Journal of Biomedical and Health Informatics

... Hui Lu et al. achieved SOTA performance in murmur detection using a lightweight CNN with wide features and Melspectrogram [4], while LCNN, dual Bayesian ResNet, hidden semi-Markov model, and hierarchical multi-scale convolutional neural networks were also deployed to achieve competitive performance in the Physionet 2022 challenge [5,6,26,27]. Derivative works include the use of a new wav2vec model [28], multi-scale CNN with LSTM [29], and complexvalued networks [30]. Beyond the pursuit of higher classification performance, efforts have also been made to enhance model interpretability. ...

Embracing the Imaginary: Deep Complex-valued Networks for Heart Murmur Detection
  • Citing Conference Paper
  • November 2023

... For resource-constrained devices, previous research focused arXiv:2410.18378v1 [cs.LG] 24 Oct 2024 on optimizing the usage of limited hardware resources to facilitate the efficient on-device deployment of cloud-side CL solutions [26,36], such as saving storage through data quantization [29,56], accelerating data loading via hierarchical memory management [39,46], and speeding up computation by optimizing the allocation of hardware resources [35,40]. ...

LifeLearner: Hardware-Aware Meta Continual Learning System for Embedded Computing Platforms

... In addition to monitoring all five vital signs and their derivatives, the sensor suite in OmniBuds enables the detection of typical motion-based contexts commonly tracked by earable devices. These include physical activity [24,27], head tracking [16], head gestures [27], and facial expressions [26,30]. While primarily driven by the integrated 9-axis IMU, more complex activities such as dietary monitoring [24], energy expenditure [20], and mental fatigue assessment [22] require combining data from multiple sensors, such as PPG and microphones. ...

EarSet: A Multi-Modal Dataset for Studying the Impact of Head and Facial Movements on In-Ear PPG Signals

Scientific Data

... Sanchez et al. [4] evaluate the advantages and challenges of using causal machine learning (CML) in healthcare and clinical settings. Similarly, Dang et al. [5] contribute a review into AI's intersection with mobile health sensing, highlighting the role of key modalities such as audio, location and motion data. ...

Human-centred artificial intelligence for mobile health sensing: challenges and opportunities