Fig 4 - uploaded by Alireza Rafiei
Content may be subject to copyright.
Source publication
Objective: The challenge of irregular temporal data, which is particularly prominent for medication use in the critically ill, limits the performance of predictive models. The purpose of this evaluation was to pilot test integrating synthetic data within an existing dataset of complex medication data to improve machine learning model prediction of...
Context in source publication
Context 1
... increment continued for the AUROC, reaching a peak of 0.83 for the meta-model when CTGAN's synthetic data was added and representing a more balanced trade-off among the other performance metrics. Figure 4 illustrates the AUROC curve including a 95% confidence interval for the five developed models, which were trained and validated using all the original and synthetically generated datasets. The meta-model, RF, and XGB demonstrated a similar curve pattern and higher AUROC. ...
Similar publications
In this paper we test the parsing performances of a multilingual parser on Old English data using different sets of languages, alone and combined with the target language, to train the models. We compare the results obtained by the models and we analyze more in deep the annotation of some peculiar syntactic constructions of the target language, pro...
Cervical cancer is a leading cause of cancer mortality, with approximately 90% of the 250,000 deaths per year occurring in low- and middle-income countries (LMIC). Secondary prevention with cervical screening involves detecting and treating precursor lesions; however, scaling screening efforts in LMIC has been hampered by infrastructure and cost co...
Brain tumors affect thousands of people worldwide each year and can be extremely fatal if not diagnosed early. They are challenging to diagnose due to their complexity and the overlapping features of different tumor types. This research explores the application of AI technology to expedite the diagnosis of brain tumors. The proposed AI-based approa...
Benefiting from well-trained deep neural networks (DNNs), model compression have captured special attention for computing resource limited equipment, especially edge devices. Knowledge distillation (KD) is one of the widely used compression techniques for edge deployment, by obtaining a lightweight student model from a well-trained teacher model re...
An efficient on-chip learning method based on neuron multiplexing is proposed in this paper to address the limitations of traditional on-chip learning methods, including low resource utilization and non-tunable parallelism. The proposed method utilizes a configurable neuron calculation unit (NCU) to calculate neural networks in different degrees of...
Citations
... In the medical domain, Serte et al. (2022) [43] proposed a data-efficient deep network for COVID-19 detection on CT images, showcasing the potential of artificial intelligence techniques in medical image analysis. Rafiei et al. (2023) [44] conducted a study on integrating synthetic data into electronic medical records to enhance machine learning predictions, specifically focusing on fluid overload prediction in ICU patients. Moreover, Kaabachi et al. (2024) [45] introduced a method for evaluating privacy risks in GAN by leveraging the discriminator outputs of the standard GAN architecture. ...
Credit score models are essential tools for evaluating creditworthiness and mitigating financial risks. However, the imbalanced nature of multi-class credit score datasets poses significant challenges for traditional classification algorithms, leading to poor performance in minority classes. This study explores the effectiveness of Generative Adversarial Network (GAN)-based oversampling methods, including CTGAN, CopulaGAN, WGAN-GP, and DraGAN, in addressing this issue. By synthesizing realistic data for minority classes and integrating it with majority class data, the study benchmarks these GAN-based methods across classical (KNN, Decision Tree, Logistic Regression) and ensemble machine learning models (XGBoost, Random Forest, LightGBM). Evaluation metrics such as accuracy and F1-score reveal that WGAN-GP consistently achieves superior performance, especially when combined with Random Forest, outperforming other methods in balancing dataset representation and enhancing classification accuracy. The results showed that WGAN-GP + RF achieved 0.873 in accuracy, 0.936 F1-score in the “good” class, 0.806 F1-score in the “poor” class, and 0.816 F1-score in the “standard” class. The findings underscore the potential of GAN-based oversampling in improving multi-class credit score classification and highlight future directions, including hybrid sampling and cost-sensitive learning, to address remaining challenges.
... [4][5][6][7] Given the complexity and prolific nature of mediation use in the ICU, data driven strategies are increasingly being employed to parse meaningful patterns for fluid overload prediction. [8][9][10] While research is ongoing regarding identification of predictors for fluid overload, minimal research has evaluated the impact of medications as potential contributors. 11,12 These studies have shown that medication regimen complexity, as measured by the medication regimen complexity-ICU (MRC-ICU), was related to fluid overload risk, using both traditional regression and supervised machine learning approaches. ...
... 11,12 These studies have shown that medication regimen complexity, as measured by the medication regimen complexity-ICU (MRC-ICU), was related to fluid overload risk, using both traditional regression and supervised machine learning approaches. [8][9][10] This score has also been shown to predict mortality 13 , LOS 14 , and prolonged duration of mechanical ventilation. 15-21 Moreover, pharmacophenotyping based approaches including MRC-ICU and employing a common data model (CDM) for ICU medications (ICURx) have previously been created to allow for unsupervised cluster analysis machine learning that showed unique patterns of medication use and ICU complications, including FO. 22,23 Therefore, quantifying patient-specific, medicationrelated data may be an important strategy in the prediction of fluid overload in critically adults. ...
INTRODUCTION: Intravenous (IV) medications are a fundamental cause of fluid overload (FO) in the intensive care unit (ICU); however, the association between IV medication use (including volume), administration timing, and FO occurrence remains unclear.
METHODS: This retrospective cohort study included consecutive adults admitted to an ICU ≥72 hours with available fluid balance data. FO was defined as a positive fluid balance ≥7% of admission body weight within 72 hours of ICU admission. After reviewing medication administration record (MAR) data in three-hour periods, IV medication exposure was categorized into clusters using principal component analysis (PCA) and Restricted Boltzmann Machine (RBM). Medication regimens of patients with and without FO were compared within clusters to assess for temporal clusters associated with FO using the Wilcoxon rank sum test. Exploratory analyses of the medication cluster most associated with FO for medications frequently appearing and used in the first 24 hours was conducted.
RESULTS: FO occurred in 127/927 (13.7%) of the patients enrolled. Patients received a median (IQR) of 31 (13-65) discrete IV medication administrations over the 72-hour period. Across all 47,803 IV medication administrations, ten unique IV medication clusters were identified with 121-130 medications in each cluster. Among the ten clusters, cluster 7 had the greatest association with FO; the mean number of cluster 7 medications received was significantly greater in patients in the FO cohort compared to patients who did not experience FO (25.6 vs.10.9. p<0.0001). 51 of the 127 medications in cluster 7 (40.2%) appeared in > 5 separate 3-hour periods during the 72-hour study window. The most common cluster 7 medications included continuous infusions, antibiotics, and sedatives/analgesics. Addition of cluster 7 medications to a prediction model with APACHE II score and receipt of diuretics improved the ability for the model to predict fluid overload (AUROC 5.65, p =0.0004).
CONCLUSIONS: Using ML approaches, a unique IV medication cluster was strongly associated with FO. Incorporation of this cluster improved the ability to predict development of fluid overload in ICU patients compared with traditional prediction models. This method may be further developed into real-time clinical applications to improve early detection of adverse outcomes.
Machine learning algorithms are used in diverse domains, many of which face significant challenges due to data imbalance. Studies have explored various approaches to address the issue, like data preprocessing, cost-sensitive learning, and ensemble methods. Generative Adversarial Networks (GANs) showed immense potential as a data preprocessing technique that generates good quality synthetic data. This study employs a systematic mapping methodology to analyze 3041 papers on GAN-based sampling techniques for imbalanced data sourced from four digital libraries. A filtering process identified 100 key studies spanning domains such as healthcare, finance, and cybersecurity. Through comprehensive quantitative analysis, this research introduces three categorization mappings as application domains, GAN techniques, and GAN variants used to handle the imbalanced nature of the data. GAN-based over-sampling emerges as an effective preprocessing method. Advanced architectures and tailored frameworks helped GANs to improve further in the case of data imbalance. GAN variants like vanilla GAN, CTGAN, and CGAN show great adaptability in structured imbalanced data cases. Interest in GANs for imbalanced data has grown tremendously, touching a peak in recent years, with journals and conferences playing crucial roles in transmitting foundational theories and practical applications. While with these advances, none of the reviewed studies explicitly explore hybridized GAN frameworks with diffusion models or reinforcement learning techniques. This gap leads to a future research idea develop innovative approaches for effectively handling data imbalance.