Malick Ebiele’s research while affiliated with Dublin City University and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (8)


Figure 1. Readmission class distribution in the dataset, which is highly imbalanced toward the negative class.
Figure 2. Power analysis with different effect size.
Figure 3. ROC curves for all experiment models.
Figure 4. Effectiveness of each data features' inclusion and techniques applied across different models.
Figure 5. Overall feature ranking based on SHAP absolute values across models.

+5

Forecasting Patient Early Readmission from Irish Hospital Discharge Records Using Conventional Machine Learning Models
  • Article
  • Full-text available

October 2024

·

17 Reads

·

1 Citation

Diagnostics

·

·

·

[...]

·

Background/Objectives: Predicting patient readmission is an important task for healthcare risk management, as it can help prevent adverse events, reduce costs, and improve patient outcomes. In this paper, we compare various conventional machine learning models and deep learning models on a multimodal dataset of electronic discharge records from an Irish acute hospital. Methods: We evaluate the effectiveness of several widely used machine learning models that leverage patient demographics, historical hospitalization records, and clinical diagnosis codes to forecast future clinical risks. Our work focuses on addressing two key challenges in the medical fields, data imbalance and the variety of data types, in order to boost the performance of machine learning algorithms. Furthermore, we also employ SHapley Additive Explanations (SHAP) value visualization to interpret the model predictions and identify both the key data features and disease codes associated with readmission risks, identifying a specific set of diagnosis codes that are significant predictors of readmission within 30 days. Results: Through extensive benchmarking and the application of a variety of feature engineering techniques, we successfully improved the area under the curve (AUROC) score from 0.628 to 0.7 across our models on the test dataset. We also revealed that specific diagnoses, including cancer, COPD, and certain social factors, are significant predictors of 30-day readmission risk. Conversely, bacterial carrier status appeared to have minimal impact due to lower case frequencies. Conclusions: Our study demonstrates how we effectively utilize routinely collected hospital data to forecast patient readmission through the use of conventional machine learning while applying explainable AI techniques to explore the correlation between data features and patient readmission rate.

Download

Forecasting Patient Early Readmission from Irish Hospital Discharge Records Using Conventional Machine Learning Models

October 2024

·

15 Reads

·

1 Citation

Predicting patient readmission is an important task for healthcare risk management, as it can help prevent adverse events, reduce costs, and improve patient outcomes. In this paper, we compare various conventional machine learning models on a multimodal dataset of electronic discharge records from an Irish acute hospital. We \khoi{evaluate the effectiveness of several widely-used Machine Learning models} that leverage patient demographics, historical hospitalization records, and clinical diagnoses codes, to forecast future clinical risks. \khoi{Our work focuses on addressing two key challenges in the medical fields: data imbalance and the variety of data types in order to boost the performance of Machine Learning algorithms. Through extensive benchmarking and the application of a variety of feature engineering techniques, we successfully improved the Area Under the Curve (AUROC) score from 0.628 to 0.7 across our models on the test dataset}. Furthermore, we also employ \khoiz{Shapley Additive Explanations (SHAP)} value visualization to\reviewertwo{v In the abstract line 6 SHAP is mentioned in initials should be written in details: SHAP (SHapley Additive exPlanations).} interpret the model predictions and identify both the key data features and disease codes associated to readmission risks, identifying a specific set of diagnoses codes that are significant predictors of readmission within 30 days. Our study demonstrates how we effectively utilize the routinely collected hospital data to forecast patient readmission through the use of conventional machine learning while applying explainable AI techniques to explore the correlation between data features and patient readmission rate.


A systems approach to managing the risk of healthcare acquired infection in an acute hospital setting supported by human factors ergonomics, data science, data governance and AI

September 2024

·

30 Reads

Innovative approaches are needed for managing risk and system change in healthcare. This paper presents a case study of a project that took place over two years, taking a systems approach to managing the risk of healthcare acquired infection in an acute hospital setting, supported by an Access Risk Knowledge Platform which brings together Human Factors Ergonomics, Data Science, Data Governance and AI expertise. Evidence for change including meeting notes and use of the platform were studied. The work on the project focused on first systematically building a rich picture of the current situation from a transdisciplinary perspective. This allowed for understanding risk in context and developing a better capability to support enterprise risk management and accountability. From there a linking of operational and risk data took place which led to mapping of the risk pattern in the hospital.


Figure 1: Experimental design for personalized metadata-based data valuation.
Personalization of Dataset Retrieval Results using a Metadata-based Data Valuation Method

July 2024

·

39 Reads

In this paper, we propose a novel data valuation method for a Dataset Retrieval (DR) use case in Ireland's National mapping agency. To the best of our knowledge, data valuation has not yet been applied to Dataset Retrieval. By leveraging metadata and a user's preferences, we estimate the personal value of each dataset to facilitate dataset retrieval and filtering. We then validated the data value-based ranking against the stakeholders' ranking of the datasets. The proposed data valuation method and use case demonstrated that data valuation is promising for dataset retrieval. For instance, the outperforming dataset retrieval based on our approach obtained 0.8207 in terms of NDCG@5 (the truncated Normalized Discounted Cumulative Gain at 5). This study is unique in its exploration of a data valuation-based approach to dataset retrieval and stands out because, unlike most existing methods, our approach is validated using the stakeholders ranking of the datasets.




Fig. 1: Extended ML Process for Investigating the Impact of Data Valuation Metrics on Feature Importance
The number of entries per class before and after a stratified random sampling
Model performance report in percentage (%) and training and SHAP values time.
The Impact Of Data Valuation On Feature Importance In Classification Models

November 2023

·

164 Reads

This paper investigates the impact of data valuation metrics (variability and coefficient of variation) on the feature importance in classification models. Data valuation is an emerging topic in the fields of data science, accounting, data quality, and information economics concerned with methods to calculate the value of data. Feature importance or ranking is important in explaining how black-box machine learning models make predictions as well as selecting the most predictive features while training these models. Existing feature importance algorithms are either computationally expensive (e.g. SHAP values) or biased (e.g. Gini importance in Tree-based models). No previous investigation of the impact of data valuation metrics on feature importance has been conducted. Five popular machine learning models (eXtreme Gradient Boosting (XGB), Random Forest (RF), Logistic Regression (LR), Multi-Layer Perceptron (MLP), and Naive Bayes (NB)) have been used as well as six widely implemented feature ranking 1 2 M. Ebiele et al. and SHAP values) to investigate the relationship between feature importance and data valuation metrics for a clinical use case. XGB outperforms the other models with a weighted F1-score of 79.72%. The findings suggest that features with variability greater than 0.4 or a coefficient of variation greater than 23.4 have little to no value; therefore, these features can be filtered out during feature selection. This result, if generalisable, could simplify feature selection and data preparation.


Data Value Dimensions included in Data Value Models Studied
A Systematic Survey of Data Value: Models, Metrics, Applications and Research Challenges

January 2023

·

204 Reads

·

4 Citations

IEEE Access

Data is central to modern decision making and value creation. Society creates, consumes and collects data at an increasing pace. Despite advances in processing power, data is expensive to maintain and curate. So, it is imperative to have methods and tools to distinguish between data based on its value. Yet, there is no consensus on what characterises the value of data or how this data value should be assessed. This results in heterogeneous data value models and inconsistent measurement techniques that are siloed in specific application domains. This limits the formalisation and exploitation of these concepts. We present in this paper a methodical literature analysis that discusses data value models, assessment metrics and current applications. We also highlight challenges hindering the development and exploitation of data value as concept. This leads to the identification of a set of research questions to help researchers contribute to this emerging field. The aim of this article is to stimulate further research and deployment of quantitative data value models and value-driven applications.

Citations (2)


... The automatic prediction of readmission requires machine learning algorithms capable of integrating different types of information (numerical, categorical, etc.). Conventional models [13] include feature extraction modules adapted to each type of information. ...

Reference:

Transformer-Based Prediction of Hospital Readmissions for Diabetes Patients
Forecasting Patient Early Readmission from Irish Hospital Discharge Records Using Conventional Machine Learning Models

Diagnostics

... This also aligns with (Bendechache et al., 2023), who conducted a systematic review of research exploring the concept of data value and its application in decision-making processes. They found that despite its conceptual origins dating back to at least 1980, the field remains immature, lacking commonly agreed terminologies, models, and approaches. ...

A Systematic Survey of Data Value: Models, Metrics, Applications and Research Challenges

IEEE Access