Figure - available via license: Creative Commons Attribution 4.0 International
Content may be subject to copyright.
Source publication
Natural language processing (NLP) technology has recently used to predict substance properties based on their Simplified Molecular-Input Line-Entry System (SMILES). We aimed to develop a model predicting human skin sensitizers by integrating text features derived from SMILES with in vitro test outcomes. The dataset on SMILES, physicochemical proper...
Citations
... As for patient risk stratification, Kwon et al. 18 recently developed an algorithm integrating text features derived from Simplified Molecular-Input Line-Entry System (SMILES) using NLP with in vitro test outcomes for predicting the risk of skin sensitization. Such an algorithm could be particularly useful for AD patients, which notably have a higher risk of cutaneous sensitization. ...
Background
Natural Language Processing (NLP) is a field of both computational linguistics and artificial intelligence (AI) dedicated to analysis and interpretation of human language.
Objectives
This systematic review aims at exploring all the possible applications of NLP techniques in the dermatological setting.
Methods
Extensive search on ‘natural language processing’ and ‘dermatology’ was performed on MEDLINE and Scopus electronic databases. Only journal articles with full text electronically available and English translation were considered. The PICO (Population, Intervention or exposure, Comparison, Outcome) algorithm was applied to our study protocol.
Results
Natural Language Processing (NLP) techniques have been utilized across various dermatological domains, including atopic dermatitis, acne/rosacea, skin infections, non‐melanoma skin cancers (NMSCs), melanoma and skincare. There is versatility of NLP in data extraction from diverse sources such as electronic health records (EHRs), social media platforms and online forums. We found extensive utilization of NLP techniques across diverse dermatological domains, showcasing its potential in extracting valuable insights from various sources and informing diagnosis, treatment optimization, patient preferences and unmet needs in dermatological research and clinical practice.
Conclusions
While NLP shows promise in enhancing dermatological research and clinical practice, challenges such as data quality, ambiguity, lack of standardization and privacy concerns necessitate careful consideration. Collaborative efforts between dermatologists, data scientists and ethicists are essential for addressing these challenges and maximizing the potential of NLP in dermatology.
This study aimed to develop a model incorporating natural language processing analysis for the simplified molecular-input line-entry system (SMILES) to predict clearance (CL) and volume of distribution at steady state (Vd,ss) in humans. The construction of CL and Vd,ss prediction models involved data from 435 to 439 compounds, respectively. In machine learning, features such as animal pharmacokinetic data, in vitro experimental data, molecular descriptors, and SMILES were utilized, with XGBoost employed as the algorithm. The ChemBERTa model was used to analyze substance SMILES, and the last hidden layer embedding of ChemBERTa was examined as a feature. The model was evaluated using geometric mean fold error (GMFE), r2, root mean squared error (RMSE), and accuracy within 2- and 3-fold error. The model demonstrated optimal performance for CL prediction when incorporating animal pharmacokinetic data, in vitro experimental data, and SMILES as features, yielding a GMFE of 1.768, an r2 of 0.528, an RMSE of 0.788, with accuracies within 2-fold and 3-fold error reaching 75.8% and 81.8%, respectively. The model's performance in Vd,ss prediction was optimized by leveraging animal pharmacokinetic data and in vitro experimental data as features, yielding a GMFE of 1.401, an r2 of 0.902, an RMSE of 0.413, with accuracies within 2-fold and 3-fold error reaching 93.8% and 100%, respectively. This study has developed a highly predictive model for CL and Vd,ss. Specifically, incorporating SMILES information into the model has predictive power for CL.