Minjung Kim’s research while affiliated with Ewha Womans University and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (2)


Prediction of human pharmacokinetic parameters incorporating SMILES information
  • Article

November 2024

·

7 Reads

Archives of Pharmacal Research

Jae-Hee Kwon

·

Ja-Young Han

·

Minjung Kim

·

[...]

·

This study aimed to develop a model incorporating natural language processing analysis for the simplified molecular-input line-entry system (SMILES) to predict clearance (CL) and volume of distribution at steady state (Vd,ss) in humans. The construction of CL and Vd,ss prediction models involved data from 435 to 439 compounds, respectively. In machine learning, features such as animal pharmacokinetic data, in vitro experimental data, molecular descriptors, and SMILES were utilized, with XGBoost employed as the algorithm. The ChemBERTa model was used to analyze substance SMILES, and the last hidden layer embedding of ChemBERTa was examined as a feature. The model was evaluated using geometric mean fold error (GMFE), r2, root mean squared error (RMSE), and accuracy within 2- and 3-fold error. The model demonstrated optimal performance for CL prediction when incorporating animal pharmacokinetic data, in vitro experimental data, and SMILES as features, yielding a GMFE of 1.768, an r2 of 0.528, an RMSE of 0.788, with accuracies within 2-fold and 3-fold error reaching 75.8% and 81.8%, respectively. The model's performance in Vd,ss prediction was optimized by leveraging animal pharmacokinetic data and in vitro experimental data as features, yielding a GMFE of 1.401, an r2 of 0.902, an RMSE of 0.413, with accuracies within 2-fold and 3-fold error reaching 93.8% and 100%, respectively. This study has developed a highly predictive model for CL and Vd,ss. Specifically, incorporating SMILES information into the model has predictive power for CL.


Figure 1. Study flow diagram.
Figure 4. t-SNE visualization of the first hidden-layer embeddings (upper left) and the last hiddenlayer embeddings (upper right) before fine-tuning and the first hidden-layer embeddings (lower left) and the last hidden-layer embeddings (lower right) after fine-tuning.
BERT models used in the study.
Classification performance of BERT models.
Predicted results with BERTweet-large model.
Fine-Tuning BERT Models to Classify Misinformation on Garlic and COVID-19 on Twitter
  • Article
  • Full-text available

April 2022

·

160 Reads

·

14 Citations

Garlic-related misinformation is prevalent whenever a virus outbreak occurs. With the outbreak of COVID-19, garlic-related misinformation is spreading through social media, including Twitter. Bidirectional Encoder Representations from Transformers (BERT) can be used to classify misinformation from a vast number of tweets. This study aimed to apply the BERT model for classifying misinformation on garlic and COVID-19 on Twitter, using 5929 original tweets mentioning garlic and COVID-19 (4151 for fine-tuning, 1778 for test). Tweets were manually labeled as ‘misinformation’ and ‘other.’ We fine-tuned five BERT models (BERTBASE, BERTLARGE, BERTweet-base, BERTweet-COVID-19, and BERTweet-large) using a general COVID-19 rumor dataset or a garlic-specific dataset. Accuracy and F1 score were calculated to evaluate the performance of the models. The BERT models fine-tuned with the COVID-19 rumor dataset showed poor performance, with maximum accuracy of 0.647. BERT models fine-tuned with the garlic-specific dataset showed better performance. BERTweet models achieved accuracy of 0.897–0.911, while BERTBASE and BERTLARGE achieved accuracy of 0.887–0.897. BERTweet-large showed the best performance with maximum accuracy of 0.911 and an F1 score of 0.894. Thus, BERT models showed good performance in classifying misinformation. The results of our study will help detect misinformation related to garlic and COVID-19 on Twitter.

Download

Citations (1)


... Transformer models exhibit a high degree of adaptability to transfer learning, a process where a pre-trained model on one task or dataset can be fine-tuned on a different, often smaller, dataset for a specific task [27]. In a previous study, ChemBERTa demonstrated its ability to identify toxic chemicals from the ClinTox dataset and p53 stress-response pathway activators from the Tox21 dataset, achieving AUC-ROC values of 0.733 and 0.728, respectively [14]. ...

Reference:

Integration of the Natural Language Processing of Structural Information Simplified Molecular-Input Line-Entry System Can Improve the In Vitro Prediction of Human Skin Sensitizers
Fine-Tuning BERT Models to Classify Misinformation on Garlic and COVID-19 on Twitter