Yuqi Guo’s research while affiliated with University of North Carolina at Charlotte and other places


Ad

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (3)


Social media data retrieval, sampling, and pre-processing. using Twitter as an example for online social platforms.
Social media data mining, NLP, and integration with epidemic modeling.
Content and sentiment surveillance (CSI): A critical component for modeling modern epidemics
  • Article
  • Full-text available

March 2023

·

41 Reads

Frontiers in Public Health

·

Shuhua Jessica Yin

·

Yuqi Guo

·

[...]

·

Dongsong Zhang

Comprehensive surveillance systems are the key to provide accurate data for effective modeling. Traditional symptom-based case surveillance has been joined with recent genomic, serologic, and environment surveillance to provide more integrated disease surveillance systems. A major gap in comprehensive disease surveillance is to accurately monitor potential population behavioral changes in real-time. Population-wide behaviors such as compliance with various interventions and vaccination acceptance significantly influence and drive the overall epidemic dynamics in the society. Original infoveillance utilizes online query data (e.g., Google and Wikipedia search of a specific content topic such as an epidemic) and later focuses on large volumes of online discourse data about the from social media platforms and further augments epidemic modeling. It mainly uses number of posts to approximate public awareness of the disease, and further compares with observed epidemic dynamics for better projection. The current COVID-19 pandemic shows that there is an urgency to further harness the rich, detailed content and sentiment information, which can provide more accurate and granular information on public awareness and perceptions toward multiple aspects of the disease, especially various interventions. In this perspective paper, we describe a novel conceptual analytical framework of content and sentiment infoveillance (CSI) and integration with epidemic modeling. This CSI framework includes data retrieval and pre-processing; information extraction via natural language processing to identify and quantify detailed time, location, content, and sentiment information; and integrating infoveillance with common epidemic modeling techniques of both mechanistic and data-driven methods. CSI complements and significantly enhances current epidemic models for more informed decision by integrating behavioral aspects from detailed, instantaneous infoveillance from massive social media data.

Download

Will You Be Vaccinated? A Methodology for Annotating and Analyzing Twitter Data to Measure the Stance Towards COVID-19 Vaccination

March 2022

·

16 Reads

·

2 Citations

People turn to social media to express their opinion towards different topics and issues. This makes social media a valuable resource for mining public opinion. Stance detection is an approach to analyzing social media users’ content to determine public opinion. In this paper, we present a replicable methodology for coding tweets for stance detection towards the COVID-19 vaccination. The methodology includes a codebook for coding the stance towards COVID-19 vaccination and 2 approaches to sampling Twitter data for manual coding: keywords and hashtags. The codebook provides a template for other researchers to code social media data for stance towards vaccination. Our analysis of the results from 2 sampling approaches shows that sampling with hashtags leads to high inter-coder agreement. We analyze the stance and compare it with the results from sentiment analysis on the same dataset to highlight the distinction of our methodology for stance analysis when compared to sentiment analysis towards vaccination. The major contributions of this paper are: a replicable methodology for annotating stance towards COVID-19 vaccination with a codebook and dataset, and a comparison of sampling Twitter data with keywords and hashtags on the inter-coder agreement along with the resulting distribution of stance.KeywordsCOVID-19VaccinationPublic opinionStance


Predictors of underutilization of lung cancer screening: a machine learning approach

January 2022

·

23 Reads

·

6 Citations

European Journal of Cancer Prevention

Lung cancer is the second common cancer and a leading cause of cancer-related death in the US. Unfavorably, the prevalence of using low-dose computed tomography (LDCT) for lung cancer prevention in the US has remained below 4% over time. The purpose of this study is to develop machine learning models to analyze interactive pathways of factors associated with lung cancer screening use with the LDCT. The study was based on the data retrieved from the 2018 Behavioral Risk Factor Surveillance System. After dealing with missing values, 86 variables and 710 samples were included in the decision tree model and the random forest model. The data were randomly split into training (569/710, 80%) and testing (141/710, 20%) sets. Gini impurity is used to select and determine the optimal split of the nodes in the model. Machine learning performance was evaluated by model accuracy, sensitivity, specificity, F1 score, etc. The average performance metrics of the decision tree model were obtained: average accuracy is 67.78%, F1 score is 65.76%, sensitivity is 62.52%, and specificity is 73.57% based on 100 runs. In the decision model, nine interactive pathways were identified among the following factors: average drinks per month, BMI, diabetes, first smoke age, years of smoking, year(s) quit smoking, sex, last sigmoidoscopy or colonoscopy, last dental visit, general health, insurance, education, and last Pap test. Lung cancer screening utilization is the result of the interplay of multifactors. Lung cancer screening programs in clinical settings should not only focus on patients' smoking behaviors but also consider other socioeconomic factors.

Ad

Citations (1)


... For the overall population, the algorithm achieved an AUC score of 0.90 and 0.87 for patients over the age of 55 years. Guo et al. trained ML models on low-dose CT and found an accuracy of 0.6778, a F1 score of 0.6575, a sensitivity of 0.6252, and a specificity of 0.7357 [45]. More notably, the interactive pathways were BMI, DM, first smoke age, average drinks per month, years of smoking, year(s) since quitting smoking, sex, last dental visit, general health, insurance, education, last PAP test, and last sigmoidoscopy or colonoscopy. ...

Reference:

Machine-Learning-Based Prediction Modelling in Primary Care: State-of-the-Art Review
Predictors of underutilization of lung cancer screening: a machine learning approach
  • Citing Article
  • January 2022

European Journal of Cancer Prevention