Shannon S. Hubany’s research while affiliated with Nemours Children’s Health System and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (1)


Fig. 1. This flowchart delineates the exclusion criteria applied to the questions considered in our study's analysis.
Fig. 2. The percentage of correct answers of chatGpT-4's responses for examination years 2018-2023, as well as the average percentage correct. additionally, this table shows the percentage of correct answers in each of the five examination sections, as well as the average percentage correct.
Fig. 3. The percentile rankings of chatGpT-4's performance across examinations from 2018 to 2023, utilizing each year's normal distribution for comparison against the performance of plastic surgery residents, alongside aggregated averages.
Fig. 4. comparison of the percentage of correct answers between chatGpT-3.5 6 and chatGpT-4 for examination years 2018-2022, as well as comparing the overall average percentage correct. Subsequent statistical analysis through a paired t test revealed a P value of less than 0.001 and a mean difference of 18.54.
Fig. 5. illustration of the difference in the percentage of correct answers between chatGpT-3.5 6 and chatGpT-4 for examination years 2018-2022 as well as a comparison of the overall average.

+1

ChatGPT-4 Surpasses Residents: A Study of Artificial Intelligence Competency in Plastic Surgery In-service Examinations and Its Advancements from ChatGPT-3.5
  • Article
  • Full-text available

September 2024

·

114 Reads

·

1 Citation

Shannon S. Hubany

·

Fernanda D. Scala

·

Kiana Hashemi

·

[...]

·

Angelo A. Leto Barone

Background ChatGPT, launched in 2022 and updated to Generative Pre-trained Transformer 4 (GPT-4) in 2023, is a large language model trained on extensive data, including medical information. This study compares ChatGPT’s performance on Plastic Surgery In-Service Examinations with medical residents nationally as well as its earlier version, ChatGPT-3.5. Methods This study reviewed 1500 questions from the Plastic Surgery In-service Examinations from 2018 to 2023. After excluding image-based, unscored, and inconclusive questions, 1292 were analyzed. The question stem and each multiple-choice answer was inputted verbatim into ChatGPT-4. Results ChatGPT-4 correctly answered 961 (74.4%) of the included questions. Best performance by section was in core surgical principles (79.1% correct) and lowest in craniomaxillofacial (69.1%). ChatGPT-4 ranked between the 61st and 97th percentiles compared with all residents. Comparatively, ChatGPT-4 significantly outperformed ChatGPT-3.5 in 2018–2022 examinations ( P < 0.001). Although ChatGPT-3.5 averaged 55.5% correctness, ChatGPT-4 averaged 74%, a mean difference of 18.54%. In 2021, ChatGPT-3.5 ranked in the 23rd percentile of all residents, whereas ChatGPT-4 ranked in the 97th percentile. ChatGPT-4 outperformed 80.7% of residents on average and scored above the 97th percentile among first-year residents. Its performance was comparable with sixth-year integrated residents, ranking in the 55.7th percentile, on average. These results show significant improvements in ChatGPT-4’s application of medical knowledge within six months of ChatGPT-3.5’s release. Conclusion This study reveals ChatGPT-4’s rapid developments, advancing from a first-year medical resident’s level to surpassing independent residents and matching a sixth-year resident’s proficiency.

Download