Soo-Yong Shin

Soo-Yong Shin
Sungkyunkwan University | SKKU · Department of Digital Health

PhD

About

140
Publications
17,628
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
2,048
Citations

Publications

Publications (140)
Chapter
As the number of cases for COVID-19 continues to grow unprecedentedly, COVID-19 screening is becoming more important. In this study, we trained machine learning models from the Israel COVID-19 dataset and compared models that used surveillance indices of COVID-19 and those that did not. The AUC scores were 0.8478±0.0037 and 0.8062±0.005 with and wi...
Chapter
It is very important to ensure reliable performance of deep learning model for future dataset for healthcare. This is more pronounced in the case of patient generated health data such as patient reported symptoms, which are not collected in a controlled environment. Since there has been a big difference in influenza incidence since the COVID-19 pan...
Article
Objectives: The outlook of artificial intelligence for healthcare (AI4H) is promising. However, no studies have yet discussed the issues from the perspective of stakeholders in Korea. This research aimed to identify stakeholders' requirements for AI4H to accelerate the business and research of AI4H. Methods: We identified research funding trends...
Article
Background: The most important aspect of a retrospective cohort study is the operational definition (OP) of the disease. We developed a detailed OP for the detection of sodium-glucose cotransporter-2 inhibitors (SGLT2i) related to diabetic ketoacidosis (DKA). The OP was systemically verified and analyzed. Methods: All patients prescribed SGLT2i...
Article
Full-text available
In clinical practice, assessing digital health literacy is important to identify patients who may encounter difficulties adapting to digital health using digital technology and service. We developed the Digital Health Technology Literacy Assessment Questionnaire (DHTL-AQ) to assess the ability to use digital health technology, services, and data. T...
Article
Full-text available
PurposeThis study aims to identify factors associated with the adoption and compliance of electronic patient-reported outcome measure (ePROM) use among cancer patients in a real-world setting.Methods This prospective cohort study was conducted at the Samsung Medical Center in Seoul, Korea, from September 2018 to January 2019. Cancer patients aged 1...
Article
Full-text available
Protecting patients’ privacy is one of the most important tasks when developing medical artificial intelligence models since medical data is the most sensitive personal data. To overcome this privacy protection issue, diverse privacy-preserving methods have been proposed. We proposed a novel method for privacy-preserving Gated Recurrent Unit (GRU)...
Article
Full-text available
Purpose: Triple-negative breast cancer (TNBC) is well known for its aggressive course and poor prognosis. In this study, we sought to investigate clinical, demographic, and pathologic characteristics and treatment outcomes of patients with refractory, metastatic TNBC selected by a clinical data warehouse (CDW) approach. Patients and methods: Dat...
Article
Personal medical information is an essential resource for research; however, there are laws that regulate its use, and it typically has to be pseudonymized or anonymized. When data are anonymized, the quantity and quality of extractable information decrease significantly. From the perspective of a clinical researcher, a method of achieving pseudony...
Article
This study was conducted as a pilot project to evaluate the feasibility of building an integrate dementia platform converging preexisting dementia cohorts from several variable levels. The following four cohorts were used to develop this pilot platform: 1) Clinical Research Center for Dementia of South Korea (CREDOS), 2) Korean Brain Aging Study fo...
Article
Objectives: An increasing emphasis has been placed on the integration of clinical data and patient-generated health data (PGHD), which are generated outside of hospitals. This study explored the possibility of using standard terminologies to represent PGHD for data integration. Methods: We chose the 2020 general health checkup questionnaire of t...
Article
Background: The aim of this study was to investigate the relationship between changes in breast density during menopause and breast cancer (BC) risk. Methods: This study was a retrospective, longitudinal cohort study for women over 30 years of age who had undergone breast mammography serially at baseline and postmenopause during regular health c...
Article
e24085 Background: Breakthrough cancer pain (BTcP), a transitory flare of pain that occurs on a background of relatively well-controlled baseline pain, is a challenging clinical problem in managing cancer pain. We hypothesized that the BTcP could be predictable according to the patients’ previous observed patterns. In this study, we report on the d...
Article
Full-text available
Purpose: The purpose of the study was to validate the Korean version of Patient-Reported Outcomes Measurement Information System 29 Profile V2.1 (K-PROMIS-29 V2.1) among cancer survivors. Materials and methods: Participants were recruited from outpatient clinics of the Comprehensive Cancer Center at the Samsung Medical Center in Seoul, South Kor...
Preprint
Full-text available
Background Precision medicine (PM) is a growing area of interest in cancer care. However, relatively little is known about the public attitudes toward PM and the factors associated with the willingness to participate in the construction of national registries for PM. Methods A cross-sectional survey was conducted with 1,500 cancer patients and 1,4...
Article
Full-text available
This study aimed to analyze the proportion, characteristics and prognosis of untreated hepatocellular carcinoma (HCC) patients in a large representative nationwide study. A cohort study was conducted using the National Health Insurance Service (NHIS) database in Korea. A total of 63,668 newly-diagnosed HCC patients between January 2008 and December...
Preprint
BACKGROUND Colorectal cancer is a leading cause of cancer deaths. Several screening tests such as colonoscopy can be used to find polyps or colorectal cancer. Colonoscopy reports are often written in unstructured narrative text. The information embedded in the reports can be used for various purposes, including colorectal cancer risk prediction, fo...
Article
Full-text available
In recent years, artificial intelligence (AI) technologies have greatly advanced and become a reality in many areas of our daily lives. In the health care field, numerous efforts are being made to implement the AI technology for practical medical treatments. With the rapid developments in machine learning algorithms and improvements in hardware per...
Article
Full-text available
Background: Despite the great benefits of mobile health applications (mHAs) in managing non-communicable diseases (NCDs) internationally, studies have documented general challenges to broad adoption of mHAs among older age groups. By focusing on broad adoption, these studies have been limited in their evaluation of adults aged 50 and older who hav...
Article
Background Federated learning (FL) is a newly proposed machine-learning method that uses a decentralized dataset. Since data transfer is not necessary for the learning process in FL, there is a significant advantage in protecting personal privacy. Therefore, many studies are being actively conducted in the applications of FL for diverse areas. Obj...
Preprint
BACKGROUND Screening for influenza in primary care is challenging due to the low sensitivity of rapid antigen tests and a lack of proper screening tests. OBJECTIVE We developed a machine learning-based screening tool using patient-generated health data (PGHD) obtained from a mobile application (mHealth app). METHODS We trained a deep learning mod...
Article
Background Screening for influenza in primary care is challenging due to the low sensitivity of rapid antigen tests and the lack of proper screening tests. Objective The aim of this study was to develop a machine learning–based screening tool using patient-generated health data (PGHD) obtained from a mobile health (mHealth) app. Methods We traine...
Article
Full-text available
Hepatocellular carcinoma (HCC) is one of the most common malignant tumors and a leading cause of cancer-related death worldwide. We propose a fully automated deep learning model to detect HCC using hepatobiliary phase magnetic resonance images from 549 patients who underwent surgical resection. Our model used a fine-tuned convolutional neural netwo...
Preprint
Full-text available
BACKGROUND Federated learning (FL) is the newly proposed machine learning framework that uses decentralized dataset. Since data transfer is not necessary for the learning process in FL, FL has the great advantage in protecting personal privacy. Due to this merit, many studies have been being actively performed on diverse application areas. OBJECTI...
Article
Background: Electronic health record (EHR) systems have been widely adopted in hospitals. However, since current EHRs mainly focus on lowering the number of paper documents used, they have suffered from poor search function and reusability capabilities. To overcome these drawbacks, structured clinical templates have been proposed; however, they ar...
Article
Full-text available
This study aimed to investigate awareness, attitudes, and perspectives on precision medicine among health professionals in Korea and to identify issues that need to be addressed before implementing precision medicine. Mixed methods research was applied. For qualitative research, a semi-structured focus group interview was conducted with six health...
Preprint
BACKGROUND Despite the great benefits of mobile health applications (mHAs) in managing non-communicable diseases (NCDs) internationally, studies have documented general challenges to broad adoption of mHAs among middle to older age groups. By focusing on broad adoption, these studies have been limited in their evaluation of patients who can benefit...
Article
Recently, digital health has gained the attention of physicians, patients, and healthcare industries. Digital health, a broad umbrella term, can be defined as an emerging health area that uses brand new digital or medical technologies involving genomics, big data, wearables, mobile applications, and artificial intelligence. Digital health has been...
Article
Background: Most bioinformatic tools for next generation sequencing (NGS) data are computationally intensive, requiring a large amount of computational power for processing and analysis. Here the utility of graphic processing units (GPUs) for NGS data computation is assessed. Method: In a previous study, we developed a probabilistic evolutionary...
Article
Full-text available
Intratumoral heterogeneity (ITH) refers to the presence of distinct tumor cell populations. It provides vital information for the clinical prognosis, drug responsiveness, and personalized treatment of cancer patients. As genomic ITH in various cancers affects the expression patterns of genes, the expression profile could be utilized for determining...
Preprint
BACKGROUND Despite the rapid adoption of genomic sequencing in clinical practice, clinical sequencing reports in electronic health record (EHR) systems are currently being written in unstructured formats such as PDF or free text. These formats hinder the implementation of a clinical decision support system and secondary research applications. There...
Article
Full-text available
Background: To implement standardized machine-processable clinical sequencing reports in an electronic health record (EHR) system, the International Organization for Standardization Technical Specification (ISO/TS) 20428 international standard was proposed for a structured template. However, there are no standard implementation guidelines for data...
Preprint
BACKGROUND The analytical capacity and speed of next generation sequencing (NGS) technology has been improved. Lots of genetic variants correlating with various diseases have been discovered based on NGS. Therefore, applying NGS to clinical practice results in precision or personalized medicine. However, as clinical sequencing reports in electronic...
Article
Background: The analytical capacity and speed of next-generation sequencing (NGS) technology have been improved. Many genetic variants associated with various diseases have been discovered using NGS. Therefore, applying NGS to clinical practice results in precision or personalized medicine. However, as clinical sequencing reports in electronic hea...
Article
Full-text available
Abstract The 17th International NETTAB workshop was held in Palermo, Italy, on October 16-18, 2017. The special topic for the meeting was “Methods, tools and platforms for Personalised Medicine in the Big Data Era”, but the traditional topics of the meeting series were also included in the event. About 40 scientific contributions were presented, in...
Article
Full-text available
Background: There has been significant effort in attempting to use health care data. However, laws that protect patients' privacy have restricted data use because health care data contain sensitive information. Thus, discussions on privacy laws now focus on the active use of health care data beyond protection. However, current literature does not...
Preprint
BACKGROUND Data standardization is essential in electronic health records (EHRs) for both clinical practice and retrospective research. However, it is still not easy to standardize EHR data because of nonidentical duplicates, typographical errors, or inconsistencies. To overcome this drawback, standardization efforts have been undertaken for collec...
Article
Background: Data standardization is essential in electronic health records (EHRs) for both clinical practice and retrospective research. However, it is still not easy to standardize EHR data because of nonidentical duplicates, typographical errors, or inconsistencies. To overcome this drawback, standardization efforts have been undertaken for coll...
Preprint
BACKGROUND Precision medicine (PM) is a growing area of interest in cancer care. Although the terms ‘precision medicine’ and ‘personalized medicine’ are used interchangeably, the former may be new both to cancer patients and the general population. Most previous studies evaluated peoples’ attitudes towards genetic testing as a part of personalized...
Preprint
BACKGROUND Electronic health record (EHR) systems have been widely adopted in hospitals. However, since the current EHRs mainly focus on removing paper documents, they have suffered from poor search function and reusability capabilities. To overcome these drawbacks, structured clinical templates have been proposed; however, they are not widely used...
Article
Full-text available
MicroRNA (miRNA) binding is primarily based on sequence, but structure-specific binding is also possible. Various prediction algorithms have been developed for predicting miRNA target genes; the results, however, have relatively high levels of false positives, and the degree of overlap between predicted targets from different methods is poor or nul...
Data
Additional File 2: the resulting lists of 377 GO terms.
Data
Additional File 1: the complete list of 57 miRNAs.
Article
Full-text available
Despite the growing adoption of the mobile health (mHealth) applications (apps), few studies address concerns with low retention rates. This study aimed to investigate how the usage patterns of mHealth app functions affect user retention. We collected individual usage logs for 1,439 users of single tethered personal health record app, which spanned...
Data
Variable descriptions and summary statistics. (DOCX)
Data
The difference between demographic groups. (DOCX)
Data
Distribution of inactive users of the mPHR app. (DOCX)
Data
The impacts of different mPHR functions on the probability of users abandoning the app. (DOCX)
Article
Motivation: Single individual haplotyping (SIH) is critical in genomic association studies and genetic diseases analysis. However, most genomic analysis studies do not perform haplotype-phasing analysis due to its complexity. Several computational methods have been developed to solve the SIH problem, but these approaches have not generated suffici...
Article
Full-text available
Objectives For earlier detection of infectious disease outbreaks, a digital syndromic surveillance system based on search queries or social media should be utilized. By using real-time data sources, a digital syndromic surveillance system can overcome the limitation of time-delay in traditional surveillance systems. Here, we introduce an approach t...
Article
Full-text available
The Middle East respiratory syndrome coronavirus (MERS-CoV) was exported to Korea in 2015, resulting in a threat to neighboring nations. We evaluated the possibility of using a digital surveillance system based on web searches and social media data to monitor this MERS outbreak. We collected the number of daily laboratory-confirmed MERS cases and q...
Article
Full-text available
BioC is a simple XML format for text, annotations and relations, and was developed to achieve interoperability for biomedical text processing. Following the success of BioC in BioCreative IV, the BioCreative V BioC track addressed a collaborative task to build an assistant system for BioGRID curation. In this paper, we describe the framework of the...
Article
Full-text available
Ventricular tachycardia (VT) is a potentially fatal tachyarrhythmia, which causes a rapid heartbeat as a result of improper electrical activity of the heart. This is a potentially life-threatening arrhythmia because it can cause low blood pressure and may lead to ventricular fibrillation, asystole, and sudden cardiac death. To prevent VT, we develo...
Article
Full-text available
BioC is an XML-based format designed to provide interoperability for text mining tools and manual curation results. A challenge of BioC as a standard format is to align annotations from multiple systems. Ideally, this should not be a major problem if users follow guidelines given by BioC key files. Nevertheless, the misalignment between text and an...
Article
Full-text available
Background: Mobile mental-health trackers are mobile phone apps that gather self-reported mental-health ratings from users. They have received great attention from clinicians as tools to screen for depression in individual patients. While several apps that ask simple questions using face emoticons have been developed, there has been no study exami...
Article
Full-text available
Background: Digital surveillance using internet search queries can improve both the sensitivity and timeliness of the detection of a health event, such as an influenza outbreak. While it has recently been estimated that the mobile search volume surpasses the desktop search volume and mobile search patterns differ from desktop search patterns, the...
Data
Time series plots of number of search queries having a strong correlation (r-value of ≥ 0.7) with KCDC ILI data from lag correlation analysis (two weeks preceding of search query). (TIF)
Data
Time series plots of number of search queries having a strong correlation (r-value of ≥ 0.7) with KCDC ILI data from lag correlation analysis (one week preceding of search query). (TIF)
Data
Time series plots of number of search queries having a strong correlation (r-value of ≥ 0.7) with KCDC virologic data from lag correlation analysis (one week preceding of search query). (TIF)
Data
All combined queries and raw data of this study. (XLS)
Data
Lag correlation analysis (one week preceding of search query) between search query data and KCDC ILI. (DOCX)