Han-Chin Shing’s research while affiliated with University of Maryland, College Park and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (8)


Entity Anchored ICD Coding
  • Preprint

August 2022

·

33 Reads

·

Han-Chin Shing

·

Luyang Kong

·

[...]

·

Medical coding is a complex task, requiring assignment of a subset of over 72,000 ICD codes to a patient's notes. Modern natural language processing approaches to these tasks have been challenged by the length of the input and size of the output space. We limit our model inputs to a small window around medical entities found in our documents. From those local contexts, we build contextualized representations of both ICD codes and entities, and aggregate over these representations to form document-level predictions. In contrast to existing methods which use a representation fixed either in size or by codes seen in training, we represent ICD codes by encoding the code description with local context. We discuss metrics appropriate to deploying coding systems in practice. We show that our approach is superior to existing methods in both standard and deployable measures, including performance on rare and unseen codes.


Figure 5: Faithfulness before and after correcting summaries with the Reviser, controlling for extractiveness.
Figure 6: The distribution of reference-level (left) and reference sentence-level (right) hallucination rates-the fraction of entity mentions not present in source text.
Figure 8: ReDRESS model outputs. Green represents non-hallucinated entities, red not present in input sentence, and red also not present in any of the source notes. The orange box shows BERTScore F1 vis-a-vis original. Due to the topical nature of the distractor set, all hallucinations except terminal ileitis exist elsewhere in the source notes.
Figure 9: Correlation of sampled noise levels to control codes.
Hospital-Admission Summarization Dataset.

+2

Learning to Revise References for Faithful Summarization
  • Preprint
  • File available

April 2022

·

89 Reads

·

2 Citations

In many real-world scenarios with naturally occurring datasets, reference summaries are noisy and contain information that cannot be inferred from the source text. On large news corpora, removing low quality samples has been shown to reduce model hallucinations. Yet, this method is largely untested for smaller, noisier corpora. To improve reference quality while retaining all data, we propose a new approach: to revise--not remove--unsupported reference content. Without ground-truth supervision, we construct synthetic unsupported alternatives to supported sentences and use contrastive learning to discourage/encourage (un)faithful revisions. At inference, we vary style codes to over-generate revisions of unsupported reference sentences and select a final revision which balances faithfulness and abstraction. We extract a small corpus from a noisy source--the Electronic Health Record (EHR)--for the task of summarizing a hospital admission from multiple notes. Training models on original, filtered, and revised references, we find (1) learning from revised references reduces the hallucination rate substantially more than filtering (18.4\% vs 3.8\%), (2) learning from abstractive (vs extractive) revisions improves coherence, relevance, and faithfulness, (3) beyond redress of noisy data, the revision task has standalone value for the task: as a pre-training objective and as a post-hoc editor.

Download


Figure 1: A medical encounter is an interaction between a patient and a healthcare provider.
Figure 2: An extractive-abstractive summarization pipeline. The recall-oriented extractor extract relevant sentences from prior documents, the abstractor smooths out irrelevant or duplicated information.
Figure 3: Relationship between source documents, reference summary, and system-generated summary.
Figure 4: ROUGE-L of summarization models vs. average word lengths of the medical sections. Sections (dotted vertical lines) from short to long: (A) Chief complaint, (B) Family history, (C) Social history, (D) Medications on admission, (E) Past medical history, (F) History of present illness, and (G) Brief hospital course.
Figure 5: NER-based incorrect hallucination rate of abstractive models vs. average word lengths. Extractors do not hallucinate. Order the same as Figure 4.
Towards Clinical Encounter Summarization: Learning to Compose Discharge Summaries from Prior Notes

April 2021

·

260 Reads

·

3 Citations

The records of a clinical encounter can be extensive and complex, thus placing a premium on tools that can extract and summarize relevant information. This paper introduces the task of generating discharge summaries for a clinical encounter. Summaries in this setting need to be faithful, traceable, and scale to multiple long documents, motivating the use of extract-then-abstract summarization cascades. We introduce two new measures, faithfulness and hallucination rate for evaluation in this task, which complement existing measures for fluency and informativeness. Results across seven medical sections and five models show that a summarization architecture that supports traceability yields promising results, and that a sentence-rewriting approach performs consistently on the measure used for faithfulness (faithfulness-adjusted F3F_3) over a diverse range of generated sections.



Assigning Medical Codes at the Encounter Level by Paying Attention to Documents

November 2019

·

21 Reads

The vast majority of research in computer assisted medical coding focuses on coding at the document level, but a substantial proportion of medical coding in the real world involves coding at the level of clinical encounters, each of which is typically represented by a potentially large set of documents. We introduce encounter-level document attention networks, which use hierarchical attention to explicitly take the hierarchical structure of encounter documentation into account. Experimental evaluation demonstrates improvements in coding accuracy as well as facilitation of human reviewers in their ability to identify which documents within an encounter play a role in determining the encounter level codes.


Unsupervised System Combination for Set-Based Retrieval with Expectation Maximization

August 2019

·

33 Reads

·

4 Citations

Lecture Notes in Computer Science

System combination has been shown to improve overall performance on many rank-based retrieval tasks, often by combining results from multiple systems into a single ranked list. In contrast, set-based retrieval tasks call for a technique to combine results in ways that require decisions on whether each document is in or out of the result set. This paper presents a set-generating unsupervised system combination framework that draws inspiration from evaluation techniques in sparse data settings. It argues for the existence of a duality between evaluation and system combination, and then capitalizes on this duality to perform unsupervised system combination. To do this, the framework relies on the consensus of the systems to estimate latent “goodness” for each system. An implementation of this framework using data programming is compared to other unsupervised system combination approaches to demonstrate its effectiveness on CLEF and MATERIAL collections.


Citations (6)


... ROUGE scores, which fail to evaluate summaries properly [20,21,22,23,11,24,25], suffer from another significant drawback: heavy reliance on reference summaries. Recent research highlighted that the quality of reference summaries in abstractive summarization is often subpar [7,34]. Thus, to measure the omission and hallucination of the summaries precisely, we employ UniEval [11] and human evaluation for multi-dimensional evaluation, and Chat-GPT evaluation [35] to examine the presence of inconsistencies in the summaries. ...

Reference:

Key-Element-Informed sLLM Tuning for Document Summarization
Learning to Revise References for Faithful Summarization
  • Citing Conference Paper
  • January 2022

... The BHC is a succinct summary of a patient's entire journey through the hospital and are embedded within complex discharge summaries. Efforts in compiling large-scale datasets for the generation of these BHC sections (Adams et al., 2021), including those with synthetic data (Adams et al., 2022), have led to subsequent contrastive learning methods for aligning generation models (Adams et al., 2023). Finally, methods leveraging heuristics to increase factuality (e.g., retrieval and ontology referencing) have also been developed (Adams et al., 2024;Hartman et al., 2023). ...

Learning to Revise References for Faithful Summarization

... On the basis of the survey of related works on summarization in the medical domain in general and in mental health in particular, we present a taxonomy of task formulations for summarization tasks in the medical domain (Figure 1 [11,15,[22][23][24]34,37,39,40,45,47,[49][50][51][52][53][54][55][56][57][58][59][60][61][62][63][64][65][66][67][68]). In general, medical text summarization is divided into research articles [49][50][51][52], reports, patient health questions, electronic health records, and dialogue summarization. ...

Towards Clinical Encounter Summarization: Learning to Compose Discharge Summaries from Prior Notes

... The alternative hypothesis was defined as the judgment variable distribution median being greater in the 'higher' anxiety group than the 'lower' anxiety group, or vice versa. The 'higher' anxiety group had higher medians for loss aversion (threshold = 35, p < 0.05), ante (threshold = 45, 55, p < 0.05), Peak PR (threshold = 45, p < 0.0083; 55, p < 0.05), and Total RR (threshold = 35, p < 0.05; 45, 55, p < 0.0083) when compared to the 'lower' anxiety group ( [65][66][67][71][72][73][74][75] . Fourth, all contextual variables exhibited significant differences in anxiety scores, and 11 of the 15 judgment variables differed when assessed for median shifts across 'higher' and 'lower' anxiety groups, indicating that a constellation of judgment alterations are predictive of anxiety levels. ...

A Prioritization Model for Suicidality Risk Assessment
  • Citing Conference Paper
  • January 2020

... Türe and Boschee [177] later extended this approach to include learned query-specific combination weights for a similar set of three translation probabilities (two with language model context and one with no context), finding further improvements. Shing et al. [168] explored an alternative approach, learning late fusion combi-nation weights without supervised (retrieval effectiveness) training by instead using data programming to estimate system weights. Nair et al. [122] compared early and late fusion using two approaches (one with language model context, one without), finding similar improvements over the best single system from both approaches. ...

Unsupervised System Combination for Set-Based Retrieval with Expectation Maximization
  • Citing Chapter
  • August 2019

Lecture Notes in Computer Science

... They trained their models on Reddit dataset obtained from Kaggle and achieved the F1 scores of 94.95 percent and 97.69 percent on Bi-LSTM and CNN based network and GPT-2 model respectively. Allen et al. [13] leveraged the use of Linguistic Inquiry and Word Count (LIWC) features for suicide risk prediction and and trained their proposed CNN architecture using 10 Fold Cross Validation technique and achieved macro F1 score of 50 percent on test set of dataset proposed by Shing et al. [14] during CLPsych 2019 competition [15]. Lastly, Bitew et al. [16] used weighted TD-IDF features, extracted emotion features from the text using DeepMoji model and used them to train logistic regression and SVM model and ensemble of all the features and models. ...

Expert, Crowdsourced, and Machine Assessment of Suicide Risk via Online Postings