ArticlePDF Available

Document Difficulty Aspects for Medical Practitioners: Enhancing Information Retrieval in Personalized Search Engines

Authors:

Abstract and Figures

Timely and relevant information enables clinicians to make informed decisions about patient care outcomes. However, discovering related and understandable information from the vast medical literature is challenging. To address this problem, we aim to enable the development of search engines that meet the needs of medical practitioners by incorporating text difficulty features. We collected a dataset of 209 scientific research abstracts from different medical fields, available in both English and German. To determine the difficulty aspects of readability and technical level of each abstract, 216 medical experts annotated the dataset. We used a pre-trained BERT model, fine-tuned to our dataset, to develop a regression model predicting those difficulty features of abstracts. To highlight the strength of this approach, the model was compared to readability formulas currently in use. Analysis of the dataset revealed that German abstracts are more technically complex and less readable than their English counterparts. Our baseline model showed greater efficacy than current readability formulas in predicting domain-specific readability aspects. Conclusion: Incorporating these text difficulty aspects into the search engine will provide healthcare professionals with reliable and efficient information retrieval tools. Additionally, the dataset can serve as a starting point for future research.
Content may be subject to copyright.
Citation: Frihat, S.; Beckmann, C.L.;
Hartmann, E.M.; Fuhr, N. Document
Difficulty Aspects for Medical
Practitioners: Enhancing Information
Retrieval in Personalized Search
Engines. Appl. Sci. 2023,13, 10612.
https://doi.org/10.3390/
app131910612
Academic Editor: Pentti Nieminen
Received: 17 August 2023
Revised: 15 September 2023
Accepted: 21 September 2023
Published: 23 September 2023
Copyright: © 2023 by the authors.
Licensee MDPI, Basel, Switzerland.
This article is an open access article
distributed under the terms and
conditions of the Creative Commons
Attribution (CC BY) license (https://
creativecommons.org/licenses/by/
4.0/).
applied
sciences
Article
Document Difficulty Aspects for Medical Practitioners:
Enhancing Information Retrieval in Personalized
Search Engines
Sameh Frihat 1,*, Catharina Lena Beckmann 2, Eva Maria Hartmann 2and Norbert Fuhr 1
1Department of Information Engineering, University of Duisburg-Essen, 47057 Duisburg, Germany;
norbert.fuhr@uni-due.de
2Department of Computer Science, University of Applied Sciences and Arts Dortmund,
44227 Dortmund, Germany
*Correspondence: sameh.frihat@uni-due.de
Abstract:
Timely and relevant information enables clinicians to make informed decisions about
patient care outcomes. However, discovering related and understandable information from the vast
medical literature is challenging. To address this problem, we aim to enable the development of
search engines that meet the needs of medical practitioners by incorporating text difficulty features.
We collected a dataset of 209 scientific research abstracts from different medical fields, available in
both English and German. To determine the difficulty aspects of readability and technical level of each
abstract, 216 medical experts annotated the dataset. We used a pre-trained BERT model, fine-tuned
to our dataset, to develop a regression model predicting those difficulty features of abstracts. To
highlight the strength of this approach, the model was compared to readability formulas currently in
use. Analysis of the dataset revealed that German abstracts are more technically complex and less
readable than their English counterparts. Our baseline model showed greater efficacy than current
readability formulas in predicting domain-specific readability aspects. Conclusion: Incorporating
these text difficulty aspects into the search engine will provide healthcare professionals with reliable
and efficient information retrieval tools. Additionally, the dataset can serve as a starting point for
future research.
Keywords:
personalized information retrieval; medical practitioners; readability assessment; medical
literature aspects
1. Introduction
Search engines play a crucial role in facilitating information retrieval (IR) and sup-
porting users’ information-seeking tasks across various domains. However, the field of
healthcare poses unique challenges for medical practitioners when it comes to accessing
timely and comprehensible search results as the information they acquire can significantly
impact their decision making processes and ultimately influence patient care outcomes.
Despite the widespread availability of search engines, medical practitioners often face
difficulties in finding the precise and contextually relevant information they need within
the constraints of their demanding schedules. These challenges arise from the specialized
nature of medical knowledge, the vast amount of scientific literature available, and the
requirement for accurate and easily understandable information.
Research conducted by Entin and Klare [
1
] has revealed the significant influence of
factors such as a reader’s level of interest, prior knowledge, and the readability of a text on
their comprehension of the material. It is essential to clarify that ease of reading primarily
pertains to text understandability, while technicality is more focused on the concepts and
domain-specific knowledge within the text [
2
4
]. This understanding underscores the need
to address the readability and technicality aspects in the development of search engines
Appl. Sci. 2023,13, 10612. https://doi.org/10.3390/app131910612 https://www.mdpi.com/journal/applsci
Appl. Sci. 2023,13, 10612 2 of 15
tailored specifically to the needs of medical practitioners. To improve IR for healthcare
professionals, it is necessary to incorporate these aspects into search engines. By retrieving
comprehensible information based on their language proficiency and domain knowledge,
these search engines can enhance the efficiency and effectiveness of the information pro-
vided, leading to informed decisions and optimal care. Similarly, Ref. [
5
] proposed to use
the readability aspect for accepting or revising health-related documents.
While previous studies [
6
8
] have focused primarily on developing personalized
search engines for health information consumers, such as laypeople and patients, there
is a clear gap in adapting to the specific requirements of medical practitioners. Unlike
laypeople, medical practitioners possess specialized expertise and language proficiency
in their respective fields. Therefore, our research project aims to contribute to filling this
gap by developing a model capable of extracting and classifying the ease of reading and
technicality levels of medical research articles.
It is important to note that ease of reading and technicality are not mutually exclusive
aspects [
3
]. Texts can exhibit varying degrees of both characteristics, leading to different
combinations that may arise between these aspects. For instance, a text can be easy to read
while still containing high technicality, or it can be difficult to read with low technicality, as
shown in the following examples. (All examples have been reviewed and approved by a
senior medical practitioner. Examples of easy-to-read content are from [
9
], while examples
of harder-to-read content are from [10]).
Easy to read and high technicality
Autologous hematopoietic stem cell transplantation has emerged as a promising ther-
apeutic intervention for individuals with refractory multiple myeloma. This treatment
approach has shown remarkable advancements in terms of progression-free survival
and overall response rates, signifying its potential in improving patient outcomes.
Hard to read and high technicality
The pathophysiological mechanisms underlying idiopathic pulmonary fibrosis involve
aberrant activation of transforming growth factor-beta signaling pathways, leading to
excessive deposition of extracellular matrix components and subsequent progressive
scarring of lung tissue.
Easy to read and low technicality
Regular physical exercise has been widely recognized as a key lifestyle intervention
for the prevention of cardiovascular diseases, with numerous studies demonstrating
its positive impact on reducing the risk of heart attacks, stroke, and hypertension.
Hard to read and low technicality
Carcinogenesis is a multifactorial process characterized by the dysregulation of cellular
homeostasis, involving intricate interactions between oncogenes and tumor suppressor
genes that disrupt normal cell growth control mechanisms, resulting in uncontrolled
proliferation and the formation of malignant tumors.
Integrating difficulty aspects into search engines empowers medical practitioners with
tailored and relevant search results that are aligned with their expertise and language
proficiency. Without these aspects, search engines may fail to effectively address the
specific needs of medical professionals, leading to limitations and challenges. For example,
search results may include highly technical papers and articles with varying levels of
readability, requiring manual sifting and wasting valuable time. The lack of customization
based on technicality and ease of reading hinders precision, relevance, and quick access
to necessary information. Furthermore, complex language and dense scientific jargon can
impede comprehension for practitioners without specialized expertise, hindering decision
making [
11
]. By considering difficulty aspects, search engines not only improve information
accessibility and ensure comprehensibility but also optimize decision making and patient
care. Similarly, studies [
12
,
13
] presented other features that could potentially enhance
personalization in medical search engines. This enhancement in IR empowers medical
professionals by making it easier for them to identify their target audience, thus facilitating
the efficient utilization of their valuable time and expertise [14].
Appl. Sci. 2023,13, 10612 3 of 15
Readability formulas have already been developed for measuring the readability of a
given text. However, the most commonly used readability formulas were not developed
for technical materials [
15
]. Moreover, traditional readability formulas are oversimplified
to deal with technical materials [
16
]. Therefore, to accomplish our objective, we leverage
the power of pre-trained language models and fine-tuning techniques for predicting the
difficulty aspects of a given document.
An intriguing application of our research lies in the integration of these assessed
aspects into the IR process. This can be achieved by either filtering out search results that
exceed a certain threshold of technicality or ease of reading, ensuring that the retrieved
documents align with the user’s preferred level of comprehension. Additionally, these
aspects can contribute to the calculation of relevance scores [
17
], allowing documents
that match the desired technicality and readability criteria to be ranked higher in search
results. This integration has the potential to enhance the efficiency and precision of IR for
medical professionals, aiding them in accessing documents that align with their specific
requirements and facilitating informed decision making.
However, we encountered a challenge in finding datasets specifically designed for
medical practitioners, as most existing datasets target laypeople. Therefore, we compiled a
dataset of medical abstracts from PubMed (PubMed is an extensive online database that
grants users access to a diverse range of scientific literature in the field of biomedical and
life sciences. It serves as a valuable resource for researchers, medical practitioners, and
individuals seeking in-depth scholarly articles, abstracts, and citations pertaining to diverse
medical disciplines, https://pubmed.ncbi.nlm.nih.gov/, accessed on 1 June 2023) and
sought the expertise of medical doctors and medical students to annotate each article with
ease of reading and technicality scores.
Through our research, we aim to demonstrate the ability of language models to capture
and classify the ease of reading and technicality levels of medical documents. Furthermore,
we investigate the differences in language complexity and technicality between English and
German abstracts, finding German abstracts tend to be harder to read and exhibit higher
levels of technicality when compared to their English counterparts.
To sum up, our research article makes contributions in the following areas: (a) we ad-
dress a problem concerning the trade-off between comprehensibility and relevance within
the field of IR for medical practitioners; (b) we are presenting a new dataset containing
medical research articles annotated with ease of reading and technicality scores; (c) we
are developing models capturing these aspects using pre-trained language models and
comparing them with known readability formulas.
By addressing the specific needs of medical practitioners and integrating the difficulty
aspects into search engines, we strive to enhance the accessibility and relevance of search
results, ultimately empowering medical professionals with efficient and reliable IR tools.
2. Literature Review
2.1. Comprehensibility Aspects in Information Retrieval
The consideration of comprehension aspects in IR, particularly in domain-specific
contexts, has received significant attention. Researchers have recognized the challenges
faced by both domain experts and average users when searching for domain-specific
information, such as medical and health-related content, from online resources [18].
A common issue encountered by users in IR systems is the presence of search results
that encompass documents with varying levels of readability [
16
,
19
,
20
]. This poses a
challenge, particularly for users with limited domain knowledge or lower education levels,
as well as those facing physical, psychological, or emotional stress [
16
]. Consequently, there
is a need for IR systems that not only retrieve relevant documents but also prioritize those
with higher readability, adapting to the diverse needs of users [21].
To address this challenge, various approaches have been explored. Some
studies [
22
,
23
] have investigated computational models of readability, aiming to develop
efficient methods for assessing the readability of technical materials encountered in domain-
Appl. Sci. 2023,13, 10612 4 of 15
specific IR. Traditional readability formulas, although widely used, are often insufficient
for handling technical texts. On the other hand, more advanced algorithms, such as textual
coherence models, may offer improved accuracy but suffer from computational complexity
when applied to large-scale document re-ranking scenarios.
The importance of domain-specific readability computation in IR has been empha-
sized [
24
]. Technical terms and the need for efficient computations for large document
collections are among the challenges identified. By integrating concept-based readability
and domain-specific knowledge into the search process, researchers aim to enhance the
accessibility and relevance of search results. These efforts contribute to empowering users,
including both domain experts and average users, with efficient and reliable IR tools [16].
2.2. Readability Formulas
Since the early 20th century, researchers in this field have developed a variety of read-
ability formulas aimed at laypeople. Many of these formulas are still widely used today [
25
].
Among the most commonly used formulas are Simple Measure of Gobbledygook (SMOG),
the Dale-Chall Readability formula, the Flesch Reading Ease formula, the Fog Index, and
the Fry Readability Graph. More details about readability methods can be found in [26].
These formulas typically analyze syntactic complexity and semantic difficulty. Syntac-
tic complexity is often evaluated by examining sentence length, while semantic difficulty is
measured using factors such as syllable count or word frequency lists. Other factors that
have been found to influence readability include the presence of prepositional phrases,
the use of personal pronouns, and the number of indeterminate clauses. However, it is
important to note that readability formulas specifically targeting medical practitioners are
currently lacking. Further research is needed to develop readability formulas tailored to
the unique needs and expertise of medical professionals.
2.2.1. Simple Measure of Gobbledygook (SMOG)
The SMOG Index was introduced by clinical psychologist G. Harry McLaughlin in
1969 [
27
]. It is designed to estimate the years of education required to comprehend a piece of
written text accurately by counting the words of three or more syllables in three ten-sentence
samples. The formula calculates the reading grade level based on a simple mathematical
equation that incorporates the count of polysyllabic words within a sample text.
The SMOG formula has been widely used in various fields, including education,
healthcare, and IR [
28
]. Its simplicity and ease of application make it a popular choice for
estimating readability levels. However, it is important to note that the SMOG formula
may have limitations when applied to specific domains, such as technical or scientific texts,
as it does not consider the domain-specific terminology and nuances that might impact
comprehension [29].
Nonetheless, SMOG stands out as a well-suited formula for healthcare applications. It
consistently aligns results with expected comprehension levels, employs validation criteria,
and maintains simplicity in its application [
28
]. These factors make it a reliable choice for
assessing the readability of healthcare-related documents. Researchers frequently use the
SMOG formula as a reference point when evaluating alternative readability models or
proposing new formulas tailored to specific domains.
2.2.2. Dale-Chall Readability Formula
The Dale-Chall Readability Formula is a widely used readability measure that provides
an estimate of the comprehension difficulty of a given text. Developed by Edgar Dale and
Jeanne Chall in 1948 [
30
], this formula takes into account both the length of sentences and
the familiarity of words to determine the readability level.
The Dale-Chall readability formula calculates its final score by examining the propor-
tion of words in a given text that do not belong to a predefined list of commonly known
words. This list comprises 3000 words that are generally understood by fourth-grade stu-
Appl. Sci. 2023,13, 10612 5 of 15
dents. The formula calculates the readability score by incorporating the average sentence
length and the percentage of unfamiliar words.
In summary, the Dale-Chall formula’s notable advantage lies in its focus on word
familiarity, enhancing its ability to assess readability, especially for less experienced readers
or those with limited vocabulary. This practical and accessible approach considers both
word familiarity and sentence length, offering insights into comprehension difficulty for
readers of different proficiency levels. However, it is crucial to acknowledge the formula’s
limitations and its suitability for specific contexts.
2.3. Readability for Health Consumers
Health literacy and effective communication of health information are crucial in
ensuring that patients can access and understand important medical content [
31
]. With
the increasing reliance on online resources for health-related information, it is essential to
assess the readability of online materials, particularly those aimed at health consumers.
Several studies have examined the readability and quality of medical content targeting
health consumers [
32
34
]. These studies highlight the challenges associated with readability
in health-related content, indicating that a significant portion of medical content is difficult
for the average layperson to understand. This issue extends to other languages as well,
such as German [35].
The findings from these studies underscore the need for greater attention to readability
and clear communication in online health resources. Collaborative efforts among healthcare
professionals, researchers, and organizations are crucial for enhancing the readability of
health materials, including Wikipedia pages and patient education resources. By improving
the readability of these resources, we can enhance health literacy and empower patients to
make informed decisions about their health [33].
In conclusion, the related work in this subsection underscores the significance of com-
prehensibility in domain-specific IR. The exploration of computational models, readability
formulas, and approaches tailored for specific domains, such as health and medical IR,
reflects a growing recognition of the importance of addressing comprehension aspects.
The results of experiments conducted in this domain showcase the potential benefits of
integrating readability considerations into IR systems. However, further research is needed
to generalize these findings to other domains and explore additional factors influencing
word-level relatedness, document cohesion, and sentence-level readability computation.
3. Materials and Methods
Our process started with the creation of a dataset. This required extracting articles
from PubMed and then having medical experts annotate them. Next, we analyzed the
dataset for its general characteristics, level of difficulty, and language variations. Finally,
we utilized these data to refine our pretrained BERT model.
3.1. Data Collection
The dataset creation process started with the extraction of 10,000 articles from PubMed,
specifically targeting those with available abstracts in both German and English. The data
were then stored in a MongoDB and afterwards extended with information about the
readability and technicality of those abstracts. This was completed in an annotation process
by 216 medical students and practitioners using an annotation tool developed only for this
purpose. In this process, a total number of 209 annotated articles could be gathered. An
overview of the data acquisition is shown in Figure 1.
Appl. Sci. 2023,13, 10612 6 of 15
Figure 1.
The dataset creation process includes extracting articles from PubMed and expertly anno-
tating abstracts for technicality and readability assessment.
3.1.1. Documents
We have chosen to focus on research articles’ abstracts as they serve as concise sum-
maries of the main research findings and are widely utilized for initial screening and IR
purposes. In light of this, the PubMed database proves to be an ideal resource for our
specific use case.
PubMed consists of an extensive collection of scientific literature spanning various
medical disciplines, making it a comprehensive repository of valuable medical information.
To ensure a manageable dataset, we opted to download a subset of articles from the
vast PubMed database. As selection criteria, we chose 10,000 random articles that had
abstracts available and were written by the respective authors in both the German and
English languages.
3.1.2. Participants
A total of 216 participants were recruited from university hospitals for the annotation
process. Those can be divided into four categories, as shown in Table 1: 70 medical students
up to the 6th semester (junior students), 59 medical students in the 7th semester or higher
(senior students), 59 medical doctors with up to 2 years of experience (junior doctor), and
28 doctors with more than 2 years of experience (senior doctor). All participants either
studied at a German university or worked in a German hospital, ensuring they possessed
the necessary domain knowledge. Additionally, all participants were asked to assess their
German and English language proficiency based on the Common European Framework
of Reference (CEFR) (CEFR is an international standard for describing language ability.
It describes language ability on a six-point scale, from A1 for beginners, up to C2 for
those who have mastered a language), ranging from B1 (Intermediate level) to C2 (Native
level). Furthermore, in accordance with institutional regulations and to maintain complete
anonymity, no further questions were asked.
Appl. Sci. 2023,13, 10612 7 of 15
Table 1. Annotation process participants’ distribution among four categories.
Med. Student Med. Doctor
Category Junior Senior Junior Senior Total
70 59 59 28 216
Participants 32.4% 27.3% 27.3% 13% 100%
3.1.3. Annotation
In the initial phase of the study, we developed a web-based application using Python
and MongoDB to facilitate the evaluation process. This application allowed participants
to log in anonymously, access clear guidelines, and review the criteria for rating abstracts’
readability and technicality. Participants had no time constraints, providing flexibility in
completing the evaluation. Each participant assessed 2–3 different abstracts, rating them
for ease of reading and technicality on a scale from 0 to 100 in 5-point increments (0, 5,
10,
. . .
, 100). They also identified relevant medical disciplines. To ensure reliability and
minimize bias, three independent participants evaluated each abstract, and their scores
were averaged to represent text complexity and technicality fairly. Annotations were
conducted independently, enhancing the reliability and consistency of ratings.
3.2. Data Analysis
The analysis subsection provides insights into the composition of the dataset and
sheds light on the ease of reading and technicality aspects of the abstracts.
3.2.1. Dataset Overview
The annotation process proved to be a resource-intensive task, primarily due to
the challenges associated with securing time from busy medical professionals, including
both practicing physicians and medical students. With their commitments ranging from
extended working hours and on-call duties to direct patient care responsibilities and
rigorous exam preparation, allocating time for annotation was constrained. As a result,
only 209 abstracts were annotated for technicality and readability aspects. Each annotated
abstract included both English and German versions, ensuring comprehensive coverage
of the research literature across languages. The dataset encompassed various medical
disciplines, creating a representative subset of medical research publications.
3.2.2. Descriptive Statistics
To gain a better understanding of the dataset, we conducted descriptive statistics on
the abstracts. The average length of the abstracts was found to be 215 (SD = 90) words
for English and 190 (SD = 77) words for German abstracts. It is noteworthy that, while
English abstracts have a higher word count, German abstracts tend to be longer in terms
of character count. This observation is due to the nature of the German language, where
words often contain more characters compared to English [
36
]. Specifically, the average
character count for English abstracts was 1480 (SD = 585), while, for German abstracts,
it was 1587 (SD = 616) characters. Additionally, the annotations were accompanied by
significant statistical insights. The average annotation time for each document was 176
(SD = 73) seconds, highlighting variability in annotation durations. The annotation pro-
cess was also conducted at an average rate of 113 (SD = 12) words per minute (WPM),
emphasizing diverse annotation speeds, which is consistent with Klatt et al.’s study [
37
].
Furthermore, the mean intraclass correlation coefficient (ICC) for annotations was found to
be 0.81 (SD = 0.08) (according to Koo et al. guideline [
38
], ICC values below 0.5, between
0.5 and 0.75, between 0.75 and 0.9, and exceeding 0.90 indicate poor, moderate, good, and
excellent reliability, respectively), indicating substantial consistency and agreement among
the annotations provided by medical professionals.
Appl. Sci. 2023,13, 10612 8 of 15
3.2.3. Ease of Reading Analysis
The ease of reading aspect was assessed by medical professionals, who assigned ease
of reading scores to each abstract, where 0 means hard to read and 100 means easy to read.
The average ease of reading scores across all abstracts were found to be 64.21 (SD = 21.54)
and 61.50 (SD = 21.67) for English and German readability scores, respectively. Figure 2a
shows a visual representation of the distribution of ease of reading scores and identifies
any potential outliers or patterns.
English German
20
40
60
80
100
Score
English German
0
10
20
30
40
50
60
70
80
Score
(a) Ease of Reading (b) Technicality
Figure 2. Aspects of scores’ distributions on English and German abstracts.
3.2.4. Technicality Analysis
The technicality level of the abstracts was evaluated based on the ratings provided by
the medical professionals, where 0 means high technicality and 100 low technicality found
in the abstract. The average technicality scores for the dataset were 30.55 (SD = 10.02) and
26.72 (SD = 12.81) for English and German readability scores, respectively. Figure 2b shows
a visual representation of the distribution of technicality scores and identifies any potential
outliers or patterns.
3.2.5. Comparison of German and English Abstracts
For gaining deeper insights into the dataset, we conducted a comparative analysis of
the technicality and ease of reading levels between German and English medical abstracts
using paired t-tests. The goal was to investigate potential differences in the linguistic
characteristics of the two languages and their impact on the ease of reading and technicality
of the abstracts.
For the technicality level, our analysis revealed that German medical abstracts tended
to exhibit higher levels of technicality compared to their English counterparts. This finding
aligns with previous studies [
39
,
40
] highlighting the inherent complexity of the German
language, particularly in the medical domain. The higher technicality level of German
abstracts can be attributed to the frequent usage of specialized medical terminology and
the structural intricacies of the German language itself. Figure 3a shows the difference
between English and German technicality, where the positive side of the graph shows
articles with higher technicality in German, and articles with higher technicality in English
on the negative side.
In terms of readability, we observed that English medical abstracts were slightly
easier to read compared to German abstracts. This disparity can be attributed to several
factors. First, the English language generally exhibits a more straightforward and concise
writing style, which may enhance readability for a wider audience. Second, English has
a larger presence in the global scientific community, leading to greater standardization
and familiarity among medical practitioners. Consequently, English abstracts may be
tailored to a broader readership, including non-native English speakers. Figure 3b shows
Appl. Sci. 2023,13, 10612 9 of 15
the difference between English and German ease of reading, where the positive side of the
graph shows articles easier to read in English, and articles are harder to read in German on
the negative side.
-8
-7
-6
-5
-4
-3
-2
-1
0
1
2
3
4
5
6
9
10
11
English - German scores
0
5
10
15
20
25
30
35
Number of Documents
-5
-4
-3
-2
-1
0
1
2
3
4
5
8
English - German scores
0
10
20
30
40
50
Number of Documents
(a) Technicality (b) Ease of Reading
Figure 3. Difference between English and German scores per article. Scores are between 0 and 20.
3.2.6. Limitations
It is important to acknowledge the limitations of the dataset analysis. Out of the 10,000
downloaded abstracts, the sample size of 209 may be inadequate to represent the entire
breadth of medical research literature. Moreover, the annotations provided by medical
professionals might introduce some level of subjectivity or bias. Nevertheless, we took
steps to minimize such limitations by involving multiple annotators per abstract. These
limitations must be considered when interpreting the results and generalizing findings
from the dataset. Furthermore, we should acknowledge that our analysis focused only
on technicality and readability and did not explore other factors that might impact the
document’s understandability.
3.3. Model
The model subsection provides insights into the developed models and evaluation
matrix.
3.3.1. Our Model
To establish a baseline for our dataset, we employed pretrained BERT models designed
for medical text, “PubMedBERT” for English abstracts [
41
] and “German-MedBERT” for
German abstracts [
42
]. BERT, known for its impressive performance in various Natural
Language Processing (NLP) tasks, was a fitting choice for our project.
The fine-tuning process for the pretrained BERT models, specifically “BERT-Readability”
for assessing ease of reading and “BERT-Technicality” for evaluating technicality, involved
using our dataset, which includes annotated medical abstracts, each assigned scores on a
scale from 0 to 100 for ease of reading and technicality. Our main objective was to train
these models to predict these scores based on the textual content within the abstracts.
To assess the performance of “BERT-Readability” and “BERT-Technicality,” we em-
ployed the root mean square error (RMSE) metric. RMSE measures the average difference
between predicted and actual scores, with lower RMSE values indicating better alignment
between predictions and actual scores.
By leveraging pretrained BERT models, we established a foundational framework
for predicting ease of reading and technicality in medical abstracts. “BERT-Readability”
and “BERT-Technicality” serve as baseline models and serve as reference points for future
analyses. This enables us to assess the effectiveness of any forthcoming advancements or
novel techniques introduced into our work.
Appl. Sci. 2023,13, 10612 10 of 15
3.3.2. Common Readability Formulas
To judge the performance of our models, we specifically selected readability formulas
that have been utilized in the domain of medical literature [
43
,
44
]. These commonly used
readability formulas were tested on the same test set as our models.
Prior to evaluation, we took the necessary steps to ensure compatibility between the
outputs of the readability formulas and the ground truth values of our dataset. To achieve
this, we employed rescaling/normalization techniques to align the results of each formula
with the range and distribution of the dataset’s ground truth scores. This approach allowed
us to establish a fair and consistent basis for comparison. Subsequently, we evaluated the
performance of each readability formula using the same evaluation metrics employed for
our models (“BERT-Readability” and “BERT-Technicality”).
By incorporating a comparison with these readability formulas, we are able to gain
a broader perspective on the strengths and limitations of our models. This comparative
analysis allows us to assess whether our models outperform or align with the established
readability formulas in the specific context of medical documents. The objective is to posi-
tion the performance of our models within the broader landscape of readability assessment
methods utilized in the medical domain.
4. Results
In this study, we can divide our contributions into two parts, which are a dataset and
regression models for predicting the ease of reading and technicality of a scientific research
abstract.
4.1. Dataset
For this study, we curated a comprehensive dataset of scientific research abstracts
written in English and German from various medical disciplines (such as Immunology,
Dermatology, Radiology, Emergency medicine, Internal medicine, Neurology, etc.). The
dataset comprises 209 abstracts collected from the PubMed database. Each abstract was
presented to three medical practitioners (student or doctor). This dataset can be used to
improve the readability aspect of any NLP or IR system targeting the medical domain.
4.2. BERT-Readability and BERT-Technicality
We developed BERT-Readability and BERT-Technicality regression models to predict
ease of reading and technicality levels in scientific research abstracts on a prediction
scale between 0 and 100. These models were trained using the curated dataset, which
included annotations for ease of reading and technicality scores for each abstract. We
evaluated the performance of BERT-Readability and BERT-Technicality using the RMSE
metric. The results demonstrated the models’ effectiveness in predicting ease of reading
and technicality scores.
As shown in Table 2, the RMSE for ease of reading is 10.61 for English abstracts and
11.80 for German abstracts. For technicality, the RMSE is 9.42 for English abstracts and
10.07 for German abstracts.
Furthermore, we conducted a comparative assessment, pitting our models against
widely adopted and well-established readability formulas commonly utilized in healthcare
literature [
45
,
46
], as shown in Table 2. The results consistently demonstrated that our
models outperformed these traditional formulas in terms of prediction accuracy. This
underscores the critical advantage of employing specialized models tailored explicitly for
medical experts.
The existing readability formulas assessed in our comparison rely on a set of general
features, such as sentence length, syllable count, and word complexity, which are designed
for assessing text readability across various domains. In contrast, our models have been
meticulously fine-tuned to account for the specific needs and nuances of medical research
abstracts, offering a more precise and effective solution for this specialized context.
Appl. Sci. 2023,13, 10612 11 of 15
Table 2. Formulas and our models’ performance on the dataset: RMSE scores comparison.
Formula English German
BERT-Technicality 9.42 10.07
BERT-Readability 10.61 11.80
Coleman-Liau Index 19.96 20.22
SMOG Index 21.28 23.59
Gunning Fog Index 26.13 30.66
Dale-Chall Readability Score 26.77 23.61
Flesch-Kincaid Grade Level 27.93 28.38
Automated Readability Index 30.95 30.26
Gutierrez de Polini Index 33.90 31.98
Szigriszt-Pazos Index 32.93 30.94
Fernandez-Huerta Index 32.30 31.17
Flesch Reading Ease 34.34 32.60
Gulpease Index 34.46 35.48
All formulas are available in Textstat library. https://github.com/textstat/textstat, accessed on 1 August 2023.
Overall, the dataset and models presented in this study offer valuable resources for
assessing the ease of reading and technicality of scientific research abstracts in both English
and German. These models serve as valuable tools for personalized search engines, whether
by enabling the use of filtering-based techniques or contributing to the ranking algorithm
used in the retrieval process. This enables medical practitioners to access relevant research
findings that align with their language proficiency and expertise.
5. Discussion
The primary objective of this research paper was to tackle the challenges faced by
medical practitioners when seeking timely and comprehensible search results within the
healthcare domain. We specifically delved into the realms of ease of reading and tech-
nicality in medical research articles, with the ultimate aim of enhancing IR for medical
professionals, thereby augmenting their decision making capabilities and improving patient
care outcomes.
The dataset compiled for this study consists of 209 scientific research abstracts from
diverse medical disciplines, available in both English and German. Each abstract underwent
annotation by medical practitioners, who assigned ease of reading and technicality scores.
The dataset analysis provided valuable insights into the linguistic characteristics of medical
abstracts and revealed differences in technicality and ease of reading between English and
German abstracts. Notably, our findings revealed that German abstracts tended to be more
technically challenging, while English abstracts were slightly easier to read.
Our study introduced two regression models, namely BERT-Readability and BERT-
Technicality, which proved highly effective in predicting ease of reading and technical-
ity scores for the abstracts. These models outperformed existing readability formulas
commonly employed in the literature, underscoring their significance in predicting domain-
specific readability aspects. This highlights the critical importance of employing domain-
specific models tailored to scientific research abstracts for precise readability assessment.
The integration of ease of reading and technicality aspects into search engines has
practical implications for medical practitioners. Personalized search results, based on
language proficiency and expertise, empower medical professionals with efficient and
relevant IR tools. This customization enhances the relevance and accessibility of informa-
tion, optimizing decision making and patient care. Our research paper underscores the
significance of tailoring readability assessment to the specific needs of medical practitioners,
leading to improved information utilization and overall usability of search engines in the
medical domain.
Appl. Sci. 2023,13, 10612 12 of 15
5.1. Future Directions
A promising avenue for future research is to explore more sophisticated models and
to utilize advanced transfer learning techniques. These efforts aim to improve the accuracy
and applicability of readability assessment in the context of medical research abstracts.
Another compelling area of future study involves investigating the direct influence
of readability assessment on the decision making processes and patient care outcomes
of medical professionals. Understanding the tangible benefits of improved readability in
medical literature can further underscore the importance of our research.
One interesting possibility for future research is the incorporation of text complexity
factors into the IR ranking algorithm. Although our study has demonstrated their potential
advantages, additional research is required to examine the feasibility of this integration. A
future study could focus on implementing text difficulty aspects into IR ranking algorithms
to improve the retrieval of relevant medical information for practitioners. The study should
address algorithmic refinement, adaptability, and user experience assessment for practical
implementation.
5.2. Limitations
One notable limitation of this study concerns the dataset’s sample size. Although
the dataset offered useful insights, its limited size may restrict the generalizability of our
findings to a wider medical literature context and various medical disciplines.
Our research focused on English and German abstracts, which may not represent the
full linguistic diversity of medical literature. Future studies could expand to include a more
extensive range of languages to enhance the scope of applicability.
While our regression models demonstrated superior performance, their complexity
may pose challenges in real-world implementation. Future research should address ways
to streamline these models for practical use.
6. Conclusions
In conclusion, this research paper addresses a crucial gap in the field of healthcare IR
and readability assessment. By providing a comprehensive dataset and introducing the
integration of ease of reading and technicality aspects into personalized search engines, we
have taken significant strides toward enhancing the tools available to medical practitioners.
Our work not only offers efficient and reliable IR solutions but also contributes to the
broader goal of improving patient care and facilitating informed decision making within
the healthcare domain.
The dataset compiled for this study serves as a valuable resource for future research
and development in the realm of medical literature analysis. Its provision underscores our
commitment to advancing the state of the art in IR and readability assessment.
Author Contributions:
Conceptualization, C.L.B., E.M.H., S.F. and N.F.; annotation protocol, C.L.B.;
software, S.F., C.L.B. and E.M.H.; validation, S.F. and N.F.; formal analysis, S.F.; supervising annotation
process, S.F.; writing—original draft preparation, S.F., C.L.B. E.M.H., and N.F.; writing—review and
editing, S.F., C.L.B. and E.M.H.; supervision, N.F.; project administration, N.F.; funding acquisition,
S.F. All authors have read and agreed to the published version of the manuscript.
Funding:
This work was funded by Ph.D. grants from the DFG Research Training Group 2535
‘Knowledge- and data-based personalization of medicine at the point of care (WisPerMed)’, University
of Duisburg-Essen, Germany. We also acknowledge support by the Open Access Publication Fund of
the University of Duisburg-Essen.
Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Informed consent was obtained from allsubjects involved in the study.
Data Availability Statement:
Annotation web framework, dataset, annotations, and analysis scripts
can be found on the following GitHub repository. https://github.com/samehfrihat/TechnicalityLevel
AnnotationTool, accessed on 1 August 2023.
Appl. Sci. 2023,13, 10612 13 of 15
Acknowledgments:
We extend our sincere gratitude to Georg Lodde for his invaluable support and
insights as a medical practitioner, which greatly enriched the quality of this research project.
Conflicts of Interest:
The authors declare no conflict of interest. The funders had no role in the design
of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or
in the decision to publish the results.
Abbreviations
The following abbreviations are used in this manuscript:
IR Information Retrieval
NLP Natural Language Processing
RMSE Root Mean Square Error
BERT Bidirectional Encoder Representations from Transformers
CEFR Common European Framework of Reference
SD Standard Deviation
ICC Intraclass Correlation Coefficient
WPM Words Per Minute
SMOG Simple Measure Of Gobbledygook
References
1.
Entin, E.B.; Klare, G. R. Relationships of Measures of Interest, Prior Knowledge, and Readability to Comprehension of Expository
Passages. Adv. Read./Lang. Res. 1985,3, 9–38.
2.
Vydiswaran, V.V.; Mei, Q.; Hanauer, D.A.; Zheng, K. Mining consumer health vocabulary from community-generated text.
In Proceedings of the AMIA Annual Symposium Proceedings, American Medical Informatics Association, San Diego, CA, USA,
30 October–3 November 2014; Volume 2014, p. 1150.
3.
Chall, J. Readability: An Appraisal of Research and Application; Bureau of Educational Research Monographs: Columbus, OH,
USA, 1958.
4.
Hätty, A.; Schlechtweg, D.; Dorna, M.; im Walde, S.S. Predicting degrees of technicality in automatic terminology extraction.
In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, London, UK, 5–10 July 2020;
pp. 2883–2889.
5. Hedman, A.S. Using the SMOG formula to revise a health-related document. Am. J. Health Educ. 2008,39, 61–64. [CrossRef]
6.
Liu, Y.; Ji, M.; Lin, S.S.; Zhao, M.; Lyv, Z. Combining readability formulas and machine learning for reader-oriented evaluation of
online health resources. IEEE Access 2021,9, 67610–67619. [CrossRef]
7.
Goeuriot, L.; Jones, G.J.; Kelly, L.; Leveling, J.; Hanbury, A.; Müller, H.; Salanterä, S.; Suominen, H.; Zuccon, G. ShARe/CLEF
eHealth Evaluation Lab 2013, Task 3: Information retrieval to address patients’ questions when reading clinical reports. CLEF
Online Work. Notes 2013,4, 191–201.
8.
O’Sullivan, L.; Sukumar, P.; Crowley, R.; McAuliffe, E.; Doran, P. Readability and understandability of clinical research patient
information leaflets and consent forms in Ireland and the UK: A retrospective quantitative analysis. BMJ Open
2020
,10, e037994.
[CrossRef]
9.
Veltri, L.W.; Milton, D.R.; Delgado, R.; Shah, N.; Patel, K.; Nieto, Y.; Kebriaei, P.; Popat, U.R.; Parmar, S.; Oran, B.; et al. Outcome
of autologous hematopoietic stem cell transplantation in refractory multiple myeloma. Cancer 2017,123, 3568–3575. [CrossRef]
10.
Wynn, T.A.; Ramalingam, T.R. Mechanisms of fibrosis: Therapeutic translation for fibrotic disease. Nat. Med.
2012
,18, 1028–1040.
[CrossRef]
11.
Ott, N.; Meurers, D. Information retrieval for education: Making search engines language aware. Themes Sci. Technol. Educ.
2011
,
3, 9–30.
12.
Tomažiˇc, T.; ˇ
Celofiga, A.K. The Role of Different Behavioral and Psychosocial Factors in the Context of Pharmaceutical Cognitive
Enhancers’ Misuse. Healthcare 2022,10, 972. [CrossRef]
13.
Frihat, S. Context-sensitive, personalized search at the Point of Care. In Proceedings of the 22nd ACM/IEEE Joint Conference on
Digital Libraries, Cologne, Germany, 20–24 June 2022; pp. 1–2.
14.
Basch, C.H.; Fera, J.; Garcia, P. Readability of influenza information online: Implications for consumer health. Am. J. Infect.
Control 2019,47, 1298–1301. [CrossRef]
15.
Klare, G.R. The formative years. In Readability: Its Past, Present, and Future; International Reading Association: Newark, DE, USA,
1988; pp. 14–34.
16.
Yan, X.; Song, D.; Li, X. Concept-based document readability in domain specific information retrieval. In Proceedings of the
15th ACM International Conference on Information and Knowledge Management, Arlington, VA, USA, 13–16 August 2006;
pp. 540–549.
Appl. Sci. 2023,13, 10612 14 of 15
17.
Ceri, S.; Bozzon, A.; Brambilla, M.; Della Valle, E.; Fraternali, P.; Quarteroni, S.; Ceri, S.; Bozzon, A.; Brambilla, M.; Della Valle, E.;
et al. An introduction to information retrieval. Web Inf. Retr. 2013,3, 3–11.
18.
Selvaraj, P.; Burugari, V.K.; Sumathi, D.; Nayak, R.K.; Tripathy, R. Ontology based recommendation system for domain specific
seekers. In Proceedings of the 2019 Third International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud)
(I-SMAC), Palladam, India, 12–14 December 2019; pp. 341–345.
19.
Jameel, S.; Qian, X. An unsupervised technical readability ranking model by building a conceptual terrain in LSI. In Proceedings
of the 2012 Eighth International Conference on Semantics, Knowledge and Grids, Beijing, China, 22–24 October 2012; pp. 39–46.
20.
Palotti, J.; Goeuriot, L.; Zuccon, G.; Hanbury, A. Ranking health web pages with relevance and understandability. In Proceedings
of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval, Pisa, Italy, 17–21 July
2016; pp. 965–968.
21.
van der Sluis, F.; van den Broek, E.L. Using complexity measures in information retrieval. In Proceedings of the Third Symposium
on Information Interaction in Context, New Brunswick, NJ, USA, 18–21 August 2010; pp. 383–388.
22.
Kane, L.; Carthy, J.; Dunnion, J. Readability applied to information retrieval. In Proceedings of the European Conference on
Information Retrieval, London, UK, 4–10 April 2006; Springer: Berlin/Heidelberg, Germany, 2006; pp. 523–526.
23.
Taranova, A.; Braschler, M. Textual complexity as an indicator of document relevance. In Proceedings of the Advances in
Information Retrieval: 43rd European Conference on IR Research, ECIR 2021, Virtual Event, 28 March–1 April 2021; Springer:
Berlin/Heidelberg, Germany, 2021; Volume 43, Part II, pp. 410–417.
24. Lopes, C.T. Health Information Retrieval–State of the art report. arXiv 2022, arXiv:2205.09083.
25.
Fung, A.C.H.; Lee, M.H.L.; Leung, L.; Chan, I.H.Y.; Kenneth, W. Internet Health Resources on Nocturnal Enuresis—A Readability,
Quality and Accuracy Analysis. Eur. J. Pediatr. Surg. 2023. [CrossRef] [PubMed]
26. DuBay, W.H. The Principles of Readability; Online Submission; Impact Information: Costa Mesa, CA, USA, 2004.
27. Mc Laughlin, G.H. SMOG grading-a new readability formula. J. Read. 1969,12, 639–646.
28.
Wang, L.W.; Miller, M.J.; Schmitt, M.R.; Wen, F.K. Assessing readability formula differences with written health information
materials: Application, results, and recommendations. Res. Soc. Adm. Pharm. 2013,9, 503–516. [CrossRef]
29.
Willis, L.; Gosain, A. Readability of patient and family education materials on pediatric surgical association websites. Pediatr.
Surg. Int. 2023,39, 156. [CrossRef]
30. Dale, E.; Chall, J.S. A formula for predicting readability: Instructions. Educ. Res. Bull. 1948,5, 37–54.
31.
Basch, C.H.; Mohlman, J.; Hillyer, G.C.; Garcia, P. Public health communication in time of crisis: Readability of on-line COVID-19
information. Disaster Med. Public Health Prep. 2020,14, 635–637. [CrossRef]
32.
Diviani, N.; van den Putte, B.; Giani, S.; van Weert, J.C. Low health literacy and evaluation of online health information:
A systematic review of the literature. J. Med. Internet Res. 2015,17, e112. [CrossRef]
33.
Modiri, O.; Guha, D.; Alotaibi, N.M.; Ibrahim, G.M.; Lipsman, N.; Fallah, A. Readability and quality of wikipedia pages on
neurosurgical topics. Clin. Neurol. Neurosurg. 2018,166, 66–70. [CrossRef]
34.
Tan, S.S.L.; Goonawardene, N. Internet health information seeking and the patient-physician relationship: A systematic review.
J. Med. Internet Res. 2017,19, e9. [CrossRef] [PubMed]
35.
Zowalla, R.; Pfeifer, D.; Wetter, T. Readability and topics of the German Health Web: Exploratory study and text analysis. PLoS
ONE 2023,18, e0281582. [CrossRef] [PubMed]
36.
Behrens, H. How Difficult are Complex Verbs? Evidence from German, Dutch and English. Linguistics
1998
,36, 679–712 .
[CrossRef]
37.
Klatt, E.C.; Klatt, C.A. How much is too much reading for medical students? Assigned reading and reading rates at one medical
school. Acad. Med. 2011,86, 1079–1083. [CrossRef] [PubMed]
38.
Koo, T.K.; Li, M.Y. A guideline of selecting and reporting intraclass correlation coefficients for reliability research. J. Chiropr. Med.
2016,15, 155–163. [CrossRef]
39. Hockett, C.F. A Course in Modern Linguistics; The Macmillan Company: New York, NY, USA , 1958.
40.
Grimm, A.; Hübner, J. Nonword repetition by bilingual learners of German: The role of language-specific complexity. Biling.
Specif. Lang. Impair. Bi-SLI 2017,201, 288.
41.
Gu, Y.; Tinn, R.; Cheng, H.; Lucas, M.; Usuyama, N.; Liu, X.; Naumann, T.; Gao, J.; Poon, H. Domain-Specific Language Model
Pretraining for Biomedical Natural Language Processing. arXiv 2020, arXiv:2007.15779.
42.
Deepset-AI. State-of-the-Art German BERT Model Trained from Scratch. Available online: https://www.deepset.ai/german-bert
(accessed on 1 August 2023).
43.
Worrall, A.P.; Connolly, M.J.; O’Neill, A.; O’Doherty, M.; Thornton, K.P.; McNally, C.; McConkey, S.J.; De Barra, E. Readability of
online COVID-19 health information: A comparison between four English speaking countries. BMC Public Health
2020
,20, 100231.
[CrossRef]
44.
Fajardo, M.A.; Weir, K.R.; Bonner, C.; Gnjidic, D.; Jansen, J. Availability and readability of patient education materials for
deprescribing: An environmental scan. Br. J. Clin. Pharmacol. 2019,85, 1396–1406. [CrossRef]
Appl. Sci. 2023,13, 10612 15 of 15
45.
Powell, L.; Krivanek, T.; Deshpande, S.; Landis, G. Assessing Readability of FDA-Required Labeling for Breast Implants. Aesthetic
Surg. J. Open Forum 2023,5, ojad027-009. [CrossRef]
46.
Szmuda, T.; Özdemir, C.; Ali, S.; Singh, A.; Syed, M.T.; Słoniewski, P. Readability of online patient education material for the novel
coronavirus disease (COVID-19): A cross-sectional health literacy study. Public Health 2020,185, 21–25. [CrossRef] [PubMed]
Disclaimer/Publisher’s Note:
The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.
... Ethical Considerations and Limitations As AI systems become more prevalent in sensitive domains, ethical implications have come to the forefront. Frihat et al. [9] discussed the importance of considering document difficulty for medical practitioners in personalized search engines, highlighting the need for AI systems to adapt to varying levels of user expertise. ...
Preprint
Full-text available
This paper presents CaseGPT, an innovative approach that combines Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG) technology to enhance case-based reasoning in the healthcare and legal sectors. The system addresses the challenges of traditional database queries by enabling fuzzy searches based on imprecise descriptions, thereby improving data searchability and usability. CaseGPT not only retrieves relevant case data but also generates insightful suggestions and recommendations based on patterns discerned from existing case data. This functionality proves especially valuable for tasks such as medical diagnostics, legal precedent research, and case strategy formulation. The paper includes an in-depth discussion of the system's methodology, its performance in both medical and legal domains, and its potential for future applications. Our experiments demonstrate that CaseGPT significantly outperforms traditional keyword-based and simple LLM-based systems in terms of precision, recall, and efficiency.
... Frihat et al. [20] apply natural language processing techniques and regression to assess and predict the readability and technicality of abstracts extracted from PubMed documents. The authors propose that these evaluative aspects can be integrated into the information retrieval process to facilitate search results and classify documents relevant to healthcare professionals. ...
Article
Full-text available
The lack of quality in scientific documents affects how documents can be retrieved depending on a user query. Existing search tools for scientific documentation usually retrieve a vast number of documents, of which only a small fraction proves relevant to the user’s query. However, these documents do not always appear at the top of the retrieval process output. This is mainly due to the substantial volume of continuously generated information, which complicates the search and access not properly considering all metadata and content. Regarding document content, the way in which the author structures it and the way the user formulates the query can lead to linguistic differences, potentially resulting in issues of ambiguity between the vocabulary employed by authors and users. In this context, our research aims to address the challenge of evaluating the machine-processing quality of scientific documentation and measure its influence on the processes of indexing and information retrieval. To achieve this objective, we propose a set of indicators and metrics for the construction of the evaluation model. This set of quality indicators have been grouped into three main areas based on the principles of Open Science: accessibility, content, and reproducibility. In this sense, quality is defined as the value that determines whether a document meets the requirements to be retrieved successfully. To prioritize the different indicators, a hierarchical analysis process (AHP) has been carried out with the participation of three referees, obtaining as a result a set of nine weighted indicators. Furthermore, a method to implement the quality model has been designed to support the automatic evaluation of quality and perform the indexing and retrieval process. The impact of quality in the retrieval process has been validated through a case study comprising 120 scientific documents from the field of the computer science discipline and 25 queries, obtaining as a result 21% high, 39% low, and 40% moderate quality.
Article
Full-text available
Goals/Purpose In 2021, the United States (US) Food and Drug Administration (FDA) introduced regulations to improve the delivery of breast implant patient education, including new labeling requirements. The new regulations called for a patient decision checklist to be reviewed with a physician ahead of surgery. To ensure educational materials are accessible to most patients, the National Institutes of Health recommend they be written at a sixth to seventh grade level. This study aims to assess the readability of patient-facing FDA labeling for breast implants. Methods/Technique Eleven patient information brochures and seven patient decision checklists were obtained from four breast implant company websites. An example checklist was also obtained from the FDA website. Plain text was extracted from each of the documents. Figures, tables, references, and indices were omitted. The text files were analyzed using four measures of readability validated in clinical materials: Flesch-Kincaid Grade Level, Coleman-Liau Index, Simplified Measure of Gobbledygook (SMOG), and Automated Readability Index (ARI). Results/Complications On average, patient information brochures were 73 pages long (range: 48-92) and included 8 figures or graphics, while patient decision checklists were 5.7 pages long (range: 4-7) and included 0 figures or graphics. Mean readability scores of brochures were a Flesch-Kincaid of 12.7, Coleman-Liau Index of 13.0, SMOG of 14.0, and ARI of 12.7, with an overall mean grade level of 13.1 (13th grade). For decision checklists, results showed a Flesch-Kincaid of 14.2, Coleman-Liau Index of 13.3, SMOG of 15.6, and ARI of 14.8, with an overall grade level of 14.5 (14-15th grade). Comparatively, the sample decision checklist published by the FDA had a Flesch-Kincaid of 13.9, Coleman-Liau Index of 13.0, SMOG of 15.5, and ARI of 14.4, with an overall grade level of 14.2 (14th grade). Conclusion Although patient decision checklists were introduced to improve accessibility of breast implant educational materials, the level at which they are written exceeds the recommendation for the average US adult by several grade levels, suggesting that patients may not fully understand current materials. While shorter than the comprehensive information brochures available, decision checklists were more difficult to read across all readability scales assessed in this study, and neither brochures nor checklists met recommended reading levels on any of the assessed scales. In order to encourage patient understanding of surgical and implant risks and promote the positive health outcomes, the FDA and breast implant companies may consider revising labeling materials.
Article
Full-text available
Purpose Globally, pediatric surgical association websites present patient/family education materials on an extensive list of conditions, including descriptions of the condition, signs and symptoms, diagnostic modalities, and treatment options. The purpose of this project was to assess the readability of pediatric surgical association websites’ patient/family education materials. Methods With IRB approval, we accessed all patient/family education materials on pediatric surgical association websites from around the globe and used multiple grade-level assessments and readability assessments to determine the reading level at which the information is presented. Results The American Pediatric Surgical Association (APSA) website and the British Association of Paediatric Surgeons (BAPS) present publicly accessible patient/family education materials. Seventy-four (74) conditions on APSA’s website were analyzed. Three grade-level assessments and the Flesch Reading Ease assessment indicated that the articles are written at high school reading levels. No articles were available in languages other than English. BAPS presented 6 conditions, most of which were more readable than their APSA counterparts. Conclusions Our analysis indicates that the patient/family education materials available on pediatric surgical association websites may not be written at a level that is comprehensible by the general population. Potential solutions include re-writing the materials with an emphasis on readability and presenting materials in languages other than English. Level of evidence V.
Article
Full-text available
Background The internet has become an increasingly important resource for health information, especially for lay people. However, the information found does not necessarily comply with the user’s health literacy level. Therefore, it is vital to (1) identify prominent information providers, (2) quantify the readability of written health information, and (3) to analyze how different types of information sources are suited for people with differing health literacy levels. Objective In previous work, we showed the use of a focused crawler to “capture” and describe a large sample of the “German Health Web”, which we call the “Sampled German Health Web” (sGHW). It includes health-related web content of the three mostly German speaking countries Germany, Austria, and Switzerland, i.e. country-code top-level domains (ccTLDs) “.de”, “.at” and “.ch”. Based on the crawled data, we now provide a fully automated readability and vocabulary analysis of a subsample of the sGHW, an analysis of the sGHW’s graph structure covering its size, its content providers and a ratio of public to private stakeholders. In addition, we apply Latent Dirichlet Allocation (LDA) to identify topics and themes within the sGHW. Methods Important web sites were identified by applying PageRank on the sGHW’s graph representation. LDA was used to discover topics within the top-ranked web sites. Next, a computer-based readability and vocabulary analysis was performed on each health-related web page. Flesch Reading Ease (FRE) and the 4th Vienna formula (WSTF) were used to assess the readability. Vocabulary was assessed by a specifically trained Support Vector Machine classifier. Results In total, n = 14,193,743 health-related web pages were collected during the study period of 370 days. The resulting host-aggregated web graph comprises 231,733 nodes connected via 429,530 edges (network diameter = 25; average path length = 6.804; average degree = 1.854; modularity = 0.723). Among 3000 top-ranked pages (1000 per ccTLD according to PageRank), 18.50%(555/3000) belong to web sites from governmental or public institutions, 18.03% (541/3000) from nonprofit organizations, 54.03% (1621/3000) from private organizations, 4.07% (122/3000) from news agencies, 3.87% (116/3000) from pharmaceutical companies, 0.90% (27/3000) from private bloggers, and 0.60% (18/3000) are from others. LDA identified 50 topics, which we grouped into 11 themes: “Research & Science”, “Illness & Injury”, “The State”, “Healthcare structures”, “Diet & Food”, “Medical Specialities”, “Economy”, “Food production”, “Health communication”, “Family” and “Other”. The most prevalent themes were “Research & Science” and “Illness & Injury” accounting for 21.04% and 17.92% of all topics across all ccTLDs and provider types, respectively. Our readability analysis reveals that the majority of the collected web sites is structurally difficult or very difficult to read: 84.63% (2539/3000) scored a WSTF ≥ 12, 89.70% (2691/3000) scored a FRE ≤ 49. Moreover, our vocabulary analysis shows that 44.00% (1320/3000) web sites use vocabulary that is well suited for a lay audience. Conclusions We were able to identify major information hubs as well as topics and themes within the sGHW. Results indicate that the readability within the sGHW is low. As a consequence, patients may face barriers, even though the vocabulary used seems appropriate from a medical perspective. In future work, the authors intend to extend their analyses to identify trustworthy health information web sites.
Article
Full-text available
In an effort for better memory, greater motivation, and concentration, otherwise healthy individuals use pharmaceutical cognitive enhancers (PCEs), medicines for the treatment of cognitive deficits of patients with various disorders and health problems, to achieve greater productivity, efficiency, and performance. We examined the use of PCEs among 289 students at the Slovenian Faculty of Electrical Engineering and Computer Science in the behavioral and psychosocial context (students' attitudes towards study, parents, health, leisure time, and work). Furthermore, we also addressed the immediate reasons, or the hypothesized connections of behavioral and psychosocial aspects, related to PCE misuse. The study consisted of a structured questionnaire, and chi-squared tests were used. An analysis of student statements revealed differences in students' and parents' attitudes toward good academic grades. In addition, students chose among 17 values related to relationships with parents, friends, partners, careers, study obligations, leisure, hobbies, material goods, appearance, and the future, and assessed their importance. Regardless of the group they belonged to, young people cited the same values among the most important. Good grades and parental opinions have proven to be key factors in the context of PCE abuse. This research was the first study to examine the relation between PCE misuse and the role of different behavioral and psychosocial factors.
Article
Full-text available
Websites are rich resources for the public to access health information, and readability ensures whether the information can be comprehended. Apart from the linguistic features originated in traditional readability formulas, the reading ability of an individual is also influenced by other factors such as age, morbidities, cultural and linguistic background. This paper presents a reader-oriented readability assessment by combining readability formula scores with machine learning techniques, while considering reader background. Machine learning algorithms are trained by a dataset of 7 readability formula scores for 160 health articles in official health websites. Results show that the proposed assessment tool can provide a reader-oriented assessment to be more effective in proxy the health information readability. The key significance of the study includes its reader centeredness, which incorporates the diverse backgrounds of readers, and its clarification of the relative effectiveness and compatibility of different medical readability tools via machine learning.
Article
Full-text available
Background The internet is now the first line source of health information for many people worldwide. In the current Coronavirus Disease 2019 (COVID-19) global pandemic, health information is being produced, revised, updated and disseminated at an increasingly rapid rate. The general public are faced with a plethora of misinformation regarding COVID-19 and the readability of online information has an impact on their understanding of the disease. The accessibility of online healthcare information relating to COVID-19 is unknown. We sought to evaluate the readability of online information relating to COVID-19 in four English speaking regions: Ireland, the United Kingdom, Canada and the United States, and compare readability of website source provenance and regional origin. Methods The Google® search engine was used to collate the first 20 webpage URLs for three individual searches for ‘COVID’, ‘COVID-19’, and ‘coronavirus’ from Ireland, the United Kingdom, Canada and the United States. The Gunning Fog Index (GFI), Flesch-Kincaid Grade (FKG) Score, Flesch Reading Ease Score (FRES), Simple Measure of Gobbledygook (SMOG) score were calculated to assess the readability. Results There were poor levels of readability webpages reviewed, with only 17.2% of webpages at a universally readable level. There was a significant difference in readability between the different webpages based on their information source (p < 0.01). Public Health organisations and Government organisations provided the most readable COVID-19 material, while digital media sources were significantly less readable. There were no significant differences in readability between regions. Conclusion Much of the general public have relied on online information during the pandemic. Information on COVID-19 should be made more readable, and those writing webpages and information tools should ensure universal accessibility is considered in their production. Governments and healthcare practitioners should have an awareness of the online sources of information available, and ensure that readability of our own productions is at a universally readable level which will increase understanding and adherence to health guidelines.
Article
Introduction Nocturnal enuresis is a common yet quality-of-life-limiting pediatric condition. There is an increasing trend for parents to obtain information on the disease's nature and treatment options via the internet. However, the quality of health-related information on the internet varies greatly and is largely uncontrolled and unregulated. With this study, a readability, quality, and accuracy evaluation of the health information regarding nocturnal enuresis is carried out. Materials and Methods A questionnaire was administered to parents and patients with nocturnal enuresis to determine their use of the internet to research their condition. The most common search terms were determined, and the first 30 websites returned by the most popular search engines were used to assess the quality of information about nocturnal enuresis. Each site was categorized by type and assessed for readability using the Gunning fog score, Simple Measure of Gobbledygook (SMOG) index, and Dale–Chall score; for quality using the DISCERN score; and for accuracy by comparison to the International Children's Continence Society guidelines by three experienced pediatric urologists and nephrologists. Results A total of 30 websites were assessed and classified into five categories: professional (n = 13), nonprofit (n = 8), commercial (n = 4), government (n = 3), and other (n = 2). The information was considered difficult for the public to comprehend, with mean Gunning fog, SMOG index, and Dale–Chall scores of 12.1 ± 4.3, 14.1 ± 4.3, and 8.1 ± 1.3, respectively. The mean summed DISCERN score was 41 ± 11.6 out of 75. Only seven (23%) websites were considered of good quality (DISCERN score > 50). The mean accuracy score of the websites was 3.2 ± 0.6 out of 5. Commercial websites were of the poorest quality and accuracy. Websites generally scored well in providing their aims and identifying treatment benefits and options, while they lacked references and information regarding treatment risks and mechanisms. Conclusion Online information about nocturnal enuresis exists for parents; however, most websites are of suboptimal quality, readability, and accuracy. Pediatric surgeons should be aware of parents' health-information-seeking behavior and be proactive in guiding parents to identify high-quality resources.
Article
Pretraining large neural language models, such as BERT, has led to impressive gains on many natural language processing (NLP) tasks. However, most pretraining efforts focus on general domain corpora, such as newswire and Web. A prevailing assumption is that even domain-specific pretraining can benefit by starting from general-domain language models. In this article, we challenge this assumption by showing that for domains with abundant unlabeled text, such as biomedicine, pretraining language models from scratch results in substantial gains over continual pretraining of general-domain language models. To facilitate this investigation, we compile a comprehensive biomedical NLP benchmark from publicly available datasets. Our experiments show that domain-specific pretraining serves as a solid foundation for a wide range of biomedical NLP tasks, leading to new state-of-the-art results across the board. Further, in conducting a thorough evaluation of modeling choices, both for pretraining and task-specific fine-tuning, we discover that some common practices are unnecessary with BERT models, such as using complex tagging schemes in named entity recognition. To help accelerate research in biomedical NLP, we have released our state-of-the-art pretrained and task-specific models for the community, and created a leaderboard featuring our BLURB benchmark (short for Biomedical Language Understanding & Reasoning Benchmark) at https://aka.ms/BLURB .
Chapter
We study the textual complexity of documents as an aspect of the Information Retrieval process that influences retrieval effectiveness. Our experiments show that in many cases user queries allow determining which linguistic competency level best suits an underlying information need. The paper investigates promising first approaches on how to do so automatically and compares them to an idealistic baseline. By filtering out documents of unexpected textual complexity, we find improved search results mainly when using precision-oriented effectiveness measures.