Conference PaperPDF Available

Assessing the Readability of Policy Documents: The Case of Terms of Use of Online Services

Authors:

Abstract and Figures

Whether for using online services or dealing with legal issues, citizens are often requested to sign/accept policy documents that are intended to commit them to specific rights and duties. Usually such documents are difficult to read due to their nature, the length of sentences, complex terms used, etc. Since understanding is a prerequisite to making an informed decision, it is important to assess whether citizens, on average, would have the ability to comprehend those texts. We consider here that the authors of those documents should keep in mind their target audience and tailor their texts to their profiles. A good indicator that can be used in this context is the average education level of the citizens of a country. On the other hand, readability measures/scores of texts have been used to assess if educational material is suitable for the intended students' level of education attainment. In this paper, we use this information to assess the readability of terms of use of online services and correlate this information with the education attainment of the countries of their target audience. Our analysis shows that more efforts need to be put into making such policy documents understandable by a broader audience, uncovering a need for standards and tools in this area.
Content may be subject to copyright.
Assessing the Readability of Policy Documents:
The Case of Terms of Use of Online Services
Wassim Derguech
Insight Centre for Data Analytics
National University of Ireland, Galway
Ireland
wassim.derguech@insight-centre.org
Syeda Sana e Zainab
Insight Centre for Data Analytics
National University of Ireland, Galway
Ireland
syeda.sanaezainab@insight-centre.org
Mathieu D’Aquin
Insight Centre for Data Analytics
National University of Ireland, Galway
Ireland
mathieu.daquin@insight-centre.org
ABSTRACT
Whether for using online services or dealing with legal issues,
citizens are often requested to sign/accept policy documents that
are intended to commit them to specific rights and duties. Usually
such documents are difficult to read due to their nature, the length
of sentences, complex terms used, etc. Since understanding is a
prerequisite to making an informed decision, it is important to
assess whether citizens, on average, would have the ability to
comprehend those texts. We consider here that the authors of
those documents should keep in mind their target audience and
tailor their texts to their profiles. A good indicator that can be
used in this context is the average education level of the citizens
of a country. On the other hand, readability measures/scores of
texts have been used to assess if educational material is suitable
for the intended students’ level of education attainment. In this
paper, we use this information to assess the readability of terms of
use of online services and correlate this information with the
education attainment of the countries of their target audience. Our
analysis shows that more efforts need to be put into making such
policy documents understandable by a broader audience,
uncovering a need for standards and tools in this area.
CCS CONCEPTS
Applied computing → E-government Applied computing
→ E-learning
KEYWORDS
Terms of Service, Policy, Readability, Analytics,
Open Data, Education
ACM Reference format:
<keep blank / do not remove>. 2018. <keep blank / do not remove>. In
Proceedings of the 11th International Conference on Theory and Practice
of Electronic Governance, Galway, Ireland, April 2018 (ICEGOV’18),
<keep blank / do not remove> pages.
DOI: <keep blank / do not remove>
1 INTRODUCTION
According to Internet Live Stats
1
, in contrast with 1995 where
only 1% of the world population was connected to the Internet,
nowadays around 40% (estimated to more than 3.4 billion in
2016) is connected. These users are consuming online services for
various usages: shopping, learning, reading, socialising, etc. Using
those services usually is regulated by a document that describes
the terms of use and privacy policies.
As per the Federal Trade Commission (FTC)
2
, policies must
define in a “clear and conspicuous” way the domain’s information
practices, located in one place on a website, and may be reached
by clicking on an icon or hyperlink. In other words, policies
should be easily accessible and reasonably understandable by the
general audience of the website or the online service. However, as
observed in the FTC report [1], even when consumers get access
to policy texts, they cannot understand them. Indeed, policies are
“long and incomprehensible, placing too high a burden on
consumers to read, understand, and then exercise meaningful
choices based on them.” [1].
Organizations and communities such as the Plain Language
Action and Information Network (PLAIN)
3
support the idea of
clear writing of legal documents (more precisely in the context of
government communications). The idea is to propose a set of
guidelines such using short sections and sentences with simple
and short words, etc [2]. In this area, text readability assessment
techniques can be used to help identify parts of policy/legal
documents that require attention.
Readability measures such as FleschKincaid readability tests
[3], the Gunning Fog Index [4], SMOG Grading Index [5] are
proposed for the assessment of educational resources to assist
teachers, curriculum committees or librarians in choosing their
grade-level appropriateness. Such measures can influence the
selection of library books, the design of instructional materials,
the choice of the appropriate reading texts, etc.
With policy documents, such as terms and conditions, the use
of readability measures can help policymakers decide on the
1
http://www.internetlivestats.com
2
https://www.ftc.gov/
3
http://www.plainlanguage.gov/
ICEGOV’18, April 2018, Galway, Ireland
<keep blank / do not remove>
appropriateness of the document with respect to the target
audience. Knowing the level of education of the target audience
can help setting the target grade-level of the policy text. Such
information can be found in open and free data resources such as
UIS.Stat
4
from Unesco Institute for Statistics or the World Bank
Open Data
5
.
In this context, we are interested in the study of the level of
readability of terms of use of online services and applications with
respect to their target audience. We specifically examine the
policy texts in the TOSBack dataset
6
of terms of use of several
hundred services by evaluating their readability. Each of the
entries of this dataset is linked to its country of operation where
the average education level is used to verify if those policy texts
are properly written to the target population.
The remainder of the paper is structured as follows: Section 2
analyses related work dealing with readability assessment of legal
documents. Section 3 describes the concept of readability of a text
and more specifically introduces the SMOG Grading Index that
we will use in the rest of this work. Section 4 describes our
proposed policy data processing and analysis pipeline that is
implemented and discussed in Section 5. Section 6 reports on our
experimental results. Finally, Section 7 concludes the paper and
identifies future research directions.
2 Related Work Analysis
Individuals are often requested to sign contracts and other
policies that are intended to commit them to specific rights and
duties. Many of those individuals however do not truly
comprehend what they are signing. The difficulty of reading such
documents may bring individuals into making commitments that
they do not understand and might not have any desire to make.
Surprisingly, very little research has been done in the area of the
readability of such policy documents.
Scott and Suchan [6] examined contract understandability.
They analyzed how effortlessly officers and first line
administrators could understand agreements and found that these
understandings required at least college graduate abilities to
comprehend the content. They propose that agreements should be
composed to the reading comprehension level of the target group.
Currently in the U.S. the utilization of plain English has been
advanced in legal reports. State assemblies for example in
Maryland, Florida, Michigan, and Texas have started to consider
and in some cases pass "simple language" rules for authoritative
archives. As referred by McDonald [7] the State of California has
4
http://data.uis.unesco.org/
5
https://data.worldbank.org/topic/education
6
https://tosback.org/
additionally started to create rules after an investigation found that
90% of nationals and legal counselors needed less complex legal
language.
Research of the Indian legal documents Bhatia [8] explained
that, even though English is a not considered a native language in
India, it is generally utilized as a part of the act of the law cases.
Surprisingly the Indian judiciary has never objected the defects of
unclear archaic mode of legal language and there is no
parliamentary law to say that the legal language should be plain.
Fairclough [9] argues for the benefits of plain legal language is
that it has a tendency to influence others, it builds clear
correspondence, it tends to help legal exactness and it enhances
document's substance.
Julie E. Howe and Michael S. Wogalter [10] conducted two
studies to find out the understandability of legal documents people
are often used to sign. Their study showed that with the education
level of average two years of college, participants read and
understand contract moderately. However, their basic complaint
was the technical nature of the documents and recommended to
achieve understandability of legal documents by decrease their
technical aspects.
This paper is step forward the systematic evaluation of the
understandability of legal documents of various countries by
formulating their readability using the SMOG Index. After this
computation, the score is compared with the countries’ population
average education attainment. This shows the reading complexity
of legal documents that individuals are facing in each country.
This is a timely topic as it relates to public education and
awareness in the context of policy design practices [11].
3 Measuring the Readability of a Text
Readability formulas have been introduced to predict the level
of comprehension for large passages of texts in a document [12].
The level of comprehension of texts is linked to the prediction of
the required level of reading skills to easily use it and understand
it. The readability of a text depends on its content and presentation
(e.g., font and size). In this paper, we are interested in the content,
leaving the presentation aspect for future work. More specifically,
we examine texts with respect to their syntactic composition:
sentence length, use of complex words, etc. In this particular
context, the literature proposes a wide range of readability
formulas such as FleschKincaid readability tests [3], The
Gunning Fog Index [4], SMOG Grading Index [5], etc. In our
work, we use the SMOG Grading Index for the evaluation of
readability of policy texts.
McLaughlin created the SMOG Readability Formula in 1969
as an improved version of previous formulas. The object is to
estimate the number of years of education a person needs to
comprehend a text. Formula (1) is the version that we consider in
our work. It counts the number of complex words in three 10-
<keep blank / do not remove>
ICEGOV’18, April 2018, Galway, Ireland
sentence samples, computes its square root, multiply it by 1.043
and adds 3.1291 [5]. A complex word is defined a polysyllabic
word, i.e., words with 3 or more syllables.
𝑔𝑟𝑎𝑑𝑒=1.0430 𝑐𝑜𝑚𝑝𝑙𝑒𝑥 𝑤𝑜𝑟𝑑𝑠 ∗ 30
𝑠𝑒𝑛𝑡𝑒𝑛𝑐𝑒𝑠+3.1291
The SMOG readability formula can be applied as follows
7
:
Step1: Take the input text and select 10 sentences from
the beginning, 3 from the middle and 3 from the end.
The total should be 30 sentences.
Step2: Count the number of complex words in each
group of sentences. A complex word is a word that has
more than three syllables.
Calculate the score using Formula (1).
The resulting grade can be compared to the American school
grading system. Table 1 shows an approximate conversion
between the SMOG Index and the Education level.
Table 1: Conversion Table of SMOG Scores to Education
Levels
SCORE/ Grade
Education Level
1-4
Elementary School
5-8
Middle School
9-12
High School
13-16
Undergraduate
17+
Graduate
4 Readability of policy documents of terms and
conditions of online services
We propose, in this section, a data processing pipeline that
takes as input a policy document that is advertised online. Policy
documents constitute terms and conditions of use of an online
service or application that users should agree to before using of
that service. The idea here is to create a high-level data flow
without ties to any technological choices or data formatting
(except for the input data that should be in HTML). The proposed
pipeline (or data flow) is shown in Fig. 1 and details about the
different steps shown on this diagram are explained in the
following.
7
http://www.readabilityformulas.com/smog-readability-formula.php
Figure 1: Data Processing for Policy Data Analytics
1. Operation: HTML Parsing and String Processing;
Input: HTML file;
Output: Raw text;
The first step of data processing pipeline consists of parsing
the content of the web page containing the policy text. A policy
text is an HTML document (either as a downloaded file on the
disk or available online using its URL). The document is parsed to
remove all the HTML tags and the irrelevant parts such as
javascripts. Further filtering operations can be done at this stage
such as removing very short sentences (e.g., sentences that have
less than 2 words). The more the document is structured the faster
this step would be.
2. Operation: Readability Evaluation;
Input: Raw Text;
Output: Readability Score;
The second step consists in measuring the readability score of
the parsed policy text. This operation can be done using either of
multiple readability score. The choice can be made depending on
the use-case. For example, the readability of either the entire text
can be measured at once or each paragraph can be measured
separately. The results of this step will feed into the policy
metadata that contains other information collected in the other
steps of the data processing pipeline.
3. Operation: Metadata Enrichment;
Input: Website URL;
Output: Additional metadata (e.g., target countries and
their education levels);
Proper enrichment of the metadata of the websites from which
the policy text has been extracted is required. Such information
allows to identify the target audience of the website. Possible
ICEGOV’18, April 2018, Galway, Ireland
<keep blank / do not remove>
implementations of this step include the analysis of the policy text
to identify the target country, the use of services such as AWS
Alexa
8
or google analytics, or by combining multiple other online
resources/data such as whois.com or opencorporates.com. In
addition, further information about the education attainment levels
of the audience is required. To get this information for each of the
identified target countries (or countries of operation) of the
website, we can use open data resources such as UIS.Stat
9
from
Unesco Institute for Statistics or the World Bank Open Data
10
.
4. Operation: Visualization and Statistical Analysis;
Input: Structured metadata related to the policy texts;
Output: Visual Analytics Components
After the enrichment of the data, a dedicated analysis toolset
can be used to carry out various types of analysis. Visual analytics
would be the easiest and most obvious way to use the output of
the previous operations. Using various visualization techniques
helps policymakers identify the level of readability of their texts
and potentially identify paragraphs that require further attention
and a better writing style. The data enrichment part helps to align
the average level of education attainment in some countries that
the website targets. This information can drive decision makers to
further tailor their policy text to reach a wider audience.
5 EXPERIMENTAL AND COMPUTATIONAL
DETAILS
5.1 Methodology
For testing the proposed data processing pipeline and carry out
the analysis of real policy texts, implemented the data processing
pipeline discussed earlier in Section 4 following the technology
choices shown in Fig. 2. We use the ToSback Terms of Service
dataset that will be detailed in Section 5.2. This dataset is parsed
and its metadata has been enriched using custom Java
applications. Educational data is taken from the UIS.Stat
11
from
Unesco Institute for Statistics that will be further detailed in
Section 5.3.
The analysis of the results is carried out using R and will be
detailed in Section 6. Furthermore, we propose a web-based
application for navigating the results of our analysis using various
visual components that will be detailed in Section 5.4.
8
https://aws.amazon.com/en/awis/
9
http://data.uis.unesco.org/
10
https://data.worldbank.org/topic/education
11
http://data.uis.unesco.org/
Figure 2: Technical Choices for the Implementation of the
Data Processing and Analytics Pipeline
5.2 Policies Dataset Processing
ToSback is the result of a collaboration between the Electronic
Frontier Foundation, Internet Society and Terms of Service Didn’t
Read. The project aims to continuously check terms and
conditions of many online services and monitor any changes on
them. ToSback provides an openly available dataset
12
that
contains web pages of terms of services of 903 websites. The total
number of files crawled by this project is 711,049.
The HTML parsing and String processing step consists in
eliminating all the files that are not relevant for our analysis such
as css and javascript files. The final set of files that we analyzed is
reduced to 1587.
As the dataset contains websites that have multiple documents
to refer to various policies or refer to other websites’ policies or
have moved their policies to other URLs, we detected in the
parsing operation these issues and we applied filtering rules such
as keeping only files that contain more than 10 sentences, and
keep sentences that contain more than 2 words, usually these
sentences represent titles of sections and do not bring much
information about the readability of the text.
12
https://github.com/pde/tosback2-data
<keep blank / do not remove>
ICEGOV’18, April 2018, Galway, Ireland
The readability evaluator developed for this work is part of the
of the AFEL project
13
. It takes as input a raw text or a URL of a
webpage and provides its readability score using multiple
formulas, but we are using for this only the SMOG Grade Index
described earlier in Section 3. This application (as seen in Fig. 3)
extracts also important concepts found in the provided document.
These concepts can be also used to identifying complex words
used in this text. In the current version we do not make use of this
feature and we simply use the SMOG Index.
Figure 3: UI of the Readability Evaluator used in this Work
5.3 Educational Attainment Levels
In this work, we are interested to analyze the readability of
policy texts with respect to the target audience. For each of the
websites that we consider, we identified the top three target
countries using analytics services such as AWS Alexa. The
prediction of the suitability of policy texts with the target audience
requires further information about the average level of education
of that audience. Such information is openly available through
UIS.Stat from Unesco Institute for Statistics. We consider,
initially, in our analysis 55 countries that have been reduced to 24
as we decided to keep only countries that are part of the target
audience of more that 5 websites.
Attainment levels as per UIS.Stat dataset indicate 11 levels of
education programs ranging from no schooling, incomplete
primary, primary, to master’s and doctoral levels.
13
AFEL project (Analytics for Everyday Learning) - http://afel-project.eu.
5.4 EnDRiV for data visualization
EnDRiV (Education and Document Readability Visualization)
is a visualization tool of terms and conditions documents of
various websites formulating their readability SMOG Index. The
documents are clustered by countries representing the target
audience of those online services/websites. With this tool one can
compare the SMOG index with average population education
percentage of a country. It helps to find out the complexity of
terms and conditions documents that individuals should read
before using those online services in each country.
The main features of EnDRiV are as follows:
- Visual Representation
EnDRiV represents countries on a World map where a user
can hover over each country to get information related to its
population’s education attainment levels and the year on which
those measures are taken. This education data for each country is
displayed on pie and bar charts.
The most visited websites/documents of each country are
displayed on force directed layout graph where a country node is
linked to its websites/documents. By clicking on each
website/document, a number of statistical information about the
policy text of that website is displayed i.e., number of words,
sentences, syllables and character in a table.
- Technology
To build this tool, a variety of web technologies are used
including HTML5, CSS, SASS, JavaScript, JQuery
14
, SVG
15
,
Highcharts
16
, HighMaps
17
, Amcharts
18
and JSON
19
. The map
visualization is based on the Javascript library HighMap. The
education visualization uses the Javascript library HighCharts
whereas Javascript library AmCharts is used in candlestick
representation of the SMOG Readability Index. The open source
Javascript library D3.js
20
is used to implement the force-directed
layout graph for visualizing the country’s most visited websites.
All the data used for visualization is in JSON format.
- Availability
The EnDRiV tool can be accessed at http://vmafel01.insight-
centre.org/EnDRiV/
14
https://jquery.com/
15
www.w3schools.com/svg/
16
https://www.highcharts.com/
17
https://www.highcharts.com/blog/products/highmaps/
18
https://www.amcharts.com/
19
http://json.org/
20
https://d3js.org/
ICEGOV’18, April 2018, Galway, Ireland
<keep blank / do not remove>
In the following, an example scenario is discussed to
demonstrate the interaction with EnDRiV: France Education and
Document Readability Visualization.
In order to visualize the Education and Document Readability
of documents targeting France one can select France from the
world map (Fig 4. window A). All the other sections of this
window will display all the related information. For example, the
most visited websites are displayed in the force directed layout
graph where France is linked with websites/documents (Fig 4.
window E). By clicking on each website, the stastical information
about the policy text for that website is displayed in window F of
Fig 4. (i.e. counts of Words, Characters, Syllables and Sentences).
The SMOG readability index of all websites/documents are
visualized in a candlestick chart showing the aggregated scores in
the form of Open, High, Close and Low (Fig 4. window D). The
education information for the selected country (in our case
France) is shown in a pie chart in window B and in a bar chart in
window C. The proximity of window C and D is deliberate as it
Figure 4: France Education and Document Readability Visualization using EnDRiV
<keep blank / do not remove>
ICEGOV’18, April 2018, Galway, Ireland
shows on the one hand the education attainment percentages and
the average SMOG index on the other hand. This is gives a visual
indication of the potential correlation between the education level
of the target audience of the website to visually assess the
difficulty of the texts for that target population.
6 RESULTS AND DISCUSSION
6.1 Readability of Terms and Conditions for all
the Countries
The results of the analysis of the dataset used in our
experiment are listed per country in Table 2. A summarized
version of this table is shown in Table 3. The results show that
around than 75% of the documents considered require an
education level more than 9 (i.e., high school, undergraduate and
graduate levels). Among these documents, around 45% require a
third level education to be understood. Such documents would be
difficult to be understood by a population where the average
education level is very low. Examples such as India, with 41.3%
of its population with no schooling, such documents are not at all
understandable.
Table 3: Percentage of Documents for each Education
Level Identified based on the SMOG Index
SCORE
Education Level
Percentage of
documents for each
educational level
1-4
Elementary
School
0.09
5-8
Middle School
25.24
9-12
High School
29.79
13-16
Undergraduate
35.19
17+
Graduate
9.68
Table 2: Distribution of Documents per Education Level for each Country using SMOG Index
ICEGOV’18, April 2018, Galway, Ireland
<keep blank / do not remove>
On the other hand, the results show that a very limited number
of documents (3 documents) have a score less than 4 and around
25% of these documents target a population with elementary and
middle school levels. By examining the content of some of these
documents, we found out that some of them are either written with
a language different from English (e.g., Korean language) or have
a very limited text such “For terms and conditions refer to this
link” or the documents were not properly crawled in a way that
“Page not found” was the only available content. This identifies
the following issues in our proposed pipeline:
- a language detection module is necessary to detect the
language of the document being analyzed.
- Define a strategy to deal with multilingual readability
evaluation, either by investigating other readability
formulas for each language or remove the document
completely for the results of the analysis.
- the dataset used, i.e., TOSBack, does not provide any
metadata regarding the document crawled. Information
regarding the language of the document could improve
considerably our analysis.
- A module that detects outliers based on their content is
necessary for eliminating empty pages for example.
Assuming that all our data is correct and all outliers have been
removed from our dataset, we plot the candlestick chart for the
readability scores of documents for each country in Fig. 5. This
type of charts help identify the density of readability scores of the
documents centered around a mean value representing the
majority of documents. Fig 5. shows that the mean readability
score of most of the countries is between 10 and 15. In other
words, the average education level required for understanding
those documents is at a third level education (i.e., undergraduate
studies). In the following sections, we look more specifically at
the readability scores of documents of the United States and India.
6.2 Readability of Terms and Conditions in the
United States
We selected the United States for our analysis for multiple
reasons. First, this country is the main target for most of the
websites considered in the TOSBack dataset. As shown in Table
2, the United States is the target country for 1139 policy document
that constitute around 35% of the entries of our results (see Table
2). Second, citizens of the United States are native English
speakers. Third, and most importantly, the SMOG index formula
has been initially designed to identify the required level of
education of texts using the education standards in the United
States.
Figure 5: Candlestick chart showing the readability scores of documents per country
<keep blank / do not remove>
ICEGOV’18, April 2018, Galway, Ireland
For the analysis of the results of readability scores of the
Terms and Conditions of websites targeting this country, we use R
to display both education data and readability scores (see Fig. 6).
The education data is shown in a bar chart showing that most of
the population has an education attainment at the upper secondary
level (i.e., 46.1%). The SMOG index equivalent to this level is 10.
It is then recommended for policymakers to consider policy texts
with a SMOG index less than 10. However, according to our
analysis of terms and conditions that target this country, we found
that the mean value of the SMOG index is 12. This informs that
policy texts targeting this country are difficult to read by a large
group and consequently policy makers should pay attention to this
by following federal readability standards [13].
6.4 Readability of Terms and Conditions in India
We selected India as a second country for our analysis for
multiple reasons. First, this country is the second main target for
most of the websites considered in the TOSBack dataset. As
shown in Table 2, India is the target country for 593 policy
document that constitute around 18% of the entries of our results
(see Table 2). Second, as stated in the related work section. even
though English is a not considered a native language in India, it is
generally utilized as a part of the act of the law case.
Similar to our analysis in Section 6.3, for the analysis of the
results of readability scores of the Terms and Conditions of
websites targeting India, we use R to display both education data
and readability scores (see Fig. 7). The education data is shown in
a bar chart showing that most of the population has no schooling
(i.e., 41.3%).
The mean value of the SMOG index of the documents
targeting India is 10. As per Fig. 7, around 38% of the population
of India have the required level of education to understand Terms
and Conditions of the considered websites. While this number
might seem sufficient for policymakers, it is still necessary to
consider the other part of the population. It might be more
challenging for the case of India as many the population do not
have access to education. Summarization of legal text [14] and
other means of communications need to be researched such as the
use of graphical indicators, multimedia files such videos, etc.
7 CONCLUSIONS
In summary, we have performed both an experimental and
theoretical study of the potential use of text readability formulas
for evaluating the level of readability of policy texts. More
specifically we used terms and conditions of online services as a
test collection to evaluate their level of readability. Furthermore,
we have included in our test collection the information about the
target audience of the terms and conditions by looking at the most
visiting countries of those web services. For each of the target
audience (i.e., country) we used the education attainment level to
compare the readability scores of the policy documents with
average education level of their target countries. We found out
that most web services for each country propose terms and
conditions that require a third level education. Our results also
show that for the United States the mean values of the readability
score of the considered documents is 12 while 46.1% of its
population’s education attainment is at the upper secondary (i.e.,
equivalent SMOG index = 10). The situation is even more critical
in India with an average readability score of documents of 10 with
41.3% of its population having no access to education. These
results show a need to establish both standards and guidelines for
the authors of policy documents to support them in achieving an
appropriate level of readability, and tools like the ones presented
in this paper to assess and identify readability issues.
Figure 6: Education vs. Readability Attainment scores of Terms and Conditions for the United States
ICEGOV’18, April 2018, Galway, Ireland
<keep blank / do not remove>
As part of our future work we plan to:
- Investigate solutions to resolve issues encountered
during the analysis of our data (see Section 6.1) by
adjusting our data management and analysis pipeline by
developing the required modules to remove outliers and
identify languages of the documents
- Test and validate our approach with other datasets that
from government communications. Texts of government
communications tend to be written in the same style as
legal documents. However, they represent a more
“human readable” forms of laws and government
regulations.
- Perform a multi-level of analysis of the texts, e.g., at the
sentence or paragraph levels. That can help
policymakers identify the exact portions of texts that
required more attention.
- Consider other readability scores that examine the
vocabulary used in the texts and not only length of
sentences and complexity of words (imposed in this
work by the SMOG index formula).
ACKNOWLEDGMENTS
This work has received funding from the European Union's
Horizon 2020 research and innovation programme as part of the
AFEL (Analytics for Everyday Learning) project under grant
agreement No 687916.
REFERENCES
[1] Federal Trade Commission, “Protecting Consumer Privacy in an Era of
Rapid Change: A proposed framework for businesses and policymakers,”
2010.
[2] Plain Language Action and Information Network, “Federal Plain
Language Guidelines,” 2011.
[3] R. Flesch, “A new readability yardstick.,” J. Appl. Psychol., 1948.
[4] R. Gunning, “The Fog Index After Twenty Years,” J. Bus. Commun.,
1968.
[5] H. G. McLaughlin, “{SMOG} grading - a new readability formula,” J.
Read., pp. 639646, May 1969.
[6] C. Scott and J. Suchan, “Public sector collective bargaining agreements:
How readable are they?,” Public Pers. Manage., vol. 16, no. 1, pp. 1522,
1987.
[7] M. McDonald, “Lawyers vs. language: Briefs are criminal,” The News &
Observer, Raleigh, NC, May-1992.
[8] P. D. K. L. Bhatia, Textbook on Legal Language and Legal Writing.
Universal Law Publishing, 2010.
[9] N. Fairclough, Language and Power, Third Edit. New York: Pearson
Education, 2001.
[10] J. E. Howe and M. S. Wogalter, “The Understandability of Legal
Documents: Are they Adequate?,” Proc. Hum. Factors Ergon. Soc. Annu.
Meet., vol. 38, no. 8, pp. 438442, 1994.
[11] A. E. Waldman, “A Statistical Analysis of Privacy Policy Design,” Notre
Dame Law Rev. Online, vol. 93, 2017.
[12] T. M. DUFFY, “CHAPTER 6 – Readability Formulas: What’s the Use?,”
in Designing Usable Texts, 1985, pp. 113143.
[13] D. D. Dyson and K. Schellenberg, “Access to Justice: The Readability of
Legal Services Corporation Legal Aid Internet Services,” J. Poverty, vol.
21, no. 2, pp. 142165, 2017.
[14] A. Kanapala, S. Pal, and R. Pamula, “Text summarization from legal
documents: a survey,” Artif. Intell. Rev., pp. 132, 2017.
Figure 7: Education vs. Readability Attainment scores of Terms and Conditions for India
... The 13th grade or above is considered university level. Table 1 shows an approximate comparison between the index scores and the US education level [17]. The Coleman-Liau Index (CLI) [18] depends on the complexity of the words, measured from the number of letters, and the complexity of the sentences. ...
... TABLE COMPARING SCORES AND EDUCATION LEVELS[17]. ...
Preprint
Modern cars technologies are evolving quickly. They collect a variety of personal data and treat it on behalf of the car manufacturer to improve the drivers' experience. The precise terms of such a treatment are stated within the privacy policies accepted by the user when buying a car or through the infotainment system when it is first started. This paper uses a double lens to assess people's privacy while they drive a car. The first approach is objective and studies the readability of privacy policies that comes with cars. We analyse the privacy policies of twelve car brands and apply well-known readability indices to evaluate the extent to which privacy policies are comprehensible by all drivers. The second approach targets drivers' opinions to extrapolate their privacy concerns and trust perceptions. We design a questionnaire to collect the opinions of 88 participants and draw essential statistics about them. Our combined findings indicate that privacy is insufficiently understood at present as an issue deriving from driving a car, hence future technologies should be tailored to make people more aware of the issue and to enable them to express their preferences.
... The 13th grade or above is considered university level. Table 1 shows an approximate comparison between the index scores and the US education level [17]. The Coleman-Liau Index (CLI) [18] depends on the complexity of the words, measured from the number of letters, and the complexity of the sentences. ...
... TABLE COMPARING SCORES AND EDUCATION LEVELS[17]. ...
Conference Paper
Modern cars technologies are evolving quickly. They collect a variety of personal data and treat it on behalf of the car manufacturer to improve the drivers’ experience. The precise terms of such a treatment are stated within the privacy policies accepted by the user when buying a car or through the infotainment system when it is first started. This paper uses a double lens to assess people’s privacy while they drive a car. The first approach is objective and studies the readability of privacy policies that comes with cars. We analyse the privacy policies of twelve car brands and apply well-known readability indices to evaluate the extent to which privacy policies are comprehensible by all drivers. The second approach targets drivers’ opinions to extrapolate their privacy concerns and trust perceptions. We design a questionnaire to collect the opinions of 88 participants and draw essential statistics about them. Our combined findings indicate that privacy is insufficiently understood at present as an issue deriving from driving a car, hence future technologies should be tailored to make people more aware of the issue and to enable them to express their preferences.
... Die Europäische Datenschutzgrundverordnung regelt unter anderem die Informationspflichten zur Verarbeitung personenbezogener Daten und schreibt eine präzise und verständliche Form für die Informationsvermittlung vor. Demgegenüber wird bei der Information über die Nutzungsbedingungen von Online-Diensten die Textlänge und Textverständlichkeit bemängelt (Derguech et al. 2018). Sie werden selten gelesen und ihnen wird oft unkritisch zugestimmt (Tabassum et al. 2018). ...
Article
Full-text available
Zusammenfassung Die Digitalisierung hat alle Lebensbereiche erreicht – so auch den öffentlichen Sektor. Um bequeme E‑Government-Angebote bereitstellen zu können, müssen bürgerbezogene Daten zwischen Verwaltungen geteilt werden. Gleichzeitig verfolgen viele Kommunen Smart-City-Strategien und sind dabei auch auf den Zugang zu Daten der Bürger angewiesen. Damit stellt die Digitalisierung und Digitale Transformation im öffentlichen Sektor zunehmend mehr Anforderungen an die Datensouveränität der Bürger. Im privaten Bereich geschieht dieses Teilen und die Weitergabe von Daten häufig unreflektiert oder wenig informiert. Zwar stimmen viele Personen zu, dass ihnen Datenschutz wichtig sei, diese Einstellung zeigt sich allerdings oft nicht im Verhalten (Privacy Paradox). Ziel des Beitrags ist es, basierend auf aktueller Forschung, für die Datensouveränität relevante Eigenschaften von Websites im privaten und öffentlichen Kontext anhand von exemplarischen Fällen zu beschreiben. Unterschieden wird dabei zwischen öffentlichen Angeboten (z. B. Bürgerportale von Kommunen), besonders regulierten Angeboten (z. B. Portale von Banken und Versicherungen) und privaten Angeboten, deren Geschäftsmodell darauf beruht, möglichst umfangreiche Daten über ihre Nutzer zu sammeln (z. B. Soziale Netzwerke). Ziel ist es, Eigenschaften der Websites zu erfassen, die Auswirkungen auf die Datensouveränität auf Nutzerseite haben können und Gestaltungsempfehlungen zur Erhöhung der Datensouveränität, insbesondere für den öffentlichen Sektor, abzuleiten.
Article
Full-text available
Enormous amount of online information, available in legal domain, has made legal text processing an important area of research. In this paper, we attempt to survey different text summarization techniques that have taken place in the recent past. We put special emphasis on the issue of legal text summarization, as it is one of the most important areas in legal domain. We start with general introduction to text summarization, briefly touch the recent advances in single and multi-document summarization, and then delve into extraction based legal text summarization. We discuss different datasets and metrics used in summarization and compare performances of different approaches, first in general and then focused to legal text. we also mention highlights of different summarization techniques. We briefly cover a few software tools used in legal text summarization. We finally conclude with some future research directions.
Article
Readability of existing legal-aid websites exceeds Federal standards and the reading ability of most Americans. Legal Services Corporation supported the development of online legal content to mitigate access barriers to civil legal-aid. A t-test analysis, using Flesch-Kincaid readability analyses of 407 text passages in LSC-sponsored websites, showed that websites that claim to follow federal readability standards are easier to read than websites that do not make this claim. The websites are still beyond the comprehension of many poor Americans with limited education and literacy skills. In general, these findings underscore a need to develop appropriate readability levels in legal aid websites to improve access for vulnerable populations.
Chapter
There has been widespread application of readability formulas as a tool for defining plain English in the production of texts as well as in judging existing documents. There are numerous reasons why readability formulas have been selected to fulfil this defining role. However, the findings of Duffy and Kabance along with Kniffen et al present a strong case against the readable writing approach to revision and hence against the use of a readability formula as a feedback device for the writer. Kniffen et al used a readable writing style manual. In both cases, conditions were optimal for the readability improvements to facilitate comprehension. Yet in both cases the manipulations, with one exception, resulted in no effect or, at best, marginal effect on comprehension. If the revision approach does not produce large comprehension effects under ideal testing conditions, then there must be little expectation for the approach to be effective in practical application. The findings of Duffy and Kabance, in fact, suggest that some readable writing techniques will not be effective in improving comprehension under any circumstances. The effectiveness of other simplification strategies will depend on the reading requirements and reading conditions.
Article
This article assesses the readability of a representative cross-section of public sector collective bargaining agreements. The purpose is to determine how easily public sector union members and officers and first-line supervisors can understand the contracts which affect their working lives. Three clauses (seniority, discipline, and grievance) were analyzed from each of the forty-nine agreements chosen for this study. The readability level of each clause was measured using three commonly accepted readability formulas. The results show that the reading comprehension skills of at least a college graduate are required to understand these clauses easily.
Citizens are frequently asked to make commitments by signing contracts and legal documents that frequently contain phraseology and jargon (sometimes called legalese) that highly-educated citizens often do not understand. In recent years, human factors professionals have become intimately concerned with the design of product-related documentation and safety communications (e.g., warnings), and through research have offered ways to improve these materials. However, there is apparently no human factors research on the design and evaluation of legal contracts and other similar documents. The purpose of the present research was to begin to assess some of the factors related to people's reading and understanding of legal documents. Study 1 examined the types of legal documents that people sign, how often they sign them, how carefully they read them, and whether they understand them. Ninety-two individuals were asked to complete a survey addressing these issues. While it was reported that the contracts were read moderately carefully and were understood moderately well, the levels were not as high as one would expect given the importance of the documents and the education level of the participants in this study (who had, on average, approximately two years of college). Also, 96% of the sample believed that legal documents could be improved and provided specific suggestions on how this might be accomplished. In Study 2, 32 participants rated the set of potential improvements to legal documents that had been suggested by Study 1's participants. The results confirmed the first study's pattern of findings. Implications for average citizens' lack of comprehension of contracts and other legal documents are discussed with a specific focus on the role research might have on their improvement.
Article
This article is Robert Gunning's own assessment of the achieve ments of the Fog Index after twenty years of use. Those in terested in readability formulas will be happy to learn a little more not only about the history and development of the Fog Index but about the applica tions that have been made of it in business and industry, and also in newspaper and govern ment work. Mr. Gunning is well aware, of course, of some of the weaknesses of the index, but obviously he also has rea son to be proud of its achieve ments.
Textbook on Legal Language and Legal Writing
  • P D K L Bhatia
P. D. K. L. Bhatia, Textbook on Legal Language and Legal Writing. Universal Law Publishing, 2010.
A Statistical Analysis of Privacy Policy Design
  • Waldman A. E.
A. E. Waldman, "A Statistical Analysis of Privacy Policy Design," Notre Dame Law Rev. Online, vol. 93, 2017.