Content uploaded by Patricia Acosta-Vargas
Author content
All content in this area was uploaded by Patricia Acosta-Vargas on Jul 14, 2020
Content may be subject to copyright.
The Portable Document Format: An Analysis
of PDF Accessibility
Patricia Acosta-Vargas1,2(&), Mario Gonzalez1,
Maria Rosa Zambrano1,3, Ana Medina1, Noah Zweig1, and
Luis Salvador-Ullauri2
1 Universidad de Las Américas, Vía a Nayón, Quito, Ecuador
{patricia.acosta,mario.gonzalez.rodriguez,
maria.zambrano.torres,anagabriela.medina, noah.zweig}@udla.edu.ec
2 Universidad de Alicante, San Vicente del Raspeig, Alicante, Spain
lasu1@alu.ua.es
3 Universidad Politécnica de Madrid, Madrid, Spain
Abstract. Today, PDFs are frequently used as part of the preservation of historical documents
in libraries, and they are also one of the most used formats on the web when sharing
information. Unfortunately, most shared documents are not accessible, especially for users
with disabilities. To solve this problem, AQ1 we propose to relate accessibility techniques for
PDF documents in accordance with the Web Content Accessibility Guidelines (WCAG) 2.1.
As a case study, we have selected a random sample of 10 documents related to the modern
architectural heritage of Quito. The authors applied a combined method to check accessibility
in PDFs with the help of the PDF Accessibility Checker version
3.0,
The results revealed that the accessibility barriers that are repeated in most
documents are related to the content and the natural language of the analyzed
PDFs. The analysis applied in this investigation can contribute to future works
to generate more inclusive PDF documents.
Keywords: Accessibility Digital documents Portable document PDF techniques
WCAG 2.1
1 Introduction
Presently, portable document formats (PDF) are an essential element of information
excellence. PDF documents are increasingly used as part of the preservation of
historical documents in libraries and are often shared on the web. Nevertheless, not all
PDFs offer universal access. To solve this problem, we apply the PDF techniques of
the Web Content Accessibility Guidelines (WCAG) 2.1 [1]. In this study, we take as a
case study a random sample of 10 documents in PDF format that refers to the modern
architectural heritage of Quito and is stored in a digital format. In the evaluation of the
documents, we use the PDF Accessibility Checker version 3.0, which showed that
libraries had not been concerned about providing accessible documents under
© The Editor(s) (if applicable) and The Author(s), under exclusive license to
Springer Nature Switzerland AG 2020
2 P. Acosta-Vargas et al.
I. L. Nunes (Ed.): AHFE 2020, AISC 1207, pp. 1–9, 2020.
https://doi.org/10.1007/978-3-030-51369-6_28
minimum accessibility standards. The PDFs became the first digital format to distribute the
documentation on the Internet; PDF files enable the whole integration combination of
various kinds of content, like text, images, videos, and forms.
The rest of the article is structured as follows: in Sect. 2 we show the background, in
Sect. 3 we depict the methodology and, therefore, the case study, in Sect. 4 we to show the
results and, the discussion, and finally, in Sect. 5, we tend to incorporate our conclusions and
propose future analyzes.
2 Background and Related Work
Accessibility refers to how users can communicate, interact, and navigate the web with ease.
To better the level of accessibility, the Web Content Accessibility Guidelines 2.1 (WCAG
2.1) proposes 4 principles of accessibility, 13 guidelines and 78 compliance criteria, and
some sufficient techniques and advisory techniques. The four principles of web accessibility
are 1) perceptible, 2) operable, 3) understandable and 4) robust [1].
Uebelbacher et al. [2] indicate that the research presents the PDF Accessibility Checker
2.0 tool that allows for automatic testing of those 108 test conditions that can be thoroughly
tested automatically. The tool promotes PDF accessibility among a full group of users and
has the potential to increase compliance of PDF documents with the respective accessibility
standard.
Furthermore, Ahmetovic et al. [3] argue that accessing mathematical formulas inside
digital documents is a challenge for blind people; in specific, the formats of documents
designed for printing, such as PDF, structure the mathematical content for visual access only.
While there are accessibility characteristics for presenting nonvisual PDF content, formula
support is limited to supporting alternative text that can be read on a screen reader or shown
in a braille bar. Nevertheless, the procedure of introducing replacement text is left to
document creators who infrequently deliver such content. Besides, at most excellent,
descriptions of formulas are supplied, which consequently makes it almost impossible to
transmit a detailed understanding of the complex formula.
The authors [4] suggest that in order for documents to be accessible, navigation aids,
such as bookmarks, may be included, which are particularly useful for longer documents.
The key to creating accessible PDF documents is to design the source document taking into
account accessibility; they suggest applying the standard ISO 32000-1: 2008.
In their previous studies, Acosta-Vargas et al. [5, 6] depict that PDF documents are
universally accessible, and Web Content Accessibility (WCAG) 2.0 must be applied. The
authors took as a case study the repositories of Latin American universities with the most
excellent university reputation corresponding to the Webometrics. In the assessment of the
PDFs, they showed that academies have not been worried about supporting creating
accessible documents.
Following the techniques proposed in WCAG 2.1, we have 23 techniques to make a PDF
accessible [7], Table 1 presents a summary of the success criteria associated with PDF
techniques. With the techniques recommended by WCAG 2.1, it is probable to examine the
scanning order of the labels, of how the manuscript is read aloud. To
Table 1. Summary of the success criteria associated with PDF techniques [7].
The Portable Document Format: An Analysis of PDF Accessibilty 3
Success criteria
Level
PDF general techniques
1.1.1 Non-textual content
A
PDF1, PDF4
1.2.1 Audio-only and
videoonly
A
General techniques
1.2.2 Subtitles
A
General techniques
1.2.3 Audio description or
alternative media
A
General techniques
1.2.4 Subtitles
AA
General techniques
1.2.5 Audio description
AA
General techniques
1.3.1 Information and
relationships
A
[7] PDF6, PDF9, PDF10, PDF11, PDF12,
PDF17, PDF20, PDF21
1.3.2 Significant sequence
A
PDF3 [7]
1.3.3 Sensory characteristics
A
General techniques
1.4.1 Use of color
A
General techniques
1.4.2 Audio control
A
General techniques
1.4.3 Contrast
AA
General techniques
1.4.4 Change text size
AA
G142 [7]
1.4.5 Text images
AA
PDF7, General techniques
1.4.9 Text images
AAA
PDF7
2.1.1 Keyboard
A
PDF3, PDF11, PDF23
2.1.2 No traps for keyboard
focus
A
G21
2.1.3 Keyboard
AAA
PDF3, PDF11, PDF23
2.2.1 Adjustable time
A
PDF3, G133
2.2.2 Pause, stop, hide
A
General techniques
2.3.1 Threshold of three flashes
or less
A
General techniques
2.4.1 Avoid blocks
A
PDF9, General techniques
2.4.2 Titling pages
A
PDF18
2.4.3 Focus order
A
PDF3
2.4.4 Purpose of the links
A
PDF11, PDF13
2.4.5 Multiple ways
AA
PDF2, General techniques
2.4.6 Headings and labels
AA
General techniques
2.4.7 Visible focus
AA
G149, G165, G195
2.4.8 Location
AAA
PDF14, PDF17
2.4.9 Purpose of the links
AAA
PDF11, PDF13
3.1.1 Page language
A
PDF16, PDF19 [7]
3.1.2 Language of the parties
AA
PDF19 [7]
3.1.4 Abbreviations
AAA
PDF8
4 P. Acosta-Vargas et al.
3.2.1 Upon receiving the focus
A
General techniques
3.2.2 When receiving tickets
A
PDF15 [7]
3.2.3 Consistent navigation
AA
PDF14, PDF17, G61 [7]
(continued)
Table 1. (continued)
Success criteria
Level
PDF general techniques
3.2.4 Consistent identification
AA
General techniques
3.3.1 Error identification
A
PDF5, PDF22 [7]
3.3.2 Labels or instructions
A
PDF5, PDF10 [7]
3.3.3 Error suggestions
AA
PDF5, PDF22 [7]
3.3.4 Error prevention
AA
General techniques
4.1.1 Processing
A
Not Applicable: PDF
4.1.2 Name, function, value
A
PDF10, PDF12 [7]
review accessibility in PDFs, there are some validators, which allows us to assess the
accessibility of PDFs corresponding to the WCAG 2.0 and the PDF/UA standard.
An additional tool is PDF Accessibility Checker 3.0, which is free and validates meta
information, labeling, safety, bookmarks, scanning order, and text contrast. This
investigation applied the PDF Accessibility Checker 3.0
1
because it permits validating the
PDFs under ISO 32000-1 (PDF/UA-1) [8] and the WCAG 2.1 [4], it offers a quick way to
test the accessibility of PDFs, it supports both experts and end-users who perform
accessibility valuations.
3 Method and Case Study
The case study is applied to a random sample of 10 documents in PDF format related to the
modern architectural heritage of Quito; Table 2 contains the detail of the documents
evaluated.
Table 2. PDF documents used in accessibility evaluation.
Id
File
Size (kB)
Title
Language
Tags
Pages
A
prueba_1.pdf
453
no title
no language
no tags
23
B
prueba_2.pdf
3158
no title
no language
no tags
23
C
prueba_3.pdf
2795
no title
no language
no tags
23
D
prueba_4.pdf
4981
no title
no language
no tags
23
E
prueba_5.pdf
2137
no title
no language
no tags
23
1
https://www.access-for-all.ch/en/pdf-lab/pdf-accessibility-checker-pac.html.
The Portable Document Format: An Analysis of PDF Accessibilty 5
F
prueba_6.pdf
355
no title
es-ES
525
12
G
prueba_7.pdf
16670
no title
no language
no tags
92
H
prueba_8.pdf
671
no title
no language
no tags
8
I
prueba_9.pdf
11328
Yes
es-ES
5519
130
J
prueba_10.pdf
2910
no title
es-ES
50
16
The method applied to evaluate accessibility in PDFs comprises of five phases, as
presented in Fig. 1.
Fig. 1. Method to assess accessibility in PDFs.
Phase 1: Select the random sample of PDF documents, in this phase we randomly
selected ten documents in PDF format that contain information related to the modern
architectural heritage of Quito, the evaluated documents are detailed in Table 2.
Phase 2: Review with PDF Accessibility Checker, we review each document with PDF
Accessibility Checker 3.0, version 3.0.7.0. The tests performed are available in a data set
located in the Mendeley repository
2
.
Phase 3: Record the results, in Table 3, we record the evaluation data; the tests are
available for the reproduction of the experiment in the Mendeley repository. Table 3
contains the number of barriers presented by the PDF documents evaluated, the errors
presented by each PDF document is detailed according to the errors presented.
Table 3. PDF documents failed.
PDF (failed)
A
B
C
D
E
F
G
H
I
J
Total
Embedded files
0
0
0
0
0
0
0
0
0
0
0
Metadata
4
4
4
4
4
6
0
4
4
0
34
Document settings
4
4
4
4
4
2
2
2
14
4
44
Fonts
0
0
0
0
0
6
0
24
32
0
62
Structure elements
0
0
0
0
0
0
0
0
332
4
336
PDF syntax
22
0
22
22
0
0
186
18
5791
86
6147
Structure tree
0
0
0
0
0
0
0
0
9998
4
10002
Role mapping
0
0
0
0
0
0
0
0
10752
196
10948
Alternative
Descriptions
0
0
0
0
0
0
0
0
21608
198
21806
2
https://data.mendeley.com/datasets/83n9xvgfcr/2.
6 P. Acosta-Vargas et al.
Natural language
916
10724
920
926
10296
0
0
0
109872
36
133690
Content
918
11368
922
928
10770
0
0
4692
246230
138
275966
Phase 4: Analyze the results; in this phase, we analyze the outcomes of the PDFs; in Fig.
2, we present a summary of the analyzed PDF documents. The parameters that fail and
represent an accessibility barrier for the users are shown, we observe that a substantial
number of failures corresponds to the Content followed by Natural language and
Alternative descriptions.
Fig. 2. Parameters of failed PDF documents.
Table 4 shows the parameters that pass the accessibility verification test; there are zero
(0) errors related to Embedded files, followed by Metadata.
Table 4. PDF documents passed.
PDF (passed)
A
B
C
D
E
F
G
H
I
J
Total
Embedded files
0
0
0
0
0
0
0
0
0
0
0
Metadata
2
2
2
2
2
0
0
4
4
0
18
Document settings
2
2
2
2
2
4
2
2
14
4
36
Structure elements
0
0
0
0
0
4
0
0
332
4
340
Fonts
0
380
0
0
0
14
0
24
32
0
450
PDF syntax
26
48
26
26
26
553
186
18
5791
86
6786
Structure tree
0
0
0
0
0
1048
0
0
9998
4
11050
Role mapping
0
0
0
0
0
1146
0
0
10752
196
12094
Alternative
Descriptions
0
0
0
0
0
0
0
0
21608
198
21806
Natural language
0
0
0
0
0
24524
0
0
109872
36
134432
The Portable Document Format: An Analysis of PDF Accessibilty 7
Content
916
10724
920
926
926
49768
0
4692
246230
138
315240
Phase 5: Suggest improvements, to ensure that PDF documents achieve an acceptable
degree of accessibility, we suggest the following: 1) Apply the same criteria as on the
web, that is, only images that are not decorative should have
alternative text; 2) create the PDF so that bookmarks are automatically
generated, hence, it is necessary to structure the source document well; 3)
label the tables correctly with the labels TABLE, TR, TH, and TD; 4) define
the links before labeling the document; and 5) include relevant information in
headers and footers consistently throughout the entire document.
4 Results and Discussion
In Fig. 2, we observe that PDF documents are not compatible with PDF/UA; 60%
contain errors related to Content, 29% with the Natural language, and 5% with
Alternative descriptions. Natural language is the most frequent error; it is present
when it is impossible to identify the language of the content of a document; this
is the reason why voice synthesizers and braille devices cannot automatically
switch to a new language. Also, the authors suggest considering the requirements
for multimedia and image resources to be accessible to the most significant
number of users and, therefore, suggest reviewing the studies [9]. Finally, they
suggest considering the application of heuristic methods [10] related to web
accessibility and the type of disability of endusers.
Figure 3, we observe that the documents that present a more significant
number of failures correspond to those of identifiers B, E, and I.
Fig. 3. Parameters of failed PDF documents.
Figure 4 presents a summary of the documents analyzed with PDF
Accessibility Checker 3.0; the most common errors are related to Content and
Natural language.
Fig. 4. Detail of the documents analyzed.
5 Conclusions and Future Works
The study carried out recommends creating accessible PDFs by applying the
techniques for PDFs, according to WCAG 2.1. To generate more inclusive
documents we propose to use PDF Accessibility Checker 3.0, version 3.0.7.0. The
study carried out can promote as a beginning point the future work to produce
more accessible PDFs. On the other hand, we suggest conducting accessibility
tests and correcting errors in PDF documents before sharing in digital
repositories. Furthermore, the authors suggest applying accessibility tools for
PDFs in the design of architectural plans which will allow innovating this area
and to get better access to a large number of users with disabilities. Finally, we
recommend libraries to develop access to digital papers so that they can raise
accessibility from an international communication viewpoint by employing the
criteria related to WCAG 2.1.
Acknowledgments. The researchers thank Universidad de Las Américas - Ecuador, for
funding this study through projects UDLA FGE.PAV.19.11, and ARQ.AMG.1802.
References
1. World Wide Web Consortium: Web Content Accessibility Guidelines (WCAG) 2.1.
https:// www.w3.org/TR/WCAG21/
2. Uebelbacher, A., Bianchetti, R., Riesch, M.: PDF accessibility checker (PAC 2): the
first tool to test PDF documents for PDF/UA compliance. In: International
Conference on Computers for Handicapped Persons, pp. 197–201. Springer, Cham
(2014)
3. Ahmetovic, D., Armano, T., Bernareggi, C., Berra, M., Capietto, A., Coriasco, S.,
Murru, N.,Ruighi, A., Taranto, E.: Axessibility: a LaTeX package for mathematical
formulae accessibility in PDF documents. In: Proceedings of the 20th International
ACM SIGACCESS Conference on Computers and Accessibility, pp. 352–354.
Association for Computing Machinery, New York, NY, USA (2018)
4. Devine, H., Gonzalez, A., Hardy, M.: Making accessible PDF documents. In:
Proceedings ofthe 11th ACM Symposium on Document Engineering, pp. 275–276.
Association for Computing Machinery, New York (2011)
5. Acosta-Vargas, P., Luján-Mora, S., Acosta, T.: Accessibility of portable document
format in education repositories. In: ACM International Conference Proceeding
Series, pp. 239–242 (2017)
6. Acosta-Vargas, P., Luján-mora, S., Acosta, T., Salvador, L.: Accesibilidad de
documentos PDF en repositorios educativos de Latinoamérica. In: Congreso
Internacional sobre Aplicación de Tecnologías de la Información y Comunicaciones
Avanzadas, pp. 239–246 (2017)
7. World Wide Web Consortium (W3C): Techniques for WCAG 2.1.
https://www.w3.org/ WAI/WCAG21/Techniques/
8. ISO: Document management applications—Electronic document file format
enhancement for accessibility—Part 1: Use of ISO 32000-1 (PDF/UA-1).
https://www.iso.org/standard/ 54564.html
9. Acosta-Vargas, P., Esparza, W., Rybarczyk, Y., González, M., Villarreal, S., Jadán,
J., Guevara, C., Sanchez-Gordon, S., Calle-Jimenez, T., Baldeon, J.: Educational
resources accessible on the tele-rehabilitation platform. In: International Conference
on Applied Human Factors and Ergonomics, pp. 210–220. Springer (2018)
10. Acosta-Vargas, P., Salvador-Ullauri, L., Luján-Mora, S.: A heuristic method to
evaluate web accessibility for users with low vision. IEEE Access 7, 125634–125648
(2019). https://doi.
org/10.1109/ACCESS.2019.2939068