Content uploaded by Markel Vigo
Author content
All content in this area was uploaded by Markel Vigo
Content may be subject to copyright.
Int. J. Web Engineering and Technology, Vol. 3, No. 3, 2007 30
7
Copyright © 2007 Inderscience Enterprises Ltd.
Interdependence between technical web accessibility
and usability: its influence on web quality models
Myriam Arrue*, Inmaculada Fajardo,
Juan Miguel López and Markel Vigo
Laboratory of Human-Computer Interaction for Special Needs
University of the Basque Country (UPV/EHU)
Manuel Lardizabal 1, E-20018 Donostia, Spain
E-mail: myriam@si.ehu.es
E-mail: acbfabri@si.ehu.es
E-mail: juanmi@si.ehu.es
E-mail: markel@si.ehu.es
*Corresponding author
Abstract: Quality assurance is one of the most significant issues web
engineering has to deal with. It is crucial to define adequate quality models in
order to determine the most significant properties for quality evaluation of
websites. Usability should be one of those properties as it bears on efficiency,
effectiveness and user satisfaction. Moreover, it has been demonstrated to be a
key property for users in recent European Commission’s benchmark studies on
quality and usage of public e-services. Web accessibility is also a significant
property that websites have to fulfil in order to ensure equal access for all
people in the Information Society. Therefore, both accessibility and usability
should be included in such quality models. This paper presents an empirical
study carried out with the aim of analysing the relationship between these two
properties. The results show that the provision of technical web accessibility
does not intrinsically guarantee web usability. Actually, other non-accessibility
metrics such as the number of images, links and words of the web home page
were better predictors of the users’ effectiveness, efficiency and satisfaction.
Keywords: accessibility; usability; quality models; standards; human-computer
interaction; empirical studies.
Reference to this paper should be made as follows: Arrue, M., Fajardo, I.,
López, J.M. and Vigo, M. (2007) ‘Interdependence between technical web
accessibility and usability: its influence on web quality models’, Int. J. Web
Engineering and Technology, Vol. 3, No. 3, pp.307–328.
Biographical notes: Myriam Arrue is an Engineer in Informatics (University
of the Basque Country, 1999). After her graduation, she joined the industry to
work on web applications development. In 2001, she moved to the Laboratory
of Human-Computer Interaction for Special Needs (UPV-EHU) to work on the
IRIS European Project on tools for automatic web accessibility evaluation.
Currently, she is a Lecturer in the Computer Architecture and Technology
Department (UPV-EHU) and is working on her PhD on Web Accessibility
Design Methods and Techniques.
Inmaculada Fajardo is an Experimental Psychologist (University of
Granada-UGR, 1999). After she got her degree, she joined the Cognitive
Ergonomic Group to research on diverse topics such as problem solving,
mental workload and hypertext interaction. In 2002, she moved to the
308 M. Arrue, I. Fajardo, J.M. López and M. Vigo
Laboratory of Human-Computer Interaction for Special Needs (UPV-EHU) to
work on the CogniWeb project on Cognitive Accessibility of Deaf People to
the web. She received her PhD in Experimental Psychology in 2005. Currently,
she is a Lecturer in the Experimental Psychology Department (UGR).
Juan Miguel López is an Engineer in Informatics (University of the Basque
Country, 2002). He joined the Laboratory of Human-Computer Interaction for
Special Needs (UPV-EHU) in 2003 and is collaborating in research projects
related to web accessibility and emotional interface evaluation. He is working
on his PhD focusing on the development of methodologies for evaluating
interfaces by user testing.
Markel Vigo is an Engineer in Informatics (University of the Basque Country,
2004). After his graduation, he joined the Laboratory of Human-Computer
Interaction for Special Needs (UPV-EHU) to collaborate in research projects on
web accessibility. He is currently working on his PhD on Methods and Tools
for Web Accessibility Measuring, Monitoring and Maintenance.
1 Introduction
Even though software engineering has been a productive research area in the last
decades, web engineering is a relatively new one. It refers to specific methods,
technologies and models for web application development since these applications
have special characteristics. In fact, web applications differ from other traditional
software applications in a wide range of aspects, such as the timeframe assigned for their
development, differences in the characteristics of end-users in terms of their age,
education and web navigational experience (Mendes et al., 2006). Because of these
special characteristics of web applications, it has been necessary to define specific
methods, technologies and models for their development.
An important issue which web engineering must deal with is the quality assurance of
both the final product and the processes. Quality assurance is essential in the current
situation where web applications are becoming the primary way of communication for
different activities such as business, leisure and learning, causing the total number of
existing websites to vertiginously grow. In fact, web designers have to design new
websites or update existing websites in very short periods of time, which negatively
affects the quality of the final product. Therefore, metrics, models and methods for
quality assurance have to be rigorously applied in the web applications development
process in order to overcome this situation.
It is necessary to determine the properties that are important for evaluating the quality
level of websites such as usability, performance, visibility, etc. In addition, it is crucial
to specify the metrics and methods for determining the fulfilment of each of those
properties. Similarly with software systems (Fenton and Pfleeger, 1997), all these
parameters should be specified in the quality model. The definition of adequate quality
models makes more accurate quality assurance processes possible.
Interdependence between technical web accessibility and usability 30
9
Defining a quality model is a complex task since it requires specifying all the
measures and methods that are going to be applied to obtain the necessary data from web
applications. One general method has been proposed and extensively applied for this
purpose, the GQM (Goal, Question and Metrics) approach by Basili and Weiss (1984).
This approach determines specific steps in order to define which data will be acquired
and which measurement methods will be used to obtain the necessary values for
evaluating the quality level of the systems. In recent years, specific models for web
applications quality evaluation have been proposed such as the 2QCV3Q model (Mich
et al., 2003) and WebQEM (Olsina and Rossi, 2002). The specific characteristics of
web applications and metrics for their measurement are taken into account in these
quality models. Moreover, several ISO standards dealing with web quality have
been developed. They cover different aspects of quality: quality of the software
product, quality of the software process and quality evaluation process. The ISO 9126
(ISO/IEC 9126-1, 2001) defines a quality model for software product quality, which
includes internal quality, external quality and quality in use. This standard defines six
software product quality characteristics: functionality, reliability, efficiency, usability,
maintainability and portability.
The previously described ISO standard states that internal and external quality of a
product are not sufficient to assure quality in use. It has to be evaluated dynamically in
terms of the user interaction result, since it refers to the extent in which the application
meets specific user’s needs in specific context of use. Therefore, quality in use is the final
user’s view of quality level, which is similar to the usability definition in ISO 9241-11
(1998). Consequently, usability must be appropriately measured, rated and assessed. This
attribute is especially relevant in the case of websites because they usually have more
information density than any other software product, so performing tasks can require
more effort.
According to Landauer (1995), 64% of the software applications maintenance costs
are due to usability problems. This situation highlights the importance of taking into
account usability issues in the development phases and the need for usability evaluation
methods. Moreover, recent European Commission’s benchmark studies on quality and
usage of public e-services (Top of the Web, 2003; 2004) indicate that the most important
factor for users is the ease of use.
Web accessibility is another significant issue web engineering has to handle. As
it will be seen later, accessibility and usability are closely related, as they both deal
with satisfaction, effectiveness and efficiency. According to Thatcher et al. (2002),
accessibility can be understood as a subset of usability. In fact, the accessibility concept
is related to the absence of physical or cognitive barriers to use the functionalities
implemented in a website such as navigation, search of information, shopping, etc.
Therefore, both usability and accessibility have to be included in any quality model for
evaluating the quality level of web applications. Many automatic tools and methods have
been developed in order to evaluate accessibility of websites. Actually, several
accessibility errors can be automatically detected. The term ‘technical accessibility’
refers to the fulfilment of accessibility guidelines as they define the functional
requirements that a site has to accomplish in order to reach a determined accessibility
level. However, it is recommended to perform evaluations with users in order to achieve
comprehensive accessibility evaluations of websites. Usability issues are more difficult to
detect since they require experts’ analysis and user testing.
310 M. Arrue, I. Fajardo, J.M. López and M. Vigo
This paper presents an empirical study carried out with the aim of analysing the
relationship between two measures of web quality: usability and technical accessibility.
Our specific objective was to test whether there was a relationship between more
accessible websites (measured with the POUR metrics of Arrue et al., 2005) and more
usable websites for general users (in terms of target found, response time, disorientation
and satisfaction).
This paper is organised as follows: Sections 2 and 3 provide usability and
accessibility definitions and describe their assessment methods. The existing
controversial discussion points about both concepts are described in Section 4 and the last
section is dedicated to the presentation of the empirical study conducted to clarify the
relation between both variables. Finally, conclusions on the results obtained in the
empirical study are provided.
2 Web usability
A key aspect in the evaluation of all kind of web interfaces is the usability, which is
colloquially defined as the property that a definite system must have in order to make it
easy to use and learn. One of the most broadly accepted usability definition is the ISO
9241-11 (1998). It states that usability can measure to what degree a product can be used
by several users to achieve specific goals with effectiveness, efficiency and satisfaction in
particular environments. In this sense, traditional usability and web usability concepts
overlap in numerous ways. Preece et al. (1994) states that web usability comprises
learnability, throughput, flexibility and user’s attitude towards a given website.
Nielsen (1993) defines usability engineering discipline as a software engineering
process that includes usability considerations in software development process. Applied
in web environments, usability engineering techniques may help in creating more
usable websites by having developers focus on users’ needs. Engineering web usability
introduces a few interesting problems and solutions, such as structuring the information
space. Examples of discussion on web usability issues are available at Sano (1996) and
Shneiderman and Plaisant (2005).
Usability evaluation is the process by which systems and products are evaluated using
any methods available to the usability expert. Nielsen and Mack (1994) provide a review
of different usability evaluation methods. An overview of the most important methods
is listed below.
2.1 Inspection methods
Usability inspection methods are evaluation methods employed by involving usability
inspectors in examining the user interface. Many inspection methods can be used in the
evaluation of early stage prototypes, so they can be performed early in the software life
cycle. However, some methods also address issues like the overall system usability
concerning the final prototype. These methods serve as a source of evaluative feedback
on specific elements of the user interface.
Interdependence between technical web accessibility and usability 311
2.1.1 Heuristic evaluation
An evaluator checks the interface according to sets of guidelines (i.e., heuristics).
According to Nielsen and Mack (1994), many usability problems can be identified by
using this technique. It is also possible to evaluate their significance early in the design
phase in order to fix such problems.
2.1.2 Cognitive walkthrough
An expert evaluator uses a detailed procedure to simulate task execution and browse
all particular solution paths, examining each action while determining if expected
user’s goals and memory content would lead to choosing a correct option. Cognitive
walkthroughs can be performed at any stage of design using a prototype, a conceptual
design document, or the final product. An example of a tool that implements this
technique for web environments is AutoCWW, developed by Kitajima et al. (2002). It
uses the Latent Semantic Analysis technique (LSA) (Deerwester et al., 1990) to identify
unusable elements within each web page such as unfamiliar, confusable or competing
headings and links.
2.2 Testing methods
In this kind of methods, tests are used to measure system performance according to
usability attributes, such as the previously mentioned heuristics. Typically, individual
users performing specific tasks with the system are observed and interaction data are
collected, for example, the number of errors made or the time required to fulfil the
given tasks.
2.2.1 Thinking aloud protocol
Thinking aloud tests consist in users interacting with the system while they continuously
vocalise their thoughts, feelings and opinions. By verbalising, test users enable
researchers to understand their points of view about the computer system, making it
easy to identify the users’ major misconceptions.
2.2.2 Performance measurement
This method is focused on measuring whether a usability goal is reached or not, and
how fast the goal is reached. This is almost always done with groups of users performing
pre-defined sets of tasks while collecting data related to errors and times. According to
Dumas and Redish (1993), task scenarios can be useful in performance measurement
and take some of the artificiality out of the test. The ISO 9241 promotes in particular
a usability evaluation approach based on measured performance of pre-determined
usability metrics. An example of metrics and usability factors is shown in Table 1. As
stated by Nielsen (1993), drawbacks of this method are the needed amounts of time,
economical resources and laboratory expertise.
312 M. Arrue, I. Fajardo, J.M. López and M. Vigo
Table 1 Examples of measuring usability with ISO 9241 proposed by Avouris (2001)
Usability
objective Effectiveness measures Efficiency measures Satisfaction measures
Overall
usability
Percentage of
goals achieved
Percentage of
users successfully
completing task
Average accuracy of
completed tasks
Time to complete
a task
Tasks completed per
unit time
Monetary cost of
performing the task
Rating scale for satisfaction
Frequency of discretionary use
Frequency of complaints
2.2.3 Card sorting
This is a technique for discovering the latent structure in an unsorted list of statements
or ideas by exploring how people group items and structures that maximise the
probability of users finding items. Researchers write each statement on small index cards
or pieces of paper and request a number of informants to sort these cards into groups or
clusters, working on their own. Results of individual sorts are then combined and
statistically analysed. As one of the most challenging issues in website design is creating
the information architecture, this technique is particularly useful for defining website
structures. Examples of tools that implement card sorting technique are WebCat1
and WebSort.2
2.3 Inquiry methods
Even though inquiry methods can be used to measure various usability attributes, their
most common usage relates to measurement of user satisfaction.
2.3.1 Questionnaires and interviews-based protocols
These techniques are useful to get data by querying users about the system. Users’
answers to the questions help evaluators determine which parts of the system interface
present difficulties for them. Many questionnaires have been proposed serving
various usability evaluation objectives, such as the ones proposed by Shneiderman and
Plaisant (2005).
Several usability evaluation methods are commonly combined in order to obtain a
more comprehensive result on websites usability analysis. A tool for the automation of
several of these methods is EWeb. Further information about this tool can be found in
López (2004).
Interdependence between technical web accessibility and usability 31
3
3 Web accessibility
Designing accessible websites implies universal access to information sources for all
users, especially for people with disabilities. The WWW has a great potential to make life
easier for disabled people and make them less dependent on their relatives or friends
since users can now perform tasks they used to have difficulty fulfilling by themselves
(i.e., shopping, buying tickets, etc.). However, as most websites are not accessible, these
users are faced with design barriers, which do not allow them to access information.
Different initiatives have been taken in order to overcome this situation including the
promulgation, in some countries, of laws against electronic exclusion, such as Americans
with Disabilities Act (1990), Section 508 (1998), eEurope Action Plan (2005), etc. In
general terms, these laws propose the adoption of web accessibility guidelines when
developing web content. As most web designers are not aware of the specific needs of
people with disabilities, these guidelines are helpful to develop accessible web content. In
this sense, the Web Accessibility Initiative (WAI)3 within the World Wide Web
Consortium (W3C),4 has developed many specifications, guidelines, software and tools in
order to achieve the design for all paradigms in the web environment, actively promoting
higher degrees of accessibility for people with disabilities. Several components take
part in web accessibility issues (see Figure 1). Thus, different inclusive guidelines
are proposed:
• Authoring Tool Accessibility Guidelines (ATAG),5 provide guidance for
software developers so that content produced by authoring tools could be
accessible. Authoring tools are software applications used by developers in order
to produce websites and web content. An ATAG guideline example: ‘generate
standard markup’.
• User Agent Accessibility Guidelines (UUAG),6 suggest ways for developing
user agents accessible to all users. User agents are tools such as assistive
technologies, multimedia players and web browsers, which are helpful to achieve
the interaction with computers. An example of a UUAG guideline: ‘support input
and output independence’.
• Web Content Accessibility Guidelines (WCAG)7 are broadly accepted and used
recommendations to produce accessible web content. A WCAG guideline example:
‘provide alternative to auditory and visual content’.
Figure 1 The application context of web accessibility guidelines
314 M. Arrue, I. Fajardo, J.M. López and M. Vigo
These guidelines define specific testing techniques or checkpoints, which refer to
accessibility issues in a more accurate way. For example, WCAG 2.2 checkpoint states
thus: ‘ensure that all information conveyed with colour is also available without colour’.
Depending on the way a checkpoint impacts on the accessibility of a web page, each
checkpoint has a priority assigned (1, 2 or 3, from more to less impact). In addition, based
on these priorities, three conformance levels are defined:
1 Conformance level A: all priority 1 checkpoints are satisfied.
2 Conformance level AA: all priority 1 and 2 checkpoints are satisfied.
3 Conformance level AAA: all priority 1, 2 and 3 checkpoints are satisfied.
Currently, new versions of these guidelines are being produced. The last draft of WCAG
second version was released in April 2006 (http://www.w3.org/TR/WCAG20/) and
proposes a new guideline concept. This set of guidelines incorporate a new accessibility
description as it states the properties an accessible website has to accomplish. Similar to
the previous version of WCAG, each checkpoint defines three priority levels and
analogous conformance levels. According to this description, an accessible website
should fulfil these four guidelines and the checkpoints they include:
1 Make content PERCEIVABLE for any user.
2 Ensure that interface elements in the content are OPERABLE by any user.
3 Make content and controls UNDERSTANDABLE to as many users as possible.
4 Use ROBUST web technologies that maximise the ability of the content to work
with current and future accessibility technologies and user agents.
These guidelines are necessary to ensure an accessible web environment. However,
their manual evaluation is an extremely hard task as the volume of information tends
to be unmanageable in many websites. In this context, automatic accessibility evaluation
tools are useful so that accessible content could be verified and developed. These
tools evaluate web pages according to the previously mentioned guidelines and return
reports in order to help correct the accessibility errors. Despite their similarities, each
tool has some features that make it different from others. Some of those automatic
accessibility evaluation tools are: WebXACT,8 WAVE9 and EvalAccess (Abascal et al.,
2004). Nevertheless, these tools are not enough to ensure web accessibility since
several checkpoints cannot be automatically tested. Then, manual evaluations must be
performed. In addition, evaluations with disabled users should be done to find out
specific problems. These evaluations should rely on accepted usability techniques such as
those proposed by Nielsen and Mack (1994) and Rubin (1994). Moreover, it is necessary
to use the evaluation tools adequately during the web development life cycle as different
approaches could be followed when developing accessible websites. There are two main
different approaches for developing accessible content:
1 Turning a non-accessible website into an accessible one. This process requires a big
effort in human and material resources.
2 Design a new website taking web accessibility into account in every life cycle phase.
This proactive approach has several advantages since it does not imply a rise in
development costs.
Interdependence between technical web accessibility and usability 31
5
Pemberton (2003b) has proposed that accessible web design not only benefits the
disabled community, but also improves website usability, makes them more visible for
web crawlers, makes their adaptation easier to other devices and optimises narrowband
users’ navigation.
Web accessibility quantitative metrics
The previously mentioned WCAG conformance levels or qualitative metrics proposed
by the WAI (AAA, AA, A) are not accurate enough for precisely measuring the
accessibility of a website. For instance, a website fulfilling only all priority 1 checkpoints
would obtain the same accessibility value compared to another website fulfilling all
priority 1 checkpoints and almost all priority two checkpoints: both of them would get the
A level conformance. That shows the limits of the current accessibility measurement
when performing a comparison or rating among a group of websites based on
their accessibility.
These criteria seem to be based in the assumption that if a webpage fails to
accomplish one of the guidelines in a level, it would be so unaccessible as if it failed to
fulfil all of them. That is true for some users, but in general, it is essential to have not
only a reject/accept validation, but a more accurate graduation of the accessibility.
Therefore, the need for quantitative accessibility measurements as suggested by Olsina
and Rossi (2002) is clear.
In this sense, Arrue et al. (2005) propose some quantitative accessibility metrics.
These metrics produce different measures – one for each attributes (Perceivable,
Operable, Understandable and Robust) – described in WCAG 2.0 guidelines, as well as
an average value. Besides, these metrics are automatically calculated based on the
returned data provided by EvalAccess (Abascal et al., 2004).
In the quantitative metrics study, the different requirements are as follows:
• The result of the metric is limited to a specific range of values, from 0 to 100 so that
the results can be expressed in percentage terms. The closer a value is to 0, the less
accessible it is; whereas the closer it is to 100, the more accessible it is.
• The priorities of each checkpoint are also taken into account. The WAI states that
priority 1 checkpoints have more influence on final accessibility than priority 2 ones
and these have more influence than priority 3 ones. This is reflected in the metric by
weighting values empirically as can be seen in Table 2.
• Besides the committed number of errors, potential errors are also measured. For
instance, if we analyse a web page that contains five images without text equivalent
and another one containing ten, where five of them have a text equivalent, the second
web page should obtain better accessibility score, since the failure percentage is
100% (five of five) and 50% (five of ten), respectively. However, we have
empirically tested that in each checkpoint grouped by POUR guidelines, the ratio of
errors over tested cases (E/T) tends to be very low. Most values pile up close to 0.
Thus, it is difficult to discriminate among different pages since they all get similar
accessibility values. Therefore, the interval where the metric results for lowest ratios
of errors (variable E) over tested cases (variable T) are situated has to be spread. The
function with two intersecting straight lines on the right would be an approach to the
ideal hyperbole on the left in Figure 2. In this hyperbole, the closer the error number
316 M. Arrue, I. Fajardo, J.M. López and M. Vigo
and tested cases ratio (E/T) are to 0, the more they will be discriminated. The
advantage of this approach is that the value of x’ can be empirically assigned, in
order to easily control the height allocated to E/T. This feature makes it possible to
increase or decrease the variability in any interval depending on the experimental
results obtained by modifying variables a and b.
According to the hyperbole approach, if the E/T ratio is less than the intersection point x’,
the accessibility will be calculated using the S line. Otherwise, the V line is used. The x’
value depends on variables a, b and the tested cases.
Figure 2 Ideal hyperbole and its approach
Figure 3 Intersection point, V line and S line calculation in the approach to the hyperbole
100
'100
a
xa
Tb
−
=
−
x’ point calculation
100 100AE b
−
⎛⎞
=× +
⎜⎟
⎝⎠
S line formula
a
A
Ea
T
−
⎛⎞
=
×+
⎜⎟
⎝⎠
V line formula
Interdependence between technical web accessibility and usability 31
7
The following table contains a description of variables, constants and final values of
the metric:
Table 2 Variables, constants and final values for metric calculation
Variables Description Range
E Number of accessibility errors in each guideline 0–∞
T Number of tested cases in each guideline 0–∞
a Variable for customisation the hyperbole approach 0–100
b Variable for customisation the hyperbole approach 0–1
Constants Value
N Total number of evaluable checkpoints by EvalAccess 42
Nx Number of checkpoints in each x ∈ {P, O, U, R} guideline
NP Error checkpoints in Perceivable 15
NO Error checkpoints in Operable 4
NU Error checkpoints in Understandable 9
NR Error checkpoints in Robust 14
Weights
k1 Priority 1 items 0.80
k2 Priority 2 items 0.16
k3 Priority 3 items 0.04
Metric Range
Axy
Accessibility of guidelines from x ∈ {P, O, U, R} and with priority
y ∈ {1, 2, 3} 0–100
Ax Accessibility of x ∈ {P, O, U, R} guidelines 0–100
A Total accessibility 0–100
Using the evaluation reports returned by EvalAccess, let us gather the necessary data
for metric calculation, such as the times a checkpoint is tested (T variable), the times
each test fails to be conformant with the guidelines definition (E variable) and its priority.
All these parameters are grouped into 12 subgroups classified by their priority
(three priorities) in WCAG 1.0 and their membership in the WCAG 2.0 four POUR
guidelines according to the mapping proposed by the WAI.10 Therefore, the quantitative
accessibility metric takes into account the previously mentioned requirements. While
some of them are explicit in the algorithm, others are in the results:
for x in each checkpoint in a guideline {P,O,U,R} loop
for y in each priority{1,2,3} loop
x’=calculate_x’_point(a,b,T)
if
'
Ex
T
⎛⎞
<
⎜⎟
⎝⎠
then
A
xy = calculate_accessibility_value_with_S_line(b, E)
else
A
xy = calculate_accessibility_value_with_V_line(a, E, T)
318 M. Arrue, I. Fajardo, J.M. López and M. Vigo
end if
end loop
3
1
xyxy
y
A
kA
=
=×
∑Í Values for POUR guidelines
end loop
xx
x
NA
AN
×
=
∑
Í Average value for the page
These metrics proved to be correlated with some experts’ classification of Spanish
universities’ websites according to their accessibility level as presented in Arrue et al.,
2005. Since EvalAccess has a spider implementation, the accessibility of a whole website
can be evaluated. Therefore, an overall accessibility value can be defined for a website.
The spider evaluates all the pages within a site and all reports are gathered and the
previously explained metric is calculated for each page. However, all the pages in a site
are not equally weighted. The deeper a page is in the hierarchy of the site, the less impact
it has in the final value. This is illustrated by the next formula:
i
i
Site i
i
Ae
Ae
−
−
×
=
∑
∑
Where i is the depth index, starting from 0 for the home page. Thus, the accessibility to
the home page is weighed by 1.00, the first depth level by 0.37, the second depth by 0.14,
the third one by 0.05, etc.
4 Accessibility versus usability
Web professionals and researchers often mix web accessibility and usability terms and
use them as if they were of the same concept. This could happen owing to the
overlapping of the guidelines of both properties, e.g.: ‘use the clearest and simplest
language appropriate for a site’s content’. This principle can be fulfilled for ensuring
both accessibility and usability. Therefore, the boundary between them is not clear.
Contributing to the confusion, some standards such as ISO/TS 16071 (2003) define
accessibility in terms of usability: “accessibility is the usability of a product, service,
environment or facility by people with the widest range of capabilities”. In addition,
Thatcher et al. (2002) state that “web accessibility is a subset of usability although both
concepts tend to be dealt with separately”. Thus, a usable website should be accessible
but an accessible website does not have to be usable. In fact, while web accessibility aims
at making websites available to the broader spectrum of users, usability focuses
on efficiency, learnability and satisfaction of such users, as stated by Gulliksen et al.
(2004). On the other hand, Hoffman et al. (2005), contrary to an existing tendency stating
that accessibility benefits all users (i.e., Pemberton, 2003a), claim that sometimes
accessibility improves usability, while other times it has no impact and in some
other occasions it can even decrease general usability. Unfortunately, no empirical data
are attached.
Interdependence between technical web accessibility and usability 31
9
However, the conceptual overlapping between accessibility and usability is only
partially reflected in the methodology used for evaluating both properties. That is,
although they are defined in similar terms, their evaluations are covered differently.
Web usability is usually evaluated by experts or testing methods by users while web
accessibility is evaluated by guidelines review. As mentioned previously, the term
‘technical accessibility’, refers to fulfilment of accessibility guidelines and could be
regarded as a system property.
In addition to technical accessibility fulfilment, experts suggest the evaluation by
users in order to perform a comprehensive accessibility evaluation of a website. This
could be understood as a usability test of an accessible site. It could also be understood as
the last phase of accessibility evaluation or as the usability evaluation of an accessible
site. In this sense, some studies have been carried out. Theofanos and Redish (2003)
propose some usability guidelines for both blind users who use screen readers and
screen reader developers, concluding that technical accessibility defined by Section 508
is not enough to ensure full accessibility. According to Leporini and Paternò (2004), in
the mere fulfilling of technical accessibility for vision-impaired users, some usability
problems arise, such as lack of context, information overloading and excessive
sequencing in reading the information. These conclusions were reached after testing it
empirically with blind and other vision-impaired users. On the other hand, Sullivan and
Matson (2000) have found another pattern of results, concluding that some web
accessibility metrics softly correlate with web usability metrics. In this study, some
websites were evaluated using both automatic accessibility and usability evaluation tools
and obtained results were statistically correlated.
Therefore, owing to inconclusive empirical evidence, the impact of web accessibility
in usability is not very clear. For that reason, one of the main goals of this paper
is to clarify the relationships between technical accessibility and usability for users
without disabilities.
5 Empirical study
5.1 Objectives
Our generic goal was to determine whether diverse accessibility and usability metrics
correlate. Concretely, our hypothesis was that the more accessible websites were
measured in terms of the POUR metrics of Arrue et al. (2005), the more usable they
would be for general users in terms of target found, response time, disorientation
and satisfaction.
5.2 Participants
Twenty volunteers participated in the experiment (14 women and 6 men) whose average
age was 25 years old and whose mother language was Spanish. Participants were
Computer Science students from the University of Basque Country who received
monetary remuneration for their participation.
320 M. Arrue, I. Fajardo, J.M. López and M. Vigo
5.3 Material
Ten websites were selected (see Table 3). Since websites’ content changes dynamically
and the experiments were planned to be performed in different days, the content of each
website was downloaded in order to ensure that all participants worked with the same
version of the websites. Since the users’ mother language was Spanish, we selected
websites in this language. It is also relevant to say that the websites were heterogeneous:
general information magazine, online shop, gastronomic, university, corporative,
institutional websites, etc.
5.4 Used measures
5.4.1 Accessibility measures
In order to measure the accessibility levels of the ten websites, we checked if they
fulfilled the web accessibility guidelines-WCAG. The WCAG guidelines were selected
because they are used as standards in the European Community where the study was
carried out. After that, with the aim of obtaining global and partial scores of accessibility
for each website, we applied, respectively:
• Global accessibility metrics: As (for the entire website) and A (for the homepage)
(Arrue et al., 2005).
• Partial accessibility metrics: P, O, U, R (Arrue et al., 2005), which calculate the
accessibility as a function of the categories of Perception, Operability
Understandability and Robustness proposed by the WAI.
5.4.2 Usability measures
The measures provided by ISO-9241-11 (effectiveness, efficiency and satisfaction) were
used in the experiment. Effectiveness was measured by means of the percentage of goals
achieved, and the percentage of targets that the users found in each website.
As far as efficiency is concerned, two measures were used. The first one was the
average time that users invested to complete tasks in each website. The second efficiency
measure was disorientation, defined as participants’ deviation from the optimum path.
The disorientation was measured with the Lostness formula proposed by Smith (1996):
22
(/ 1) (/ 1)LNS RN=−+−
where:
N = number of different nodes visited
S = number of total nodes visited
R = minimum number of nodes needed to fulfil the target search
L increases as Lostness increases, and for the perfect search task L equals 0.
In order to measure satisfaction, we built a computerised and online version of the
After-Scenario Questionnaire-ASQ (Lewis, 1995). The questionnaire was composed of
three items:
Interdependence between technical web accessibility and usability 321
1 Overall, I am satisfied with the ease of completing the tasks in this scenario.
2 Overall, I am satisfied with the amount of time it took to complete the tasks in
this scenario.
3 Overall, I am satisfied with the support information (online help, messages,
documentation) when completing the scenario.
Participants were asked to answer this questionnaire for each website in a 7-point Likert
scale ranging from 1 (disagree) to 7 (agree).
5.5 Instruments
The experiment was performed using EWeb, a framework for designing and
implementing controlled experiments in web environments. This framework is composed
of three different modules. In the Session Design Module, an experimenter can design an
experiment by introducing the different study variables, as well as tasks and their
execution order, that users have to perform. It provides an XML-based specification file
describing this information as an output. The Session Monitor and Guidance Module
executes the experimental session in users’ local computer. The Session Analysis Module
performs the analysis of collected user data.
An XML-based specification file for the experimental design was created using the
Session Design Module in order to define user sessions. All different variables used in
this experiment were coded in this design. This specification file was later used by the
Session Monitor and Guidance Module, which, based on the information, distributed the
different tasks users had to perform. Users had to fill in demographic and satisfaction
questionnaires and web searching tasks. A demographic questionnaire was launched in
the beginning of the session and a satisfaction questionnaire for each website at the end
of web search tasks. User navigation was monitored using a client-proxy within this
module. All users’ interaction data were stored in a remote database for later analysis.
Finally, the Session Analysis Module gathered information from all users’ sessions
stored in the database so that existing usability metrics could be applied for analysing
users’ navigation. Website homepage accessibility information and parameters, such
as number of words, images and links, were also provided to refine the analysis of
users’ interactions.
5.6 Procedure
Firstly, the participants read the general instructions of the experiment and filled in
the demographic questionnaire. The sequence of websites was different for each user as it
was randomly obtained. Participants were asked to fulfil six web search tasks in each
website. These tasks were also randomly selected. Concretely, two of the targets were
reachable from links located in the homepage, so they were in the first depth level. Two
other targets were located in the second depth level and the remaining two targets on
the third. Participants had one minute to perform each task. After completing all tasks
in each site, participants filled in a satisfaction questionnaire. The session lasted
approximately one hour.
322 M. Arrue, I. Fajardo, J.M. López and M. Vigo
5.7 Results
To confirm our hypothesis, that is, the more accessible the websites were (measured with
the POUR metrics of Arrue et al., 2005), the more usable they would be for
general users in terms of target found, response time, disorientation and satisfaction, this
section presents descriptive and inferential data. The data from the Greenpeace site were
removed from the statistical analysis because they were not correctly logged.
5.7.1 Descriptive analyses
Accessibility metrics
The first column in Table 3 shows the websites evaluated in the experiment and the
following columns show the values of different metrics in each website. As we said
before, As represents the global accessibility of each site. The succeeding values
represent the average (A) accessibility value of each homepage and its respective WCAG
2.0 guidelines: P for Perceivable, O for Operable, U for Understandable and R for
Robust. The last row in Table 3 shows the means of accessibility for all websites
calculated for each metric.
Table 3 Accessibility percentage of each website calculated with different global (As, A) and
categorical metrics (P, O, U, R)
Site As A P O U R
W3C 88 86 84 80 85 100
LOT 87 86 86 81 72 100
OPT 76 38 11 80 78 0
GIP 72 79 81 80 23 100
UGR 36 26 58 9 12 6
KAR 35 39 14 77 66 11
ADI 33 28 56 0 85 1
LAC 23 21 27 10 61 10
CON 15 14 10 10 50 10
Mean 52 46 47 47 59 38
Usability metrics
Means and standard deviations for effectiveness and efficiency measures can be seen in
Table 4. According to the effectiveness metric (percentage of targets found), the most and
less usable websites were OPT and ADI, respectively.
Interdependence between technical web accessibility and usability 32
3
Table 4 Means and standard deviations for effectiveness and efficiency metrics
Percentage of targets found (%) Response time (ms) Lostness
Sites Means
Standard
deviations Means
Standard
deviations Means
Standard
deviations
W3C 40 20 21511 9505 0.3 0.1
LOT 60 20 13696 6394 0.2 0.1
OPT 91 10 15364 3926 0.2 0.0
GIP 70 16 15920 7226 0.2 0.0
UGR 54 18 22391 5645 0.2 0.1
KAR 46 19 19265 6809 0.4 0.1
ADI 43 17 15689 9440 0.3 0.1
LAC 56 19 16040 6119 0.2 0.1
CON 63 26 22006 6128 0.5 0.1
In order to obtain a global satisfaction score for each subject and website, we calculated
the means for the three items of the ASQ (Table 5). Furthermore, OPT was the site which
had the most satisfied participants, while W3C and ADI were the least satisfying ones.
Table 5 Means and standard deviations of users’ satisfaction for each website
Websites Means Standard deviation
OPT 5.92 0.73
GIP 4.68 1.14
LOT 4.22 1.33
LAC 4.10 1.36
CON 4.02 1.65
KAR 3.92 1.05
UGR 3.63 1.47
ADI 2.80 1.33
W3C 2.63 1.22
Valid N 20 20
However, as it can be observed in Figure 4, the most and least accessible sites according
to A measure were W3C and CON, respectively. In fact, W3C is the most accessible but
also the least satisfying site. This preliminary descriptive data indicates that accessibility
and usability metrics are not correlated, at least in the same way that we had predicted.
Therefore, we performed some inferential analyses with the aim of testing the relation
between the two variables.
324 M. Arrue, I. Fajardo, J.M. López and M. Vigo
Figure 4 Websites quantitative accessibility
5.7.2 Inferential analyses
Firstly, each site was ranked on each variable (accessibility and usability measures).
Then, we performed several Kendall’s Concordance Tests to analyse the agreement
between accessibility and usability rankings of websites. This test provides a concordance
coefficient (CC), which ranges between 0 and 1, where 0 means lack of agreement and 1
means total agreement. Therefore, based on our hypothesis, if accessibility and usability
are correlated, the CC between both types of metrics should be close to 1. The
concordance declared as significant was p < 0.05.
As can be seen in Table 6, the concordance between A and efficacy (found target
percentage), effectiveness (time and Lostness) and satisfaction (S111) was not significant,
that is, contrary to our prediction, accessibility and usability metrics are not correlated
and provide different websites ranking. We also compared the accessibility attributes
(POUR) and the efficacy measure (target found). Although concordance was not
significant in any case, we appreciated that the correlation between efficacy and O
(Operability) is higher than with the rest of attributes.
Therefore, based on our results, technical web accessibility is not a good usability
predictor. We analysed the correlation between other measurable site variables and the
performance and satisfaction measures. These variables were the number of images,
number of different words, number of links and the proportion of links per word in the
homepage (see Table 7). The results showed that the number of images correlated
positively with the disorientation (r = 0.73, p = 0.02), that is, the more images in the
homepage, the more disoriented the users became. On the other hand, the number of links
correlated positively with the time invested to find a target (r = 0.70, p = 0.03). This
ATargets found
0
10
20
30
40
50
60
70
80
90
100
W3C LOT OPT GIP UGR KAR ADI LAC CON
Interdependence between technical web accessibility and usability 32
5
means that the more links a homepage has, the slower the users were performing web
search tasks. Finally, the percentage of link per word was positively correlated with the
users’ satisfaction (r = 0.63, p = 0.05).
Table 6 Concordance coefficient pair of metrics
Concordance coefficient Average rank r p
S1-S2-S3 0.9 0.9 0.01
A-% targets found 0.4 –0.2 0.6
A-Time 0.4 –0.3 0.7
A-Lostness 0.5 –0.1 0.5
A-S1 0.4 –0.2 0.6
P-% targets found 0.3 –0.3 0.7
O-% targets found 0.7 0.4 0.2
U-% targets found 0.3 –0.4 0.8
R-% targets found 0.4 –0.1 0.6
As-A 0.9 0.9 0.06
As-% targets found 0.5 0.0 0.46
As-Time 0.4 –0.2 0.59
As-Lostness 0.4 –0.3 0.68
As-S1 0.4 –0.1 0.52
Table 7 Number of images, words, links and percentage of links per word for each website
Site
Homepage (HP)
image number HP word number HP link number
Links/words
percentage in HP (%)
W3C 4 2628 167 6
LOT 19 359 58 16
OPT 26 187 56 30
GIP 8 394 63 16
UGR 27 387 87 23
KAR 29 358 73 20
ADI 29 206 32 16
LAC 9 486 34 7
CON 46 576 114 20
GRE 9 211 43 20
6 Discussion and conclusions
There is a generalised tendency to believe that “Improving accessibility improves
usability for all users” (Theofanos and Redish, 2003). However, based on our results, we
can conclude that technical accessibility and usability metrics used in this experiment are
not correlated. For example, the most accessible website (W3C) did not guarantee that
users were more efficient, effective and satisfied. Therefore, based on these empirical
326 M. Arrue, I. Fajardo, J.M. López and M. Vigo
results, we suggest that this generalised assumption might not be precise. As Leporini and
Paternó (2004) suggest, the origin of the lack of concordance may be that the solutions
implemented for making possible the access of all users to websites create new and
qualitatively different problems. That is, accessibility solutions solve a problem, which
create other problems of a diverse nature.
An alternative explanation of the results obtained in this study is that the lack of
concordance between accessibility and usability metrics could be due to the lack of
internal validity of the accessibility metrics used (A, P, O, U and R). Even if these
metrics proved to be correlated with some experts’ classification of Spanish universities’
websites according to their accessibility level, as presented by Arrue et al. (2005), they
have not yet been validated empirically. Currently, we are carrying out an empirical
validation of these metrics in order to ensure that A, P, O, U and R are not evaluating a
different phenomenon. Anyhow, although we cannot ensure the validity of A, P, O, U
and R until this moment, we can say that the presented study has served to compare the
set of metrics proposed. Surprisingly, the CC is close to 1 in the case of A versus As,
which means that both measures are equally valid or invalid to measure accessibility.
Moreover, this result may be interpreted as a support to the suggestion of Nielsen and
Tahir (2000) that the assessment of the homepage is enough to know the global
accessibility or usability of a website.
All things taken, we consider that web accessibility must be contemplated in
all quality models developed for evaluating websites’ quality. The 2QCV3Q quality
model (Mich et al., 2003) includes accessibility as an attribute of usability dimension,
which means that accessibility measures will be used, jointly with other attributes
measures, in order to predict the usability of the product. The WebQEM quality model
(Olsina and Rossi, 2002) takes into account web accessibility attribute as a measure for
predicting the efficiency of the product. Based on our study results, we believe that
accessibility should be inserted in quality models as a new dimension or property of the
product that must be measured, since it cannot always predict the usability or efficiency
of the product. In addition, development of specific quantitative metrics for web
accessibility turns into an essential issue when there is the need for evaluating or
classifying websites according to their quality, so our future work will lead to improving
existing metrics in order to obtain empirically validated metrics, which could be easily
integrated into website quality models.
References
Abascal, J., Arrue, M., Fajardo, I., Garay, N. and Tomás, J. (2004) ‘Use of guidelines to
automatically verify web accessibility’, International Journal on Universal Access in the
Information Society (UAIS), Springer Verlag, http://sipt07.si.ehu.es/evalaccess/index.html,
Vol. 3, No. 1, pp.71–79.
Americans with Disabilities Act (1990) http://www.usdoj.gov/crt/ada/adahom1.htm.
Arrue, M., Vigo, M. and Abascal, J. (2005) ‘Quantitative metrics for web accessibility evaluation’,
Proceedings of The ICWE 2005 Workshop on Web Metrics and Measurement, University of
Wollongong School of IT and Computer Science.
Avouris, N.M. (2001) ‘An introduction to software usability’, Proceedings of the 8th Panhellenic
Conference on Informatics, Vol. 2, pp.514–522.
Basili, V.R. and Weiss, D. (1984) ‘A methodology for collecting valid software engineering data’,
IEEE Transactions on Software Engineering, Vol. 10, No. 6, pp.728–738.
Interdependence between technical web accessibility and usability 32
7
Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K. and Harshman, R.A. (1990) ‘Indexing
by latent semantic analysis’, Journal of the American Society of Information Science, Vol. 41,
No. 6, pp.391–407.
Dumas, J.S. and Redish, J.C. (1993) A Practical Guide to Usability Testing, Norwood.
eEurope Action Plan (2005) An Information Society for All, http://europa.eu.int/information
_society/eeurope/2005/index_en.htm.
Fenton, N.E. and Pfleeger, S.L. (1997) Software Metrics: a Rigorous and Practical Approach,
2nd ed., PWS Publishing Company.
Gulliksen, J., Andersson, H. and Lundgren, P. (2004) ‘Accomplishing universal access through
system reachability – a management perspective’, International Journal Universal Access in
the Information Society (UAIS), Springer Verlag, Vol. 3, No. 1, pp.96–101.
Hoffman, D., Grivel, E. and Battle, L. (2005) ‘Designing software architectures to facilitate
accessible web applications’, IBM Systems Journal, Vol. 44, No. 3, pp.467–483.
ISO/IEC 9126-1 (2001) Software Engineering – Product Quality – Part I: Quality Model,
International Organization for Standardization.
ISO 9241-11 (1998) Ergonomic Requirements for Office Work with Visual Display Terminals
– Part II: Guidance on Usability, International Organization of Standardization.
ISO/TS 16071 (2003) Ergonomics of Human-System Interaction: Guidance on Accessibility for
Human Computer Interfaces, International Organization for Standardization.
Kitajima, M., Blackmon, M.H., Polson, P.G. and Lewis, C. (2002) ‘AutoCWW: Automated
Cognitive Walkthrough for the Web’, Human Interface Symposium.
Landauer, T.K. (1995) The Trouble with Computers: Usefulness, Usability and Productivity,
MIT Press.
Leporini, B. and Paternò, F. (2004) ‘Increasing usability when interacting through screen readers’,
International Journal Universal Access in the Information Society (UAIS), Springer Verlag,
Vol. 3, No. 1, pp.57–70.
Lewis, J.R. (1995) ‘IBM computer usability satisfaction questionnaires: psychometric evaluation
and instructions for use’, International Journal of Human-Computer Interaction, Vol. 7,
No. 1, pp.57–78.
López, J.M. (2004) ‘Development of a tool for the design and analysis of experiments in the web’,
Proceedings of the 5th Spanish Human Computer Interaction Conference, Interacción 2004.
Mendes, E., Mosley, N. and Counsell, S. (2006) ‘The need for Web Engineer: an introduction’,
Web Engineering, Springer, pp.1–27.
Mich, L., Franch, M. and Gaio, L. (2003) ‘Evaluating and designing the quality of web sites’, IEEE
MultiMedia, Vol. 10, No. 1, pp.34–43.
Nielsen, J. (1993) Usability Engineering, Academic Press.
Nielsen, J. and Mack, R. (1994) Usability Inspection Methods, John Wiley & Sons.
Nielsen, J. and Tahir, M. (2000) Homepage Usability: 50 Websites Deconstructed, New
Riders Publishing.
Olsina, L. and Rossi, G. (2002) ‘Measuring web application quality with WebQEM’, IEEE
Multimedia, Vol. 9, No. 4, pp.20–29.
Pemberton, S. (2003a) ‘Accessibility is for everyone’, ACM Interactions, Vol. 10, No. 6, pp.4–5.
Pemberton, S. (2003b) ‘The kiss of the spiderbot’, ACM Interactions, Vol. 10, No. 1, p.44.
Preece, J., Rogers, Y., Sharp, H., Benyon, D., Holland, S. and Carey, T. (1994) Human–Computer
Interaction, Addison-Wesley.
Rubin, J. (1994) Handbook of Usability Testing, John Wiley & Sons.
Sano, D. (1996) Designing Large-Scale Web Sites: A Visual Design Methodology, John Wiley
& Sons.
Section 508 (1998) The Road to Accessibility, http://www.section508.gov.
328 M. Arrue, I. Fajardo, J.M. López and M. Vigo
Shneiderman, B. and Plaisant, C. (2005) Designing the User Interface: Strategies for Effective
Human Computer Interaction, 4th ed., Addison-Wesley.
Smith, P.A. (1996) ‘Towards a practical measure of hypertext usability’, Interacting with
Computers, Elsevier Science, Vol. 8, No. 4, pp.365–381.
Sullivan, T. and Matson, R. (2000) ‘Barriers to use: usability of content accessibility on the web’s
most popular sites’, ACM Conference on Universal Usability 2000, pp.139–144.
Thatcher, J., Waddell, C.D., Henry, S.L., Swierenga, S., Urban, M.D., Burks, M., Regan, B. and
Bohman, P. (2002) Constructing Accessible Web sites, Glasshaus.
Theofanos, M.F. and Redish, J. (2003) ‘Bridging the gap, between accessibility and usability’,
ACM Interactions, Vol. 10, No. 6, pp.36–51.
Top of the Web (2003) Survey on Quality and Usage of Public e-Services,
http://www.topoftheWeb.net/docs/Final_report_2003_quality_and_usage.pdf.
Top of the Web (2004) User Satisfaction and Usage Survey of eGovernment Services,
http://europa.eu.int/information_society/activities/egovernment_research/doc/top_of_the_Web
_report_2004.pdf.
Notes
1 http://zing.ncsl.nist.gov/WebTools/WebCAT/overview.html
2 http://www.Websort.net
3 http://www.w3.org/WAI/
4 http://www.w3.org/
5 http://www.w3.org/WAI/intro/atag.php
6 http://www.w3.org/WAI/intro/uaag.php
7 http://www.w3.org/WAI/intro/wcag.php
8 http://Webxact.watchfire.com/
9 http://www.wave.Webaim.org
10 http://www.w3.org/TR/WCAG20/appendixD.html
11 Since the concordance among S1, S2 and S3 was significant (see Table 6), we used only one
of them (S1) to compare with the rest of accessibility and usability metrics.
Appendix URLs of the tested websites
Identifier URL
W3C http://www.w3c.es
LOT http://www.lotura.com
OPT http://www.optica-online.com
GIP http://www.gipuzkoa.net
UGR http://www.ugr.es
KAR http://www.karlosarguinano.com
ADI http://www.adicciones.com
LAC http://www.lacunza.es
CON http://www.consumer.es