Classifying costs and effects of poor Data Quality – examples
Sheffield Business School, Sheffield Hallam University
Sheffield, South Yorkshire
+44 7967 477336
Dublin City University
Dublin 9, Ireland
+353 1 700 8727
Sheffield Business School, Sheffield Hallam University
Sheffield, South Yorkshire
+44 114 2253157
This study has highlighted the importance of information, the costs and consequential effects
associated with poor quality data and the benefits and value that can be derived from
implementing data quality improvement initiatives. This paper also provides a taxonomy which
may be used to classify costs relating to both the consequences of low quality data and the
costs of improving and assuring on-going data quality. In addition there is framework for
analysing the business value of data quality. Finally a data governance model is proposed
centring on three inter-related fundamental elements namely: People, Processes and Data,
where any attempt to improve the quality of data within any organisation must be focussed
around these three essential elements
Keywords: Data Quality, IT Governance, Cost, Benefits, IT Management
A realisation has grown that organisations that are able to collect, analyse and act on data in
a strategic manner, are in a position to gain a competitive advantage within their industries,
leading in some cases to domination in these areas (Davenport 2006). This form of
information management known as ‘analytics’ stresses that successful organisations are
those that take action from their information to inform their strategic decision making
Davenport (1998); Davenport, Harris, De Long and Jacobson (2001); Davenport (2006);
Davenport and Harris (2007) Davenport (2009), establishing along the way a ‘fact-based
culture’ (Harris 2005a; Harris 2005b; Harris 2007). If this ever expanding focus on ‘intelligent’
business intelligence and management information is so crucial to organisational strategy,
then the requirement to have quality data becomes even more paramount in manufacturing
planning Gustavsson and Wanstrom (2009: 326) as well as information systems.
Cost of poor Data Quality
Extensive literature has identified the high costs of low quality data and the cost of poor data
quality recognising that firms may lose upwards of 10% of revenues due to poor operational
data, together with other serious consequential effects relating to tactical decision making and
strategy generation. In our previous work, Eppler and Helfert (2004), we have proposed
taxonomy for data quality costs (Figure 1).
This taxonomy contains two types of major costs, (1) costs caused by low data quality and (2)
costs of improving or assuring data quality. The costs caused by low data quality are divided
into direct and indirect costs. Direct costs are those negative monetary effects that are raised
directly out of low data quality, namely the costs of verifying data because it is of questionable
credibility, the costs of re-entering data because it is wrong or incomplete, and the costs of
compensation for damages to others based on the incorrect data. On the other hand, indirect
costs are those negative monetary effects that arise, through intermediate effects, from low
quality data. Such indirect costs are loss of a price premium because of a deteriorating
reputation, the costs incurred by sub-optimal decisions based on bad data, or the investment
costs that have to be written off because of low quality data. In terms of the costs evoked by
improving data quality, we distinguish among prevention, detection, and repair costs Kim and
Choi (2003. Prevention costs refer to the costs that are used to prevent any possible effects
caused by data quality problems, for example, costs of training data quality staff or costs of
developing data quality standard. Detection costs are the costs related to data quality analysis
and profiling. These costs are generated when analysing and reporting the source of the data
quality problems. In addition, repair costs mean the costs in data quality improvement
activities such as repair planning costs and implementation costs.
Figure 1: Data Quality Cost Taxonomy (Eppler and Helfert, 2004)
Extensive literature has identified the high costs of low quality data and the cost of poor data
quality (COPDQ) (Redman 1995; English 1998; Redman 1998; Loshin 2001; Redman 2002;
Redman 2004; English 2009). Redman (2001: Table 8.1) identified that firms may lose
upwards of 10% of revenues due to poor operational data, together with other serious
consequential effects relating to tactical decision making and strategy generation. A report
from The Data Warehouse Institute back in 2002 estimated that data quality problems costs
US business $600 billion a year (5% of the American GDP) in postage, printing and staff
overhead costs alone, whilst the majority of the senior managers in those companies affected
remained unaware (Eckerson 2002). A report published jointly by Dun and Bradstreet and the
Richard Ivey School of Business in 2006 forecasted that critical data within at least 25% of the
Fortune 1000 companies would continue to be inaccurate and that “every business function
will have direct costs associated with poor data quality”, whilst a survey conducted by the
Economist Intelligence Unit in the same year on behalf of SAP and Intel reported that 72% of
the survey respondents said their data was sometimes inconsistent across departments and
that workers frequently made poor decisions because of inadequate data. More recently Larry
English outlined a catalogue of corporate disasters emanating from poor quality business
information amounting to ‘One and a Quarter Trillion Dollars’ (English 2009). During 2009 a
survey of 193 organisations sponsored by Pitney Bowes, 39% of which had revenues in
excess of US $1 billion, reported that a third of the respondents rated their data quality as
poor at best, whilst only 4% reported it as excellent Information Difference (2009). A Gartner
(one of the world’s leading IT research organisations) report stated that “Through 2011, 75%
of organisations will experience significantly reduced revenue growth potential and increased
costs due to the failure to introduce data quality assurance” (Fisher 2009).
Benefits and Value of Data Quality
The second element to be considered when evaluating data quality is its impact and value to
the business. The aspect of business value in relation to IS has been discussed in numerous
papers. For instance, Gustafsson et al. (2009) have presented a comprehensive model that
aims to explain the business impact with three generic elements: IT, organizational impact,
and business value. This model serves as background model for data quality impact. Other
related frameworks have been presented in the literature aiming to refine this generic model
(Borek et. al., 2011).
The model is supported by strong evidence that data quality has a considerable effect on
decision-making in organizations. This section will therefore focus on the data quality value in
decision-making. For instance, Keller and Staelin (1987) indicate that increasing information
quantity impairs decision effectiveness and, in contrast, increasing data quality improves
decision effectiveness. Jung et al. conducted a study to explore the impact of representational
data quality (which comprises the data quality dimensions interpretability, easy to understand,
and concise and consistent representation) on decision effectiveness in a laboratory
experiment with two tasks with different levels of complexity (Jung and Olfman, 2005).
Furthermore, Ge and Helfert (2008) show that the improvement of data quality in the intrinsic
category (e.g. accuracy) and the contextual category (e.g. completeness) can enhance
The benefits and improvements derived from the implementation of data quality initiatives are
often the inverse of the problems and issues and the consequential costs discussed above.
Eckerson (2002) cites tangible and intangible benefits particularly within financial institutions.
For example one medium-sized organisation generated annual cost savings in excess of
$130,000 from an initial outlay of $70,000 ($40,000 for software and $30,000 for data
cleansing). In another case a bank is claiming cost reductions of $100,000 per year in
postage, printing and staff costs arising from faulty delivered mail, by tackling incorrect
customer address issues. (Eckerson 2002). In a further example a global leader in online
legal, business and new information estimates it saves $1 million from reduced mailing costs
after addressing issues around duplicate customer names and invalid addresses (Fisher
2009). The Data Warehouse Institute survey Eckerson (2002) also identified that defective
data causes a litany of problems. These are summarised in Table 1.
Extra time to reconcile data 87%
Loss of credibility in a system 81%
Extra costs- (eg duplicate mailings) 72%
Customer dissatisfaction 76%
Delay in systems deployment 64%
Lost revenue 54%
Compliance issues 38%
Table 1. Problems of defective data (Eckerson, 2002)
In order to investigate the business value of data quality, we follow IS/IT business value
studies that show how IS/IT impacts on business processes and/or decision-making. A
business process can be defined “a specific ordering of work activities across time and place,
with a beginning, an end, and clearly identified inputs and outputs: a structure for action”
(Davenport, 1993). Porter and Millar argue that activities that create value consist of a
physical and an information-processing component and each value activity uses information
(Porter & Millar, 1985). In their integrative model of IS/IT business value, Mooney et al. (1996)
propose a process framework for assessing the IS/IT business value. They present a typology
of processes that subdivides business processes into operational and management
processes and argue that IS/IT creates business value as it has automational, informational,
and transformational effects on the processes. Similarly, Melville et al. (2004) see business
processes and business process performance as the key steps that link IS/IT resources and
complementary organizational resources to organizational performance. Data can be seen as
an important organizational asset as well as resource. Its quality is directly related to business
value and organizational performance.
In addition to measuring the effect on business processes, organizational performance has
always been of consideration to IS/IT researchers and practitioners, resulting in a plethora of
performance related contributions. Earlier approaches focused, for example, on the economic
value of information systems (Van Wegen & De Hoog, 1996). They were more recently
detailed to frameworks for assigning the impact of IS/IT to businesses (Mooney, Gurbaxani &
Kraemer 1996; Melville, Kraemer & Gurbaxani, 2004). These IS/IT oriented frameworks have
resulted in an abundance of recommendations, frameworks and approaches for performance
measurement systems (Folan & Browne, 2005).
It has been recognized that there are two perspectives on value: objective and perceived
value, which results in different data quality and value measures and value perceptions for
particular stakeholders (Fehrenbacher & Helfert, 2012). To evaluate the value of data quality
and develop suitable indicators, we suggest combining the work on business processes and
decision quality with the work on performance indicators, developing a framework for
analyzing business value of data quality. This is illustrated in Figure 2. The value propositions
of data quality are manifold. It ranges from generating direct business value by providing
information of high quality, reducing complexity, improving customer loyalty, improving
operational performance, reducing costs and so on. Due to its critical importance in
enterprises, data quality affects many areas of organizational performance and may deliver
business value simultaneously to stakeholders.
Figure 2: Framework for analyzing business value of data quality
This article has provided illustrations from the literature to highlight examples of the costs of
poor data quality and consequential benefits of related improvement programmes. A further
example of the effects of such an initiative may be seen from a practical data quality
improvement programme allied to an academic study carried out in collaboration with an
industry partner. This organisation, a multi-business manufacturing enterprise operating
across sixty three factories and offices within the United Kingdom, initiated a data quality
improvement programme in 2006 and over the subsequent five years the quality of its overall
data as measured by a weighted KPI index showed an overall improvement of 59%. Looking
at the cost taxonomy, whilst there was no detailed analysis undertaken as to the detailed
financial effects of the underlying data quality problems, the improvement initiative was
undertaken by existing staff using existing resources, applying quality principles which
evolved during the overall process. These were basically ‘sunk costs’ in that there were little
or no marginal incremental costs incurred as a direct consequence of the overall initiative.
Whilst it could be argued that such resources could have been applied in other areas of the
business, the overall effects upon the business mean that data was identified as a major
organisational resource and asset. During the period the overall operating results improved by
37%, with a 52% improvement in operational order efficiency across purchasing,
manufacturing and sales/despatches. In addition the underlying problems in processing
supplier invoices and successfully resolving customer invoice issues improved by 72% and
53% respectively. Whilst the links between the data quality initiative and improved financial
position could be somewhat tenuous, it was widely acknowledged within the organisation that
the operational improvements were a direct result of the programme.
A similar study conducted more recently on a large quasi-public sector organisation has again
highlighted the costs of poor data quality. The organisation, used to be one of the largest
public sector organisation has recently been privatised and has faced numerous problems
relating to data quality whilst providing its services. The study conducted in the form of focus
groups, highlighted a number of key themes relating to data quality. The main themes
identified are given as follows,
Firstly, in the discussion among the cross section of the work force, it was noted that data and
information governance were of low priority. Employees’ awareness of data governance
issues and the associated responsibilities were low; the communication channels that are
used to highlight and promote data quality issues are either non-existent or clogged.
Secondly, there was an absence of any formal mechanism or a procedure to report data
problems. Employees who worked with the Master Data Systems were not aware of any
formal procedures or mechanisms through which they can report or correct faulty or incorrect
data. One attendant stated that “When I send payments to subxxxxx, I am not even sure that
the branch is still open, I may actually be sending payments to the wrong person or to a
wrong branch”. Thirdly, the organisation did not have formal structure in terms of data
stewardship or governance, data management was done on an ad-hoc basis by senior
managers and specific roles and responsibilities relating to data quality management were
either absent or under-developed. Fourthly, a more common theme identified related to the
use of local and informal controls to manage data. Examples included the use of local
spreadsheets, storing mission critical data in local drives, users writing their own macros to
automate some actions etc. These issues though provide convenience and expedite the
transactions in local areas can often lead to information security risks and compliance issues.
Lastly, among the discussions it was noted that the middle level managers were not aware of
ISO standards or best practices associated with information security management, They were
aware of the need to employ and use the current best practices available within the
information security management domain but the knowledge to get further relevant
information or how to implement a organisation wide data management program was lacking.
One of the positive aspects of the discussion was that the senior management were aware of
the data quality issues and the pressures of compliance, they are highly supportive in
improving the current practices and procedures but present organisational culture and
remains of public sector heritage is making their task harder and less efficient. The
organisation is still developing key metrics or the parameters which can identify the cost of the
poor data and poor data decisions
The study concluded by providing a governance model for managing data that can help to
reduce the costs and effects of poor data quality. We illustrate some guiding principles around
the management of data focused on three fundamental elements: People, Processes and
Data. The inter-relationship between these three elements requires that any attempt to
improve the overall quality of data within any organisation must be centred on people whether
data suppliers, processors or information customers; the processes that receive, handle,
action and pass on data and information; as well as the data itself where ever it sits within the
data cycle of input, process and output.
Figure 3. Conceptual Framework
The conceptual framework depicted in Figure 3 above sets Data Quality firmly within the
overall context of Data Governance as part of an enterprise-wide data strategy and acting as
a route map through the whole research.
The initial triple inter-linked framework developed from an intensive review of the literature
comprises the ’Data’ elements of master data management, together with operational and
transactional data; ‘Process’ review and improvement initiatives running in tandem with the
necessary system housekeeping procedures; together with the ‘People’ elements of
education and training, personal development aligned with accessibility in the form of
Assistive Technology (hardware and software techniques developed in order to assist visually
or physically disabled persons gain access to information technology within the working
environment). During the research for this study it became apparent that any enduring
improvement is predicated on making lasting changes to both processes and individuals’
behaviour and to bring about this, there has to be cultural and organisational change mainly
through the interaction of leadership and management at all levels. The framework also
identifies how the process of producing quality information derived from quality raw data has
parallels with a generic product manufacturing process. This useful analogy between a
production process and an information system also has strong roots in the literature (Strong,
Lee and Wang 1997:104; Wang 1998: 59).
Data quality improvement is not just about fixing data or improving quality within a single
business application or process, but also about taking a more expansive and forward-looking
enterprise-wide approach. This must involve addressing cultural issues, initiating both short
and long term process and procedural improvements by a step-by-step, incremental
approach, whilst ensuring that the data conforms to appropriate specifications or
requirements. In this way any improvement initiative has an opportunity to be sustained. It has
to be appreciated that there cannot be a ‘one size fits all’ remedy to embedding organisational
improvements at all levels, but rather to identify appropriate solutions to fit individual
situations and circumstances. One accepts that data quality problems are not created
intentionally by people, but more by the failure of the surrounding processes whether these
are system related or individual related involving lack of education, training, personal
developments or purely the person being placed in a position for which they are not suited.
There is strong evidence to indicate that solutions exist to improve the quality of data,
emanating from both the academic fraternity and the commercial world. This research
therefore has not only a strong academic base but also has major practical implications which
leads to a further key theme, that of aligning robust theoretical and academic concepts, within
the operating environment of a real life organisation, in order to implement sustainable data
quality improvements. Both Van de Ven and Johnson (2006) and Van de Ven (2007)
focussed on this relationship between theory and practice and how each discipline may
inform and thereby benefit the other, within a single project. It is also recognised that research
in this specific area may have implications for other functional sectors where process
improvements programmes can be applied.
Summary and concluding Remarks
The study has addressed both the issues of the cost of poor data quality (COPDQ) and the
values of quality data and information using examples from the literature, together with two
practical business case studies with which the researchers are associated. From this
research it can be seen that real tangible benefits can be derived if organisations recognise
the vital importance of data as an enterprise-wide asset, apply forms of analysis and
measurement and then attempt to implement data quality improvement initiatives
1. Borek A, Helfert M, Ge M, Parlikad AK. An information oriented framework for relating IS/IT
resources and business value. In: Proceedings of the International Conference on Enterprise
Information Systems. Beijing, China; 2011.
2. Davenport, T. H. (1998) Putting the Enterprise into the Enterprise System. Harvard Business
Review, July-August 1998: 121-131.
3. Davenport, T. H. (2006) Competing on Analytics. Harvard Business Review, January 2006: 99-
4. Davenport, T. H. and Harris, J. G. (2002) Elusive data is everywhere: understanding of position
and strategy comes before knowledge creation. Ivey Business Journal, 66(5): 30-31.
5. Davenport, T. H. and Harris, J. G. (2007) Competing on Analytics: The New Science of Winning.
Cambridge MA: Harvard Business School Press: 22
6. Davenport, T. H., Harris, J. G., De Long, D. W. and Jacobson, A. L. (2001) Data to Knowledge to
Results: Building an Analytic Capability. California Management Review, 43(2): 117-138.
7. Davenport, T.H., 1993. Process innovation: reengineering work through information technology,
Harvard Business Press.
8. Dun and Bradstreet and Richard Ivery School of Business. (2006). The Cost of Poor Data Quality:
1-13: Dun and Bradstreet.
9. Eckerson, W. (2002). Data Quality and the Bottom Line: Achieving Business Success through a
Commitment to High Quality Data: 1-33: The Data Warehouse Institute.
10. Economist Intelligence Unit. (2006). Business Intelligence- Putting Information to Work: 25.
11. English, L. P. (2009). Information Quality Applied. Indianapolis: Wiley Publications Inc: 802.
12. Eppler M, Helfert M (2004) A classification and analysis of data quality costs. 9th MIT
International Conference on Information Quality, November 5-6, 2004, Boston, USA
13. Fehrenbacher, D. and Helfert, M. (2012) "Contextual Factors Influencing Perceived Importance
and Trade-offs of Information Quality," Communications of the Association for Information
Systems: Vol. 30, Article 8.
14. Fisher, T. (2009). The Data Asset: How Smart Companies Govern Their Data for Business
Success. New Jersey: John Wiley & Sons: 220.
15. Folan, P. & Browne, J., 2005. A review of performance measurement: towards performance
management. Computers in Industry, 56(7), p.663-680.
16. Ge M. and Helfert M. (2008), Effects of information quality on inventory management.
International Journal of Information Quality, 2(2), pp. 176-191.Eppler and Mengis 2003
17. Gustafsson, P. et al., 2009. Quantifying IT impacts on organizational structure and business value
with Extended Influence Diagrams. The Practice of Enterprise Modeling, p.138–152.
18. Gustavsson, M. and Wanstrom, C. (2009) Assessing Information Quality in Manufacturing
Planning and Control Processes. International Journal of Quality & Reliability Management, 26(4):
19. Harris, J. G. (2005a) Insight-to-Action Loop: Theory to Practice: Accenture.
20. Harris, J. G. (2005b). The Insight-to-Action Loop: Transforming Information into Business
21. Harris, J. G. (2007) Forget the toys- It's the guy with the best data who wins: Accenture.
22. Hewlett-Packard. (2007). Managing Data as a Corporate Asset: Three Action Steps towards
Successful Data Governance: 1-8.
23. Informatica. (2008). Timely, trusted Data Unlocks the Door to Governance, Risk and Compliance:
15: Informatica Corporation.
24. Information Difference. (2009). The State of Data Quality Today: 33: The Information Difference
25. Jung, W. and Olfman, L. (2005), An experimental study of the effects of contextual data quality
and task complexity on decision performance, Information Reuse and Integration Conference, Las
Vegas, Nevada, USA.
26. Keller, K.L. and Staelin, R. (1987), Effects of quality and quantity of information on decision
effectiveness, Journal of Consumer Research, 14(2), pp. 200-213.
27. Kim W, Choi B (2003) Towards quantifying data quality costs. Journal of Object Technology,
28. Melville, N., Kraemer, K. & Gurbaxani, V., 2004. Review: Information technology and
organizational performance: An integrative model of IT business value. MIS quarterly, p.283–322.
29. Mooney, J.G., Gurbaxani, V. & Kraemer, K.L., 1996. A process oriented framework for assessing
the business value of information technology. ACM SIGMIS Database, 27(2), p.68–81.
30. Porter, M.E. & Millar, V., 1985. How information gives you competitive advantage. Harvard
Business Review, 63(4), p.149–160.
31. Van de Ven, A. and Johnson, P. E. (2006) Knowledge for Theory and Practice. Academy of
Management Review, 31(4): 802-821.
32. Van de Ven, A. H. (2007). Engaged Scholarship: A Guide for Organisational and Social Research.
Oxford: Oxford University Press: 330.
33. Van Wegen, B. & De Hoog, R. 1996. Measuring the economic value of information systems.
Journal of Information Technology, 11(3), p. 247-260.
The Data Capture Process
The major element of the data quality improvement initiative took the form of a series of face
to face meetings carried out at a number of the Company’s factories and business offices,
supplemented by conference calls used where either sheer distance or time precluded a
physical meeting. The meetings were arranged in a number of ways; by contacting individual
sites, upon the receipt of an invitation from a site, or a number of visits organised via a
member of a business management team. As the process developed the majority of the
further meetings were requested either by individual sites or by a site-owner business. A
generic agenda was developed to focus each event and identify the main points of
discussion, whilst still being flexible enough to tease out any other relevant issues. These
agenda points embraced:
• Discussion on the overall Corporate Data Quality Improvement Initiative with particular
regards to the Data Accuracy KPIs and the way in which these support the overall process
• Implications for the site and business
• The Site/Business KPIs
• Priority areas
• Short term actions
• Medium term approach
• Ensuring that everyone is aware of the implications of their actions and responsibilities
• Any further relevant points
This generic agenda encapsulates also the recurring themes around Processes and People,
together with the Cultural/Organisational themes relating to measurement and reporting,
communication, change management and short and long term priorities.
As stated the initial plan when commencing this study was to cover around a dozen sites, but
it became apparent immediately from the outset, that there was a distinct appetite, across all
levels within the manufacturing businesses, for better quality data as sites and businesses
rapidly requested a visit. This gathered a momentum, which led one to decide to expand the
initiative to encompass as many sites as possible. This valuable additional access has not
only widened the improvement process within the organisation, but has expanded the
researcher’s insight, to further identify understanding around this subject.
Every effort was made to ensure that this was not viewed as ‘a visit from Head Office’ or as
just another training session, but as a two-way information exchange. Notes were taken by
the researcher and were then written up in bullet point form, usually within twenty four hours,
to a predetermined format and circulated to all attendees for their comments and feedback.
These outcomes were then analysed and the findings generated as lessons learnt and to be
learnt, reproduced in the form of key findings, short term guidelines, issues and on-going
suggestions for improvement
The main thrust of the process took place between December 2008 and April 2009. In all,
forty eight of the fifty four factories and seven business operations and sales teams were
covered. Thirty four separate locations were actually visited, a number of these events
comprised representatives from two or more factories in the form of regional cluster meetings,
to speed up the process, reduce travelling and share experiences. In addition three
conference call meetings were also held where it was not possible to arrange for all the
participants to be together at the same time. Four of the factories also received a second visit
after a specific request.
The number of attendees at each meeting (excluding the researcher) varied between one and
six in respect of the pure site factory visits and up to nine in the case of business
operations/sales meetings. There was also quite a breadth of job roles represented, from
sales, procurement, finance, administration and well as operations and in total in excess of
130 people took part.
The format of the discussion evolved to encompass:
• Re-focussing the Data Accuracy KPIs within the perspective of the overall data quality
• Identifying and highlighting good and bad practice
• Identifying issues and problems
• Developing best practices within both the short and medium terms
• Determining how best this may be implemented
• Leaning from the above to improve on-going practice- (Action Research/Learning)
The discussions were captured and recorded by the researcher in bullet point form and
circulated around all attendees as soon as possible after each event. Feedback was
requested and any resultant comments were noted and added where applicable. In all thirty
seven separate events were recorded in this way. This process generated two hundred and
fourteen points of discussion,