About
59
Publications
49,232
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
2,512
Citations
Introduction
Skills and Expertise
Publications
Publications (59)
The first two phases of the statistical engineering process are to identify the problem, and to properly structure it. These steps relate to work that is often referred to elsewhere as framing of the problem. While these are obviously critical steps, we have found that problem-solving teams often “underwhelm” these phases, perhaps being over-anxiou...
At first blush, it appears that statistics and data quality should be perfect together. After all, statistical practitioners depend on high-quality data to conduct their analyses, and many data quality efforts seem well-suited to the application of statistical methods. Yet, the facts on the ground suggest otherwise: statistical applications, qualit...
It is especially poignant for data scientists who must “get out there” and learn all they can about everything surrounding the data they use. This chapter is about contextual background and soft data. Great data scientists know that the only way to acquire this smorgasbord of information is to go get it. They delve deeply into processes of data cre...
The difference between a good data scientist and a great one is like the difference between a lightning bug and lightning. Good data scientists work to discover hidden insights in vast quantities of often disparate and often poor‐quality data. Great data scientists think about things differently. They are not simply interested in finding new insigh...
Data scientists know that rigged decisions are antithetical to everything they stand for. Before decrying rigged decisions made by others, the authors recommend that data scientists first work to improve their own decision‐making. To develop a deeper appreciation for a scientific framework for handling bias in decision making, data scientists shoul...
The management introduction describes the company's vision, its short‐ and long‐term goals, what it accomplished in the past year, and where it is headed in the next few. A classical approach to evaluate a business or any organization is to list its strengths, weaknesses, opportunities, and threats that is SWOT. In such a mapping, strengths and wea...
This chapter provides an industrial context into the advance of data science. This context is important because it illustrates an important role for chief analytics officers (CAOs) and senior managers namely, identifying macrotrends based on evidence from disparate sources and positioning their companies to take advantage of them. The chapter consi...
This chapter describes teaching colleagues some basics and provides a starter set of questions for decision‐makers. Decision‐makers evaluate data scientists and their work every day, even if only informally. The information quality (InfoQ) framework can be used to assist in the design of a data science project, as a midproject assessment, and as a...
This chapter is about data quality, which is a plague on data scientists and chief analytics officers (CAOs). It takes up to 80% of data scientists' time and is the problem they complain about most. Without delving too deeply into details, to be judged of high quality, data must meet three distinct criteria: it must be “right:” correct, properly la...
The authors want to make valid inferences and predictions from the data about bigger, more important areas of interest to decision‐makers. They recognize four distinct methods for doing so. The “laws of nature” refers to laws and models that allow one to extrapolate, under assumptions. “Statistical generalization” refers to making inference from a...
Maintenance is a big area for analytics. The real work of the chief analytics officer (CAO) involves acquiring enough data science talent and getting the right organizational structure in place so the overall team of data scientists (and the organization as a whole) is most effective in turning data into information, making better decisions, and bu...
The wide‐angle perspective of data science includes activities as diverse as building trust so people are asked to contribute to really important problems, clearly stating the problem, conducting the analyses, teaching, supporting decisions in practice, and so forth. This chapter focuses on one activity that is too often ignored, impact assessment....
The real work of chief analytics officers involves establishing a team suited for the organization's current level of maturity in the short term and leading efforts to move up the maturity ladder in the longer term. This chapter distinguishes between five maturity levels: firefighting, inspection, process view, quality by design, and learning and d...
This chapter presents an example that provides an object lesson in understanding the real problem. Data scientists simply must engage with “customers” in their languages and talk through the apparent problems to discover the real ones. They found that many people make this more difficult than it needs to be. The authors consider three points to thi...
Over the past several years, the term data‐driven has penetrated the business lexicon and appears to be here to stay. A “data‐driven” company is one that strives to make better decisions, by individuals and in decision‐making groups, up and down the organization chart, every day. This means making slightly better decisions today than yesterday and...
As a data scientist, we face a tall order in getting decision‐makers to comprehend and believe data, our results, and their implications. We have to think through their background and present in ways that advance their understanding. At a minimum, we must make our plots and the accompanying explanations easy to understand. Successful oral presentat...
This chapter presents the life cycle of data analytics in the context of an organization aiming to profit from data science. The life‐cycle view is designed to help data scientists help decision‐makers. The chapter considers each step of the cycle in turn. The work of data science takes place in complex organizational settings, which can both promo...
Nothing improves data science like a demanding decision‐maker, one who is striving to become data‐driven, who wants to bring as much data and data science as possible to bear and constantly expects data scientists to deliver more. This chapter presents the first exercise of waist measurement with rope, which aims to help decision‐makers appreciate...
Educating senior management and helping guide overall data strategy is a tall order indeed. The space is a confused mess and the topic is very charged and political. There are always good reasons to delay, or simply avoid, the tough issues. There are lots of ways to profit from data and much that can go wrong. It leads us to conclude that every com...
High-quality data is critical for effective data science. As the use of data science has grown, so too have concerns that individuals’ rights to privacy will be violated. This has led to the development of data protection regulations around the globe and the use of sophisticated anonymization techniques to protect privacy. Such measures make it mor...
Published in the online version of the Harvard Business Review on October 3, 2019.
Information and data quality practitioners are in general agreement that social, cultural, and organizational factors are the most important in determining the success or failure of an organization’s data quality programs. This paper presents some of the first research undertaken to substantiate these anecdotal claims. The paper describes a survey...
Duplicates in a database are one of the prime causes of poor data quality and are at the same time among the most difficult data quality problems to alleviate. To detect and remove such duplicates, many commercial and academic products and methods have been developed. The evaluation of such systems is usually in need of pre-classified results. Such...
Information quality is generally defined in terms of fitness for use. Almost all agree that they prefer high-quality to low-quality information. And, while many organizations have made good progress, many find that setting up information quality programs and making improvements proves difficult. Further, most agree that the most critical difficulti...
This chapter provides a prospective look at the “big research issues” in data quality. It is based on 25 years experience, most as a practitioner; early work with a terrific team of researchers and business people at Bell Labs and AT&T; constant reflection on the meanings and methods of quality, the strange and wondrous properties of data, the impo...
The typical organization is faced with a range of issues that prevent it from taking full advantage of its data resources.
Among these issues are poor connection between strategy and data, low accuracy levels, inadequate knowledge of what data resources
are available, and lack of management accountability. While one might hope that the Internet and...
Most companies underutilize their data assets, but sometimes they figure out how to leverage them to satisfy marketplace demands. The author outlines nine ways to create new value from your data
In many respects, statistical foundations and techniques that worked so well to help improve the quality of manufactured goods have proved extensible to data and information. But data and information differ from manufactured products in some important ways, presenting both challenges and opportunities. For example, unlike manufactured products, dat...
No industry, company within any industry or any department within any company is immune to the effects of poor quality data. While most effects are barely observable, the cumulative impact of poor data quality is enormous. It is trite to observe that data is a critical asset in the information age. Data is the "facts and figures" associated with cu...
Almost every activity in which the enterprise engages, from the most mundane operation to the most far-reaching decision, requires data. Yet data are rarely managed well. few enterprises know what data they have; people cannot access or use data; and data quality is often low. Furthermore, individuals and business units often hoard data, leading to...
Poor data quality has far-reaching effects and consequences. The article aims to increase the awareness by providing a summary of impacts of poor data quality on a typical enterprise. These impacts include customer dissatisfaction, increased operational cost, less effective decision-making and a reduced ability to make and execute strategy. More su...
Sumario: Errors in data can cost a company millions of dollars, alienate customers, and make implementing new strategies difficult or impossible. The author describes a process AT&T uses to recognize poor data and improve their quality. He proposes a three-step method for identifying data-quality problems, treating data as an asset, and applying qu...
Data quality is usually associated with the quality of data values. But even perfectly correct data values are of little use if they are based on a deficient data model. The purpose of this paper is to present and discuss a list of characteristics (dimensions) that are crucial for data model quality. We single out 14 quality dimensions, organized i...
The importance of data in large databases to the operation of
telecommunications networks has grown considerably. For example, all
provisioning, maintenance, and billing operations are critically
dependent on data and many new network services are based on real-time
access to data. This makes data quality a major issue for the industry.
The purpose...
The rapid proliferation of computer-based information systems is increasing the importance of data quality to both system makers and users. However, there is neither an established framework nor common terminology for investigating data quality. There is not even agreement on what the term “data” means. We lay a foundation for the study of data qua...
The purpose of the paper is to present a new model of the data life-cycle. Such a model is needed to clarify activities involving data, from its creation through use, and to establish the relationships of these activities to one another. The proposed model features four principal data cycles: the acquisition cycle includes activities that create an...
Data are used in the delivery of many products and services, and so data quality is an important component of customers' perceptions of the quality of these products and services. The paper describes efforts initiated by AT&T to control and improve the quality of data it uses to operate its worldwide intelligent network, to conduct its day-to-day o...
To characterize the transmission performance of the public switched network, the Bell System conducted an intensive field measurement study of end-office-to-end-office connections from October 1982 to January 1983. A special multistage sampling plan and ASPEN, a flexible, automatic data acquisition system based on the UNIX™ operating system, were d...
Time sequence measurements of the elemental composition of aerosols, on an hourly to fewhourly basis, may be analyzed statistically for chemical associations which are characteristic of sources before modification of these associations during transport through the atmosphere. If correlations between elemental abundances are computed in the measured...