Available via license: CC BY 4.0
Content may be subject to copyright.
Harvard Data Science Review • Issue 5.1, Winter 2023
A Review of Data Valuation
Approaches and Building
and Scoring a Data
Valuation Model
Mike Fleckenstein1Ali Obaidi1Nektaria Tryfona2
1Data Environments and Engineering Department, The MITRE Corporation, McLean, Virginia,
United States of America,
2Bradley Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and
State University, Blacksburg, Virginia, United States of America
Published on: Jan 26, 2023
DOI: https://doi.org/10.1162/99608f92.c18db966
License: Creative Commons Attribution 4.0 International License (CC-BY 4.0)
Harvard Data Science Review • Issu e 5.1, Winter 2023 A Review of Data Valuation Approaches an d Building and Scoring a Data Valuation Model
2
ABSTRACT
Data valuation has been given increasing thought for the past 20 years. The importance of data as an asset in
both the private and public sectors has systematically increased, and organizations are striving to treat it as
such. However, this remains a challenge, as data is an intangible asset. Today, there is no standard to measure
the value of data. Different approaches include market-based valuation, economic models, and applying
dimensions to data. The first part of this article (Data Valuation Framework) examines these approaches and
suggests a framework for grouping them. The second part of this article (Building and Scoring a Dimensional
Data Valuation Model) describes how we built and scored a data valuation model.
Keywords: data valuation, intangibles, data as an asset, data monetization, data valuation model, data
valuation framework
Media Summary
We often hear that data is becoming the new currency across our economy (e.g., Keller, 2020). It is a clear
indication that we, as a society, want a way to value data in concrete terms. We are not there yet.
Today, business gambles on the future value of data by acquiring competitors for huge amounts of money
based on things like “eyeballs.” Governments estimate future economic prosperity based on availability of
data. Lastly, many of us, either explicitly or implicitly, calculate the value of specific data sets in improving our
business outcomes. That these approaches exist is a sign that we are striving to apply currency to data.
However, these different approaches to data valuation highlight that we are still experimenting with a
repeatable approach to assessing data in terms of currency.
Over the past two or three decades, researchers have increasingly sought to define a repeatable approach to
data valuation. Our research builds a framework that shows how approaches to data valuation typically fall into
three categories. We examine these approaches in terms of their characteristics, differences, and
commonalities, and highlight their strengths and challenges. We also present real-world examples of each
approach. We then look more closely at one of these approaches and expand on historical attempts of its use to
value data. In so doing, we develop an easy-to-use, repeatable model to value data for two use-cases.
We acknowledge that no single approach to data valuation exists today, and that different approaches—even a
combination of approaches—can be used, depending on the use case.
Harvard Data Science Review • Issu e 5.1, Winter 2023 A Review of Data Valuation Approaches an d B uildin g and Scoring a Data Valuation Model
3
1. Introduction
As longtime practitioners and teachers of data management, we are struck by the many references to ‘data as
an asset.’ The implication is that data should be valued similarly to traditional assets. When a market exists,
discounted value of future utility can be measured in monetary terms. However, when no market exists, the
value of data must be calculated more creatively. For example:
Sometimes, when no market exists, the value of data can be stated in only relative terms rather than in
monetary terms. For example, during the recent COVID-19 pandemic, many data sources were assembled and
often available for free. Entities including hospitals, research organizations, governments, and academia had to
decide which data sources served them best by examining a host of factors, including data volume and variety,
data quality, and update frequency. These factors were not readily expressible in monetary terms. Instead,
organizations had to value them in relative terms by implicitly or explicitly scoring their value.
To better understand how value can be applied to data, we took a two-fold approach. First, we did an
environmental scan of approaches that have already been tried. Second, we built a data valuation model based
on a small amount of real-world data.
Our research found many studies and executions of estimating data value. No standard approach to data
valuation exists, and perspectives vary considerably based on the use case. Based on our findings, we created a
framework that grouped data valuation approaches into three models: market-based, economic, and
dimensional.
We found that business typically estimates the value of data in terms of cost and revenue when buying and
selling data or data-intensive businesses (market-based). Government approaches to data valuation center on
estimating economic benefit as a result of making data available—for example, making government data such
as census, transportation, and health data publicly available in hopes of stimulating predetermined economic
growth (economic). A third approach leverages data dimensions (dimensional). This approach examines
valuation points of a specific data set both inherent to data, like data quality (e.g., completeness, accuracy,
Many organizations benefit from freely obtained data provided by people. What freely provided data is
worth is subject to interpretation. In one recent large-scale survey, researchers estimated compensation
required for individuals to forgo certain data-intensive applications, such as email, maps, and social media.
They estimated, for example, that a typical U.S. user of Facebook might require $48 per month to forgo that
data (Brynjolfsson et al., 2019). That is slightly more, but in the same ballpark as the roughly $27 per user
Facebook’s revenue divided by the number of users might indicate.1
A 2011 market assessment of public sector data in the United Kingdom estimated its value at £1.8 billion
(Deloitte, 2013). This includes direct value to sellers of public sector data, direct value to entities that
interact with public data, and indirect value affecting supply chains.2 This estimate triples in value when
factors such as data reuse and wider societal impacts are taken into account.
Harvard Data Science Review • Issu e 5.1, Winter 2023 A Review of Data Valuation Approaches an d B uildin g and Scoring a Data Valuation Model
4
timeliness), and contextual to value data (e.g., frequency of use, ownership). For example, organizations
routinely make decisions on acquiring (or keeping) one data set over another similar data set based on these
dimensions.
This research article consists of two parts:
The first part covers data value initiatives to date. We group initiatives into three models:
Our research shows that each model can be used in different circumstances and none of these approaches work
for every case. All models are speculative and subject to context external to the data. We also note that the
three models overlap with each other. For example, government policy and legal regulation (e.g., privacy)
affect all models. The right approach depends on a given use case. As a group, they can serve as the basis for a
data valuation framework, with each use case leveraging one or more models.
The second part describes how we built and scored a dimensional data valuation model. We developed a survey
containing about 30 questions. We used our dimensional model research as our baseline and built on prior
work. Our proposed data valuation is based on questions in dimensions of ownership, cost, utility, age, privacy,
data quality, and volume and variety.
The dimensional model is best suited to comparing one data set with another. Therefore, we focused on two
use cases: how to compare the value of two similar data sets and how to assess the value of adding a data set to
an existing data pool. We expanded on prior work by examining how stakeholders with different perspectives
might weight and value dimensions differently. For example, we examined how government, a research
organization, a hospital, and an academic institution might each weigh certain questions differently.
Our goal was to design an easy-to-use, customizable approach that helps organizations assess the value of
specific data sets for specific use cases using a small, consistent set of dimensions. Our scoring reflects the
relative value of data sets. It shows clear differences in some comparisons and more subtle differences in
others. We concluded that our model can be used effectively as a baseline for determining the value of a data
set in terms of a score and that the weighting of scores can vary significantly based on context and stakeholder
perspective.
2. Data Valuation Framework
market-based models, which calculate data’s value in terms of cost and revenue
economic models, which estimate data’s value in terms of economic and public benefit
dimensional models, which estimate data value based on categories or dimensions
Harvard Data Science Review • Issu e 5.1, Winter 2023 A Review of Data Valuation Approaches an d B uildin g and Scoring a Data Valuation Model
5
2.1. Models for Data Valuation
Our environmental scan reviewed many examples of data valuation spanning from more than 40 years ago to
today. Through our research of prior approaches, we arrived at a data valuation framework that groups data
valuation approaches into three models. We define these as follows:
This grouping allows data researchers, practitioners, and policymakers from industry and government to better
approach data valuation. Table 1 summarizes our data valuation framework. It provides an overview of the
types of approaches included in each model. Subsequent sections provide examples for each model, a detailed
description of each model, and each model’s strengths and challenges. In addition, Appendix A provides a
summary of each model’s strengths and challenges, and Appendix B provides a comparison of data assets with
traditional assets.
Tabl e 1. Data valuati on framework .
The market-based model values data based on income (e.g., selling data), cost (e.g., buying data), and/or
stock value (e.g., value of data-intensive organizations). Organizations routinely buy and sell data and data-
intensive companies.
The economic model values data in terms of its economic impact. This model is frequently used by
governments to assess the value of publicizing data. For example, governments share weather data, which
helps sustain an ecosystem of weather forecasting.
The dimensional model values data by assessing attributes inherent to a data set (e.g., data volume, variety,
and quality) as well as the context in which data is used (e.g., how the data will be used and integrated with
other data). For example, organizations inherently decide to acquire, keep, or prioritize one of several similar
but different data sets. To date, this is an informal process.
Model Description
Market-Based Model Models that assess data value in terms of its income, cost,
and/or market worth, including:
Income and cost based
Buying and selling data
Leveraging data to improve products or services
Enhancing customer experience through associated data
products or services
Assessing the value of data breach or loss
Licensing
Stock market based (e.g., mergers, acquisitions, initial
public offerings)
Harvard Data Science Review • Issu e 5.1, Winter 2023 A Review of Data Valuation Approaches an d B uildin g and Scoring a Data Valuation Model
6
2.2. Model Overlap
While the three models are different and rooted in specific use cases, there is overlap among them. For
example, governmental policy and legal regulation (e.g., privacy) affect all models. Similarly, survey questions
can be constructed to accommodate any model. Finally, we saw that cost and utility (sometimes expressed in
financial terms) was used as a valuation method across all models. This overlap highlights the underlying
similarity of the three models as well as their unique focus. Figure 1 reflects differences and commonalities in
graphic terms.
Economic Model Models that assess data value in terms of its economic impact,
including:
Financial estimates of economic benefits
Value of data for the public good
Policy and legal regulation impact
Dimensional Model Models that identify and prioritize categories (or dimensions),
both data related and contextual, and then attempt to calculate
or estimate data value, including:
Explicit comparisons of datasets
Formulas using dimensions to assess the value of data to
business functions
Harvard Data Science Review • Issu e 5.1, Winter 2023 A Review of Data Valuation Approaches an d B uildin g and Scoring a Data Valuation Model
7
2.3. Market-Based Model Examples
Market-based approaches to data valuation are an extension of physical asset valuation. Just like physical
assets, data can be valued based on its cost, its sale value, or its income potential (Internal Revenue Service
[IRS], 2020). In addition to these approaches to data valuation, companies are also using at least two different
forms of cost, besides purchase cost. The first is data valuation in terms of insurance cost—what would the
compromise or loss of data cost? The second is estimating the value of their competitors’ data and sometimes
costing a purchase. Below are some examples of market-based data valuation:
Figure 1. Data Valuation Model Overlap.
Buying and selling data
Acxiom, Equifax, and Dun & Bradstreet are companies that only buy and sell data. They aggregate that
data and enhance and repackage it for consumption. These data brokers value their data using cost and
whatever income it will fetch in the free market.
Email addresses for marketing are available for purchase on the open market. For example, in 2014, it was
possible to purchase four million emails for $75.95 (Nash, 2014).
Leveraging data to improve products or services
Price Waterhouse Coopers markets its services by showing that firms with greater effectiveness at gaining
a financial return on investment from data—data trust pacesetters—routinely explicitly value their data
Harvard Data Science Review • Issu e 5.1, Winter 2023 A Review of Data Valuation Approaches an d B uildin g and Scoring a Data Valuation Model
8
2.4. Economic Model Examples
We found two types of studies on economic models: ones that explicitly estimate the value of open data and
ones that focus on how policy creates public data value. The following reflect some economic model examples:
using cost-benefit analysis (PWC, 2017).
Large retailers are selling their purchasing data to suppliers, which are eager to buy this data to improve
the time to market of their products (Najjar & Kettinger, 2013).3
Enhancing customer experience through associated data products or services
FedEx provided its customers with online package tracking, enhancing its package delivery service. Thus,
such data might be measured by the additional business it generates.
Companies frequently offer premium versions of software or free apps, including weather, fitness
tracking, and analytic platforms for purchase. Companies like Spotify and Netflix use customer data to
deliver enhanced streaming content and recommendations to their users.
Assessing the value of data breach or loss
With data breaches and data ransom increasing, companies routinely go through a data valuation exercise
to determine how much and what kind of insurance to buy for their information assets. Here, value of the
data is defined in terms of fines, loss of customers, and cost of preventing future breaches. Note that a
company may want to insure a discrete piece of data or intellectual property. The answer, generally, is that
it cannot be done because it is difficult to value data discretely (Najjar & Kettinger, 2013).
One observer noted that the TJX Co. breach disclosed in 2007, estimated to cost the company at least
$180 million on over 46 million records, worked out to more than $4 per customer record—prompting the
question of how a theoretical insurer coming into the company ahead of time to create such a policy
would have calculated the value (Todd, 2015).
Purchasing or selling data-intensive companies
There are many examples of companies buying other companies for their data, which largely determines
their worth in the marketplace. In one example in 2016, Microsoft Corp. acquired the online professional
network LinkedIn Corp. for $26.2 billion (Microsoft, 2022). Other examples include Google’s acquisition
of YouTube ($1.7 billion, 2006), Nest ($3.2 billion, 2014), and fitbit ($2.1 billion, 2019) or Facebook’s
acquisition of Instagram ($1 billion, 2012) and WhatsApp ($22 billion, 2014).
The economic value of earth observation from space. An assessment of the value of geospatial data for the
Australian economy examines the impact of satellite data from weather monitoring; ocean health; and
activities like oil drilling, landscape monitoring, agriculture, water monitoring, natural disaster management,
and mining. Completed in 2015, the assessment projects a total of about $3 billion (Australian) economic,
social, and environmental benefits to the Australian public ( Acil Allen Consulting, 2015).
Valuing the census. A report that quantifies the benefits to New Zealand from the use of census and
population information estimates the value of census data in areas such as improved health funding,
Harvard Data Science Review • Issu e 5.1, Winter 2023 A Review of Data Valuation Approaches an d B uildin g and Scoring a Data Valuation Model
9
2.5. Dimensional Model Examples
In addition to market-based and economic models for data valuation, numerous studies attempted to quantify
additional categories—or ‘dimensions’—to value data. Such dimensions were based both on the data itself
(e.g., data quality) as well as on the context within which the data was used (e.g., timeliness of delivery). We
term this approach the dimensional model.
We found different approaches used to evaluate dimensional models, including the use of mathematical
formulas, survey questions, examinations of prior studies (sometimes with new ideas), and actual attempts at
categorizing data assets. An example of applying a mathematical formula is the calculation of business value of
information used by Doug Laney (2018, p. 253) in his book Infonomics:
Business Value of Information =
where p = the number of business process functions.
Other models used surveys that consisted of asking respondents to value specific data set characteristics, such
as age of data, accuracy, operational value, and replacement cost. We found numerous examinations of prior
studies, some of which explored the impact to data valuation of adding new dimensions. We also found studies
that examined the application of dimensions to real-world use cases. For example, one study categorized usage
as either log analysis, identifying data consumers, or number of views/downloads, depending on the use case
(Brennan et al., 2018).
Although we have witnessed that informal evaluations using the dimensional model are frequently done, we
found limited published real-world examples. In one example, Highways England, which manages data on
roads and related infrastructure, explored how much of its £115 billion in intangible assets was attributable to
data. It mapped key data assets to business functions and their financial value, modulated by an assessment of
reductions in use of underutilized capital investments, ability to craft more precise policy, and overall benefit
to government and private sector firms. The report concludes that, despite significant difficulties in
developing a rigorous quantification, census data presents a $1 billion (New Zealand) benefit to its public
(<5 million population) over 25 years (Bakker, 2013).
The California Consumer Privacy Act (2018). This regulation, effective January 2020, requires businesses,
when offering certain services, to document a reasonable and good faith method for calculating the value of
the consumer’s data.
Taxing data. New York City is working on legislation to create a data sales tax. The proposal’s authors
outline a four-step approach, with Step 1 being to “quantify the amount of data generated by New Yorkers
and commercialized for profit.” (Adams & Gounardes, 2020) In a similar vein, California’s governor, Gavin
Newsom, in 2019 tasked a team to research a “data dividend,” a tax paid to either consumers or the state for
selling individuals’ data (Ulloa, 2019).
Relevance ∗∑p=1
n(p)Validity ∗Completeness ∗T imeliness
Harvard Data Science Review • Issu e 5.1, Winter 2023 A Review of Data Valuation Approaches an d B uildin g and Scoring a Data Valuation Model
10
each data asset’s potential market value, to show that the organization’s data was worth £60 billion (Laney,
2021).
We summarize prior approaches to the dimensional model in more detail in Section 3.3.
3. Model Detail
3.1. Market-Based Model
3.1.1. Market-Based Model Overview
The key feature of this model is that it uses income or cost to value data. Based on our findings, the market-
based model of data valuation is widely practiced. Policy development regarding this model is still evolving.
Currently, it is similar to permitted valuation techniques of other intangible assets, such as patents, copyrights,
or software. In fact, the IRS guidelines for valuing intangible assets list “technical data” as one type of
intangible asset (IRS, 2020). According to the IRS, the value of an intangible asset can be determined in the
same way as for tangible assets: using a cost basis, gauging the asset’s value in the marketplace, or basing the
asset’s value on revenue potential of the asset in question.
3.1.2. Market-Based Model Strengths
Market-based models allow for the monetary valuation of data based on what the market will pay, whether
valuation is rooted in anticipated income, how much a data-oriented company might fetch in a sale, or
speculation on what the loss of data is worth.
On the cost side, market-based models calculate the cost of a data breach or loss and the cost of insurance.
Similarly, companies estimate their cost of letting competitors into the market and sometimes decide to acquire
those competitors based on the projected value of their data.
Data may even lend itself to be bought and sold on an exchange. Examples of data marketplaces for both
personal and business data are starting to appear (see, e.g., Dilmegani, 2022). This is also happening for data in
the illegal market, such as credit card and Social Security numbers. It remains to be seen whether formal
exchanges for legally traded data are sustainable.
3.1.3. Market-Based Model Challenges
One challenge with the market-based model is that factors besides data, like talent acquisition, may play a role.4
Another challenge is the small marketplace of buyers and sellers of data, resulting in a limited ability to
compare one purchase with another (e.g., most do not share their prices). In addition, while there are data
brokers that collect and then sell information, their accountability regarding the quality of information is
sometimes in question, and these markets are not transparent (Federal Trade Commission, 2014).5
Harvard Data Science Review • Issu e 5.1, Winter 2023 A Review of Data Valuation Approaches an d B uildin g and Scoring a Data Valuation Model
11
The market-based model also does not consider the value of data created by consumers, while companies in
turn receive advertising revenue for this data. There is debate about the degree to which a tax placed on the
marginal use of individual data may benefit the public versus disincentivize organizations from creating a user
information marketplace (Bergemann & Bonatti, 2019).
Market-based data valuation is also impacted by restrictions on local markets. For example, a company may
incur costs to comply with a mandate to store individual data locally. Companies must consider the risk of local
piracy, favoritism toward local competitors, and censorship. These types of local restrictions ultimately factor
into the value of data.
3.2. Economic Model
3.2.1. Economic Model Overview
The economic model values data in terms of overall economic and public benefits. Economic benefit might
look at overall job gains, while public benefit might look at social benefits such as impact to privacy, health,
and infrastructure. In some cases, data valuation using the economic model is squarely counter to using the
market-based model. For example, much work goes into evidence-based health care based on big data, which
relies on broad data from many sources, including providers, payers (i.e., insurers), and individuals (e.g., health
apps; Harwich & Lasko-Skinner, 2018). An economic model might look at the overall value of such data for
the public, while industry might leverage a market-based model to reduce costs or increase revenue for its
sector only. We found many studies on valuing data for the public good (see, e.g., Open Data Watch, 2021).
These studies are often done on behalf of governments. They estimate the value of data to the economy for the
likes of geospatial data, census data, or public sector data in general.
3.2.2. Economic Model Strengths
The strength of the economic model lies in its focus on data valuation for the public good in two types of
studies: those that estimate the value of open data and those that suggest the use of policy to drive public data
value. The former project how open data can be leveraged by both government and the private sector to
produce economic benefit. The latter discuss tweaking policy to effect those same kinds of benefits. Economic
models are being actively used to determine the value of data. For example:
Generating value through aggregating data from many sources. An example of effective data aggregation is
the U.K. Hydrographic Office’s (UKHO’s) transition from paper to digital maps on surface water and
geospatial measures. This digital conversion allowed UKHO to aggregate mapping data, include other
diverse sources, and then apply analytics. Today, in addition to the Royal Navy and defense, 90% of ships
trading internationally use this data, which generates £150 million annually (HM Treasury, 2018).
Data produced by public institutions often spurs private innovation. Weather data and transportation data
generated by the government is routinely enhanced and provided back to society either for free or for a fee
Harvard Data Science Review • Issu e 5.1, Winter 2023 A Review of Data Valuation Approaches an d B uildin g and Scoring a Data Valuation Model
12
3.2.3. Economic Model Challenges
Economic models are based on calculations with limited scope and take a long time to verify. To some extent
economic models have been executed, but unlike market-based models, economic models are slow to be put
into practice. This is likely because their implications—good and bad—are significant, and there is no profit
motive. Governments act carefully to avoid negative implications. Examples of challenges include:
3.3. Dimensional Model
3.3.1. Dimensional Model Overview
Numerous studies attempt to value data via dimensions. Such dimensions are based on both the data itself (e.g.,
data quality, age, format) as well as the context within which the data is used (e.g., time savings, level of
ownership, delivery frequency).
(e.g., premium versions).
Some observers maintain that economic models can spur enhanced data access by espousing pro-competitive
policies, making it harder for a small number of companies to hoard such data (Coyle et al., 2020).
Economic models are exploring the valuation of personal data by extracting a tax paid either directly to end-
users or to the state for use of that data.
Economic models can designate data ownership. Data ownership today is not well defined. As a result, de
facto ownership is common. Creating laws regarding data ownership, such as privacy laws that assign
ownership of personal data to individuals, significantly shifts data value in favor of the owner.
Overly restrictive laws or policies that might negatively affect the value of data, discouraging competition
and wide data reuse. Such laws may lead companies to hoard data, defeating the purpose of the economic
model. As an example, the potential negative economic impact of overly restrictive privacy laws may
outweigh their intended benefit (Jones & Tonetti, 2019).
Unlike with physical goods, the flow of data is not tracked. This might entail data flow between businesses
or countries, or free services delivered to end-users, like email, search results, or driving directions.
Consequently, the data value for activities like unpaid data creation, data reuse, and cross-border flow is
difficult to include in models (Organisation for Economic Co-operation and Development [OECD], 2019;
U.S. Department of Commerce, 2016).
Economic models reflect data valuation in terms of projected financial as well as social benefits. Valuing
social benefits based on data is particularly challenging.
The purpose of intellectual property law is to maintain the balance between innovation and public good. This
has been tried with data in limited ways. For example, the U.K. Copyright and Rights in Databases
Regulations of 1997 allow copyright of databases, including contents. Some studies suggest that strong
externalities, such as the benefit of aggregating data from many sources, make copyright-like protection less
appropriate for data (Duch-Brown et al., 2017).
Harvard Data Science Review • Issu e 5.1, Winter 2023 A Review of Data Valuation Approaches an d B uildin g and Scoring a Data Valuation Model
13
An early study, by Niv Ahituv (1980), examined mathematical formulas to evaluate data systems, including in
terms of timeliness (response time and frequency), level of nondesired data, value of data aggregation, format
(medium, data organization, and data representation), and ranking of data importance. A subsequent study by
the same author investigated attributes of information valuation, including timeliness, content, format, and cost
(Ahituv, 1989).
An often-cited study to value data by Daniel Moody and Peter Walsh (1999) looked at different approaches to
information value based on accounting practices, namely, cost, market value, and present value of expected
revenue potential. The authors concluded these were the most effective valuation parameters. The authors also
examined communications theory, the attempt to measure the value of information based on the amount of
information communicated. This, they correctly concluded, leaves out the value of the content and is not a
useful approach to data valuation.
More recently, Gianluigi Viscusi and Carlo Batini (2017) performed a compilation, documenting various prior
studies of using dimensional data valuation. This compilation reflects the use of information quality (e.g.,
accuracy, timeliness, credibility) and information structure (e.g., abstraction, codification). It reiterates the
importance of utility (financial value) as a data valuation category. In addition, the study highlights information
diffusion (e.g., scarcity, sharing) and infrastructure (e.g., abstraction, embeddedness) as key data valuation
categories.
In 2018, Douglas Laney, an analyst and author with Gartner at the time, popularized the concept of
“Infonomics” in an effort to centralize discussion on valuing data as an asset. He discussed several models, at
least two of which (intrinsic value and business value of data) involve data dimensions.
Table 2 summarizes our research into dimensional data valuation models. It is notable that some categories,
like data cost, quality, and utility, are repeated across multiple studies, suggesting that they are particularly
valuable dimensions.
Tabl e 2. Sum mary of pri or appro aches to data val uati on usi ng t he dimensional m odel.
Study Data Value Categories Conclusion
Brennan et al. (2019) Operational value, replacement cost,
competitive advantage, regulatory risk,
timeliness, secondarily ease of
measurement, and data quality
Reinforces a hierarchy of data value
dimensions—that is, utility (including
operational impact), context (including
timeliness and competitive advantage),
usage and quality, cost (including
replacement costs), and the use of
manual survey-based methods as useful
for data valuation.
Harvard Data Science Review • Issu e 5.1, Winter 2023 A Review of Data Valuation Approaches an d B uildin g and Scoring a Data Valuation Model
14
Brennan et al. (2018) Usage (log analysis), cost (creation,
maintenance), quality, intrinsic value, IT
operations (surveys, trouble ticket
analysis), contextual (e.g., access
frequency, purchase cost, volume,
appropriate data quality threshold,
relevance), and utility
Monitoring data value is a necessary
prerequisite to strategic data
management. It is possible to assess the
maturity of data value monitoring
processes. Usage and cost are easiest to
implement, and utility or operational
value is the most important for
organizations.
Laney (2018) Intrinsic value (validity, completeness,
scarcity, lifecycle), business value
(relevance, validity, completeness,
timeliness), performance value (relative
key performance indicator benefit when
leveraging information assets), cost
value, market value, and economic value
Models are imperfect and have greater
utility in combination than when
standing alone. Dimensions may be
modified based on organization needs.
Models provide an indicator of
‘information asset management’
maturity, which typically results in
increased value from data.
Fleckenstein and Fellows (2018) Cost, data type (quality), maturity of
data stewardship, data architecture, and
data lifecycle
Principles from physical asset valuation
may be used to extract relevant
dimensions related to data valuation.
Harwich and Lasko-Skinner (2018) Quality, format, ability to link data, type
of data, reason of data collection,
quantity, actionability, use of data,
market capitalization, and relative cost
of getting data elsewhere
Public authorities should develop a clear
national strategy that seeks to optimize
the value of data; help the public sector
when accessed for commercial purposes;
and ensure the value of data is optimized
between data owners, the public sector,
and industry.
Harvard Data Science Review • Issu e 5.1, Winter 2023 A Review of Data Valuation Approaches an d B uildin g and Scoring a Data Valuation Model
15
Viscusi and Batini (2017) Information quality (accuracy,
accessibility, completeness, currency,
reliability, timeliness, usability,
credibility, believability, reputation,
trustworthiness), information structure
(abstraction, codification, derivation,
integration), information diffusion
(scarcity, sharing), information
infrastructure abstraction,
embeddedness, evolving (timeliness),
flexibility, openness, sharing,
standardization (codification), financial
value, pertinence, transaction costs
These metrics may be useful for
measuring information value. Data
valuation analysis produced was limited
due to the complex and multidisciplinary
nature of information value. Further
studies are recommended to clarify data
valuation categories.
Nagle and Sammon (2017) Business value (cost reduction, revenue
generation, risk mitigation), acquisition
(cost and legitimate need of data), level
of integration (existing vs. needed),
analytics effectiveness, delivery (data
quality and visual impact), and level of
data governance
A data value map can be used to gain a
shared understanding.
Heckman et al. (2015) Value-based, qualitative, and cost-based
parameters
This model is a rudimentary step toward
building data valuation.
Higson and Waltho (2010) Cost/cost reduction, return on
investment, risk mitigation, data security,
data quality, utility (number of users or
applications using), business satisfaction,
and results
Argues for an asset-centric, value-based
approach to the management of
information.
Sajko et al. (2006) Quantitative dimensions (value to
business, value for other businesses, cost
of reconstruction, value of data over
time) and qualitative dimensions
(information importance and age)
Information assessment is determined by
two components: dimensions of
information value and importance or
priority of the dimensions. The value of
information is contextual.
Moody and Walsh (1999) Amount of data Found the amount of data ineffective,
since it excludes value of content.
Concluded that acquisition cost, market
value, and potential revenue are the best
indicators.
Harvard Data Science Review • Issu e 5.1, Winter 2023 A Review of Data Valuation Approaches an d B uildin g and Scoring a Data Valuation Model
16
3.3.2. Dimensional Model Strengths
The dimensional model incorporates data-specific and contextual attributes like data quality and stewardship,
which other models leave out. These attributes underline the effective use of data. They are, to a large extent,
the focus of data management and maturity models, such as the Capability Maturity Model Integration Data
Maturity Model (CMMI Institute, 2022), the Federal Data Maturity Model (Data Cabinet, 2018), and the Data
Management Association’s Data Management Body of Knowledge (DAMA International, 2020). Additional
strengths of the dimensional model include:
3.3.3. Dimensional Model Challenges
Key challenges of this model are listed below:
Ahituv (1989) Timeliness, contents, format, and cost The multi-attribute approach incurs
challenges, including variable
identification, measurement, variable
relationship between each attribute and
data value, and trade-offs between
variables.
Ahituv (1980) Timeliness (response time and
frequency), level of nondesired data,
value of data aggregation, format
(medium, data organization, data
representation), and ranking of data
importance
If problems of information systems
measurement can be overcome, methods
of evaluation exist.
Data dimensions are useful for the relative comparison of similar data sets.
This model lends itself well to survey questions. It allows simple and straightforward evaluation of key data
dimensions by business users.
Data dimensions can extend the valuation approach of other models. For example, the aggregation of data—
viewed as a strength of the economic model—using high-quality data is more beneficial than similar
aggregation using lower quality data. Similarly, the buying and selling of data is highly dependent on factors
such as data accuracy and timeliness.
This model fosters a standard definition of data dimensions, which will lead to wider adoption. This
reinforces investment into data management, ultimately creating better, more consistent data.
Some dimensions feed into other dimensions. For example, timeliness, accuracy, lifecycle, and others are
likely factors of both cost and utility, two of the most universal and useful dimensions. Being able to break
down and compare cost and utility in these terms allows for concise appraisals of data valuation.
Harvard Data Science Review • Issu e 5.1, Winter 2023 A Review of Data Valuation Approaches an d B uildin g and Scoring a Data Valuation Model
17
4. Building and Scoring a Dimensional Data Valuation Model
For the second part of our research, we focused on building a dimensional data valuation model that expands
on prior models.6
We designed a survey of about 30 questions around an extended set of dimensions, both intrinsic to data (e.g.,
data quality) and contextual (e.g., data usage). For data, we leveraged three types of data sets: COVID-19 data,
flight scheduling and navigation data, and voter data. We examined two use cases:
To define dimensions, we leveraged the research described in Sections 2 and 3 and expanded on that research.
As a result, we created questions on cost, age, and ownership and added other questions around dimensions
like privacy, licensing restrictions, and volume and variety.
We used our professional data management experience to apply a score to each question, weighted that score,
and, in some cases, scored a data set from different perspectives. We explain details on scoring in Section 4.2.
In the case of flight scheduling and navigation data, we vetted the results with the data set owners, as they were
internal to our company.
Value may vary considerably based on factors such as who uses the data and for what purpose it is used. For
example, fraud detection relies on near-real-time data at the cost of data quality. Alternatively, an analysis of
purchasing history using the same data demands higher data quality and can afford significant latency.
Similar data sets may be nonfungible. In some situations, a variety of data sets may contain similar but
slightly different information and thus may hold different values. Because of this nonsimilarity, data assets
cannot always be easily compared or substituted (Yousif, 2015).
Data value in this model is often measured via survey questions. Even if we can clearly define each
dimension and how it is measured, valuation is subject to interpretation. Different survey takers may
interpret the need for, say, data quality or stewardship differently.
This model is still evolving, and the surveys we found were small. Surveys are expensive to execute. The
goal is that we can, over time, leverage a much larger data set pool to standardize and streamline survey
questions sufficiently.
While it is possible to determine the relative value of two different data sets based on dimensions, translating
that value to monetary terms likely requires the secondary application of a market-based or economic model
to a given data set.
1. how the value of one data set compares to a similar data set (flight scheduling and navigation, voter data)
2. how a given data set adds value to existing data (COVID-19 data).
Harvard Data Science Review • Issu e 5.1, Winter 2023 A Review of Data Valuation Approaches an d B uildin g and Scoring a Data Valuation Model
18
4.1. Model Design
Our aim was to create a model that was simple, not too time-consuming, and usable by multiple stakeholders,
including business-side analysts, engineers, and executives. The following were our steps:
4.1.1. Survey Design
In our research, we saw limited design and execution of dimensional models. In one instance, a team asked 16
diverse participants to evaluate their own data sets based on a standard set of questions. They determined that
certain dimensions, like operational impact, replacement cost, and timeliness, were more significant
contributors to data valuation than others, such as competitive advantage or regulatory risk (Brennan et al.,
2019).
Our aim was to create a model that was simple, not too time-consuming, and usable by multiple stakeholders,
including business-side analysts, engineers, and executives. We felt that our model could serve as an initial
evaluation and point to areas (e.g., cost, usage, data quality) that the evaluator can explore in more detail, if
desired. We strove for simplicity and speed over detailed precision. While such a model may not provide all the
answers, it can indicate the relative value of a data set, highlight potential risks, and promote informed
decisions.
Our focus was on executing data valuation against two use cases. Our initial use case was aimed at comparing
the value of similar data sets. For this, we used two similar flight scheduling and navigation data sets as well as
two similar voter data sets. From experience, we knew this to be useful to any large organization that wants to
reduce the number of similar data sources or wants to replace an existing data set with a similar but better one
(e.g., less costly, more reliable, less maintenance).
We formulated our second use case by working with several internal projects. These projects needed to
evaluate adding new data to their existing data pool. For this, we used baseline COVID-19 data sets, to which
we added additional data. Thus, our second use case became a comparison of the value of existing data versus
that of existing data plus new data. The section on Data Sets discusses our data sets in more detail.
We seeded our model based on dimensions uncovered in prior research, particularly those reflected
repeatedly. We also relied on our experience in data management to confirm that certain dimensions, like
usage, cost, and data quality, were useful to data valuation.
We created a set of survey questions around our dimensions, which we subsequently used to score the value
of sample data sets.
We scored data sets to attain a value.
We expanded on prior research by adding new dimensions, adding weighted scoring, and scoring from
different perspectives.
Harvard Data Science Review • Issu e 5.1, Winter 2023 A Review of Data Valuation Approaches an d B uildin g and Scoring a Data Valuation Model
19
Through our research and data mapping, we had a good idea of which dimensions mattered. Our dimensions
expanded on prior work and evolved through repeated testing and interactions with stakeholders. In the end, we
found our best results by asking questions in the dimensions of ownership, cost, utility, age, privacy, data
quality, and volume and variety. Table 3 reflects our final set of dimensions.7
Tabl e 3. Data val uati on di mensions.
Next, we started formulating questions, and a related set of answers, within each dimension. For answers, we
assigned incremental point values, giving a single point to the answer we deemed least valuable and adding an
additional point for each answer we felt provided more value. We used our experience as researchers and our
diverse backgrounds working for both government agencies and industry to formulate the questions.
We then started to apply the questions and answers to our data sets and perspectives. During this process, we
looked for redundancies, clarity, and gaps in our questions. For redundancies, we ended up removing unwanted
or duplicate questions. For lack of clarity, we rephrased the questions and answers to make them easier to read
and understand. For gaps, we added missing questions. This process was repetitive in nature and led to a
refined set of questions and answers.
Dimension Description
Ownership Addresses outright data set ownership plus licensing
restrictions and service agreements
Cost Addresses the cost of data set acquisition, maintenance, and
replacement
Usage Addresses dataset mission criticality, ability to integrate, usage
scope, usage frequency, metadata, additional resources,
expected increase in demand, and diminishing value
Age Addresses refresh rate and available history
Privacy Addresses whether the dataset contains sensitive data such as
Personally Identifiable Information (PII) and Protected Health
Information, and meets privacy standards
Data quality Addresses completeness, accuracy, currency, consistency,
duplication, trustworthiness, and timeliness
Volume and variety Addresses the number of records, scope of information for
each record, and ability to answer needed questions
Harvard Data Science Review • Issu e 5.1, Winter 2023 A Review of Data Valuation Approaches an d B uildin g and Scoring a Data Valuation Model
20
4.1.2. Data Sets
For data, we leveraged three types of data sets. Specifically, for COVID-19, we leveraged cases/death rates,
testing, and vaccination data sets; for flight scheduling and navigation, we leveraged similar vendor-compiled
data sets; and for voter data, we leveraged data from two states: Ohio and North Carolina. These data sets were
either openly available (COVID-19, Johns Hopkins University , 2021; voter data, U.S. Election Assistance
Commission, 2020), or, in the case of flight scheduling and navigation data, accessible to us.
For flight scheduling, we used two owned data sets that were purchased at different times. For the navigation
data sets, we used one data set that was provided for free and a similar data set that was purchased. COVID-19
data sets were publicly available from the Johns Hopkins University Coronavirus Resource Center (JHU), and
voter data was publicly available from the U.S. Election Assistance Commission. One motivator for using both
purchased and free data sets was that we could explicitly factor the cost into our data valuation comparisons.
This allowed us, for example, to validate whether a more expensive data set had more data or higher data
quality. We used COVID-19 and voter data because it is freely available, popular, abundant, of good quality,
well suited to different perspectives, and easy to augment with variety, and it could commingle well. This
approach allowed us to examine a variety of comparisons.
4.2. Scoring
To score data valuation, we used our own experience working with industry and government. We first assigned
a raw score to each question. We based this raw score on the point value. We assigned a point value of 1 to the
answer contributing least to a data set’s value and increased, by one, the value of each subsequent answer with
the highest score for the answer contributing most to the data set’s value. Since some questions had more
answers than others, the possible number of points was not the same for all questions. To standardize this
process, we added a conversion factor so that questions with more answers were not automatically scored
higher than those with fewer answers. Finally, we added a weight factor between 1 and 5 to each score. This
served as an indicator of the importance of a question relative to all other questions.
In cases where there were different perspectives, we allowed for different weights by perspective. We arrived at
this design through trial and error, noticing that certain dimensions—or survey questions within a dimension—
may matter more for some organizations than for others.
We looked at the value of COVID-19 data from the perspectives of government, a hospital, JHU, and a public
service research organization. For flight scheduling and navigation data, we examined a vendor, government,
and a public service research organization.
The tables below are sample snapshots of our scoring:8
Table 4 reflects the data quality dimension for the comparison of two similar data sets, in this case flight
scheduling data. We can clearly see that data set 2 has higher data quality than data set 1. It is noteworthy
Harvard Data Science Review • Issu e 5.1, Winter 2023 A Review of Data Valuation Approaches an d B uildin g and Scoring a Data Valuation Model
21
that, separate from the data quality score, the cost, usage, age, and volume and variety scores for data set 2
are also higher.
Table 5 reflects a snapshot of our COVID-19 scoring in the volume and variety dimension. Here, we reflect
how adding testing and vaccination data to COVID-19 case and death rate data increases the valuation. It is
noteworthy that, separate from the data-quality score, the usage score is much higher for the combined data.
Additionally, cost and ownership are not factors since both data sets are public under the creative commons
license.
Table 6 shows how different organizations value the COVID-19 data set differently. Here again we show a
snapshot of volume and variety for our COVID-19 evaluation but for four different perspectives:
government, a research organization, a hospital, and JHU. These perspectives are based on our own best
guesses.
Harvard Data Science Review • Issu e 5.1, Winter 2023 A Review of Data Valuation Approaches an d B uildin g and Scoring a Data Valuation Model
24
Tabl e 6. Exampl e of di fferent perspecti ves.
Note. JHU = Johns Hopkins University Coronavirus Resource Center.
4.3. Findings
Our scoring verified typical assumptions. For example:
When comparing two similar data sets, higher cost also showed higher data quality, more usage, more
history, and greater volume and variety. This is reflected in the comparison between the two flight
scheduling data sets.
For flight navigation, data set 1 was freely licensed while data set 2 was purchased. The comparison
shows data set 2, while more expensive to acquire, rates significantly higher in usage, including the
inclusion of metadata, ease of integration with other data sets, inclusion of additional resources, and
popularity. Data set 2 also scored higher in data quality and volume and variety.
When comparing data sets that add value to existing data, the combined data sets scored higher. This is
reflected in the COVID-19 data, where combined cases/deaths and testing/vaccination data has
significantly higher usage than just the cases/deaths data. However, ours was a simple case, adding a small
data set to another, relatively small data set. We anticipate that adding a small data set to a large data pool
may not always result in this outcome.
The data sets each had their own strengths and weaknesses. This sometimes evened out valuations. For
example:
Harvard Data Science Review • Issu e 5.1, Winter 2023 A Review of Data Valuation Approaches an d B uildin g and Scoring a Data Valuation Model
25
For flight navigation, data set 2 was purchased, making it more valuable than the licensed data set 1.
However, data set 1 has less restrictions for sharing within the organization.
For COVID-19 data, adding testing/vaccination data to cases/deaths data yields significantly more variety
and volume. However, this added volume and variety increased the cost to maintain the data sets.
Context is important. For example:
For flight scheduling data, we scored three perspectives: vendor, government, and research organization.
One of our usage questions revolved around frequency of use, which is daily for the government and the
research organization but rare for the vendor. This implies a lower value for the vendor, which is
counterintuitive since the vendor stands to profit from the data set. Thus, the vendor might give this
question a low weight or no weight at all.
For privacy, we scored for Personally Identifiable Information (PII) and whether the data set met required
privacy compliance. In the case of voter data, both data sets contained PII, which we valued higher. Such
data is useful for a variety of analyses. However, meeting privacy compliance might require an
organization to mask PII data, in which case it may value masked data higher.
The ability or desire to answer new questions for COVID-19 data likely differs across stakeholders (e.g.,
government, research organization, hospital, and JHU). While we did not engage stakeholders from each
of these organizations, we assumed that COVID-19 data sets were more likely to be used for analytics by
the government and research organizations.9
Data sets are more valuable when accompanied by additional resources. For both the navigational and flight
scheduling data, the value of the data set increased when accompanied by a complete set of metadata and
other resources, such as code, data analysis, reports, or additional lookups. The same case applies to voter
data, where one of the data sets comes up with full metadata that explains all fields.
Our team experimented a lot with applying different weights. In the end, we applied weights that we thought
were reasonable. We also concluded that weights are very context specific. For example, cost may matter
much more to a particular stakeholder or in a particular context. We realized that weights may also differ by
perspective. While our weights fell between 1 and 5, we encourage users to experiment with weights in ways
that work in their context. The survey acts as a blueprint for stakeholders to register their professional
opinion on the value of data sets.
There were instances we were not able to investigate. For example:
Our scoring reflected that some dimensions mattered more than others (e.g., usage, data quality, volume
and variety). However, our sampling of data sets was small and differed in key ways (e.g., cost,
ownership). We would need to score a much larger data set sample to say with confidence that certain
dimensions or questions matter more in all cases.
As part of our study, we briefly experimented with dependencies. For example, one might start with
asking whether an organization owns a given data set and then evaluate other alternatives, such as cost,
usage, or data quality. We found that documenting such dependencies quickly leads to many complex
Harvard Data Science Review • Issu e 5.1, Winter 2023 A Review of Data Valuation Approaches an d B uildin g and Scoring a Data Valuation Model
26
5. Conclusions and Future Work
The first part of this article examines research into data valuation. We found many examples and were able to
construct a framework that grouped the three approaches into the following models:
Determining which model to leverage relies heavily on the given use case (e.g., purchasing a data-intensive
company, calculating economic impact of data policy, or internally valuing a data set). Our preliminary
research shows that each model is most useful in different circumstances and no approach alone is effective in
every case. We also found overlap between models (e.g., cost, utility, policy). Depending on the context, the
three models could conceivably be used in some combination.
threads without any evidence that beginning with one dimension/question before another one is more
correct than another approach.
For a given data set, raw scores that are inherent to data (e.g., data quality, privacy, volume and variety)
remained the same across stakeholders with different perspectives. Only scores for dimensions that are
separate from the data (e.g., ownership, usage, cost) change. That said, the definition of data quality, for
example, is how well data is suited for intended use. This may render a given data set a better data-quality
fit for one organization than for another. Similar concepts could be examined for other dimensions, such
as privacy and volume and variety.
We realized that the value of a given data set may differ for stakeholders. This led us first to add different
perspectives and subsequently to include weights for each perspective. It also led us to unanswered
questions. For example, with COVID-19 data, JHU obtains data, wrangles it, and then makes it freely
available to others who do not have to wrangle the data in the same way. However, folding data wrangling
into acquisition cost proved difficult, since we value freely acquired data as more valuable in the cost
dimension. We tried to reverse valuation scoring here, giving the highest score to data that is costly to
acquire. While this solved the problem for the COVID-19 model with different perspectives, the approach
did not hold for other data set valuations. We were not able to solve this paradox of highly valuing free
data while also accounting for the value of sunk cost.
We were able to determine the relative value of two different data sets based on dimensions using a score-
based approach. Translating that value into monetary terms likely requires the secondary application of a
market-based or economic model to a given data set.
We anticipate that, given a sufficient database of survey responses, it will be possible to apply artificial
intelligence and machine learning to these surveys so that they can be more automatically completed. We
understand that this requires logging many additional use cases.
market-based models, which calculate data’s value in terms of cost and revenue/profit
economic models, which estimate data’s value in terms of economic and public benefit
dimensional models, which value based on data dimensions like data quality and ownership—both data-
specific and contextual
Harvard Data Science Review • Issu e 5.1, Winter 2023 A Review of Data Valuation Approaches an d B uildin g and Scoring a Data Valuation Model
27
For the second part of our research, we built a simple tool that we think can help organizations quickly and
proficiently assess the value of data sets for specific use cases using a small, consistent set of dimensions. We
focused on the dimensional model since this model allowed us to score the value of two similar data sets or the
value of adding a data set to an existing data pool. Based on our experience and working with several internal
projects, these use cases reflected real-world needs. We evaluated these two use cases against four different
data sets, and we examined multiple perspectives against two of those data sets.
Our model shows that dimensions can be used effectively to compare two similar data sets or to evaluate the
addition of a data set to an existing data pool. Our model also exhibits that context and perspective matter,
based on factors like how the data set can and will be used. The dimensional model falls short of being able to
value data in monetary terms. This likely requires the additional application of a market-based or economic
model. Our model expands on previous dimensional models by suggesting a larger set of data valuation
dimensions and applying weighting and perspectives to scoring. The model will benefit from being applied
against additional data sets and use cases and then being subsequently evolved.
The demand for data valuation is fast growing.10 We see our research as one step toward a data valuation
methodology that includes survey questions, feedback loops, and—eventually—a maturity model. Our intent is
to expand our work to more data sets, both to verify as well as to enhance our model. We would like to
investigate some of the stated items that we were not able to focus on in our current research. We would very
much like to explore perspectives more deeply by, for example, working more directly with the JHU team. We
plan to make all details of our current model, as well as well as future versions, publicly available for others to
leverage, collaborate, and enhance.
Acknowledgments
We thank Dr. Nitin Naik and Dr. Kris Rosjford for useful insights and discussions. We thank the MITRE
Corporation Innovation Program (MIP) for funding this research.
Disclosure Statement
The views, opinions, and/or findings contained in this report are those of The MITRE Corporation and should
not be construed as an official government position, policy, or decision, unless designated by other
documentation.
Approved for Public Release. Distribution Unlimited. Public Release Case Number: 21-3464.
Harvard Data Science Review • Issu e 5.1, Winter 2023 A Review of Data Valuation Approaches an d B uildin g and Scoring a Data Valuation Model
28
References
Acil Allen Consulting. (2015, December). The value of earth observations from space to Australia. Spatial
Information Systems Research Ltd. https://www.crcsi.com.au/assets/Program-2/The-Value-of-Earth-
Observations-from-Space-to-Australia-ACIL-Allen-FINAL-20151207.pdf
Adams, E., & Gounardes, A. (2020, June 1). A tax on data could fix New York’s budget. The Wall Street
Journal. https://www.wsj.com/articles/a-tax-on-data-could-fix-new-yorks-budget-11591053159
Ahituv, N. (1980, January). A systematic approach towards assessing the value of an information system. MIS
Quarterly, 4(4), 61–75. https://doi.org/10.2307/248961
Ahituv, N. (1989). Assessing the value of information: Problems and approaches. ICIS 1989 Proceedings, 45.
https://aisel.aisnet.org/cgi/viewcontent.cgi?article=1007&context=icis1989
Bakker, C. (2013, April). Valuing the census. Statistics New Zealand.
https://www.stats.govt.nz/assets/Research/Valuing-the-Census/valuing-the-census.pdf
Bergemann, D., & Bonatti, A. (2019, March). The economics of social data: An introduction (Cowles
Foundation Discussion Paper No. 2171). Yale University Cowles Foundation for Research in Economics.
https://cowles.yale.edu/sites/default/files/files/pub/d21/d2171.pdf
Brennan, R., Attard, J., & Helfert, M. (2018). Management of data value chains, a value monitoring capability
maturity model. In Proceedings of the 20th International Conference on Enterprise Information Systems: Vol.
2: ICEIS (pp. 573–584). SCITEPRESS. https://doi.org/10.5220/0006684805730584
Brennan, R., Attard, J., Petkov, P., Nagle, T., & Helfert, M. (2019). Exploring data value assessment: A survey
method and investigation of the perceived relative importance of data value dimensions. In Proceedings of the
21st International Conference on Enterprise Information Systems: Vol. 1: ICEIS (pp. 200–207). SCITEPRESS.
https://doi.org/10.5220/0007723402000207
Brynjolfsson, E., Collis, A., & Eggers, F. (2019, March 26). Using massive online choice experiments to
measure changes in well-being. PNAS, 116(15), 7250–7255. https://doi.org/10.1073/pnas.1815663116
California Consumer Privacy Act (2018), https://leginfo.legislature.ca.gov/faces/codes_displayText.xhtml?
division=3.&part=4.&lawCode=CIV&title=1.81.5
CMMI Institute. (2022). Data Management Maturity (DMM). https://stage.cmmiinstitute.com/dmm
Coyle, D., Diepeveen, S., Wdowin, J., Tennison, J., & Kay, L. (2020, February). The value of data. Bennet
Institute for Public Policy, University of Cambridge. https://www.bennettinstitute.cam.ac.uk/wp-
content/uploads/2020/12/Value_of_data_Policy_Implications_Report_26_Feb_ok4noWn.pdf
Harvard Data Science Review • Issu e 5.1, Winter 2023 A Review of Data Valuation Approaches an d B uildin g and Scoring a Data Valuation Model
29
DAMA International. (2020, July). Body of knowledge. https://www.dama.org/content/body-knowledge
Data Cabinet. (2018, October). The federal government data maturity model. GSA.
https://my.usgs.gov/confluence/download/attachments/624464994/Federal%20Government%20Data%20Matur
ity%20Model.pdf?api=v2
Deloitte. (2013, May). Market assessment of public sector information. Department for Business Innovation &
Skills.
https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/198905/bis-
13-743-market-assessment-of-public-sector-information.pdf
Dilmegani, C. (2022, September 12). Data marketplaces: What, why, how, types, benefits, vendors. AI
Multiple. https://research.aimultiple.com/data-marketplace/
Duch-Brown, N., Martens, B., & Mueller-Langer, F. (2017). The economics of ownership, access and trade in
digital data (JRC Digital Economy Working Paper No. 2017-01). EU Commission. https://joint-research-
centre.ec.europa.eu/system/files/2017-03/jrc104756.pdf
Federal Trade Commission. (2014, May). Data brokers—A call for transparency and accountability.
https://www.ftc.gov/system/files/documents/reports/data-brokers-call-transparency-accountability-report-
federal-trade-commission-may-2014/140527databrokerreport.pdf
Fleckenstein, M., & Fellows, L. (2018). Modern Data Strategy. Springer.
https://link.springer.com/book/10.1007/978-3-319-68993-7
General Services Administration. (2021, October 14). Office of Shared Solutions and Performance
Improvement (OSSPI); Chief Data Officers Council (CDO); Request for Information on Behalf of the Federal
Chief Data Officers Council, 86 Fed. Reg. 57147, 57147–57149.
https://www.federalregister.gov/documents/2021/10/14/2021-22267/office-of-shared-solutions-and-
performance-improvement-osspi-chief-data-officers-council-cdo-request
Harwich, E., & Lasko-Skinner, R. (2018, December). Making NHS data work for everyone. Reform.
https://www.abhi.org.uk/media/2272/reform-nhsdata.pdf
Heckman, J. R., Boehmer, E. L., Peters, E. H., Davaloo, M., & Kurup, N. G. (2015). A pricing model for data
markets. In iConference 2015 Proceedings. Core. https://core.ac.uk/download/pdf/158298935.pdf
Hempel, J. (2017, March 14). Now we know why Microsoft bought LinkedIn. Wired.
https://www.wired.com/2017/03/now-we-know-why-microsoft-bought-linkedin/
Harvard Data Science Review • Issu e 5.1, Winter 2023 A Review of Data Valuation Approaches an d B uildin g and Scoring a Data Valuation Model
30
Higson, C., & Waltho, D. (2010). Valuing information as an asset. EURIM.
http://faculty.london.edu/chigson/research/InformationAsset.pdf
HM Treasury. (2018, October 29). Getting smart about intellectual property and intangible assets.
https://www.gov.uk/government/publications/getting-smart-about-intellectual-property-and-intangible-assets
Internal Revenue Service. (2020, September 22). Internal revenue manual – 4.48.5 Intangible property
valuation guidelines. http://www.irs.gov/irm/part4/irm_04-048-005.html
Johns Hopkins University. (2021, October). Coronavirus Resource Center.
https://coronavirus.jhu.edu/about/how-to-use-our-data
Jones, C. I., & Tonetti, C. (2019, August). Nonrivalry and the economics of data (Working Paper No. 3716).
Stanford University. https://www.gsb.stanford.edu/faculty-research/working-papers/nonrivalry-economics-data
Keller, S.A., Shipp, S., Schroeder, A., Korkmaz, G., “Doing Data Science: A Framework and Case Study”,
Harvard Data Science Review, February 21, 2020. https://hdsr.mitpress.mit.edu/pub/hnptx6lq/release/10
Laney, D. (2018). Infonomics: How to monetize, manage, and measure information as an asset for competitive
advantage. Gartner Research.
Laney, D. (2021, February 1). Data valuation paves the road to the future for Highways England. Forbes.
https://www.forbes.com/sites/douglaslaney/2021/02/01/data-valuation-paves-the-road-to-the-future-for-
highways-england/?sh=88d6039612c0
Microsoft. (2022). Microsoft buys LinkedIn. https://news.microsoft.com/announcement/microsoft-buys-
linkedin/#:~:text=Microsoft's%20%2426.2%2Dbillion%20acquisition%20of,software%2C%20such%20as%20
Office%20365.&text=LinkedIn%20retained%20its%20distinct%20brand,to%20Microsoft%20CEO%20Satya
%20Nadella
Moody, D., & Walsh, P. (1999). Measuring the value of information – An Asset valuation approach. ECIS.
https://www.semanticscholar.org/paper/Measuring-the-Value-Of-Information-An-Asset-Moody-
Walsh/bc8ee8f7e8509db17e85f8108d41ef3bed5f13cc
Nagle, T., & Sammon, D. (2017) The data value map: A framework for developing shared understanding on
data initiatives. In ECIS 2017: Proceedings of the 25th European Conference on Information Systems (pp.
1439–1452). ECIS. https://aisel.aisnet.org/ecis2017_rp/93
Najjar, M. S., & Kettinger, W. J. (2013) Data monetization: Lessons from a retailer’s journey. MIS Quarterly
Executive, 12(4), Article 4. https://aisel.aisnet.org/misqe/vol12/iss4/4
Harvard Data Science Review • Issu e 5.1, Winter 2023 A Review of Data Valuation Approaches an d B uildin g and Scoring a Data Valuation Model
31
Nash, K. S. (2014, June 13). CIOs consider putting a price tag on data. CIO.
https://www.cio.com/article/291030/leadership-management-cios-consider-putting-a-price-tag-on-data.html
NEJM Catalyst. (2018, January 1). Healthcare big data and the promise of value-based care.
https://catalyst.nejm.org/doi/full/10.1056/CAT.18.0290
Office of Fair Trading. (2006, December). The commercial use of public information.
https://webarchive.nationalarchives.gov.uk/ukgwa/20140402164714/http://www.oft.gov.uk/OFTwork/publicatio
ns/publication-categories/reports/consumer-protection/oft861
Open Data Watch. (2021, October). Value of data inventory.
https://docs.google.com/spreadsheets/d/1QRNZUKIrwKxq7J6EEfA6fRLpjYUevaNDpXMbwqx_Ogw/edit#gi
d=37279104
Organisation for Economic Co-operation and Development. (2019). Measuring the digital transformation: A
roadmap for the future. OECD Publishing. https://doi.org/10.1787/9789264311992-en
Organisation for Economic Co-operation and Development, World Trade Organization, & International
Monetary Fund. (2020). Handbook on measuring digital trade (Version 1).
https://www.oecd.org/sdd/its/Handbook-on-Measuring-Digital-Trade-Version-1.pdf
Own Your Own Data Act, S. 806, 116th Congress. (2019). https://www.congress.gov/bill/116th-
congress/senate-bill/806
Price Waterhouse Coopers. (2019, April 4). Leading organizations don’t just have a data strategy, they have a
data trust strategy. https://www.pwc.com/gx/en/news-room/press-releases/2019/digital-trust-insights-data-
trust.html
Ritter, J., & Mayer, A. (2018). Regulating data as property: A new construct for moving forward. Duke Law &
Technology Review, 16(1), 220–277. https://scholarship.law.duke.edu/dltr/vol16/iss1/7/
Sajko, M., Rabuzin, K., & Bača, M. (2006). How to calculate information value for effective security risk
assessment. Journal of Information and Organizational Sciences, 30(2).
https://jios.foi.hr/index.php/jios/article/view/22
Shelton Leipzig, D. (2019). Transform – Data as a pre-tangible asset for a post-data world: The leader’s
playbook.
Short, J. E., & Todd, S. (2017, March 3). What’s your data worth? MIT Sloan Management Review.
https://sloanreview.mit.edu/article/whats-your-data-worth/
Harvard Data Science Review • Issu e 5.1, Winter 2023 A Review of Data Valuation Approaches an d B uildin g and Scoring a Data Valuation Model
32
Steele, M. L. (2017, May). The great failure of the IPXI experiment. Cornell Law Review 4(102), Article 5.
https://scholarship.law.cornell.edu/cgi/viewcontent.cgi?article=4730&context=clr
Taylor, L. (2016). The ethics of big data as a public good: Which public? Whose good? Philosophical
Transactions of the Royal Society A, 374(2083). https://doi.org/10.1098/rsta.2016.0126
Todd, S. (2015, August 11). Insurance and data value. Information Playground.
https://stevetodd.typepad.com/my_weblog/2015/08/insurance-and-data-value.html
U.K. Copyright and Rights in Databases Regulations of 1997. (1997). U.K. Statutory Instruments, No. 3032.
http://www.legislation.gov.uk/uksi/1997/3032/contents/made
Ulloa, J. (2019, May 5). Newsom wants companies collecting personal data to share the wealth with
Californians. The Los Angeles Times. https://www.latimes.com/politics/la-pol-ca-gavin-newsom-california-data-
dividend-20190505-story.html
U. S. Department of Commerce. (2016, September 30). Measuring the value of cross-border data flows.
https://www.commerce.gov/data-and-reports/reports/2016/09/measuring-value-cross-border-data-flows
U.S. Election Assistance Commission. (2020, October 29). Availability of state voter file and confidential
information. https://www.eac.gov/sites/default/files/voters/Available_Voter_File_Information.pdf
Viscusi, G., & Batini, C. (2017, March 28). Digital information asset evaluation: Characteristics and
dimensions (Working Paper). EPFL and University of Milano-Bicocca.
Yousif, M. (2015). The rise of data capital. IEEE Cloud Computing, 2(2), 4–4,
https://doi.org/10.1109/MCC.2015.39
Appendices
Appendix A. Summary of Model Strengths and Weaknesses
Table A1 summarizes the strengths and challenges of each model.
Tabl e A1. Model strengt hs and challeng es.
Approach/Model Strengths Challenges
Harvard Data Science Review • Issu e 5.1, Winter 2023 A Review of Data Valuation Approaches an d B uildin g and Scoring a Data Valuation Model
33
All models De facto ownership of data, as the
often-allowable norm, incentivizes
data markets (Duch-Brown et al.,
2017).
Standards to determine data value do
not exist.
May inadequately reflect data value
due to lack of intellectual property
protection (OECD et al., 2020).
Data value is highly context
dependent; the same data set may be
valued differently for different use
cases.
Data valuation is speculative.
Legal ownership of data is not yet
clearly defined (Duch-Brown et al.,
2017; Own Your Own Data Act,
2019).
Market-based model Income based:
Cost based:
Stock market based:
Easy data valuation in terms of
potential revenue or profit.
Free services in exchange for data
incentivizes data markets.
Free data collection has led to heavily
used free services and products,
indirectly attributing value to such
data (Brynjolfsson et al., 2019).11,12
Data exchange markets could
minimize transactional costs while
maintaining competition (Steele,
2017).
Regularly used for calculating cost of
security and insurance.
Easy use of stock price as an indicator
for valuing data for data-intensive
firms.
Income based:
Cost based:
Stock market based:
Buyers and sellers are not sufficient to
ensure transaction prices are closely
related to economic value (Coyle et
al., 2020).
Lack of compensation for freely
provided data; individual cost not
fully recognized (Bergemann &
Bonatti, 2019).13
The marketplace is very small. Data
valuation is limited to data-intensive
firms.14
Inability or unwillingness to enter
local markets due to legal restrictions
(e.g., local storage, privacy,
censorship, favoritism, piracy,
hacking; U.S. Department of
Commerce, 2016).
May reflect other factors besides data
(e.g., talent acquisition).
Harvard Data Science Review • Issu e 5.1, Winter 2023 A Review of Data Valuation Approaches an d B uildin g and Scoring a Data Valuation Model
34
Appendix B. Key Differences and Similarities Between Data and Traditional
Assets
One of the things that makes data valuation particularly difficult is that data is, in some ways, different from
physical assets. For example:
Economic model Has the ability to positively impact
data value for the public sector
through policies or laws (e.g., through
fostering competition; Coyle et al.,
2020).
May contribute to direct or indirect
public income through a data dividend
or taxation (Adams & Gounardes,
2019; Shelton Leipzig, 2019).
Provides societal benefits through
externalities, such as open data
(Taylor, 2016), broad data
aggregation, and data privacy.
The value of data is based on
contingencies such as projected use of
data and job increases.15
The value of data in activities like
unpaid data creation, data reuse, and
cross-border flow may be difficult to
measure (U.S. Department of
Commerce, 2016; OECD, 2019).
Can negatively affect the value of
data, through policies and laws,
discouraging competition and wide
data reuse (Jones & Tonetti, 2019).
Policies vary from one location (e.g.,
country) to another.
Dimensional model Classification-based data valuation
may be useful for relative data value
comparisons within a given context
when there are no pricing options.
Can be used in combination with other
models to enhance those models.
May be able to apply a standard
category hierarchy to aid data value
determination (Brennan et al., 2018;
Jones & Tonetti, 2019; Sajko et al.,
2006).
Classification-based data valuation
can be complex and is highly context
dependent (Short & Todd, 2017;
Viscusi & Batini, 2017).
Data value estimates based on survey
questions can be inconsistent over
time (Brennan et al., 2019).
Data is nonrivalrous, as it can be consumed simultaneously by multiple parties. However, this must be seen
in context, as others argue data value can be diminished through broad consumption (Nash, 2014).
Data is an intermediate good. It reveals ways in which to derive value from other assets.
Data is freely generated and traded. Personal data that individuals provide to companies for free may include
demographics, financial data, health data, activity data, consumption data, and more. The discussion of pros
and cons on corporate use of freely provided data as well as taxation of this data (Adams & Gounardes,
2020) is evolving.
Harvard Data Science Review • Issu e 5.1, Winter 2023 A Review of Data Valuation Approaches an d B uildin g and Scoring a Data Valuation Model
35
Data valuation is also similar to valuing physical assets in some ways. Below, we highlight some of these
similarities:
©2023 The MITRE Corporation. All rights reserved.
Footnotes
Data ownership is an evolving concept. Some locations, like the European Union (EU; Duch-Brown et al.,
2017)16 and United Kingdom,17 have passed database copyright laws. Mostly, there have been only studies,
calls for guidelines, and proposals on data ownership (Ritter & Mayer, 2018).
Data value is impacted by externalities. Data often gains value from being combined with other data. This
has been shown, for example, with improved ability to diagnose health problems.18
Data value is impacted by law. Through regulation, the law makes certain data, particularly personal and
sensitive data, less accessible. This forces companies to treat personal and sensitive data in more costly ways
and likely increases its value.
Data value is impacted by exclusivity. There is much debate about data ownership, particularly considering
all the freely gathered data from people, by a handful of very large companies (e.g., Google, Facebook,
Apple, Amazon). While these companies offer valuable services and products sometimes for free, they also
create barriers of entry, triggering questions about anticompetitive practices.
1. Calculated based on $70 billion (2019 revenue) divided by 2.6 billion (number of monthly users) =
$26.92. ↩
2. Calculations were based on an adapted bottom-up methodology outlined by the Office of Fair Trading
(2006). ↩
3. Information asymmetries occur when two people have different information about the same thing. If one
has additional inside information, he or she can leverage or take advantage of that information. ↩
4. See, for example, speculation on Reid Hoffman’s value to Microsoft as part of its LinkedIn acquisition in
Hempel (2017). ↩
5. Examples of data brokers include Acxiom, Corelogic, Datalogix, eBureau, ID Analytics, Intelius,
PeekYou, Rapleaf, and Recorded Future. ↩
6. For the second part of our research, we were joined by Dr. Nektaria Tryfona. ↩
7. For the full set of survey questions, please contact the authors. ↩
8. The full scoring is available from the authors upon request. ↩
Harvard Data Science Review • Issu e 5.1, Winter 2023 A Review of Data Valuation Approaches an d B uildin g and Scoring a Data Valuation Model
36
9. We assumed that hospitals were less focused on analytics and more on health care. In the case of JHU, we
assumed that its primary function is to gather COVID-19 data and leave it to others to perform analytics. ↩
10. For example, the Federal Chief Data Officers Council recently released a Request for Information in
which it dedicated a section to data “Value and Maturity.” See General Services Administration (2021). ↩
11. Note that Brynjolfsson et al. (2019) explore individuals’ renumeration to forgo a social media platform
such as Facebook. ↩
12. Research indicates that free data-based products provide extensive value to customers and can be
estimated from their quality-adjusted prices of devices (phones and computers) and their data usage intensity.
↩
13. Note Bergemann and Bonatti (2019) primarily express individual cost of social data gleaned from many
individuals as a “Con” in terms of organizations receiving advertising revenue and not passing to consumers
the full value of their data; they admit that a tax, similar to a carbon tax, placed on the marginal use of
individual data may disincentivize organizations from creating a user information marketplace, and it is
unclear whether this is a benefit or a detriment to consumers. ↩
14. There is a lack of transparency because data firms do not provide a cost-based map of their products.
They have set prices, and the market determines if those prices are right. ↩
15. Economic models typically project the value of data made available for public consumption. It is
unclear to what extent the current value of data can be determined using economic models. ↩
16. As of 2017, partial and limited ownership rights to data were defined in the EU Database Directive
(1996) and the General Regulation for the Protection of Rights in personal data (2016), combined with some
provisions in the Trade Secrets Protection Directive (2016) and in general contract law. ↩
17. See U.K. Copyright and Rights in Databases Regulations of 1997, which allow copyright of databases. ↩
18. See, for example, various approaches to harnessing big data, including from personal health devices, in
NEJM Catalyst (2018). ↩