Conference PaperPDF Available

Detecting price and search discrimination on the Internet

Authors:
  • Guavus Inc.

Abstract and Figures

Price discrimination, setting the price of a given product for each customer individually according to his valuation for it, can benefit from extensive information collected online on the customers and thus contribute to the profitability of e-commerce services. Another way to discriminate among customers with different willingness to pay is to steer them towards different sets of products when they search within a product category (i.e., search discrimination). Our main contribution in this paper is to empirically demonstrate the existence of signs of both price and search discrimination on the Internet, and to uncover the information vectors used to facilitate them. Supported by our findings, we outline the design of a large-scale, distributed watchdog system that al-lows users to detect discriminatory practices.
Content may be subject to copyright.
Detecting price and search discrimination on the Internet
Jakub Mikians, László Gyarmati?, Vijay Erramilli?, Nikolaos Laoutaris?
Universitat Politecnica de Catalunya,?Telefonica Research
jmikians@ac.upc.edu,{laszlo,vijay,nikos}@tid.es
ABSTRACT
Price discrimination, setting the price of a given product for
each customer individually according to his valuation for
it, can benefit from extensive information collected online
on the customers and thus contribute to the profitability of
e-commerce services. Another way to discriminate among
customers with different willingness to pay is to steer them
towards different sets of products when they search within
a product category (i.e., search discrimination). Our main
contribution in this paper is to empirically demonstrate the
existence of signs of both price and search discrimination
on the Internet, and to uncover the information vectors used
to facilitate them. Supported by our findings, we outline the
design of a large-scale, distributed watchdog system that al-
lows users to detect discriminatory practices.
Categories and Subject Descriptors
I2 [Information Systems]: World wide web
General Terms
Measurement
Keywords
Economics, Privacy, Search, E-Commerce, Price Dis-
crimination, Search Discrimination
1. INTRODUCTION
The predominant economic model behind most Inter-
net services is to offer the service for free, attract users,
collect information about and monitor these users, and
monetize this information. The collection of personal in-
formation is done using increasingly sophisticated mech-
anisms [12] and this has attracted the attention of pri-
vacy advocates, regulators, and the mainstream media.
A natural question to ask is: what is done with all the
collected information? And the popular answer is, the
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
Hotnets ’12, October 29–30, 2012, Seattle, WA, USA.
Copyright 2012 ACM 978-1-4503-1776-4/10/12 ...$10.00.
information is being used increasingly to drive targeted
advertising.
Another hypothesis put forward for the wide-scale
collection of information, and the related “erosion of
privacy” is to facilitate price discrimination [14]. Price
discrimination1is defined as the ability to price a prod-
uct on a per customer basis, mostly using personal at-
tributes of the customer. The collected information can
be used to estimate the price a customer is willing to
pay. Thus, it can have a huge impact on the e-commerce
business, whose estimated market size is $961B [9]. The
question we deal with in this paper is, “does price dis-
crimination, facilitated by personal information, exist
on the Internet?”. In addition to price discrimination,
users can also be subjected to search discrimination,
when users with a particular profile are steered towards
appropriately priced products.
Detecting price or search discrimination online is not
trivial. First, we need to decide which information vec-
tors are relevant and can cause or trigger discrimination,
if it exists. We look into three distinct vectors: techno-
logical differences, geographical location, and personal
information (Sec. 3). For system-based differences, the
question is whether the underlying system used to query
for prices make a difference? For location, we check
whether the price for exactly the same product, sold by
the same online site at the same time, differs based on
the location of the originating query. And for personal
information, we are interested if there is a difference in
prices shown to users who have certain traits (affluent vs
budget conscious). Second, we need to be able to finely
control the information that is exposed while searching
for price or search discrimination, to claim causality.
In order to uncover price/search discrimination while
addressing these concerns, we develop a comprehensive
methodology and build a distributed measurement sys-
tem based on the methodology.
Using our distributed infrastructure, we collect data
from multiple vantage points over a period of 20 days
(early July 2012), on a set of 200 online vendors. Our
1Price discrimination is an established term of economics
literature and we use it as such. It does not imply any opin-
ions of the authors regarding price setting policies of any
third parties.
1
main results are:
We find no evidence of price/search discrimination
for system based differences, i.e., different OS/Browser
combinations do not seem to impact prices.
We find price differences based on the geographical
location of the customer, primarily on digital products,
up to 166%—e-books and video games. In addition, we
also see price differences for products on a popular office
supplies vendor site, when the queries originate from
different locations within the same state (MA, USA).
However, we cannot claim with certainty that these dif-
ferences are due to price discrimination, since digital
rights costs or competition could offer alternative inter-
pretations.
When we use trained personas that possess certain at-
tributes (affluent, budget conscious), we find evidence
of search discrimination. For some products, we observe
prices of products that were shown to be up to 4 times
higher for affluent than for budget conscious customers.
We also observe this on a popular online hotels/tickets
vendor.
We find signs of price discrimination when we consider
the origin URL of the user. For some product categories,
when a user visits a vendor site via a discount aggre-
gator site, the prices can be 23% lower as compared to
visiting the same vendor site directly.
2. BACKGROUND
Price Discrimination. Price discrimination is the
practice of pricing the same product differently to
different buyers, depending on the maximum price
(reservation price) that each respective buyer is willing
to pay. For example, Alice and Bob want to buy the
same type of computer monitor and visit the same
e-commerce site at approximately the same time. Alice
receives $179 as the price while Bob gets $199. The
seller offers different prices to them by profiling them
(see Sec. 3.4 for details) and realizing that Alice has
already visited many electronics’ websites and therefore
might be more price sensitive than Bob.
From an economics point of view, price discrimination
is the optimal method of pricing and increases social
welfare [19, 3, 13]. Despite its theoretical merits, buy-
ers generally dislike paying different prices than their
peers for the same product/service. From a legal point
of view, the Robinson-Patman Act prohibits price dis-
crimination in the US under certain circumstances [2]
but the possibility is largely open in the current largely
unregulated cross-border electronic retail market on the
Internet. Recently, a new congress bill aims to make
price discrimination on the Internet transparent to end
users [16].
Historically, price discrimination has been practiced
in myriad industries such as the US railways in the 19th
century, flight tickets, personal computers and printers,
and college fees [14]. Besides these examples, some mi-
nor instances of price discrimination have emerged in
the last decade on the Internet as well, e.g., Amazon
showed different prices to customers [17], and more re-
cently, Orbitz displayed search results in different or-
ders to some groups of customers [18]. We emphasize
that price discrimination and price dispersion2are dif-
ferent concepts. Price dispersion occurs when the same
product has different prices across different stores for
reasons other than the intrinsic value of the product,
e.g., because one store wants to reduce its stock or has
had a better deal with the manufacturer.
Search Discrimination. Another way to extract more
revenue from buyers with a higher willingness to pay is
to return more expensive products when they search
within a product category. Search discrimination is dif-
ferent from price discrimination because instead of op-
erating on one product, it operates on multiple prod-
ucts trying to steer buyers towards an appropriate price
range. Ranking of search results greatly impacts the
result eventually chosen by the user; users seldom go
beyond the first page of results [11]. Hence the search
provider, whether a generic search engine or search on
e-commerce sites, is in a position enable such discrim-
ination. For example, Alice and Bob are searching for
a hotel in Redmond during the same days and for the
same type of room. Their searches are launched at ap-
proximately the same time. A booking site offers Al-
ice three hotels with prices $180, $200, and $220, while
Bob receives quotes from a slightly different set of hotels
with prices $160, $180, and $200. This can happen if the
site has access to historic data that indicates that Al-
ice tends to stay in more expensive hotels, or by other
means such as system information [18]. While search
personalization is not entirely new3, in this paper we
draw attention to the economic ramifications of it, and
in particular study if the information vectors that cause
price discrimination also play a role in search discrimi-
nation.
Information leading to discrimination. In order to
detect discrimination—price or search—we first need
to fix the different axes along which the discrimination
can take place. We consider three distinct sources of
information:
Technological/System based differences: Does the
combination of OS and/or browser lead to being offered
different prices?
Geographic Location: Does the location of the origi-
nating query for the same product and from the same
vendor/site play a role? Note that we are not inter-
ested in the same product sold via local affiliates—for
instance Amazon has sites in multiple countries, often
selling the same products.
Personal Information: Does personal information,
collected and inferred via behavioral tracking meth-
ods, impact prices? For instance, does an ‘affluent’
user see higher prices for the same product than a
‘budget-conscious’ user?
2http://en.wikipedia.org/wiki/Price_dispersion
3With new implications being discovered, for instance the
Filter Bubble concept [6]
2
Requirements of the system. Based on the defini-
tion of price and search discrimination, as well as the
axes along which we seek to uncover discrimination, we
set the following requirements for our methodology:
Sanitary and controlled system: In order to attribute
causality, we need to have clean, sanitary, and controlled
systems. We should be able to test for one of the axes
described above, while keeping the others fixed. For all
our measurements, we keep time fixed, i.e., request all
price quotations at nearly the same time.
Distributed system: In order to have indicative results,
we need a distributed system where we can collect mea-
surements from multiple vantage points.
Automated: To scale the study in terms of customers
and vendors, we need to automate the process.
3. METHODOLOGY
The test that we employ while searching for price dis-
crimination is to select a website, an associated product,
and then study whether the website returns dynamic
prices based on who the potential buyer is. In all the
experiments, we compare the results (price or search)
retrieved simultaneously to exclude the impact of time
from the analyses, i.e., all measurements for a single
product happen within a small time window.
3.1 Generic measurement framework
We have developed a measurement framework that
uses three components: browsers, a measurement
server, and a proxy server.4The browser(s) run on
separate clean local machines, with the possibility to
run over different OSes. To access the pages, we use
a JavaScript (JS) application that loads the pages in
separate IFrames. We use browsers and JS to ensure we
can browse sites that need full features (as opposed to
issuing wget’s) and to ensure cross-browser compliance.
The measurement server controls the JS robot.
Role of the Proxy. We used a proxy for three rea-
sons: (i) We are interested in extracting prices embed-
ded in the pages. Unfortunately JS cannot access and
store the content of the opened pages due to its internal
Same Origin Policy. Hence we configured the browsers
to use the proxy server. The proxy then monitored and
stored all the traffic going through it. (ii) Some of the
destination sites (e.g. amazon.com) did not open in an
iFrame by setting X-Frame-Options in the HTTP re-
sponse headers. The proxy modified the headers on the
fly so the option was removed before the page reached
the browser. (iii) The proxies allowed us to add addi-
tional privacy features, e.g., set the Do Not Track option
in HTTP headers. In order to mimic behavior of users
for sites that need interaction, we used iMacro [10].
Ensuring a Sanitary Environment. We made an ef-
fort to prevent any permanent data from being stored
in the browser, and thus allowing tracking of the user.
The proxy layer allowed us to remove the “Referer” field
in the HTTP header that would point to the measure-
4We modified Privoxy [1].
Figure 1: Presence of third party resources on the sites
used for training personas.
ment server, and block pixel bugs [1]. All the browsers
were configured to block 3rd party cookies, commonly
used for tracking, and we also dealt with flash cookies.
Additionally, after each measurement round we deleted
the files that might have stored the browsers’ state. This
restrictive configuration was used for both the system-
and the location-based studies.
3.2 System-based measurement specifics
We compared prices of various products accessed
from different browsers running on different OSes, from
a single geographical location (Barcelona, Spain). We
used three systems: Windows 7 Professional, Ubuntu
Linux 12.04 and Mac OS X 10.7 Lion with browsers:
Firefox 14.0, Google Chrome 20.0 (for all the systems),
Safari 5.1 (for OS X) and Internet Explorer 9.0
(Windows). Since we have fixed time and location and
prevented identity information leakage, we attribute
price difference to the employed system.
3.3 Location measurement specifics
To investigate the impact of a customer’s geograph-
ical location on the prices she receives, we deployed
several proxy servers at different Planetlab nodes. We
chose 6 distinct sites: two sites in US (east and west
coast), Germany, Spain, Korea, and Brazil. For this ex-
periment, we used 6 separate, identical virtual machines
with Windows 7 and Firefox. With this configuration,
the only information that distinguished the browsers ex-
ternally was their IP. We assume that the IP address is
enough to identify the geographical location of the orig-
inating query and is enough for price discrimination to
take place. We fixed time when we conducted our mea-
surements across sites, syncing various sites using NTP.
3.4 Personal info measurement specifics
In order to uncover discrimination based on personal
information, we follow two methods that differ in the
amount of information that they employ. In the first we
train “personas” that conform to two extreme customer
segments: affluent customers and budget conscious cus-
tomer. The two profiles are quite distinct. The bud-
get conscious customer visits price aggregation and dis-
count sites (like nextag.com). The affluent customer
visits sites selling high-end luxury products. The cus-
tomers might be tracked by third party aggregators
(e.g., DoubleClick) that have presence on many sites
around the web and can chain such visits to construct
a profile of the user.
3
We train personas as follows. We obtain the
generic traits followed by an affluent consumer and
a budget conscious consumer from [4]. An affluent
consumer is more likely to visit “Retail–Jewelry/Luxury
Goods/Accessories” sites as well as “Automotive re-
sources” and “Community Personals” sites than the
average user. For each of these categories, we use
Alexa.com and Google to select the top 100 popular
sites, and configure a freshly installed system to visit
these sites, and to train the profile. In order to mimic
a real human, we train only between 9AM-12PM and
use an exponential distribution (mean: 2 min) between
requests. We do the same to train the “budget con-
scious” consumer by using the relevant sites. We train
both profiles for 7 days, and we permit tracking and
disable all blocking. Note that we can train multiple
personas resembling different segments—this is left for
future work. We show the distribution of third party
trackers on the sites we used for the training in Fig. 1.
The second method that we use to test for discrim-
ination based on personal information uses the “Ref-
erer” header that reveals where a request came from.
Therefore, if you come from a discount site or a luxury
site the e-commerce site where you land knows about it
and can use it as indication of your willingness to pay.
We fix one location—Los Angeles, USA—and fix one
system—Windows 7 with Firefox—to run the personal
information related measurements.
Assumptions: For the three sources of price discrim-
ination we are studying, we assume that the information
vectors we use are sufficient in isolation for price dis-
crimination to kick-in. In reality, a composition of dif-
ferent vectors may be needed for price discrimination.
For instance, personas and a specific type of system
configuration may be needed together for price discrim-
ination. Composing different vectors and then testing
for discrimination is left for future work.
3.5 Analyzed Products
To determine the types of products to focus on, we
selected the product categories from Alexa. In total, we
examined 35 product categories (e.g., “clothing”) and
we choose 200 distinct vendors (e.g.,gap.com). From
the identified e-commerce sites, we selected 3 concrete
products with their unique URLs (e.g., specific piece of
clothing). For each vendor, we selected low/mid/high
price products. In case of hotels, we selected three differ-
ent dates (low/mid/high season) at multiple locations.
The 200 vendors we chose may appear to be a small set.
However, we limit ourselves to 200 to first understand
issues with scaling. In addition, these 200 vendors also
account for the vast majority of user traffic as they in-
clude large vendors like amazon.com and bestbuy.com.
We intend to increase these 200 vendors to 1000+ ven-
dors to also cover long-tail sites. In the end we had a
total of 600 products; we provide more details on them
in the Appendix.
4. EMPIRICAL RESULTS
4.1 System based differences
We collected extensive measurements on 600 different
products. We used the 8 distinct system–browser setups
to examine the potential price differences. We ran the
measurements for four days, and collected over 20,000
distinct measurement points in total. In addition, we
queried Google and Bing to examine if the search re-
sults differ based on the systems. For this, we used 26
different phrases related to the products we analyze.
The measurement did not reveal any price differences
between the end systems. Regarding search discrimina-
tion, we did not find differences that were significant.
4.2 Geographic location
Next, we looked into the impact of geographic
location from where the user accesses an e-commerce
site. We issued queries through the proxies described
in Sec. 3.3 on the same set of products/sites as before.
In total, we accessed each product 10 times. The mea-
surement results do not indicate significant differences,
neither in prices nor in search results, for the majority
of the products. However, the prices shown by three
particular websites appeared to depend strongly on
the users’ location. In particular, amazon.com and
steampowered.com returned prices for digital products
(e-books and computer games, respectively) and
staples.com for office products that differ between
buyers at different locations.
In the case of Amazon, we observed price differences
only for Kindle e-books. We queried the prices of books
listed on the top 100 list of Amazon from six loca-
tions.5Only 27 out of these 100 books were available
for purchase in their original English version from Ama-
zon.com (US site) to customers coming from all the 6
locations we were testing. We illustrate the price differ-
ences of these products in Fig. 2, where we plot the ratio
of the products’ prices using the prices in New York,
USA as reference. In majority of the cases, the price
difference is at least 21%; however, in extreme cases it
can be as high as 166%.
For the Steam site, we examined more than 300 addi-
tional products. We compared the prices of the products
where their prices were displayed in the same currency
to avoid the bias of currency exchange. We observed
price differences for 20% of the products in case of Spain
and Germany (figure not shown). Moreover, 3.5% of the
products had different prices in case of US, Brazil, and
Korea.
Next we analyzed the impact of location on a finer
scale, i.e., within the US only. We used 67 Planetlab
nodes in US acting as proxy servers. We accessed 10
random products from staples.com using the proxies.
4 products showed different prices when accessed from
different locations. In those cases, there were two dis-
5For both websites, results for US/LA and US/NY overlap
and are not shown.
4
Figure 2: Price differences at Amazon based on the
customer’s geographic location using the prices in New
York, USA as reference. For each of the considered prod-
ucts there exist at least two locations with different
prices.
Figure 3: Price differences at staples.com. The dot sizes
mark the mean price surplus for the locations, from 0%
(small dots) up to 3.9% (large dots)
tinct prices for the same product. We did not observe
a significant correlation between the prices and popula-
tion per state/city, population density per state, income
per state, or tax rates per state.
We extended the study of staples.com by taking
measurements within the same state (MA) to exclude
inter-state tax differences. We selected 29 random prod-
ucts and 200 random ZIP codes.6Again, for 15 products
the price varied up to 11% above the base price between
the locations.7
Fig. 3 shows the price differences geographically. The
values on the map show a mean price surplus calculated
for a particular location over all the products. The map
shows that the outskirts are shown higher prices than
the large cities.
Discussion: Our system ensures that the only bit of
information that is exposed is the IP address, hence
the location. We see differences in prices for some dig-
ital goods as well as office supplies. We cannot claim
to have discovered price discrimination since the dif-
ferences might be attributed to other reasons such as
intellectual property issues or increased competition be-
tween retailers or logistics. Further investigation is re-
quired on this issue.
4.3 Personal information
6When accessing staples.com from outside of US, the ser-
vice asks for the customer’s ZIP code, giving equivalent re-
sults as coming from a certain location.
7Base price - smallest observed price for a product.
Figure 4: Prices (mean/min/max) shown by Google to
the different personas. The median number of products
in each category per persona is 12.
Figure 5: Mean prices (with std. deviations) of top-10
results from Cheaptickets.com returned to affluent and
budget personas. The mean difference is 15%, and can
be even as high as 50%.
Trained personas. We used the previously trained
personas (Sec. 3.4) to examine the discrepancies of
products based on the browsing behavior. We also used
a clean profile as a baseline. We did not observe price
discrimination in our results; however, we observed
different search results on two sites. First, we examined
12 search queries in google.com, three times for each
profile. For half of the queries, the results included
several suggested products, together with the prices.
There is a noticeable difference in the prices of these
products as we show in Fig. 4. For instance, the mean
price was 4 times higher in case of “headphones” for the
affluent persona than for the budget one. Second, we
examined the top-10 hotel offers on Cheaptickets. We
searched for hotels in 8 different cities on 8 different
dates. The search engine of Cheaptickets returned
offers with higher prices for the affluent profile (Fig. 5).
Originating web page. Our hypothesis for studying
the origin is that the site that a customer uses to reach
a product site can provide valuable information for pric-
ing purposes. For example, if the customer comes from
a discount site, she will be more likely to be price sen-
sitive than someone coming from a luxury site or a
portal. Hence, we focus on price aggregator sites that
provide a platform for vendors of various products and
also provide discounts to users. We looked into a couple
of aggregator sites (nextag.com,pricerunner.co.uk,
getprice.com.au), but we only present results of one
large site: nextag.com. We used a clean profile, with
5
Figure 6: Price difference at the Shoplet.com online
retailer site, with- and without redirection from a price
aggregator.
blocking enabled but enabled first party cookies. We ex-
amined 25 different categories of products available on
nextag.com. We found two online vendors (shoplet.
com,discountofficeitems.com) who returned differ-
ent prices based on the originating web page of the cus-
tomers. Both retailers specialize in office equipment. In
case of shoplet.com, users get higher prices if they ac-
cess a product directly via the retailer’s website than
when the price aggregator (nextag.com) redirects the
user to the store. In the latter case, the aggregator redi-
rects the user to an intermediate site that sets a cookie,
and from this point on the user starts getting lower
prices. We quantify the price differences with- and with-
out the redirection in Fig. 6. The mean difference be-
tween the prices is 23%.
Discussion: We noticed signs of search based dis-
crimination in case of trained personas. We stress that
while we have not yet found price discrimination for
trained personas, we did observe signs of discrimina-
tion via origin URL. We note that the entities who col-
lect large amounts of information across the web (aggre-
gators like Doubleclick)—and hence can create a more
accurate representation of the user—do not actively en-
gage in e-commerce. On the flipside, large vendors do
not track users across the web. Thus, the entities who
could utilize information of users for pricing are decou-
pled from those who collect such information. The redi-
rection mechanism, that uses one bit of information, can
be used effectively to narrow this information gap.
5. RELATED WORK
The notion of building large distributed systems to
understand the effect of personal information on ser-
vices obtained has been done for various reasons [8, 6].
Guha, et al. [8] focused on the impact of user charac-
teristics on display advertisements. Our framework is
similar; however, we focus on the differences of product
prices instead of displayed ads. Our work is closely tied
to online privacy, both in terms of usage of privacy pre-
serving tools in our methodology, as well as implications
of (loss of) privacy over price discrimination. For the
former, we use the findings of Krishnamurthy, et al. [12]
to block known forms of tracking, on our proxy as well
as the browser. Besides cookies, other techniques can
also uniquely identify users with high probability such
as the properties of the browsers [5] and the browsing
history [15], hence we take steps to counter such iden-
tification.
6. CONCLUSIONS
Our measurements suggest that both price and search
discrimination might be taking place in today’s Inter-
net. In our ongoing efforts we are scaling by orders of
magnitude both the number of sites and the product
categories that we examine. Our preliminary results also
point to a natural extension of our distributed system:
co-opt and retrofit it as a watchdog system that helps
users check if they are being discriminated.
7. ACKNOWLEDGEMENTS
We thank our shepherd Michael Walfish for helpful
comments as well as the anonymous reviewers of Hot-
nets.
8. REFERENCES
[1] Privoxy, http://www.privoxy.org/.
[2] The Robinson-Patman Act, Pub. L. No. 74-692, 49 Stat. 1526,
1936.
[3] A. Acquisti and H.R. Varian. Conditioning Prices on Purchase
History. Marketing Science, 24(3), 2005.
[4] AudienceScience. http://www.audiencetargeting.com.
[5] P Eckersley. How unique is your web browser? In Privacy
Enhancing Technologies, LNCS 6205, pages 1–18. 2010.
[6] Georgia Tech Information Security Center. Filter Bubble
project, 2012. http://bobble.gtisc.gatech.edu/.
[7] Google. Adwords Keyword Tool. https:
//adwords.google.com/o/Targeting/Explorer?__c=1000000000&_
_u=1000000000&__o=cues&ideaRequestType=KEYWORD_IDEAS.
[8] S. Guha, B. Cheng, and P. Francis. Challenges in measuring
online advertising systems. ACM IMC ’10.
[9] IMRG. B2C Global e-Commerce Overview 2012.
[10] iOpus. iMacro. https://addons.mozilla.org/en-US/firefox/
addon/imacros-for- firefox/.
[11] T. Joachims, L. Granka, B. Pan, H. Hembrooke, and G. Gay.
Accurately interpreting clickthrough data as implicit feedback.
SIGIR ’05.
[12] B. Krishnamurthy, D. Malandrino, and C.E. Wills. Measuring
privacy loss and the impact of privacy protection in web
browsing. ACM SOUPS ’07.
[13] R.P. McAfee. Price Discrimination. In Issues in Competition
Law and Policy, vol. 1. 2008.
[14] A Odlyzko. Privacy, economics, and price discrimination on the
internet. ICEC ’03.
[15] L. Olejnik, C. Castelluccia, and A. Janc. Why Johnny Can’t
Browse in Peace: On the Uniqueness of Web Browsing History
Patterns. HotPETs ’12.
[16] Susan Davis. H.R. 6508: To direct the Federal Trade
Commission to promulgate rules requiring an Internet merchant
to disclose the use of a price-altering computer program, and
for other purposes.
[17] The New York Times. Amazon’s Prime Suspect. http:
//www.nytimes.com/2010/08/08/magazine/08FOB-medium- t.html.
[18] The Wall Street Journal. On Orbitz, Mac Users Steered to
Pricier Hotels. http://online.wsj.com/article/
SB10001424052702304458604577488822667325882.html.
[19] H.R. Varian. Price Discrimination and Social Welfare. The
American Economic Review, 75(4):870–875, 1985.
APPENDIX
Examples of sites visited with products in parentheses
airlines: aa.com (3), britishairways.com (3), easyjet.com (3),
lufthansa.com (3), usairways.com (3), digital cameras: amazon.
com (3), bestbuy.com (3), overstock.com (3), ritzcamera.com
(3), hotels/travel: booking.com (3), expedia.com (3), hotels.
com (3), cheaptickets.com (10+), kayak.es (3), orbitz.com (3),
travelocity.com (3)
6
... Several researchers have tried to find evidence of price discrimination in online markets. They have also tried to identify the factors on which price discrimination is based (e.g., demographic variables, location or click history) [14][15][16][17][18][19]. ...
Article
Full-text available
This paper builds a theoretical framework to model individualization in online markets. In a market with consumers of varying levels of demand, a seller offers multiple product bundles and prices. Relative to brick-and-mortar stores, an online seller can use pricing algorithms that can observe a buyer’s online behavior and infer a buyer’s type. I build a generalized model of price discrimination with Bayesian learning where a seller offers different bundles of the product that are sized and priced contingent on the posterior probability that the consumer is of a given type. Bayesian learning allows the seller to individualize product menus over time as new information becomes available. I explain how this strategy differs from first- or second-degree price discrimination models and how Bayesian learning over time affects equilibrium values and welfare.
... Access to data generates information asymmetries that open up opportunities for price discrimination, steered consumption and unfair competition in sectors other than knowledge generation (White House, 2015, Ursu, 2018, Mikians et al. 2012, Shiller, 2014, Chen et al. 2015, Möhlmann and Zalmanson, 2017, Uber, 2018, Ezrachi and Stucke, 2016. Discrimination can go beyond prices and lead to unfair treatment and discrimination in general (Isaac, 2017, Wong, 2017. ...
Article
Full-text available
Big Data (BD) and Artificial Intelligence (AI) play a fundamental role in today’s economy that traditional economic models fail to capture. This paper presents a theoretical conceptualisation of the data economy and derives implications for digital governance and data policies. It defines a hypothetical data-intensive economy where data are the main input of AI and in which the amount of knowledge generated is below the socially desired amount. Intervention could consist of favouring the creation of additional knowledge via data sharing. We show that the framework suggested describes many features of today’s data-intensive economy and provides a tool to assist academic, policy and governance discussions. Our conclusions support data sharing as a way of increasing knowledge production on societal challenges and dilemmas of data capitalism and transparency in AI.
... For example, they offer personalized, digital coupons or discounts (Skrovan 2017;Reimers & Shiller 2019;Rossi 1996;Shiller 2020). Firms also use personalized rank-sorting algorithms, which promote more expensive items to priceinsensitive consumers (Mikians 2012;Hannak 2014, p. 305). 17 Insurance companies also engage in cost-based (or risk-based) price discrimination. ...
Article
Full-text available
Machine learning algorithms are increasingly able to predict what goods and services particular people will buy, and at what price. It is possible to imagine a situation in which relatively uniform, or coarsely set, prices and product characteristics are replaced by far more in the way of individualization. Companies might, for example, offer people shirts and shoes that are particularly suited to their situations, that fit with their particular tastes, and that have prices that fit their personal valuations. In many cases, the use of algorithms promises to increase efficiency and to promote social welfare; it might also promote fair distribution. But when consumers suffer from an absence of information or from behavioral biases, algorithms can cause serious harm. Companies might, for example, exploit such biases in order to lead people to purchase products that have little or no value for them or to pay too much for products that do have value for them. Algorithmic harm, understood as the exploitation of an absence of information or of behavioral biases, can disproportionately affect members of identifiable groups, including women and people of color. Since algorithms exacerbate the harm caused to imperfectly informed and imperfectly rational consumers, their increasing use provides fresh support for existing efforts to reduce information and rationality deficits, especially through optimally designed disclosure mandates. In addition, there is a more particular need for algorithm-centered policy responses. Specifically, algorithmic transparency—transparency about the nature, uses, and consequences of algorithms—is both crucial and challenging; novel methods designed to open the algorithmic “black box” and “interpret” the algorithm’s decision-making process should play a key role. In appropriate cases, regulators should also police the design and implementation of algorithms, with a particular emphasis on the exploitation of an absence of information or of behavioral biases.
... These applications collect and use a tremendous amount of personal information from customers to enhance user experiences. However, the collected data is often misused for undisclosed actions, such as targeted advertising [81], price discrimination [60], or gender discrimination [31], which inevitably raise privacy concerns among the customers. Privacy policies are one of the most common methods to inform customers how their personal information is collected, stored, and used [28,47,70,87]. ...
Preprint
Full-text available
Software applications have become an omnipresent part of modern society. The consequent privacy policies of these applications play a significant role in informing customers how their personal information is collected, stored, and used. However, customers rarely read and often fail to understand privacy policies because of the ``Privacy Policy Reading Phobia'' (PPRP). To tackle this emerging challenge, we propose the first framework that can automatically generate privacy nutrition labels from privacy policies. Based on our ground truth applications about the Data Safety Report from the Google Play app store, our framework achieves a 0.75 F1-score on generating first-party data collection practices and an average of 0.93 F1-score on general security practices. We also analyse the inconsistencies between ground truth and curated privacy nutrition labels on the market, and our framework can detect 90.1% under-claim issues. Our framework demonstrates decent generalizability across different privacy nutrition label formats, such as Google's Data Safety Report and Apple's App Privacy Details.
... Although we generally believe those tracking technologies as benign techniques providing user-tailored information, websites are able to exploit them-web tracking technologies can raise privacy concerns as previous research efforts have shown [51,52,57]. Users may voluntarily provide personal information on the web, such as filling out web forms, or such information may be indirectly collected without their knowledge through IP header analysis, HTTP request, search engine query analysis, JavaScript, etc [22]. ...
Preprint
Full-text available
A hidden shadow world that tracks Internet users' behaviors lies behind everyday websites. Web tracking analyzes online behaviors based on collected data and provides content that may interest the user. Web tracking collects vast amounts of data for various purposes, from sensitive personal information to trivial information such as settings and preferred information types, including the user's IP address, device, and search history. Although web tracking is largely a legitimate technology, there is a steady increase in illegal user tracking, data leakage, and illegal sale of data. Therefore, the need for techniques capable of detecting and preventing web trackers is becoming more important. This paper introduces a general description of web tracking technology, related research, and a website measurement tool that can identify web-based tracking. It also introduces techniques for preventing web tracking and discusses future research directions.
Article
Full-text available
AI auditing is a rapidly growing field of research and practice. This review article, which doubles as an editorial to Digital Society’s topical collection on ‘Auditing of AI’, provides an overview of previous work in the field. Three key points emerge from the review. First, contemporary attempts to audit AI systems have much to learn from how audits have historically been structured and conducted in areas like financial accounting, safety engineering and the social sciences. Second, both policymakers and technology providers have an interest in promoting auditing as an AI governance mechanism. Academic researchers can thus fill an important role by studying the feasibility and effectiveness of different AI auditing procedures. Third, AI auditing is an inherently multidisciplinary undertaking, to which substantial contributions have been made by computer scientists and engineers as well as social scientists, philosophers, legal scholars and industry practitioners. Reflecting this diversity of perspectives, different approaches to AI auditing have different affordances and constraints. Specifically, a distinction can be made between technology-oriented audits, which focus on the properties and capabilities of AI systems, and process-oriented audits, which focus on technology providers’ governance structures and quality management systems. The next step in the evolution of auditing as an AI governance mechanism, this article concludes, should be the interlinking of these available—and complementary—approaches into structured and holistic procedures to audit not only how AI systems are designed and used but also how they impact users, societies and the natural environment in applied settings over time.
Article
Integrating artificial intelligence (AI) has transformed living standards. However, AI’s efforts are being thwarted by concerns about the rise of biases and unfairness. The problem advocates strongly for a strategy for tackling potential biases. This article thoroughly evaluates existing knowledge to enhance fairness management, which will serve as a foundation for creating a unified framework to address any bias and its subsequent mitigation method throughout the AI development pipeline. We map the software development life cycle (SDLC), machine learning life cycle (MLLC) and cross industry standard process for data mining (CRISP-DM) together to have a general understanding of how phases in these development processes are related to each other. The map should benefit researchers from multiple technical backgrounds. Biases are categorised into three distinct classes; pre-existing, technical and emergent bias, and subsequently, three mitigation strategies; conceptual, empirical and technical, along with fairness management approaches; fairness sampling, learning and certification. The recommended practices for debias and overcoming challenges encountered further set directions for successfully establishing a unified framework.
Article
Transparency has long been held up as the solution to the societal harms caused by digital platforms’ use of algorithms. However, what transparency means, how to create meaningful transparency, and what behaviors can be altered through transparency are all ambiguous legal and policy questions. This paper argues for beginning with clarifying the desired outcome (the “why”) before focusing on transparency processes and tactics (the “how”). Moving beyond analyses of the ways algorithms impact human lives, this research articulates an approach that tests and implements the right set of transparency tactics aligned to specific predefined behavioral outcomes we want to see on digital platforms. To elaborate on this approach, three specific desirable behavioral outcomes are highlighted, to which potential transparency tactics are then mapped. No single set of transparency tactics can solve all the harms possible from digital platforms, making such an outcomes‐focused transparency tactic selection approach the best suited to the constantly‐evolving nature of algorithms, digital platforms, and our societies. This article is protected by copyright. All rights reserved.
Article
Full-text available
The rapid advance in information technology now makes it feasible for sellers to condition their price offers on consumers’ prior purchase behavior. In this paper we examine when it is profitable to engage in this form of price discrimination when consumers can adopt strategies to protect their privacy. Our baseline model involves rational consumers with constant valuations for the goods being sold and a monopoly merchant who can commit to a pricing policy. Applying results from the prior literature, we show that although it is feasible to price so as to distinguish high-value and low-value consumers, the merchant will never find it optimal to do so. We then consider various generalizations of this model, such as allowing the seller to offer enhanced services to previous customers, making the merchant unable to commit to a pricing policy, and allowing competition in the marketplace. In these cases we show that sellers will, in general, find it profitable to condition prices on purchase history.
Conference Paper
Full-text available
This paper examines the reliability of implicit feedback generated from clickthrough data in WWW search. Analyzing the users' decision process using eyetracking and comparing implicit feedback against manual relevance judgments, we conclude that clicks are informative but biased. While this makes the interpretation of clicks as absolute relevance judgments difficult, we show that relative preferences derived from clicks are reasonably accurate on average.
Article
We present the results of the first large-scale study of the uniqueness of Web browsing histories, gathered from a total of 368, 284 Internet users who visited a history detection demonstration website. Our results show that for a majority of users (69%), the browsing history is unique and that users for whom we could detect at least 4 visited websites were uniquely identified by their histories in 97% of cases. We observe a significant rate of stability in browser history fingerprints: for repeat visitors, 38% of fingerprints are identical over time, and differing ones were correlated with original history contents, indicating static browsing preferences (for history subvectors of size 50). We report a striking result that it is enough to test for a small number of pages in order to both enumerate users' interests and perform an efficient and unique behavioral fingerprint; we show that testing 50 web pages is enough to fingerprint 42% of users in our database, increasing to 70% with 500 web pages. Finally, we show that indirect history data, such as information about categories of visited websites can also be effective in fingerprinting users, and that similar fingerprinting can be performed by common script providers such as Google or Facebook.
Conference Paper
We investigate the degree to which modern web browsers are subject to “device fingerprinting” via the version and configuration information that they will transmit to websites upon request. We implemented one possible fingerprinting algorithm, and collected these fingerprints from a large sample of browsers that visited our test side, panopticlick.eff.org . We observe that the distribution of our fingerprint contains at least 18.1 bits of entropy, meaning that if we pick a browser at random, at best we expect that only one in 286,777 other browsers will share its fingerprint. Among browsers that support Flash or Java, the situation is worse, with the average browser carrying at least 18.8 bits of identifying information. 94.2% of browsers with Flash or Java were unique in our sample. By observing returning visitors, we estimate how rapidly browser fingerprints might change over time. In our sample, fingerprints changed quite rapidly, but even a simple heuristic was usually able to guess when a fingerprint was an “upgraded” version of a previously observed browser’s fingerprint, with 99.1% of guesses correct and a false positive rate of only 0.86%. We discuss what privacy threat browser fingerprinting poses in practice, and what countermeasures may be appropriate to prevent it. There is a tradeoff between protection against fingerprintability and certain kinds of debuggability, which in current browsers is weighted heavily against privacy. Paradoxically, anti-fingerprinting privacy technologies can be self-defeating if they are not used by a sufficient number of people; we show that some privacy measures currently fall victim to this paradox, but others do not.
Conference Paper
Online advertising supports many Internet services, such as search, email, and social networks. At the same time, there are widespread concerns about the privacy loss associated with user targeting. Yet, very little is publicly known about how ad networks operate, especially with regard to how they use user information to target users. This paper takes a first principled look at measurement methodologies for ad networks. It proposes new metrics that are robust to the high levels of noise inherent in ad distribution, identifies measurement pitfalls and artifacts, and provides mitigation strategies. It also presents an analysis of how three different classes of advertising -- search, contextual, and social networks, use user profile information today.
Conference Paper
The rapid erosion of privacy poses numerous puzzles. Why is it occurring, and why do people care about it? This paper proposes an explanation for many of these puzzles in terms of the increasing importance of price discrimination. Privacy appears to be declining largely in order to facilitate difierential pricing, which ofiers greater social and economic gains than auctions or shopping agents. The thesis of this paper is that what really motivates commercial organizations (even though they often do not realize it clearly themselves) is the growing incentive to price discriminate, coupled with the increasing ability to price discriminate. It is the same incentive that has led to the airline yield management system, with a complex and constantly changing array of prices. It is also the same incentive that led railroads to invent a variety of price and quality difierentiation schemes in the 19th century. Privacy intrusions serve to provide the information that allows sellers to determine buyers' willingness to pay. They also allow monitoring of usage, to ensure that arbitrage is not used to bypass discriminatory pricing.Economically, price discrimination is usually regarded as desirable, since it often increases the efficiency of the economy. That is why it is frequently promoted by governments, either through explicit mandates or through indirect means. On the other hand, price discrimination often arouses strong opposition from the public.There is no easy resolution to the conflict between sellers; incentives to price discriminate and buyers' resistance to such measures. The continuing tension between these two factors will have important consequences for the nature of the economy. It will also determine which technologies will be adopted widely. Governments will likely play an increasing role in controlling pricing, although their roles will continue to be ambiguous. Sellers are likely to rely to an even greater extent on techniques such as bundling that will allow them to extract more consumer surplus and also to conceal the extent of price discrimination. Micropayments and auctions are likely to play a smaller role than is often expected. In general, because of strong conflicting pressures, privacy is likely to prove an intractable problem that will be prominent on the the public agenda for the foreseeable future.
Conference Paper
VariousbitsofinformationaboutusersaccessingWebsites. some of which are private, have been gathered since the inceptionof theWeb. Increasinglythegathering, aggrega- tion, and processing has been outsourced to third parties. Thegoalofthisworkistoexaminetheeectiv enessofspe- cic techniquestolimitthisdiusion ofprivateinformation to thirdparties. Wealso examine theimpact of thesepri- vacy protection techniques on the usability and quality of theWebpagesreturned. Usingobjectivemeasuresforpri- vacyprotectionandpagequalityweexaminetheirtradeos fordieren tprivacyprotectiontechniquesappliedtoacol- lectionofpopularWebsitesaswellasafocusedsetofsites withsignican tprivacyconcerns. Westudyprivacyprotec- tionbothatabrowserandataproxy.