Content uploaded by Allen Oneill
Author content
All content in this area was uploaded by Allen Oneill on Dec 23, 2020
Content may be subject to copyright.
The environmental impact of encrypting the web by default.
DRAFT PRE-PRINT
Allen ONeill
allen.oneill@theideashelf.com
23/Dec/2020
Introduction
Encrypting websites and web traffic by default is great for security, but the hidden downside
is that in doing so, we are impacting the environment in a negative way.
In recent years, there has been a concerted effort by major internet stakeholders to
encourage and steer websites worldwide to implement encryption by default for all web
traffic [1,2,3]. While the intent seems laudable, research shows that little attention has been
paid to the environmental impact of the energy-overhead of encryption. The research in this
discussion paper has been carried out to examine the impact of encryption-by-default
through an environmental lens and to a lesser degree, the related economic effect.
The findings of this research are:
1. as generally implemented, security by default generates an unnecessary overhead in
aggregate for web traffic that can be damaging to the environment
2. by extension, this overhead is potentially incurring millions of dollars in extra costs of
running both cloud datacentres and related internet infrastructure worldwide, and
the overall cost in aggregate on end user devices.
The recommendations of this research are:
1. that there are sufficient existing mechanisms in place to be able to ensure security
and privacy without incurring extra compute, storage, and network transmission
costs.
2. that relevant stakeholders review the current policy taking particular note of the
impact of any engineering decisions on the environment and related economic
impact.
3. further work is required to identify concrete solutions that can both satisfy and
balance the needs of all stakeholders towards a positive sum outcome.
Background
With the continuous growth of cyber-crime [4,5] and attacks becoming increasingly more
sophisticated, security by default seems sensible. In the past, implementing secure
protocols imposed a significant performance penalty on servers. An early study in 1998 [8]
demonstrated a response time increase of 22% when using encryption, but subsequent
research has demonstrated numerous improvements over the years, with response increase
times dropping from 12-15% [9], down to the now widely quoted ‘less than 2% network
traffic overhead’ [11]. Thanks to consistent advances in technology and the commoditisation
of hardware prices, the influential W3C Technical Architecture Group (TA) now consider that
the performance impact of encryption is ‘minor -- often, imperceptible’ [12].
Despite the fact that the performance impact of encryption is no longer a barrier, the
research in this paper and others [10], clearly demonstrate that security does not come for
free, and there is a negative cost incurred both fiscally and environmentally that needs to be
addressed.
“We see that the loss of caching could cost providers an extra 2 TB of
upstream data per day and could mean increases in energy consumption
upwards of 30% for end users in certain cases” (Naylor et al., 2014) [10]
Research objective and scope
The objective of the project was to carry out preliminary research to investigate the
environmental impact of ‘security by default’ and make general recommendations for
further research if necessary.
The scope of the research was restricted to a sample of the top 500 websites globally and
examining how these sites implemented data security for browser network communication.
The research was interested in a number of factors, including the timing overhead incurred
for secure communication, the use of data file caching using content delivery networks
(CDNs) [15], types of web framework technologies being used, and the data mime-type [16]
being transmitted.
Research methodology
The research involved gathering data over a period of one day from the majority of the top
500 websites. The websites we selected from lists publicly from the site ‘similarweb.com’
across a number of categories. Data was gathered for each site using an automated browser
and capturing the network traffic. The data was then analysed to extract relevant metrics
which were used to assist in informing the research. In addition to the active experiments,
desk research was also carried out into areas related to the study, including SSL, TLS [17,18]
and HTTP/2 [19]. The programming language Python v3 was used for all data acquisition and
analytic purposes.
Experimental design
Target selection
The starting point for the research was to select a majority sample from the top 500 sites on
the web according to ‘similarweb.com’ as the ranking authority. The categories chosen were
from the top level url: https://www.similarweb.com/category/.
Individual chosen categories:
- News and Media
- Arts and Entertainment
- Business and Consumer Services
- Community and Society
- E-commerce and Shopping
- Food and drink
- Health
- Hobbies and Leisure
- Lifestyle
- Sports
- Travel and Tourism
HTTP Archive Format
In order to acquire data in a consistent and repeatable manner, the experiment harnessed
the ‘HTTP Archive Format’ (HAR) data format. HAR is a data structure/file format
specification that defines an archival format for HTTP network transactions. Although HAR
was examined as an official format by the W3C Web Performance Working Group, it was
never published and only remains in draft format [21]. However, despite this, the HAR
format is widely used in modern browsers to enable developers to export network traffic for
examination and analysis [22,23,24].
Proxy intercept
A local proxy system [25] was used to act as an intercept for network traffic. This was used to
capture the network traffic in HAR format and was re-initialised before each website to
ensure that local caching of data did not influence the metrics of subsequently loaded sites.
Automated browser
The experiment used the Selenium [26] automated browser, and website loading was carried
out using the generic Firefox profile. The browser was configured to use the proxy intercept
for all network traffic. By default, each site was loaded using a HTTP request to determine if
a re-direct to HTTP would be triggered. The outcome of this is shown in the results section.
Core metrics
For each website analysed, the core metrics captured in relation to each relevant resource
loaded by the browser were as follows:
- website URL
- resource URL
- protocol/schema
- file/mime type
- file category
- file size
- network timings
- file name
- file hash
Each website was also queried to determine if the following http protocol commands were
being utilised:
- head
- last-modified
- if-modified-since
- cache-control
Analysis
In analysing the data gathered, the experiment seeks to view the results from a number of
different perspectives:
- quantify the overhead imposed by SSL
- analyse the use of content delivery networks
- select resources underused as shared resources
Discussion
Following completion of the experiment, present a discussion and analysis of the research
and experiment itself, and also examine other issues surrounding the research objective to
identify (a) any potential underlying issues and (b) opportunities that may exist.
Experiment results
Top level metrics
The experiment ran for a number of hours and generated 83k unique rows of data and over
2.1GB of data. This output represented a collection of all resources loaded by the
automated browser when it loaded a domain website, paused while the page loaded all
default resources, and then completed. The experiment was run a number of times and on
average from a pre-compiled list of 500 sites queried, 484 responded to the browser
request before a timeout was triggered.
Table 1 – Top level metrics
Metrics
Unique sites visited
484
Protocol count - http
1,210
Protocol count – https
82,129
Average SLL timing (ms)
166.28
The results in table 1 show that the vast majority of sites when called with an insecure HTTP
request, forced a re-direct of the browser to a secure HTTP protocol version of the page
requested. The average overhead connected with SLL was 166.28 milli-seconds.
Frameworks
One of part of analysis in the experiment was to try to determine if the website in question
was using any of the common web programming and development frameworks such as for
example ReactJS, Angular, JQuery, and Bootstrap, among others. A sample collection of the
names of 31 such framework library was compared to the file resources used by the
websites analysed, and 25 of these were found to be in common usage.
Figure 1 – Framework SSL average timing
Among the frameworks identified, it is interesting to observe the size of the response
received versus the SSL timing. This potentially indicates a topic for further investigation in
relation to optimising libraries for both compression and SSL speed.
Table 2 – Framework SSL timing vs File size
Library
SSL Time
Response body size
Size/MS
underscore
17
8891
523
redux
18
3479
193
tooltip
19
402
21
three
21
1608
77
chart
29
4186
144
react
70
110329
1576
moment
79
19847
253
lodash
80
25232
317
babel
99
59611
600
font-awesome
108
9543
88
animate
128
5316
42
normalize
140
753
5
socket
162
16822
104
bootstrap
168
69334
413
jquery
169
28675
170
knockout
186
11280
61
impress
501
5016
10
Content delivery networks
In order to determine the number of sites utilising content delivery networks, each page-url
processed was examined for the presence of the common sub-domain or partial string ‘cdn’.
While the results showed a high volume of links matching this criteria (20k out of a total of
83k, representing 24%), which would indicate a high level of shared cache usage using CDNs,
the reality is quite different.
- 20 40 60 80 100 120 140 160 180 200
bootstrap
jquery
normalize
font-awesome
moment
react/redux
Instead of using shared CDNs, there seems to be a trend towards hosting private CDNs. For
example, numerous sites listing content on cdn.site.com can be traced back using open
services [27] through one or a series of redirects back to either a proxy host server on under
the domains control, or more commonly, to a specific node hosted by AWS CloudFront or
Akamai, both providers of CDN and other edge services such as protection from denial of
service attacks. This use of CDN however, seems to be specific to the individual sites, with
site specific and unique hash-like name allocated to the data content being transferred.
It is unclear from this research if providers like AWS or Akamai analyse content files and
optimise hosting and delivery to minimise both cost to themselves and the customer or
indeed the impact of extra resource usage on the environment. This is another large area
for further investigation.
Mime types
The analysis of mime-types [16] is of particular interest to this study. The reason for this is
that it is clear that web-fonts remain reasonably static from a content change point of view
over time, and unlike for example open-source code frameworks such as Twitter Bootstrap,
generally do not tend to become customised or extended to a specific websites
requirements. Web-fonts therefore represent data that is globally common between a large
number of websites.
Table 3 – Font mime-types
Mime-type
Count
Avg. Response Body size
application/font-sfnt
4
22,791
application/font-woff
133
48,166
application/font-woff2
209
44,685
application/vnd.ms-fontobject
8
41,044
application/x-font-otf
8
64,339
application/x-font-ttf
25
102,091
application/x-font-woff
31
42,676
application/x-font-woff2
2
19,818
font/opentype
4
212,339
font/otf
4
37,949
font/ttf
13
43,533
font/woff
69
35,427
font/woff2
1,084
28,372
font/x-woff
17
43,867
As web-fonts are such a ubiquitous part of the makeup of the modern website, and globally
common, it stands to reason that we should expect to see these resources shared. The
effect of this would be to save storage, transmission, SSL and other computational
overheads in every aspect of their delivery from the location pointed at by the website,
down to the client browser. Unfortunately, this is not the case. Analysis identified only 45 of
the sites examined using a content delivery network to distribute fonts, and over 300 sites
serving the fonts directly from non-shared sources.
Figure 2 – websites hosting web fonts on shared CDNs
Experiment observations
During the experiment, a number of observations were made on issues arising that need to
be considered both when interpreting the experiential results, and also to serve as
guidelines for further work should it be carried out.
(1) Complications arose when loading some websites. This occurred for different reasons,
but the most obvious was where a site suspended initial page loading pending user
interaction with for example a ‘cookie authorisation’ request. In this limited experiment
this was manually addressed, however this would need to be automated should the
experiment be repeated on a larger scale.
(2) Although a local proxy was used, the data was all retrieved from a UK based IP address,
therefore some sites may show/send different resources that if being served to a local
geographic IP, thus changing the overall resource loading signature.
(3) The experiments were carried out using a Firefox browser automated using the
Selenium python library. It was unclear after the experiment if the browser was
restricted to using the HTTP 1 protocol or was HTTP 2 capable. We expect in further
experiments that this should be addressed, and the experiment run multiple times using
different browser types and versions to compare results.
0 50 100 150 200 250 300 350
Shared
Not shared
number of websites
Web font -resource location
SSL Overhead
The main objective of the research was to investigate if security by default generates an
unnecessary environmental overhead. For the purposes of this report, we will take an initial
naïve view, and suppose that it is either unnecessary to encrypt fonts, website frameworks
or media (images/video) by default, or that they represent a resource that should be shared
by default on a globally accessible CDN. The main metric that is utilised for this is the ‘ssl-
timing’ metric reported in the captured HAR file. The SSL timing metric is defined as the
‘time required for SSL/TLS negotiation’ [21] and is given in milliseconds.
Table 4 gives a list of the average SSL timing recorded for unique mime-types that contain
the word-token ‘font’. At this level, it is easy to state that the overhead is negligible, and the
impact on performance minor [12].
Table 4 – Font types by average SSL timing
Font mime type
SSL Timing
Font mime type
SSL Timing
application/font-sfnt
25
application/x-font-ttf
104
application/font-woff
113
application/x-font-woff
159
application/font-woff2
93
font/opentype
12
application/font-woff2 #1
18
font/truetype
249
application/font-woff #2
112
font/ttf
74
application/vnd.ms-fontobject
17
font/woff
105
application/x-font-otf
107
font/woff2
83
font/x-woff
111
If we consider the aggregate timing overhead of all of the font resources for all websites
visited according to Table 5, a different picture is painted, illustrating that on fonts, an
overhead of over 60 seconds is incurred processing the SSL requirements of transferring
fonts.
Table 5 – Font types with aggregate SSL timing
Mime-type
SSL
Mime-type2
SSL 2
application/font-sfnt
25
application/x-font-woff
1911
application/font-woff
6193
font/opentype
37
application/font-woff2
8122
font/truetype
498
application/font-woff2; #1
219
font/ttf
669
application/font-woff; #2
336
font/woff
2935
application/vnd.ms-fontobject
17
font/woff2
45614
application/x-font-otf
321
font/x-woff
666
application/x-font-ttf
1455
Sum of SSL timing
69018
Examining the same metric on identified frameworks and media, the aggregate SSL timing
overhead is 46,319ms for frameworks and 1,218,888 for media, giving a combined
1,255,207ms, 1,255 seconds or 20.92 minutes.
The significance of the overhead of SSL can be illustrated by examining the total number of
visits to a site per month multiplied by the count of pages per visit. We will then calculate a
total overhead using the combined overhead of the SSL timing of media + frameworks +
fonts generated during the experiment (figure 4 illustrates a sample calculation).
Figure 4 – Total SSL overhead per site per month
While it is absolutely clear that this is a very naive assumption and there many other factors,
complexities and nuances that contribute to giving an exact metric, it is at least a baseline
from which to start. Table 6 illustrates a sample of ten of the websites analysed during the
experiment, with data for site visits taken from similarweb.com for the month of October
2020 (figure 5).
Table 6 – SSL timing overhead for ten sites
Website
Combined SSL
Timing (m/sec)
Visits per
month (mn)
Pages
per
visit
Combined SSL
Overhead per month
in millions of seconds
rakuten.co.jp
80322
562.81
7.76
5,847
sport.es
70234
44.43
2.67
139
manganelo.com
73814
120.11
9.99
1,476
bild.de
70825
214.84
3.03
768
premierleague.com
70067
92.50
4.16
449
bbc.com
70892
484.02
2.02
1,155
tripadvisor.co.uk
69763
25.52
5.41
161
sina.com.cn
151242
181.91
3.37
1,545
sozcu.com.tr
69887
142.94
4.59
764
dpreview.com
72072
7..97
4.65
45
Figure 5 – Visitor metrics for sample site for October 2020.
Once a baseline has been established for the overhead of SSL timing on the sample sites, we
can then take an aggregate measure of these sites to posit the potential impact of SSL from
both an environmental and fiscal point of view.
Table 7 – average of sample overhead
Combined SSL
Timing (m/sec)
Visits per
month (mn)
Pages per
visit
Combined SSL Overhead per
month in millions of milliseconds
79911.80 207.68 4.77 79,078,641
79 million million milliseconds is difficult to picture, so we view how this looks in table 8
when converted into different time representations.
Table 8 – combined SSL overhead for average sample site as different time representations
Time period
Value
Calculation
Milliseconds
79,078,641,349,018
Seconds
79,078,641,349
÷ 1000
Minutes
1,317,977,356
÷ 60
Hours
21,966,289
÷ 60
Days
915,262
÷ 24
Years
2,508
÷ 365
Having established a baseline SSL timing overhead for transmitting frameworks and media
for our average sample site per month, we should investigate how this might be viewed
through the lens of both energy use and monetary cost.
PRE-PRINT NOTICE
NB: It is important to point out again, that this initial research is very much exploratory and
high-level, the calculations utilised by their nature are naïve, and that in carrying out further
research it is strongly advised that a more detailed approach be taken.
References
1 - https://security.googleblog.com/2018/02/a-secure-web-is-here-to-stay.html
2 - https://www.bbc.com/news/technology-44937782
3 - https://www.w3.org/2001/tag/doc/web-https
4 - https://nationalcrimeagency.gov.uk/what-we-do/crime-threats/cyber-crime
5 - https://www.wsj.com/articles/capital-one-hack-hits-the-reputation-of-a-tech-savvy-
bank-11564565402
6 - https://gs.statcounter.com/browser-market-share
7 - https://whynohttps.com/
8 - Goldberg, A., Buff, R. and Schmitt, A., 1998. A comparison of HTTP and HTTPS
performance. Computer Measurement Group, CMG98, 8.
9 - Kleppe, H., 2011. Performance impact of deploying HTTPS. Technical Report, Universiteit
van Amsterdam. In: Askey, D. and Arlitsch, K., 2015. Heeding the signals: applying Web best
practices when Google recommends. Journal of Library Administration, 55(1), pp.49-59.
10 - Naylor, D., Finamore, A., Leontiadis, I., Grunenberger, Y., Mellia, M., Munafò, M.,
Papagiannaki, K. and Steenkiste, P., 2014, December. The cost of the" s" in https. In
Proceedings of the 10th ACM International on Conference on emerging Networking
Experiments and Technologies (pp. 133-140).
11 - https://istlsfastyet.com/
12 - https://www.w3.org/2001/tag/doc/web-https
13 - https://www.w3.org/TR/page-visibility/.
14 - https://www.w3.org/webperf/
15 - https://www.cloudflare.com/learning/cdn/what-is-a-cdn/
16 - https://developer.mozilla.org/en-US/docs/Web/HTTP/Basics_of_HTTP/MIME_types
17 - https://www.cloudflare.com/learning/ssl/what-is-ssl/
18 - https://www.cloudflare.com/learning/ssl/transport-layer-security-tls/
19 - https://httpwg.org/specs/rfc7540.html
20 - https://httpwg.org/specs/rfc7541.html
21 - https://w3c.github.io/web-performance/specs/HAR/Overview.html
22 - https://developer.chrome.com/extensions/devtools_network
23 - https://docs.microsoft.com/en-us/microsoft-edge/devtools-guide-
chromium/network/reference
24 - https://developer.mozilla.org/en-US/docs/Mozilla/Add-
ons/WebExtensions/API/devtools.network/getHAR
25 - https://github.com/lightbody/browsermob-proxy
26 - https://pypi.org/project/selenium/
27 - https://mxtoolbox.com/DnsLookup.aspx
28 - Esposito, C., Castiglione, A., Martini, B. and Choo, K.K.R., 2016. Cloud manufacturing:
security, privacy, and forensic concerns. IEEE Cloud Computing, 3(4), pp.16-22.
29 - Tuna, G., Kogias, D.G., Gungor, V.C., Gezer, C., Taşkın, E. and Ayday, E., 2017. A survey
on information security threats and solutions for Machine to Machine (M2M)
communications. Journal of Parallel and Distributed Computing, 109, pp.142-154.
30 - Deogirikar, J. and Vidhate, A., 2017, February. Security attacks in IoT: A survey. In 2017
International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud)(I-SMAC) (pp.
32-37). IEEE.
31 - https://www.wired.com/story/biggest-cybersecurity-crises-2019-so-far/
32 - Solove, D.J., 2007. I've got nothing to hide and other misunderstandings of privacy. San
Diego L. Rev., 44, p.745.
33 - Solove, D.J., 2011. Nothing to hide: The false tradeoff between privacy and security.
Yale University Press.
34 - https://www.ieee.org/about/corporate/governance/p7-8.html
35 - https://aws.amazon.com/ec2/pricing/on-demand
36 - https://www.energuide.be/en/questions-answers/how-much-power-does-a-computer-
use-and-how-much-co2-does-that-represent/54
37 - https://www.it.northwestern.edu/hardware/eco/stats.html
38 - https://www.epa.gov/energy/greenhouse-gas-equivalencies-calculator
39 - http://www.httpvshttps.com/
40 - http://tools.ietf.org/html/rfc2616#section-8.1.4
41 - https://tools.ietf.org/html/rfc7230
42 - https://www.w3.org/Protocols/rfc2616/rfc2616-sec8.html
43 - https://www.tunetheweb.com/blog/http-versus-https-versus-http2
44 – Pollard, B (2019). ‘HTTP/2 in Action’. ISBN 9781617295164. Manning Publications.
45 - https://https.tunetheweb.com/performance-test-360/
46 - https://httpwg.github.io/
47 - https://http2.github.io/faq/#does-http2-require-encryption
48 - https://queue.acm.org/detail.cfm?id=2716278
Bibliography relating to TLS and Energy consumption:
Miranda, P., Siekkinen, M. and Waris, H., 2011, June. TLS and energy consumption on a mobile
device: A measurement study. In 2011 IEEE Symposium on Computers and Communications
(ISCC) (pp. 983-989). IEEE.
Urien, P., 2015, October. Innovative TLS/DTLS security modules for IoT applications: Concepts and
experiments. In International Internet of Things Summit (pp. 3-15). Springer, Cham.
Gerez, A.H., Kamaraj, K., Nofal, R., Liu, Y. and Dezfouli, B., 2018, October. Energy and processing
demand analysis of TLS protocol in internet of things applications. In 2018 IEEE International
Workshop on Signal Processing Systems (SiPS) (pp. 312-317). IEEE.
Fischer, T., Linka, H., Rademacher, M., Jonas, K. and Loebenberger, D., 2019. Analyzing power
consumption of TLS ciphers on an ESP32. crypto day matters 30.
de Hoz, J.D., Saldana, J., Fernández-Navajas, J., Ruiz-Mas, J., Rodríguez, R.G. and Luna, F.D.J.M.,
2018, June. SSH as an Alternative to TLS in IoT Environments using HTTP. In 2018 Global Internet of
Things Summit (GIoTS) (pp. 1-6). IEEE.
Meyer, C., Somorovsky, J., Weiss, E., Schwenk, J., Schinzel, S. and Tews, E., 2014. Revisiting
SSL/TLS implementations: New bleichenbacher side channels and attacks. In 23rd {USENIX}
Security Symposium ({USENIX} Security 14) (pp. 733-748).
Khan, S., Ioannou, N., Xekalakis, P. and Cintra, M., 2011, December. Increasing the energy efficiency
of tls systems using intermediate checkpointing. In 2011 18th International Conference on High
Performance Computing (pp. 1-10). IEEE.