John Allspaw

John Allspaw
Lund University | LU · Centre for Risk Analysis and Management (LUCRAM)

MSc, Human Factors and Systems Safety

About

16
Publications
3,515
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
79
Citations
Introduction
Turned a background in mechanical engineering research (crashworthiness) into a software systems operations career. I built the backing infrastructures at Salon, InfoWorld, Friendster, and Flickr before leading engineering as CTO at Etsy. I'm the author of The Art of Capacity Planning and Web Operations published by O'Reilly. Fueled by a master's degree (MSc) in Human Factors and Systems Safety at Lund University, I am a co-founder of Adaptive Capacity Labs.

Publications

Publications (16)
Article
This panel discussion will examine the societal awareness of cognitive engineering today. Cognitive engineering celebrated its 30 th anniversary in 2018 at the HFES annual meeting. Still, some would say that cognitive engineering is not as well-known as it should be, and that it is applied in an ad hoc manner in the many high-stakes, high-risk tech...
Article
Full-text available
It's time to appreciate the human side of Internet-facing software systems.
Article
Understanding, supporting, and sustaining the capabilities above the line of representation require all stakeholders to be able to continuously update and revise their models of how the system is messy and yet usually manages to work. This kind of openness to continually reexamine how the system really works requires expanding the efforts to learn...
Preprint
Full-text available
a set of 5 short articles on human performance and business critical software infrastructure including: 1. It’s time to revise our appreciation of the human side of Internet-facing software systems. 2. Above the Line, Below the Line. 3. Cognitive Work of Hypothesis Exploration during Anomaly Response. 4. Managing the Hidden Costs of Coordination. 5...
Chapter
The modern “system” is a constantly changing melange of hardware and software embedded in a variable world. Together, the hyperdistribution, fluctuant composition, constantly varying workload, and continuous modification of modern technology assemblies comprises a unique challenge to those who design, maintain, diagnose, and repair them. We are inv...
Preprint
Full-text available
A description of what makes studying cognitive work in the SRE community critically important.
Article
Full-text available
Online software is a fast-growing field that many industries, including aviation, depend on. It is a complex domain that crosses geographic and geopolitical boundaries and depends on multidisciplinary collaboration. For a fairly new industry, it has been innovative in introducing a collaborative form of learning from incidents, often called ‘blamel...
Article
Three IT managers from different domains present their views on the challenges of tackling technical debt.
Thesis
Full-text available
The increasing complexity of software applications and architectures in Internet services challenge the reasoning of operators tasked with diagnosing and resolving outages and degradations as they arise. Although a growing body of literature focuses on how failures can be prevented through more robust and fault-tolerant design of these systems, a d...
Article
Full-text available
IT IS VERY nearly the holiday shopping season and something is very wrong at a data center handling transactions for one of the largest online retail operations in the country. Some systems have failed, and no one knows why. Stress levels are off the charts while teams of engineers work around the clock for three days trying to recover. The good ne...
Article
WHEN WE BUILD Web infrastructures at Etsy, we aim to make them resilient. This means designing them carefully so they can sustain their (increasingly critical) operations in the face of failure. Thankfully, there have been a couple of decades and reams of paper spent on researching how fault tolerance and graceful degradation can be brought to comp...
Article
A discussion with Jesse Robbins, Kripa Krishnan, John Allspaw, and Tom Limoncelli.
Article
Full-text available
When we build Web infrastructures at Etsy, we aim to make them resilient. This means designing them carefully so that they can sustain their (increasingly critical) operations in the face of failure. Thankfully, there have been a couple of decades and reams of paper spent on researching how fault tolerance and graceful degradation can be brought to...
Article
Full-text available
Making the case for resilience testing

Network

Cited By