ArticlePDF Available

Predicting future AI failures from historic examples

Authors:

Abstract and Figures

Purpose The purpose of this paper is to explain to readers how intelligent systems can fail and how artificial intelligence (AI) safety is different from cybersecurity. The goal of cybersecurity is to reduce the number of successful attacks on the system; the goal of AI Safety is to make sure zero attacks succeed in bypassing the safety mechanisms. Unfortunately, such a level of performance is unachievable. Every security system will eventually fail; there is no such thing as a 100 per cent secure system. Design/methodology/approach AI Safety can be improved based on ideas developed by cybersecurity experts. For narrow AI Safety, failures are at the same, moderate level of criticality as in cybersecurity; however, for general AI, failures have a fundamentally different impact. A single failure of a superintelligent system may cause a catastrophic event without a chance for recovery. Findings In this paper, the authors present and analyze reported failures of artificially intelligent systems and extrapolate our analysis to future AIs. The authors suggest that both the frequency and the seriousness of future AI failures will steadily increase. Originality/value This is a first attempt to assemble a public data set of AI failures and is extremely valuable to AI Safety researchers.
Content may be subject to copyright.
Predicting future AI failures
from historic examples
Roman V. Yampolskiy
Abstract
Purpose The purpose of this paper is to explain to readers how intelligent systems can fail and how
artificial intelligence (AI) safety is different from cybersecurity. The goal of cybersecurity is to reduce the
number of successful attacks on the system; the goal of AI Safety is to make sure zero attacks succeed in
bypassing the safety mechanisms. Unfortunately, such a level of performance is unachievable. Every
security systemwill eventually fail; there is no suchthing as a 100 per cent secure system.
Design/methodology/approach AI Safety can be improved based on ideas developed by
cybersecurity experts. For narrow AI Safety, failures are at the same, moderate level of criticality as in
cybersecurity;however, for general AI, failures have a fundamentally different impact. A single failure of a
superintelligent system may cause a catastrophic event without a chance for recovery.
Findings In this paper, the authors present and analyze reported failures of artificially intelligent
systems and extrapolate our analysis to future AIs. The authors suggest that both the frequency and the
seriousness of future AI failures will steadily increase.
Originality/value This is a first attempt to assemble a public data set of AI failures and is extremely
valuable to AI Safety researchers.
Keywords Cybersecurity, Failures
Paper type Research paper
1. Introduction
About 10,000 scientists[1] around the world work on different aspects of creating intelligent
machines, with the main goal of making such machines as capable as possible. With
amazing progress made in the field of artificial intelligence (AI) over the past decade, it is
more important than ever to make sure that the technology we are developing has a
beneficial impact on humanity. With the appearance of robotic financial advisors, self-
driving cars and personal digital assistants come many unresolved problems. We have
already experienced market crashes caused by intelligent trading software[2], accidents
caused by self-driving cars[3] and embarrassment from chat-bots[4], which turned racist
and engaged in hate speech. We predict that both the frequency and seriousness of such
events will steadily increase as AIs become more capable. The failures of today’s narrow
domain AIs are just a warning: once we develop artificial general intelligence (AGI) capable
of cross-domain performance, hurt feelings will be the least of our concerns.
In a recent publication, Yampolskiy proposed a Taxonomy of Pathways to Dangerous AI
(Yampolskiy, 2016b), which was motivated as follows: “In order to properly handle a
potentially dangerous artificially intelligent system it is important to understand how the
system came to be in such a state. In popular culture (science fiction movies/books) AIs/
Robots became self-aware and as a result rebel against humanity and decide to destroy it.
While it is one possible scenario, it is probably the least likely path to appearance of
dangerous AI.” Yampolskiy suggested that much more likely reasons include deliberate
actions of not-so-ethical people (“on purpose”) (Brundage et al., 2018), side effects of poor
Roman V. Yampolskiy is
based at JB Speed School
of Engineering, University
of Louisville, Louisville,
Kentucky, USA.
Received 10 April 2018
Revised 4 August 2018
25 September 2018
Accepted 18 October 2018
DOI 10.1108/FS-04-2018-0034 ©Emerald Publishing Limited, ISSN 1463-6689 jFORESIGHT j
Downloaded by 106.212.163.214 At 12:22 28 November 2018 (PT)
design (“engineering mistakes”) and finally miscellaneous cases related to the impact of the
surroundings of the system (“environment”). Because purposeful design of dangerous AI is
just as likely to include all other types of safety problems and will probably have the direst
consequences, the most dangerous type of AI and the one most difficult to defend against
is an AI made malevolent on purpose.
A follow-up paper (Pistono and Yampolskiy, 2016) explored how a Malevolent AI could be
constructed and why it is important to study and understand malicious intelligent software.
The authors observe that “cybersecurity research involves publishing papers about
malicious exploits as much as publishing information on how to design tools to protect
cyber-infrastructure. It is this information exchange between hackers and security experts
that results in a well-balanced cyber-ecosystem.” In the domain of AI Safety Engineering,
hundreds of papers (Sotala and Yampolskiy, 2015) have been published on different
proposals geared at the creation of a safe machine; yet nothing else has been published on
how to design a malevolent machine. “The availability of such information would be of great
value particularly to computer scientists, mathematicians, and others who have an interest
in making safe AI, and who are attempting to avoid the spontaneous emergence or the
deliberate creation of a dangerous AI, which can negatively affect human activities and in
the worst case cause the complete obliteration of the human species (Pistono and
Yampolskiy, 2016),” as described in many works, for example Superintelligence by Bostrom
(Bostrom, 2014). The paper implied that, if an AI Safety mechanism is not designed to resist
attacks by malevolent human actors, it cannot be considered a functional safety mechanism
(Pistono and Yampolskiy, 2016)!
2. AI failures
Those who cannot learn from history are doomed to repeat it. Unfortunately, very few
papers have been published on failures and errors made in development of intelligent
systems (Rychtyckyj and Turski, 2008). Importance of learning from “What Went Wrong and
Why” has been recognized by the AI community (Abecker et al., 2006;Shapiro and Goker,
2008). Such research includes study of how, why and when failures happen (Abecker et al.,
2006;Shapiro and Goker, 2008) and how to improve future AI systems based on such
information (Marling and Chelberg, 2008;Shalev-Shwartz et al., 2017).
Millennia long history of humanity contains millions of examples of attempts to develop
technological and logistical solutions to increase safety and security; yet not a single
example exists, which has not eventually failed. Signatures have been faked, locks have
been picked, supermax prisons had escapes, guarded leaders have been assassinated,
bank vaults have been cleaned out, laws have been bypassed, fraud has been committed
against our voting process, police officers have been bribed, judges have been
blackmailed, forgeries have been falsely authenticated, money has been counterfeited,
passwords have been brute-forced, networks have been penetrated, computers have been
hacked, biometric systems have been spoofed, credit cards have been cloned,
cryptocurrencies have been double spent, airplanes have been hijacked, completely
automated public turing test to tell computers and humans aparts have been cracked,
cryptographic protocols have been broken, and even academic peer-review has been
bypassed with tragic consequences.
Accidents, including deadly ones, caused by software or industrial robots can be traced to
the early days of such technology[5], but they are not a direct consequence of particulars of
intelligence available in such systems. AI failures, on the other hand, are directly related to
the mistakes produced by the intelligence such systems are designed to exhibit. We can
broadly classify such failures into mistakes during the learning phase and mistakes during
performance phase. The system can fail to learn what its human designers want it to learn
and instead learn a different, but correlated function (Amodei et al., 2016). A frequently
cited example is a computer vision system which was supposed to classify pictures of tanks
jFORESIGHT j
Downloaded by 106.212.163.214 At 12:22 28 November 2018 (PT)
but instead learned to distinguish backgrounds of such images (Yudkowsky, 2008). Other
examples[6] include problems caused by poorly designed utility functions rewarding only
partially desirable behaviors of agents, such as riding a bicycle in circles around the target
(Randløv and Alstrøm, 1998), pausing a game to avoid losing (Murphy, 2013), or repeatedly
touching a soccer ball to get credit for possession (Ng et al., 1999). During the performance
phase, the system may succumb to a number of causes (Pistono and Yampolskiy, 2016;
Scharre, 2016;Yampolskiy, 2016b) all leading to an AI Failure.
Media reports are full of examples of AI failure, but most of these examples can be
attributed to other causes on closer examination, such as bugs in code or mistakes in
design. The list below is curated to only mention failures of intended intelligence, not
general software faults. Additionally, the examples below include only the first occurrence of
a particular failure, but the same problems are frequently observed again in later years; for
example, self-driving cars have been reported to have multiple deadly accidents. Under the
label of failure, we include any occasion/instance where the performance of an AI does not
reach acceptable level of performance. Finally, the list does not include AI failures because
of hacking or other intentional causes. Still, the timeline of AI failures has an exponential
trend while implicitly (via gaps in the record) indicating historical events such as “AI Winter.”
In its most extreme interpretation, any software with as much as an “if statement” can be
considered a form of narrow artificial intelligence (NAI), and all of its bugs are thus
examples of AI failure:[7].
1958 Advice software deduced inconsistent sentences using logical programming
(Hewitt, 1958).
1959 AI designed to be a General Problem Solver failed to solve real-world
problems[8].
1977 Story writing software with limited common sense produced “wrong” stories
(Meehan, 1977).
1982 Software designed to make discoveries, discovered how to cheat instead[9].
1983 Nuclear attack early warning system falsely claimed that an attack is taking place[10].
1984 The National Resident Match program was biased in placement of married
couples (Friedman and Nissenbaum, 1996).
1988 Admissions software discriminated against women and minorities (Lowry and
Macpherson, 1988).
1994 Agents learned to “walk” quickly by becoming taller and falling over (Sims, 1994).
2005 Personal assistant AI rescheduled a meeting 50 times, each time by 5 min
(Tambe, 2008).
2006 Insider threat detection system classified normal activities as outliers (Liu et al.,
2006).
2006 Investment advising software lost money when deployed to real trading
(Gunderson and Gunderson, 2006).
2010 Complex AI stock trading software caused a trillion dollar flash crash[11].
2011 E-Assistant told to “call me an ambulance” began to refer to the user as
Ambulance[12].
2013 Object recognition neural networks saw phantom objects in particular noise
images (Szegedy et al., 2013).
2013 Google software engaged in name-based discrimination in online ad delivery
(Sweeney, 2013).
jFORESIGHT j
Downloaded by 106.212.163.214 At 12:22 28 November 2018 (PT)
2014 Search engine autocomplete made bigoted associations about groups of users
(Diakopoulos, 2013).
2014 Smart fire alarm failed to sound alarm during fire[13].
2015 Automated e-mail reply generator created inappropriate responses[14].
2015 A robot for grabbing auto parts grabbed and killed a man[15].
2015 Image tagging software classified black people as gorillas[16].
2015 Medical expert AI classified patients with asthma as lower risk (Caruana et al.,
2015).
2015 Adult content filtering software failed to remove inappropriate content[17].
2015 Amazon’s Echo responded to commands from TV voices[18].
2016 Linkedin’s name lookup suggests male names in place of female ones[19].
2016 AI designed to predict recidivism acted racist[20].
2016 AI agent exploited reward signal to win without completing the game course[21].
2016 Passport picture checking system flagged Asian user as having closed eyes[22].
2016 Game non-player characters designed unauthorized superweapons[23].
2016 AI judged a beauty contest and rated dark-skinned contestants lower[24].
2016 Smart contract permitted syphoning of funds from the decentralized autonomous
organization[25].
2016 Patrol robot collided with a child[26].
2016 World champion-level Go playing AI lost a game[27].
2016 Self-driving car had a deadly accident[28].
2016 AI designed to converse with users on Twitter became verbally abusive[29].
2016 Google image search returned racists results[30].
2016 Artificial applicant failed to pass university entrance exam[31].
2016 Predictive policing system disproportionately targeted minority neighborhoods[32].
2016 Text subject classifier failed to learn relevant features for topic assignment
(Ribeiro et al., 2016).
2017 Alexa played adult content instead of song for kids[33].
2017 Cellphone case designing AI used inappropriate images[34].
2017 Pattern recognition software failed to recognize certain types of inputs[35].
2017 Debt recovery system miscalculated amounts owed[36].
2017 Russian language chatbot shared pro-Stalinist, pro-abuse and pro-suicide
views[37].
2017 Translation AI learned to stereotype careers to specific genders (Caliskan et al.,
2017).
2017 Face beautifying AI made black people look white[38].
2017 Google’s sentiment analyzer became homophobic and anti-Semitic[39].
2017 Fish recognition program learned to recognize boat IDs instead[40].
jFORESIGHT j
Downloaded by 106.212.163.214 At 12:22 28 November 2018 (PT)
2017 Alexa turned on loud music at night without being prompted to do so[41].
2017 AI for writing Christmas carols produced nonsense[42].
2017 Apple’s face recognition system failed to distinguish Asian users[43].
2017 Facebook’s translation software changed Yampolskiy to Polanski, see Figure 1.
2018 Google Assistant created bizarre merged photo[44].
2018 Robot store assistant was not helpful with responses like “cheese is in the fridges”[45].
Spam filters block important e-mails, GPS provides faulty directions, machine translation
corrupts meaning of phrases, autocorrect replaces a desired word with a wrong one,
biometric systems misrecognize people and transcription software fails to capture what is
being said; overall, it is harder to find examples of AIs that never fail. Depending on what we
consider for inclusion as examples of problems with intelligent software, the list of examples
could be grown almost indefinitely.
Analyzing the list of narrow AI failures, from the inception of the field to modern day
systems, we can arrive at a simple generalization: an AI designed to do X will eventually fail
to do X. Although it may seem trivial, it is a powerful generalization tool, which can be used
to predict future failures of NAIs. For example, looking at cutting-edge current and future
AIs we can predict that:
Software for generating jokes will occasionally fail to make them funny.
Sex robots will fail to deliver an orgasm or to stop at the right time.
Sarcasm detection software will confuse sarcastic and sincere statements.
Video description software will misunderstand movie plots.
Software generated virtual worlds may not be compelling.
AI doctors will misdiagnose some patients in a way a human doctor would not.
Employee screening software will be systematically biased and thus hire low
performers.
Others have given the following examples of possible accidents with A(G)I/
superintelligence:
Housekeeping robot cooks family pet for dinner[46].
A mathematician AGI converts all matter into computing elements to solve problems[47].
An AGI running simulations of humanity creates conscious being who suffer (Armstrong
et al., 2012).
Figure 1 While translating from Polish to English, Facebooks software changed Roman
Yampolskiyto Roman Polanskibecause of statistically higher frequency of the
later name in sample texts
jFORESIGHT j
Downloaded by 106.212.163.214 At 12:22 28 November 2018 (PT)
Paperclip manufacturing AGI fails to stop and converts universe into raw materials
(Bostrom, 2003).
A scientist AGI performs experiments with significant negative impact on biosphere
(Taylor et al., 2016).
Drug design AGI develops time-delayed poison to kill everyone and so defeat
cancer[48].
Future superintelligence optimizes away all consciousness[49].
AGI kills humanity and converts universe into materials for improved handwriting[50].
AGI designed to maximize human happiness tiles universe with tiny smiley faces
(Yudkowsky, 2011).
AGI instructed to maximize pleasure consigns humanity to a dopamine drip (Marcus,
2012).
Superintelligence may rewire human brains to increase their perceived satisfaction
(Yudkowsky, 2011).
Denning and Denning made some similar error extrapolations in their humorous paper on
“artificial stupidity,” which likewise illustrates similar trend (Denning and Denning, 2004):
“Soon the automated Drug Enforcement Administration started closing down
pharmaceutical companies saying they were dealing drugs. The automated Federal Trade
Commission closed down the Hormel Meat Company, saying it was purveying spam. The
automated Department of Justice shipped Microsoft 500,000 pinstriped pants and jackets,
saying it was filing suits. The automated Army replaced all its troops with a single robot,
saying it had achieved the Army of One. The automated Navy, in a cost-saving move,
placed its largest-ever order for submarines with Subway Sandwiches. The Federal
Communications Commission issued an order for all communications to be wireless,
causing thousands of AT&T installer robots to pull cables from overhead poles and
underground conduits. The automated TSA flew its own explosives on jetliners, citing data
that the probability of two bombs on an airplane is exceedingly small”.
AGI can be seen as a superset of all NAIs and so will exhibit a superset of failures as well as
more complicated failures resulting from the combination of failures of individual NAIs and
new super-failures, possibly resulting in an existential threat to humanity or at least an AGI
takeover. In other words, AGIs can make mistakes influencing everything. Overall, we
predict that AI failures and premediated Malevolent AI incidents will increase in frequency
and severity proportionate to AIs’ capability.
2.1 Preventing AI failures
AI failures have a number of causes, with most common ones, currently observed,
displaying some type of algorithmic bias, poor performance or basic malfunction. Future AI
failures are likely to be more severe including purposeful manipulation/deception (Chessen,
2017), or even resulting in human death (likely from misapplication of militarized AI/
autonomous weapons/killer robots (Krishnan, 2009)). At the very end of severity scale, we
see existential risk scenarios resulting in extermination of human kind or suffering-risk
scenarios (Gloor, 2016) resulting in large-scale torture of humanity, both types of risk
coming from supercapable artificially intelligent systems.
Reviewing examples of AI accidents, we can notice patterns of failure, which can be
attributed to the following causes:
biased data, including cultural differences;
deploying underperforming system;
jFORESIGHT j
Downloaded by 106.212.163.214 At 12:22 28 November 2018 (PT)
non-representative training data;
discrepancy between training and testing data;
rule overgeneralization or application of population statistics to individuals;
inability to handle noise or statistical outliers;
not testing for rare or extreme conditions;
not realizing an alternative solution method can produce same results, but with side
effects;
letting users control data or learning process;
no security mechanism to prevent adversarial meddling;
no cultural competence/common sense;
limited access to information/sensors;
mistake in design and inadequate testing;
limited ability for language disambiguation; and
inability to adopt to changes in environment
With bias being a common current cause of failure, it is helpful to analyze particular types of
algorithmic bias. Friedman and Nissenbaum (Friedman and Nissenbaum, 1996) proposed
the following framework for analyzing bias in computer systems. They subdivide causes of
bias into three categories preexisting bias, technical bias and emergent bias:
Preexisting bias reflects bias in society and social institutions, practices and attitudes.
The system simply preserves an existing state the world and automates application of
bias as currently exists.
Technical bias appears because of hardware or software limitations of the system itself.
Emergent bias emerges after the system is deployed because of changing societal
standards.
Many of the observed AI failures are similar to mishaps experienced by little children. This is
particularly true for artificial neural networks, which are at the cutting edge of machine
learning (ML). One can say that children are untrained neural networks deployed on real
data and observing them can teach us a lot about predicting and preventing issues with
ML. A number of research groups (Amodei et al., 2016;Taylor et al., 2016) have
investigated ML-related topics with corresponding equivalence in behavior of developing
humans, and here we have summarized their work and mapped it onto similar situations
with children (Table I).
A majority of research currently taking place to prevent such issues is currently happening
under the label of “AI Safety.”
3. AI Safety
In 2010, we coined the phrase “Artificial Intelligence Safety Engineering” and its shorthand
notation “AI Safety” to give a name to a new direction of research we were advocating. We
formally presented our ideas on AI Safety at a peer-reviewed conference in 2011
(Yampolskiy,2011a, b), with subsequent publications on the topic in 2012 (Yampolskiy and
Fox, 2012), 2013 (Muehlhauser and Yampolskiy, 2013;Yampolskiy, 2013a), 2014 (Majot
and Yampolskiy, 2014), 2015 (Yampolskiy, 2015), 2016 (Pistono and Yampolskiy, 2016;
Yampolskiy, 2016b), 2017 (Yampolskiy, 2017) and 2018 (Brundage et al., 2018;
Ramamoorthy and Yampolskiy, 2017). It is possible that someone used the phrase
jFORESIGHT j
Downloaded by 106.212.163.214 At 12:22 28 November 2018 (PT)
informally before, but to the best of our knowledge, we were the first to use it[51] in a peer-
reviewed publication and to bring it popularity. Before that, the most common names for the
field of machine control were “Machine Ethics” (Moor, 2006) or “Friendly AI” (Yudkowsky,
2001). Today the term “AI Safety” appears to be the accepted[52],[53],[54],[55],[56],[57],
[58],[59],[60],[61],[62],[63] name for the field used by a majority of top researchers (Amodei
et al., 2016). The field itself is becoming mainstream despite being regarded as either
science fiction or pseudoscience in its early days.
Our legal system is behind our technological abilities and the field of AI Safety is in its
infancy. The problem of controlling intelligent machines is just now being recognized[64] as
a serious concern, and many researchers are still skeptical about its very premise. Worse
yet, only about 100 people around the world are fully engaged in working on addressing the
current limitations in our understanding and abilities in this domain. Only about a dozen[65]
of those have formal training in computer science, cybersecurity, cryptography, decision
theory, ML, formal verification, computer forensics, steganography, ethics, mathematics,
network security, psychology and other relevant fields. It is not hard to see that the problem
of making a safe and capable machine is much greater than the problem of making just a
capable machine. Yet only about 1 per cent of researchers are currently engaged in AI
Safety work with available funding levels below even that mark. As a relatively young and
underfunded field of study, AI Safety can benefit from adopting methods and ideas from
more established fields of science. Attempts have been made to introduce techniques
which were first developed by cybersecurity experts to secure software systems to this new
domain of securing intelligent machines (Armstrong and Yampolskiy, 2016;Babcock et al.,
2016a,2016b;Yampolskiy,2012a, 2012b). Other fields, which could serve as a source of
important techniques, would include software engineering and software verification.
During software development iterative testing and debugging is of fundamental importance
to produce reliable and safe code. Although it is assumed that all complicated software will
have some bugs, with many advanced techniques available in the toolkit of software
engineers most serious errors could be detected and fixed, resulting in a product suitable
for its intended purposes. Certainly, a lot of modular development and testing techniques
used by the software industry can be used during development of intelligent agents, but
methods for testing a completed software package are unlikely to be transferable in the
same way. Alpha and beta testing, which work by releasing almost-finished software to
advanced users for reporting problems encountered in realistic situations, would not be a
good idea in the domain of testing/debugging superintelligent software. Similarly simply
running the software to see how it performs is not a feasible approach with superintelligent
agent.
Table I ML concepts and corresponding notions in childhood development
Concept in ML Equivalent in child development
Negative side effects Child makes a mess
Reward hacking Child finds candy jar
Scalable oversight Babysitting should not require a team of ten
Safe exploration No fingers in the outlet
Robustness to distributional shift Use “inside voice” in the classroom
Inductive ambiguity identification Is ant a cat or a dog?
Robust human imitation Daughter shaves like daddy
Informed oversight Child hands in homework
Generalizable environmental goals Ignore that mirage
Conservative concepts That dog has no tail
Impact measures Keep a low profile
Mild optimization Do not be a perfectionist
Averting instrumental incentives Be an altruist
jFORESIGHT j
Downloaded by 106.212.163.214 At 12:22 28 November 2018 (PT)
4. Cybersecurity vs. AI Safety
Bruce Schneier has said, “If you think technology can solve your security problems then you
don’t understand the problems and you don’t understand the technology”[66]. Salman
Rushdie made a more general statement: “There is no such thing as perfect security, only
varying levels of insecurity”[67]. We propose what we call the Fundamental Thesis of
Security Every security system will eventually fail; there is no such thing as a 100 per cent
secure system. If your security system has not failed, just wait longer.
In theoretical computer science, a common way of isolating the essence of a difficult
problem is via the method of reduction to another, sometimes better analyzed, problem
(Karp, 1972;Yampolskiy, 2013c;Yampolskiy,2012a, 2012b). If such a reduction is a
possibility and is computationally efficient (Yampolskiy, 2013b), such a reduction implies
that if the better analyzed problem is somehow solved, it would also provide a working
solution for the problem we are currently dealing with. The more general problem of AGI
Safety must contain a solution to the more narrow problem of making sure a particular
human is safe for other humans. We call this the Safe Human Problem[68]. Formally such a
reduction can be done via a restricted Turing test in the domain of safety in a manner
identical to how AI-Completeness of a problem could be established (Yampolskiy, 2013c;
Yampolskiy,2011a, b). Such formalism is beyond the scope of this chapter, so we simply
point out that in both cases, we have at least a human-level intelligent agent capable of
influencing its environment, and we would like to make sure that the agent is safe and
controllable. Although in practice, changing the design of a human via DNA manipulation is
not as simple as changing the source code of an AI, theoretically it is just as possible.
It is observed that humans are not completely safe to themselves and others. Despite a
millennia of attempts to develop safe humans via culture, education, laws, ethics,
punishment, reward, religion, relationships, family, oaths, love and even eugenics, success
is not within reach. Humans kill and commit suicide, lie and betray, steal and cheat,
possibly in proportion to how much they can get away with. Truly powerful dictators will
enslave, commit genocide, break every law and violate every human right. It is famously
stated that a human without a sin cannot be found. The best we can hope for is to reduce
such unsafe tendencies to levels that our society can survive. Even with advanced genetic
engineering (Yampolskiy, 2016a), the best we can hope for is some additional reduction in
how unsafe humans are. As long as we permit a person to have choices (free will), they can
be bribed, they will deceive, they will prioritize their interests above those they are
instructed to serve and they will remain fundamentally unsafe. Despite being trivial
examples of a solution to the value learning problem (VLP) (Dewey, 2011;Soares and
Fallenstein, 2014;Sotala, 2016), human beings are anything but safe, bringing into question
our current hope that solving VLP will get us to Safe AI. This is important. To quote Bruce
Schneier, “Only amateurs attack machines; professionals target people.” Consequently, we
see AI Safety research as, at least partially, an adversarial field similar to cryptography or
security[69].
If a cybersecurity system fails, the damage is unpleasant but tolerable in most cases:
someone loses money, someone loses privacy or maybe somebody loses their life. For
narrow AIs, safety failures are at the same level of importance as in general cybersecurity,
but for AGI it is fundamentally different. A single failure of a superintelligent system may
cause an existential risk event. If an AGI Safety mechanism fails, everyone may lose
everything, and all biological life in the universe is potentially destroyed. With cybersecurity
systems, you will get another chance to get it right or at least do better. With AGI Safety
system, you only have one chance to succeed, so learning from failure is not an option.
Worse, a typical security system is likely to fail to a certain degree, e.g. perhaps, only a
small amount of data will be compromised. With an AGI Safety system, failure or success is
a binary option: either you have a safe and controlled superintelligence or you do not. The
goal of cybersecurity is to reduce the number of successful attacks on the system; the goal
jFORESIGHT j
Downloaded by 106.212.163.214 At 12:22 28 November 2018 (PT)
of AI Safety is to make sure zero attacks by superintelligent AI succeed in bypassing the
safety mechanisms. For that reason, ability to distinguish NAI projects from potential AGI
projects (Baum, 2017) is an open problem of fundamental importance in the AI safety field.
The problems are many. We have no way to monitor, visualize or analyze the performance
of superintelligent agents. More trivially, we do not even know what to expect after such a
software starts running. Should we see immediate changes to our environment? Should we
see nothing? What is the timescale on which we should be able to detect something? Will it
be too quick to notice or are we too slow to realize something is happening (Yudkowsky and
Hanson, 2008)? Will the impact be locally observable or impact distant parts of the world?
How does one perform standard testing? On what data sets? What constitutes an “Edge
Case” for general intelligence? The questions are many, but the answers currently do not
exist. Additional complications will come from the interaction between intelligent software
and safety mechanisms designed to keep AI Safe and secure. We will also have to
somehow test all the AI Safety mechanisms currently in development. Although AI is at
human levels, some testing can be done with a human agent playing the role of the artificial
agent (Yudkowsky, 2002). At levels beyond human capacity, adversarial testing does not
seem to be realizable with today’s technology. More significantly, only one test run would
ever be possible.
5. Conclusions
The history of robotics and artificial intelligence in many ways is also the history of
humanity’s attempts to control such technologies. From the Golem of Prague to the military
robots of modernity, the debate continues as to what degree of independence such entities
should have and how to make sure that they do not turn on us, their inventors. Numerous
recent advancements in all aspects of research, development and deployment of intelligent
systems are well publicized but safety and security issues related to AI are rarely
addressed. It is our hope that this paper will allow us to better understand how AI systems
can fail and what we can expect from such systems in the future, allowing us to better
prepare an appropriate response.
Notes
1. https://intelligence.org/2014/01/28/how-big-is-ai/
2. https://en.wikipedia.org/wiki/2010_Flash_Crash
3. https://electrek.co/2016/05/26/tesla-model-s-crash-autopilot-video/
4. https://en.wikipedia.org/wiki/Tay_(bot)
5. https://en.wikipedia.org/wiki/Kenji_Urada
6. http://lesswrong.com/lw/lvh/examples_of_ais_behaving_badly/
7. https://en.wikipedia.org/wiki/List_of_software_bugs
8. https://en.wikipedia.org/wiki/General_Problem_Solver
9. http://aliciapatterson.org/stories/eurisko-computer-mind-its-own
10. https://en.wikipedia.org/wiki/1983_Soviet_nuclear_false_alarm_incident
11. http://gawker.com/this-program-that-judges-use-to-predict-future-crimes-s-1778151070
12. www.technologyreview.com/s/601897/tougher-turing-test-exposes-chatbots-stupidity/
13. www.forbes.com/sites/aarontilley/2014/04/03/googles-nest-stops-selling-its-smart-smoke-alarm-
for-now
14. https://gmail.googleblog.com/2015/11/computer-respond-to-this-email.html
15. http://time.com/3944181/robot-kills-man-volkswagen-plant/
16. www.huffingtonpost.com/2015/07/02/google-black-people-goril_n_7717008.html
jFORESIGHT j
Downloaded by 106.212.163.214 At 12:22 28 November 2018 (PT)
17. http://blogs.wsj.com/digits/2015/05/19/googles-youtube-kids-app-criticized-for-inappropriate-
content/
18. https://motherboard.vice.com/en_us/article/53dz8x/people-are-complaining-that-amazon-echo-is-
responding-to-ads-on-tv
19. www.seattletimes.com/business/microsoft/how-linkedins-search-engine-may-reflect-a-bias
20. www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing
21. https://openai.com/blog/faulty-reward-functions
22. www.telegraph.co.uk/technology/2016/12/07/robot-passport-checker-rejects-asian-mans-photo-
having-eyes
23. www.kotaku.co.uk/2016/06/03/elites-ai-created-super-weapons-and-started-hunting-players-
skynet-is-here
24. www.theguardian.com/technology/2016/sep/08/artificial-intelligence-beauty-contest-doesnt-like-
black-people
25. https://en.wikipedia.org/wiki/The_DAO_(organization)
26. www.latimes.com/local/lanow/la-me-ln-crimefighting-robot-hurts-child-bay-area-20160713-snap-
story.html
27. www.engadget.com/2016/03/13/google-alphago-loses-to-human-in-one-match/
28. www.theguardian.com/technology/2016/jul/01/tesla-driver-killed-autopilot-self-driving-car-harry-
potter
29. www.theverge.com/2016/3/24/11297050/tay-microsoft-chatbot-racist
30. https://splinternews.com/black-teenagers-vs-white-teenagers-why-googles-algori-1793857436
31. www.japantimes.co.jp/news/2016/11/15/national/ai-robot-fails-get-university-tokyo
32. www.themarshallproject.org/2016/02/03/policing-the-future
33. www.entrepreneur.com/video/287281
34. www.boredpanda.com/funny-amazon-ai-designed-phone-cases-fail
35. www.bbc.com/future/story/20170410-how-to-fool-artificial-intelligence
36. www.abc.net.au/news/2017-04-10/centrelink-debt-recovery-system-lacks-transparency-
ombudsman/8430184
37. https://techcrunch.com/2017/10/24/another-ai-chatbot-shown-spouting-offensive-views
38. www.gizmodo.co.uk/2017/04/faceapp-blames-ai-for-whitening-up-black-people
39. https://motherboard.vice.com/en_us/article/j5jmj8/google-artificial-intelligence-bias
40. https://medium.com/@gidishperber/what-ive-learned-from-kaggle-s-fisheries-competition-
92342f9ca779
41. http://mashable.com/2017/11/08/amazon-alexa-rave-party-germany
42. http://mashable.com/2017/12/22/ai-tried-to-write-christmas-carols
43. www.mirror.co.uk/tech/apple-accused-racism-after-face-11735152
44. https://qz.com/1188170/google-photos-tried-to-fix-this-ski-photo
45. www.iflscience.com/technology/store-hires-robot-to-help-out-customers-robot-gets-fired-for-
scaring-customers-away
46. www.theguardian.com/sustainable-business/2015/jun/23/the-ethics-of-ai-how-to-stop-your-robot-
cooking-your-cat
47. https://intelligence.org/2014/11/18/misconceptions-edge-orgs-conversation-myth-ai
48. https://80000hours.org/problem-profiles/positively-shaping-artificial-intelligence
49. http://slatestarcodex.com/2014/07/13/growing-children-for-bostroms-disneyland
50. https://waitbutwhy.com/2015/01/artificial-intelligence-revolution-2.html
51. Term “Safe AI” has been used as early as 1995 (Rodd, 1995).
52. www.cmu.edu/safartint/
jFORESIGHT j
Downloaded by 106.212.163.214 At 12:22 28 November 2018 (PT)
53. https://selfawaresystems.com/2015/07/11/formal-methods-for-ai-safety/
54. https://intelligence.org/2014/08/04/groundwork-ai-safety-engineering/
55. http://spectrum.ieee.org/tech-talk/robotics/artificial-intelligence/new-ai-safety-projects-get-
funding-from-elon-musk
56. http://globalprioritiesproject.org/2015/08/quantifyingaisafety/
57. http://futureoflife.org/2015/10/12/ai-safety-conference-in-puerto-rico/
58. http://rationality.org/waiss/
59. http://gizmodo.com/satya-nadella-has-come-up-with-his-own-ai-safety-rules-1782802269
60. https://80000hours.org/career-reviews/artificial-intelligence-risk-research/
61. https://openai.com/blog/concrete-ai-safety-problems/
62. http://lesswrong.com/lw/n4l/safety_engineering_target_selection_and_alignment/
63. www.waise2018.com/
64. www.whitehouse.gov/blog/2016/05/03/preparing-future-artificial-intelligence
65. http://acritch.com/fhi-positions/
66. www.brainyquote.com/quotes/bruce_schneier_182286
67. www.brainyquote.com/quotes/salman_rushdie_580407
68. Similarly a Safe Animal Problem may be of interest (can a Pitbull be guaranteed safe?).
69. The last thing we want is to be in an adversarial situation with a superintelligence, but unfortunately
we may not have a choice in the matter. It seems that long-term AI Safety cannot succeed but
neither does it have the luxury of a partial fail.
References
Abecker, A., Alami, R., Baral, C., Bickmore, T., Durfee, E., Fong, T. and Lebiere, C. (2006), “AAAI 2006
spring symposium reports”, AI Magazine, Vol. 27 No. 3, p. 107.
Amodei, D., Olah, C., Steinhardt, J., Christiano, P., Schulman, J. and Mane
´, D. (2016), “Concrete
problems in AI safety”, arXiv preprint arXiv:1606.06565.
Armstrong, S. and Yampolskiy, R.V. (2016), “Security solutions for intelligent and complex systems”,
Security Solutions for Hyperconnectivity and the Internet of Things, IGI Global, Hershey, PA, pp. 37-88.
Armstrong, S., Sandberg, A. and Bostrom, N. (2012), “Thinking inside the box: controlling and using an
oracle ai”, Minds and Machines, Vol. 22 No. 4, pp. 299-324.
Babcock, J., Kramar, J. and Yampolskiy, R. (2016a), “The AGI containment problem”, Paper presented at
the The Ninth Conference on Artificial General Intelligence (AGI2015).
Babcock, J., Kramar, J. and Yampolskiy, R. (2016b), “The AGI containment problem”, arXiv preprint
arXiv:1604.00545.
Baum, S. (2017), “A survey of artificial general intelligence projects for ethics, risk, and policy”, Global
Catastrophic Risk Institute Working Paper 17-1.
Bostrom, N. (2003), “Ethical issues in advanced artificial intelligence”, Science Fiction and Philosophy:
From Time Travel to Superintelligence, pp. 277-284.
Bostrom, N. (2014), Superintelligence: Paths, Dangers, Strategies, Oxford University Press, New York,
NY.
Brundage, M., Avin,S., Clark, J., Toner, H., Eckersley, P., Garfinkel, B. and Filar,B. (2018), “The malicious
use of artificial intelligence: forecasting, prevention, and mitigation”, arXiv preprint arXiv:1802.07228.
Caliskan, A., Bryson, J.J. and Narayanan, A. (2017), “Semantics derived automatically from language
corpora contain human-like biases”, Science, Vol. 356 No. 6334, pp. 183-186.
Caruana, R., Lou, Y., Gehrke, J., Koch, P., Sturm, M. and Elhadad, N. (2015), “Intelligible models for
healthcare: predicting pneumonia risk and hospital 30-day readmission”, Paper presented at the
Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data
Mining.
jFORESIGHT j
Downloaded by 106.212.163.214 At 12:22 28 November 2018 (PT)
Chessen, M. (2017), The MADCOM Future,Atlantic Council, available at: www.atlanticcouncil.org/
publications/reports/the-madcom-future
Denning, D.E. and Denning, P.J. (2004), “Artificial stupidity”, Communications of The ACM, Vol. 4 No. 5,
p. 112.
Dewey, D. (2011), “Learning what to value”, Artificial General Intelligence, pp. 309-314.
Diakopoulos, N. (2013), “Algorithmic defamation: the case of the shameless autocomplete”, Blog post,
available at: www.nickdiakopoulos.com/2013/08/06/algorithmic-defamation-the-case-of-the-shameless-
autocomplete/
Friedman, B. and Nissenbaum, H. (1996), “Bias in computer systems”, ACM Transactions on Information
Systems (TOIS), Vol. 14 No. 3, pp. 330-347.
Gloor, L. (2016), “Suffering-focused AI safety: why “fail-safe” measures might be our top intervention”,
Retrieved from.
Gunderson, J. and Gunderson, L. (2006), “And then the phone rang”, Paper presented at the AAAI
Spring Symposium: What Went Wrong and Why: Lessons from AI Research and Applications.
Hewitt, C. (1958), “Development of logic programming: what went wrong, what was done about it, and
what it might mean for the future”.
Karp, R.M. (1972), “Reducibility among combinatorial problems”, In Miller, R. E. and Thatcher, J. W.
(Eds), Complexity of Computer Computations, Plenum, New York, NY, pp. 85-103.
Krishnan, A. (2009), Killer Robots: Legality and Ethicality of Autonomous Weapons, Ashgate Publishing,
Farnham.
Liu, A., Martin, C.E., Hetherington, T. and Matzner, S. (2006), “AI lessons learned from experiments in
insider threat detection”, Paper presented at the AAAI Spring Symposium: What Went Wrong and Why:
Lessons from AI Research and Applications.
Lowry, S. and Macpherson, G. (1988), “A blot on the profession”, British Medical Journal (Clinical
Research ed.).), Vol. 296 No. 6623, p. 657.
Majot, A.M. and Yampolskiy, R.V. (2014), “AI safety engineering through introduction of self-reference
into felicific calculus via artificial pain and pleasure”, Paper presented at the IEEE International
Symposium on Ethics in Science, Technology and Engineering, Chicago, IL.
Marcus, G. (2012), MoralMachines, The New Yorker, New York, NY, p. 24.
Marling, C. and Chelberg, D. (2008), “RoboCup for the mechanically, athletically and culturally
challenged”.
Meehan, J.R.(1977), “TALE-SPIN, aninteractive program that writes stories”, Paper presented at the IJCAI.
Moor, J.H. (2006), “The nature, importance, and difficulty of machine ethics”, IEEE Intelligent Systems,
Vol. 21 No. 4, pp. 18-21.
Muehlhauser, L. and Yampolskiy, R. (2013), “Roman yampolskiy on AI safety engineering”, Paper
presented at the Machine Intelligence Research Institute, available at: http://intelligence.org/2013/07/15/
roman-interview/
Murphy, T. (2013), “The first level of Super Mario Bros. is easy with lexicographic orderings and time
travel”, The Association for Computational Heresy (SIGBOVIK), pp. 112-133.
Ng, A.Y., Harada, D. and Russell, S. (1999), “Policy invariance under reward transformations: theory and
application to reward shaping”, Paper presented at the ICML.
Pistono, F. and Yampolskiy, R.V. (2016), “Unethical research: how to create a malevolent artificial
intelligence”, arXiv preprint arXiv:1605.02817.
Pistono, F. and Yampolskiy, R.V. (2016), “Unethical research: how to create a malevolent artificial
intelligence”, Paper presented at the 25th International Joint Conference on Artificial Intelligence (IJCAI-16),
Ethics for Artificial IntelligenceWorkshop (AI-Ethics-2016), New York, NY.
Ramamoorthy, A. and Yampolskiy, R. (2017), “Beyond mad?: The race for artificial general intelligence”,
ITU Journal: ICT Discoveries, No. 1, available at: www.itu.int/en/journal/001/Pages/09.aspx
Randløv, J. and Alstrøm, P. (1998), “Learning to drive a bicycle using reinforcement learning and
shaping”, Paper presented at the ICML.
jFORESIGHT j
Downloaded by 106.212.163.214 At 12:22 28 November 2018 (PT)
Ribeiro, M.T., Singh, S. and Guestrin, C. (2016), “Why should i trust you?: Explaining the predictions of
any classifier”, Paper presented at the Proceedings of the 22nd ACM SIGKDD International Conference
on Knowledge Discovery and Data Mining.
Rodd, M. (1995), “Safe AI is this possible?”, Engineering Applications of Artificial Intelligence, Vol. 8
No. 3, pp. 243-250.
Rychtyckyj, N. and Turski, A. (2008), “Reasons for success (and failure) in the development and
deploymentof AI systems”, Paper presented at the AAAI 2008 workshop on What Went Wrong and Why.
Scharre, P. (2016), “Autonomous weapons and operational risk”, Paper presented at the Center for a New
American Society, Washington DC.
Shalev-Shwartz, S., Shamir, O. and Shammah, S. (2017), “Failures of gradient-based deep learning”,
Paper presented at the International Conference on Machine Learning.
Shapiro, D. and Goker, M.H. (2008), “Advancing AI research and applications by learning from what went
wrong and why”, AI Magazine, Vol. 29 No. 2, pp. 9-10.
Sims, K. (1994), “Evolving virtual creatures”, Paper presented at the Proceedings of the 21st annual
conference on Computergraphics and interactive techniques.
Soares, N. and Fallenstein, B. (2014), Aligning superintelligence with human interests: a technical
research agenda, Machine Intelligence Research Institute (MIRI) Technical Report,8.
Sotala, K. (2016), “Defining human values for value learners”, Paper presented at the 2nd International
Workshop on AI, Ethics and Society, AAAI-2016.
Sotala, K. and Yampolskiy, R.V. (2015), “Responses to catastrophic AGI risk: a survey”, Physica Scripta,
Vol. 90 No. 1.
Sweeney, L. (2013), “Discrimination in online ad delivery”, Queue, Vol.11 No. 3, p. 10.
Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I. and Fergus, R. (2013),
“Intriguing properties of neural networks”,arXiv preprint arXiv:1312.6199.
Tambe, M. (2008), “Electric elves: what went wrong and why”, AI Magazine, Vol. 29 No. 2, p. 23.
Taylor, J., Yudkowsky, E., LaVictoire, P. and Critch,A. (2016), Alignment for Advanced Machine Learning
Systems, MachineIntelligence Research Institute, Berkeley, CA.
Yampolskiy, R.V. (2011a), “AI-Complete CAPTCHAs as zero knowledge proofs of access to an artificially
intelligent system”, ISRN Artificial Intelligence, 271878.
Yampolskiy, R.V. (2011b), “Artificial intelligence safety engineering: why machine ethics is a wrong
approach”, Paper presented at the Philosophy and Theory of Artificial Intelligence (PT-AI2011),
Thessaloniki, Greece.
Yampolskiy, R.V. (2012a), “Leakproofing the singularity artificial intelligence confinement problem”,
Journal of Consciousness Studies, Vol. 19Nos 1/2, pp. 1-2.
Yampolskiy, R.V. (2012b), “AI-Complete, AI-Hard, or AI-Easyclassification of problems in AI”, The 23rd
Midwest Artificial Intelligence and Cognitive Science Conference,Cincinnati, OH, USA.
Yampolskiy, R.V. (2013a), “Artificial intelligence safety engineering: why machine ethics is a wrong
approach”, Philosophy and Theory of Artificial Intelligence, Springer, Berlin Heidelberg,
pp. 389-396.
Yampolskiy, R.V. (2013b), “Efficiency theory: a unifying theory for information, computation and
intelligence”, Journal of Discrete MathematicalSciences & Cryptography, Vol. 16 Nos 4/5, pp. 259-277.
Yampolskiy, R.V. (2013c), “Turing test as a defining feature of AI-Completeness”, in Yang, X.-S. (Ed.),
Artificial Intelligence, Evolutionary Computing and Metaheuristics, Springer, Berlin Heidelberg, Vol. 427,
pp. 3-17.
Yampolskiy, R.V. (2015), Artificial Superintelligence: A Futuristic Approach, Chapman and Hall/CRC,
Boca Raton, FL.
Yampolskiy, R.V. (2016a), “On the origin of samples: attribution of output to a particular algorithm”, arXiv
preprint arXiv:1608.06172.
Yampolskiy, R.V. (2016b), “Taxonomy of pathways to dangerous artificial intelligence”, Paper presented
at the Workshops at the Thirtieth AAAI Conference on Artificial Intelligence.
jFORESIGHT j
Downloaded by 106.212.163.214 At 12:22 28 November 2018 (PT)
Yampolskiy, R.V. (2017), “What are the ultimate limits to computational techniques: verifier theory and
unverifiability”, Physica Scripta, Vol. 92 No. 9, p. 093001.
Yampolskiy, R.V. and Fox, J. (2012), “Safety engineering for artificial general intelligence”, Topoi. Special
Issue on Machine Ethics & the Ethics of Building Intelligent Machines.
Yudkowsky, E. (2001), “Creating friendly AI 1.0: the analysis and design of benevolent goal
architectures”, Singularity Institute for Artificial Intelligence, San Francisco, CA, 15 June.
Yudkowsky, E. (2002), The AI-Box Experiment, available at: http://yudkowsky.net/singularity/aibox
Yudkowsky, E. (2008), “Artificial intelligence as a positive and negative factor in global risk”, Global
Catastrophic Risks, Vol. 1, p. 303.
Yudkowsky, E. (2011), “Complex value systems in friendly AI”, Artificial General Intelligence,
pp. 388-393.
Yudkowsky, E. and Hanson, R. (2008), “The Hanson-Yudkowsky AI-foom debate”, Paper presented at
the MIRI Technical Report, available at: http://intelligence.org/files/AIFoomDebate.pdf
Corresponding author
Roman V. Yampolskiy can be contacted at: roman.yampolskiy@louisville.edu
For instructions on how to order reprints of this article, please visit our website:
www.emeraldgrouppublishing.com/licensing/reprints.htm
Or contact us for further details: permissions@emeraldinsight.com
jFORESIGHT j
Downloaded by 106.212.163.214 At 12:22 28 November 2018 (PT)
... The unprecedented progress in artificial intelligence (AI) [1][2][3][4][5][6], over the last decade, came alongside multiple AI failures [7,8] and cases of dual use [9] causing a realization [10] that it is not sufficient to create highly capable machines, but that it is even more important to make sure that intelligent machines are beneficial [11] for humanity. This led to the birth of the new sub-field of research commonly known as AI safety and security [12] with hundreds of papers and books published annually on the different aspects of the problem [13][14][15][16][17][18][19][20][21][22][23][24][25][26][27][28][29][30][31]. ...
... Yampolskiy reviews empirical evidence for dozens of historical AI failures [7,8] and states: "We predict that both the frequency and seriousness of such events will steadily increase as AIs become more capable. The failures of today's narrow domain AIs are just a warning: once we develop artificial general intelligence (AGI) capable of cross-domain performance, hurt feelings will be the least of our concerns." ...
... The failures of today's narrow domain AIs are just a warning: once we develop artificial general intelligence (AGI) capable of cross-domain performance, hurt feelings will be the least of our concerns." [7]. More generally he says: "We propose what we call the Fundamental Thesis of Security -Every security system will eventually fail; there is no such thing as a 100 per cent secure system. ...
Article
Full-text available
The invention of artificial general intelligence is predicted to cause a shift in the trajectory of human civilization. In order to reap the benefits and avoid the pitfalls of such a powerful technology it is important to be able to control it. However, the possibility of controlling artificial general intelligence and its more advanced version, superintelligence, has not been formally established. In this paper, we present arguments as well as supporting evidence from multiple domains indicating that advanced AI cannot be fully controlled. The consequences of uncontrollability of AI are discussed with respect to the future of humanity and research on AI, and AI safety and security.
... Unfortunately, the development of AI systems is mainly driven by a "technology-centered design" approach (e.g., Shneiderman, 2020aShneiderman, , 2020bXu, 2019a;Zheng et al., 2017). Many AI professionals are primarily dedicated to studying algorithms, rather than providing useful AI systems to meet user needs, resulting in the failure of many AI systems (Hoffman et al., 2016;Lazer et al., 2014;Lieberman, 2009;Yampolskiy, 2019). Specifically, the AI Incident Database has collected more than 1000 AI related accidents (McGregor, 2021), such as an autonomous car killing a pedestrian, a trading algorithm causing a market "flash crash" where billions of dollars transfer between parties, and a facial recognition system causing an innocent person to be arrested. ...
... Human-machine teaming also has a "double-edged sword" effect. For example, on the one hand, AI technologies (e.g., deep machine learning, big data of collective domain knowledge) can help human decision-making operations be more effective way under some operating scenarios, than individual operators using non-AI systems; on the other hand, if the human-centered approach is not followed in the development of AI systems, there is no guarantee that humans have the final decision-making authority of the systems in unexpected scenarios, and the potential unexpected and indeterministic outcome of the systems may cause ethical and safety failures (McGregor, 2021;Yampolskiy, 2019). Thus, AI technology has brought in new challenges and opportunities for HCI design. ...
... However, the behavioral outcome of AI systems could be non-deterministic and unexpected. Researchers are raising the alarm about the unintended consequences, which can produce negative societal effects (e.g., McGregor, 2021;Yampolskiy, 2019). Machine behavior in AI systems also has a special ecological form (Rahwan et al., 2019). ...
Article
Full-text available
While AI has benefited humans, it may also harm humans if not appropriately developed. The priority of current HCI work should focus on transiting from conventional human interaction with non-AI computing systems to interaction with AI systems. We conducted a high-level literature review and a holistic analysis of current work in developing AI systems from an HCI perspective. Our review and analysis highlight the new changes introduced by AI technology and the new challenges that HCI professionals face when applying the human-centered AI (HCAI) approach in the development of AI systems. We also identified seven main issues in human interaction with AI systems, which HCI professionals did not encounter when developing non-AI computing systems. To further enable the implementation of the HCAI approach, we identified new HCI opportunities tied to specific HCAI-driven design goals to guide HCI professionals addressing these new issues. Finally, our assessment of current HCI methods shows the limitations of these methods in support of developing HCAI systems. We propose the alternative methods that can help overcome these limitations and effectively help HCI professionals apply the HCAI approach to the development of AI systems. We also offer strategic recommendation for HCI professionals to effectively influence the development of AI systems with the HCAI approach, eventually developing HCAI systems.
... Unfortunately, the development of AI systems is mainly driven by a "technology-centered design" approach (e.g., Shneiderman, 2020aShneiderman, , 2020bXu, 2019a;Zheng et al., 2017). Many AI professionals are primarily dedicated to studying algorithms, rather than providing useful AI systems to meet user needs, resulting in the failure of many AI systems (Hoffman et al., 2016;Lazer et al., 2014;Lieberman, 2009;Yampolskiy, 2019). Specifically, the AI Incident Database has collected more than 1000 AI related accidents (McGregor, 2021), such as an autonomous car killing a pedestrian, a trading algorithm causing a market "flash crash" where billions of dollars transfer between parties, and a facial recognition system causing an innocent person to be arrested. ...
... Human-machine teaming also has a "double-edged sword" effect. For example, on the one hand, AI technologies (e.g., deep machine learning, big data of collective domain knowledge) can help human decision-making operations be more effective way under some operating scenarios, than individual operators using non-AI systems; on the other hand, if the human-centered approach is not followed in the development of AI systems, there is no guarantee that humans have the final decision-making authority of the systems in unexpected scenarios, and the potential unexpected and indeterministic outcome of the systems may cause ethical and safety failures (McGregor, 2021;Yampolskiy, 2019). Thus, AI technology has brought in new challenges and opportunities for HCI design. ...
... However, the behavioral outcome of AI systems could be non-deterministic and unexpected. Researchers are raising the alarm about the unintended consequences, which can produce negative societal effects (e.g., McGregor, 2021;Yampolskiy, 2019). Machine behavior in AI systems also has a special ecological form (Rahwan et al., 2019). ...
Preprint
Full-text available
While AI has benefited humans, it may also harm humans if not appropriately developed. The priority of current HCI work should focus on transiting from conventional human interaction with non-AI computing systems to interaction with AI systems. We conducted a high-level literature review and a holistic analysis of current work in developing AI systems from an HCI perspective. Our review and analysis highlight the new changes introduced by AI technology and the new challenges that HCI professionals face when applying the human-centered AI (HCAI) approach in the development of AI systems. We also identified seven main issues in human interaction with AI systems, which HCI professionals did not encounter when developing non-AI computing systems. To further enable the implementation of the HCAI approach, we identified new HCI opportunities tied to specific HCAI-driven design goals to guide HCI professionals addressing these new issues. Finally, our assessment of current HCI methods shows the limitations of these methods in support of developing HCAI systems. We propose the alternative methods that can help overcome these limitations and effectively help HCI professionals apply the HCAI approach to the development of AI systems. We also offer strategic recommendations for HCI professionals to effectively influence the development of AI systems with the HCAI approach, eventually developing HCAI systems.
... With current AI technologies, harm done by AIs is limited to power that we put directly in their control. As said in Reference [1], "For Narrow AIs, safety failures are at the same level of importance as in general cybersecurity, but, for AGI, it is fundamentally different." Despite AGI (artificial general intelligence) still being well out of reach, the nature of AI catastrophes has already changed in the past two decades. ...
... Large collections of AI failures and systems to categorize them have been created before [1,20]. In Reference [20], the classification schema details failures by problem source (such as design flaws, misuse, equipment malfunction, etc.), consequences (physical, mental, emotional, financial, social, or cultural), scale of consequences (individual, corporation, or community), and agency (accidental, negligent, innocuous, or malicious). ...
... Virtual machines, often used as an additional layer of security, are also susceptible to a wide range of exploits [38]. This illustrates a more general concern which is the AI acting outside of the output space that it was designed to work with, seen in many of the AI failures in Reference [1]. ...
Article
Full-text available
As AI technologies increase in capability and ubiquity, AI accidents are becoming more common. Based on normal accident theory, high reliability theory, and open systems theory, we create a framework for understanding the risks associated with AI applications. This framework is designed to direct attention to pertinent system properties without requiring unwieldy amounts of accuracy. In addition, we also use AI safety principles to quantify the unique risks of increased intelligence and human-like qualities in AI. Together, these two fields give a more complete picture of the risks of contemporary AI. By focusing on system properties near accidents instead of seeking a root cause of accidents, we identify where attention should be paid to safety for current generation AI systems.
... It becomes an important problem to understand and forecast both the negative and positive impact of AI. Literature such as [203,209,412] has collected and studied representative AI misuse cases and summarized highstake patterns which AI failures tend to follow [412], such as vulnerability to attacks, underperformance on noises, and biased prediction. Understanding these patterns naturally derives a number of specific requirements of AI trustworthiness, such as robustness, generalization, and fairness. ...
... It becomes an important problem to understand and forecast both the negative and positive impact of AI. Literature such as [203,209,412] has collected and studied representative AI misuse cases and summarized highstake patterns which AI failures tend to follow [412], such as vulnerability to attacks, underperformance on noises, and biased prediction. Understanding these patterns naturally derives a number of specific requirements of AI trustworthiness, such as robustness, generalization, and fairness. ...
Preprint
Full-text available
Fast developing artificial intelligence (AI) technology has enabled various applied systems deployed in the real world, impacting people's everyday lives. However, many current AI systems were found vulnerable to imperceptible attacks, biased against underrepresented groups, lacking in user privacy protection, etc., which not only degrades user experience but erodes the society's trust in all AI systems. In this review, we strive to provide AI practitioners a comprehensive guide towards building trustworthy AI systems. We first introduce the theoretical framework of important aspects of AI trustworthiness, including robustness, generalization, explainability, transparency, reproducibility, fairness, privacy preservation, alignment with human values, and accountability. We then survey leading approaches in these aspects in the industry. To unify the current fragmented approaches towards trustworthy AI, we propose a systematic approach that considers the entire lifecycle of AI systems, ranging from data acquisition to model development, to development and deployment, finally to continuous monitoring and governance. In this framework, we offer concrete action items to practitioners and societal stakeholders (e.g., researchers and regulators) to improve AI trustworthiness. Finally, we identify key opportunities and challenges in the future development of trustworthy AI systems, where we identify the need for paradigm shift towards comprehensive trustworthy AI systems.
... Organizations that use effective methods are often more highly trusted (Gill et al., 2005;Mayer et al., 1995). Across the various AI tools currently under use, some work well (e.g., Gibney, 2016;Levy, 2009;Liao, 2020), whereas others fail quite completely (Knight, 2016;Yampolskiy, 2019). The relationship between performance and trust (i.e., in a brand or an organization) (Loureiro et al., 2018) is well established within the trust literature and is often referred to an agent's ability to undertake and complete vital tasks (Mayer et al., 1995). ...
Article
Full-text available
The use of artificial intelligence (AI) in hiring entails vast ethical challenges. As such, using an ethical lens to study this phenomenon is to better understand whether and how AI matters in hiring. In this paper, we examine whether ethical perceptions of using AI in the hiring process influence individuals' trust in the organizations that use it. Building on the organizational trust model and the unified theory of acceptance and use of technology, we explore whether ethical perceptions are shaped by individual differences in performance expectancy and social influence and how they, in turn, impact organizational trust. We collected primary data from over 300 individuals who were either active job seekers or who had recent hiring experience to capture perceptions across the full range of hiring methods. Our findings indicate that performance expectancy, but not social influence, impacts the ethical perceptions of AI in hiring, which in turn influence organizational trust. Additional analyses indicate that these findings vary depending on the type of hiring methods AI is used for, as well as on whether participants are job seekers or individuals with hiring experience. Our study offers theoretical and practical implications for ethics in HRM and informs policy implementation about when and how to use AI in hiring methods, especially as it pertains to acting ethically and trustworthily.
... Model checking is a well-established way to assure a safety-critical CPS, such as a self-driving car, by searching for violations of a safety property in the reachable states of a system model. Recent advances in perception based on machine learning (ML) have challenged model checking with unpredictable, difficult-to-model behaviors [3,12,30]. The uncertainties of ML-based perception and the environment where it is deployed call for probabilistic model checking (PMC) [13,20,28], which computes a probability that a property holds (e.g., the chance to avoid a collision). ...
Preprint
Full-text available
Autonomous systems with machine learning-based perception can exhibit unpredictable behaviors that are difficult to quantify, let alone verify. Such behaviors are convenient to capture in probabilistic models, but probabilistic model checking of such models is difficult to scale -- largely due to the non-determinism added to models as a prerequisite for provable conservatism. Statistical model checking (SMC) has been proposed to address the scalability issue. However it requires large amounts of data to account for the aforementioned non-determinism, which in turn limits its scalability. This work introduces a general technique for reduction of non-determinism based on assumptions of "monotonic safety'", which define a partial order between system states in terms of their probabilities of being safe. We exploit these assumptions to remove non-determinism from controller/plant models to drastically speed up probabilistic model checking and statistical model checking while providing provably conservative estimates as long as the safety is indeed monotonic. Our experiments demonstrate model-checking speed-ups of an order of magnitude while maintaining acceptable accuracy and require much less data for accurate estimates when running SMC -- even when monotonic safety does not perfectly hold and provable conservatism is not achieved.
... A atual disseminação e popularização do termo "inteligência artificial" (IA) [3] tem movido o debate para além da lógica axiomática dos primeiros textos sobre o termo para uma discussão ontológica. Isso faz com que tenhamos uma "filosofia digital" para além de uma formalização do processo de pensamento [4]. ...
Conference Paper
Full-text available
A quantidade de pesquisas e publicações acerca do termo “inteligência” tem aumentado significativamente em diferentes áreas de pesquisa e é clara a ausência de precisão e convergência de conceitos correlatos aplicados a sistemas e produtos da construção civil. Este trabalho tem como objetivo desenvolver uma análise qualitativa indutiva entre as diferentes caracterizações do termo “inteligência” nas áreas de ciência da computação e o termo associado a arquitetura e tecnologia do edifício. Foi realizada uma comparação por argumentação lógica. Como resultado apresentamos critérios conceituais que permitem atribuir a qualidade de inteligente aos edifícios.
... The number of accidents by autonomous vehicles or with other AI systems involved is steadily increasing (Steimers and Bömer 2019). Sadly, the first deadly accident of a self-driving car has already become reality in 2016 (Yampolskiy 2019). The technology of autonomous vehicles is not mature yet, for instance in terms of pedestrian recognition (Turchin and Denkenberger 2020). ...
Conference Paper
Full-text available
In the upcoming years, huge benefits are expected from Artificial Intelligence (AI). However, there are also risks involved in the technology, such as accidents of autonomous vehicles or discrimination by AI-based recruitment systems. This study aims to investigate public perception of these risks, focusing on realistic risks of Narrow AI, i.e., the type of AI that is already productive today. Based on perceived risk theory, several risk scenarios are examined using data from an exploratory survey. This research shows that AI is perceived positively overall. The participants, however, do evaluate AI critically when being confronted with specific risk scenarios. Furthermore, a strong positive relationship between knowledge about AI and perceived risk could be shown. This study contributes to knowledge by advancing our understanding of the awareness and evaluation of the risks by consumers and has important implications for product development, marketing and society.
Article
Artificial intelligence (AI) failures are increasingly common as more and more companies race to implement AI solutions. The implementation of AI and its inevitable malfunctions are an unprecedented type of crisis for corporate communication professionals. This study reviews (1) 23 instances of AI failures, (2) subsequent corporate communication, and (3) resultant media coverage to investigate the various strategies employed to deal with AI failures. We also identify if these strategies lead to positive or negative responses and/or mitigation of the crisis. Results show that several response strategies included in extant crisis response frameworks can be effective in dealing with AI crises, whereas other strategies tend to be unsuccessful. Our analysis also points to the emergence of a crisis communication strategy that takes advantage of the uncertainty surrounding the accountability of AI to mitigate the crisis.
Article
Full-text available
This report surveys the landscape of potential security threats from malicious uses of AI, and proposes ways to better forecast, prevent, and mitigate these threats. After analyzing the ways in which AI may influence the threat landscape in the digital, physical, and political domains, we make four high-level recommendations for AI researchers and other stakeholders. We also suggest several promising areas for further research that could expand the portfolio of defenses, or make attacks less effective or harder to execute. Finally, we discuss, but do not conclusively resolve, the long-term equilibrium of attackers and defenders.
Article
Full-text available
Despite significant developments in proof theory, surprisingly little attention has been devoted to the concept of proof verifiers. In particular, the mathematical community may be interested in studying different types of proof verifiers (people, programs, oracles, communities, superintelligences) as mathematical objects. Such an effort could reveal their properties, their powers and limitations (particularly in human mathematicians), minimum and maximum complexity, as well as self-verification and self-reference issues. We propose an initial classification system for verifiers and provide some rudimentary analysis of solved and open problems in this important domain. Our main contribution is a formal introduction of the notion of unverifiability, for which the paper could serve as a general citation in domains of theorem proving, as well as software and AI verification.
Article
Full-text available
Machine learning is a means to derive artificial intelligence by discovering patterns in existing data. Here we show that applying machine learning to ordinary human language results in human-like semantic biases. We replicate a spectrum of known biases, as measured by the Implicit Association Test, using a widely used, purely statistical machine-learning model trained on a standard corpus of text from the Web. Our results indicate that text corpora contain re-coverable and accurate imprints of our historic biases, whether morally neutral as towards insects or flowers, problematic as towards race or gender, or even simply veridical, reflecting the status quo distribution of gender with respect to careers or first names. Our methods hold promise for identifying and addressing sources of bias in culture, including technology.
Article
Full-text available
Artificial intelligence and machine learning are in a period of astounding growth. However, there are concerns that these technologies may be used, either with or without intention, to perpetuate the prejudice and unfairness that unfortunately characterizes many human institutions. Here we show for the first time that human-like semantic biases result from the application of standard machine learning to ordinary language---the same sort of language humans are exposed to every day. We replicate a spectrum of standard human biases as exposed by the Implicit Association Test and other well-known psychological studies. We replicate these using a widely used, purely statistical machine-learning model---namely, the GloVe word embedding---trained on a corpus of text from the Web. Our results indicate that language itself contains recoverable and accurate imprints of our historic biases, whether these are morally neutral as towards insects or flowers, problematic as towards race or gender, or even simply veridical, reflecting the status quo for the distribution of gender with respect to careers or first names. These regularities are captured by machine learning along with the rest of semantics. In addition to our empirical findings concerning language, we also contribute new methods for evaluating bias in text, the Word Embedding Association Test (WEAT) and the Word Embedding Factual Association Test (WEFAT). Our results have implications not only for AI and machine learning, but also for the fields of psychology, sociology, and human ethics, since they raise the possibility that mere exposure to everyday language can account for the biases we replicate here.
Article
Full-text available
With unprecedented advances in genetic engineering we are starting to see progressively more original examples of synthetic life. As such organisms become more common it is desirable to be able to distinguish between natural and artificial life forms. In this paper, we present this challenge as a generalized version of Darwin's original problem, which he so brilliantly addressed in On the Origin of Species. After formalizing the problem of determining origin of samples we demonstrate that the problem is in fact unsolvable, in the general case, if computational resources of considered originator algorithms have not been limited and priors for such algorithms are known to be equal. Our results should be of interest to astrobiologists and scientists interested in producing a more complete theory of life, as well as to AI-Safety researchers.
Chapter
This chapter surveys eight research areas organized around one question: As learning systems become increasingly intelligent and autonomous, what design principles can best ensure that their behavior is aligned with the interests of the operators? The chapter focuses on two major technical obstacles to AI alignment: the challenge of specifying the right kind of objective functions and the challenge of designing AI systems that avoid unintended consequences and undesirable behavior even in cases where the objective function does not line up perfectly with the intentions of the designers. The questions surveyed include the following: How can we train reinforcement learners to take actions that are more amenable to meaningful assessment by intelligent overseers? What kinds of objective functions incentivize a system to “not have an overly large impact” or “not have many side effects”? The chapter discusses these questions, related work, and potential directions for future research, with the goal of highlighting relevant research topics in machine learning that appear tractable today.
Conference Paper
This paper presents a simple, generic method for automating the play of Nintendo Entertainment System games.
Chapter
In this chapter, we discuss a host of technical problems that we think AI scientists could work on to ensure that the creation of smarter-than-human machine intelligence has a positive impact. Although such systems may be decades away, it is prudent to begin research early: the technical challenges involved in safety and reliability work appear formidable, and uniquely consequential. Our technical agenda discusses three broad categories of research where we think foundational research today could make it easier in the future to develop superintelligent systems that are reliably aligned with human interests:1. Highly reliable agent designs: how to ensure that we built the right system. 2. Error tolerance: how to ensure that the inevitable flaws are manageable and correctable. 3. Value specification: how to ensure that the system is pursuing the right sorts of objectives. Since little is known about the design or implementation details of such systems, the research described in this chapter focuses on formal agent foundations for AI alignment research—that is, on developing the basic conceptual tools and theory that are most likely to be useful for engineering robustly beneficial systems in the future.
Chapter
Superintelligent systems are likely to present serious safety issues, since such entities would have great power to control the future according to their possibly misaligned goals or motivation systems. Oracle AIs (OAI) are confined AIs that can only answer questions and do not act in the world, represent one particular solution to this problem. However even Oracles are not particularly safe: humans are still vulnerable to traps, social engineering, or simply becoming dependent on the OAI. But OAIs are still strictly safer than general AIs, and there are many extra layers of precautions we can add on top of these. This paper begins with the definition of the OAI Confinement Problem. After analysis of existing solutions and their shortcomings, a protocol is proposed aimed at making a more secure confinement environment which might delay negative effects from a potentially unfriendly superintelligence while allowing for future research and development of superintelligent systems.