PreprintPDF Available

Is Artificial Intelligence Ready for Standardization?

Preprints and early-stage research may not have been peer reviewed yet.

Abstract and Figures

Many standards development organizations worldwide work on norms for Artificial Intelligence (AI) technologies and AI related processes. At the same time, many governments and companies massively invest in research on AI. It may be asked if AI research has already produced mature technologies and if this field is ready for standardization. This article looks at today's situation of AI in the context of needs for standardization. The International Organization for Standardization (ISO) runs a standardization project on AI since 2018. We give an up-to-date overview of the status of this work. While a fully comprehensive survey is not the objective, we describe a number of important aspects of the standardization work in AI. In addition, concrete examples for possible items of AI standards are described and discussed. From a scientific point of view, there are many open research questions that make AI standardization appear to be premature. However, our analysis shows that there is a sound basis for starting to work on AI standardization as being undertaken by ISO and other organizations.
Content may be subject to copyright.
Is Artificial Intelligence Ready for
Thomas Zielke
usseldorf University of Applied Sciences, Germany
Abstract. Many standards development organizations worldwide work
on norms for Artificial Intelligence (AI) technologies and AI related pro-
cesses. At the same time, many governments and companies massively
invest in research on AI. It may be asked if AI research has already
produced mature technologies and if this field is ready for standardiza-
tion. This article looks at today’s situation of AI in the context of needs
for standardization. The International Organization for Standardization
(ISO) runs a standardization project on AI since 2018. We give an up-
to-date overview of the status of this work. While a fully comprehensive
survey is not the objective, we describe a number of important aspects
of the standardization work in AI. In addition, concrete examples for
possible items of AI standards are described and discussed. From a sci-
entific point of view, there are many open research questions that make
AI standardization appear to be premature. However, our analysis shows
that there is a sound basis for starting to work on AI standardization as
being undertaken by ISO and other organizations.
Keywords: Artificial Intelligence ·Standardization ·ISO SC 42 ·Trust-
worthiness ·AI Robustness ·Machine Learning
1 Introduction
The research field Artificial Intelligence (AI) can be traced back to the 1950s.
The famous monograph ”Computing Machinery and Intelligence” by Alan Tur-
ing [57], the Dartmouth Summer Research Project of 1956 [35], and the invention
of the Perceptron [40] are distinctive examples of inspirations for a new research
field that today seems omnipresent. Only in resent years it can be claimed that
AI as a technology fulfills many expectations that had already been stated more
than 60 years ago. However, the term artificial intelligence is still being contro-
versially debated and to date no common understanding exists as to methods
and technologies that make a system or a software solution intelligent [53].
AI as a research field has not a history of steady progress. The field regularly
experienced ”AI winters”, stages when technology, business, and the media get
out of their warm and comfortable bubble, cool down, temper their sci-fi specu-
lations and unreasonable hypes, and come to terms with what AI can or cannot
really do as a technology [20]. Currently we observe the opposite of an AI winter
2 Th. Zielke
and this time, even the automotive industry seems to be convinced of the tech-
nological and commercial potentials of AI [26]. However, existing standards for
the regulation of functional safety, in particular ISO 26262, are not compatible
with typical AI methods, e.g. methods for machine learning [42]. This is no sur-
prise as a common characteristic of AI systems is dealing with uncertain data in
a context that bears uncertainty, while the results may be associated with some
degree of uncertainty too [23].
Despite its long history, its impressive progress in recent years, and many
existing real-world applications, AI still is an emergent technology. The gen-
eral economic benefits from international standards are well investigated [10].
In emerging technologies, the benefits of standardization are less obvious and
there is a risk of hindering innovation by inflexible and/or quickly outdated
standards. There are partially conflicting interests of the stakeholders. Startups
want to fully exploit their technological head start, rapidly create new prod-
ucts, and gain market shares. Established companies need new standards for
investment decisions, as best practice guidelines for their development depart-
ments, and as an orientation for hesitant customers. Researchers have their own
culture of defining the state of the art and sometimes regard early industry stan-
dards on their subject of research as a restriction of freedom. Policymakers ask
for respective technical standards when regulations become an issue. For the
standards development organizations, like ISO and DIN (German Institute for
Standardization) for example, initiatives for new standards are a core business
in accordance with their mission, also securing their financing.
In contrast to the scientific discourse, standardization seeks consensus. This
is actually a strong reason why the work on standards has a positive impact
on emerging technologies. Standards establish common vocabularies and agreed
definitions of terms. Standards also contribute to a more effective dissemination
of innovation and they increase the confidence of investors and customers [38].
2 Objectives and Context of this Research
This article looks at current work on the creation of international standards for
AI. In 2018, the International Organization for Standardization (ISO) and the
International Electrotechnical Commission (IEC) started a project on AI stan-
dardization by founding the subcommittee ISO/IEC JTC 1 / SC 42 Artificial
intelligence 1. The author is a founding member of the interdisciplinary DIN
Working Committee ”Artificial Intelligence” [15] which represents Germany in
the ISO/IEC JTC 1 / SC 42 . He is also an active member of several SC42 work-
ing groups.
Although the foundation of the SC 42 and many associated national commit-
tees seems to indicate that AI is ready for standardization, it can be argued that
past attempts at AI standardization were unsuccessful and that AI technology
still lacks the level of trust needed for widely agreed standards [30]. What is
1 ,
Is Artificial Intelligence Ready for Standardization? 3
different today, compared to the situation twenty-five years ago when the first
efforts to create ISO standards for AI were made (see e.g. [44]) ? It may also
be asked if a technology is ready for standardization at a stage of development
where massive investments in research are needed and actually being announced
by many governments. The European Union alone wants to spend e20 billion
per year by the end of 2020 [17].
Besides analyzing the situation with respect to questions like the ones given
above, this article provides firsthand information on the current international
activities on AI standardization. It is not the intention to provide a compre-
hensive overview of the standardization work of the SC 42 , nor would this be
possible within the scope of a conference paper. However, the general goals of
the main working groups are briefly described. In addition, concrete examples
are given for topics that are likely to be covered by future standards in AI.
One objective of this survey on AI standardization work is to prepare the
ground for answering the question in the title ”Is Artificial Intelligence Ready for
Standardization?”. There may be several different valid answers to this question
depending on the expectations for the standardization outcomes. Therefore this
article also investigates on some crucial technical issues that differentiates AI
standardization from other standards.
AI receives more public and political attention than most other technologies
because it is expected to have an impact on everyone’s life in the long run. Floridi
et al. [21] put it this way: AI is not another utility that needs to be regulated once
it is mature. It is a powerful force, a new form of smart agency, which is already
reshaping our lives, our interactions, and our environments.
This has consequences for the standardization work in AI. Even more than in
other areas of information and communication technology, the compatibility of
technology and the values of a democratic society has to be taken into account
[31], at least from a European perspective. This article focusses on the technical
aspects related to AI standardization. The reader should be aware that ethical
and societal concerns are an important part of the SC 42 work too.
3 ISO/IEC JTC 1 / SC 42 Artificial Intelligence
In November 2017, the Technical Management Board (TMB) of ISO decided that
the Joint Technical Committee ”Information Technology” (JTC 1) should found
a subcommittee (SC) on Artificial Intelligence. The inaugural plenary meeting
of the new SC 42 took place in Beijing, China, in April 2018. The scope of work
of SC 42 is ”Standardization in the area of Artificial Intelligence”, specifically:
Serve as the focus and proponent for JTC 1 ’s standardization program on
Artificial Intelligence
Provide guidance to JTC 1 , IEC, and ISO committees developing Artificial
Intelligence applications
Originally JTC 1 recommended that SC 42 should cover the main topics foun-
dational standards, computational methods, trustworthiness, and societal con-
cerns. The structure of the SC 42 as of March 2020 is shown in Fig. 1. The main
4 Th. Zielke
Joint Working Group SC 42/40:
Governance implications of AI
(WG 1)
(WG 3)
Computational approaches and
computational characteristics of
AI systems (WG 5)
Big Data
(WG 2)
Use cases and
(WG 4)
Advisory Groups: AI Management Systems
Standard ! AI Systems Engineering
Ad Hoc Working Groups:
Dissemination and outreach ! Liaison with SC 38 ! Intelligent systems engineering
Fig. 1. Structure of the SC 42 as of March 2020. The illustration shows the main
working groups (WG). There are also a joint working group (JWG) with SC 40 ”IT
Service Management and IT Governance”, two advisory groups (AG), and three ad hoc
working groups (AHG). SC 38 is on ”Cloud Computing and Distributed Platforms”.
working groups are on foundational standards (WG1), trustworthiness (WG3),
use cases and applications (WG 4), computational approaches and computational
characteristics of AI systems (WG5), and big data (WG 2), which used to be
covered by a separate working group under JTC 1 . Societal concerns has become
a subtopic of WG 3.
3.1 Foundational Standards
A basic objective of standardization is the definition of common terms. When
looking at terms relating to AI, the term artificial intelligence itself is a primary
subject of discussion. The Merriam-Webster dictionary offers these definitions:
1) a branch of computer science dealing with the simulation of intelligent be-
havior in computers 2) the capability of a machine to imitate intelligent human
behavior 2. From a technical point of view there are two problems with defini-
tions like that. Firstly, it does not explain what ”intelligent” is. Secondly, it refers
to capabilities of humans that are neither defined nor objectively measurable.
A useful reflection on definitions of AI can be found in [53]. WG 1 attempts to
find a workable definition by consensus. Although the concrete wording of the AI
definition may not be highly crucial for the quality of the future SC 42 standards,
there is a definite need for an AI definition in industry.
Is Artificial Intelligence Ready for Standardization? 5
patents with "artificial intelligence"
patents containing "intelligence"
patents containing "intelligent"
Fig. 2. Relative monthly numbers of US patent applications that contain the term
”intelligent”, ”intelligence”, or ”artificial intelligence” respectively. The data analysed
cover all applications since 2001. The coloured curves show moving averages taken over
periods of 6 months. The grey curves show the monthly raw values.
In recent years a steep increase in the usage of the term artificial intelligence
can be observed, in the media, in research work, in marketing material, and in
industrial publications. As an example, we looked at the US patent applications
since 2001. All text is available from the US patent office 3. We counted all patent
applications that mention the terms ”intelligent”, ”intelligence”, or ”artificial
intelligence” respectively at least once. The graphs in Fig. 2 show the respective
monthly percentages of all patent applications in the period between March 2001
and March 2020. There is a remarkable exponential increase in recent years.
More than the wording of an AI definition, the description of the concepts,
methods, and best practices for AI are important. The chapter titles in the 1982
textbook ”Principles of Artificial Intelligence” by Nils J. Nilson [36] contain the
following terms: production systems, search strategies, predicate calculus, reso-
lution refutation, plan-generation systems, structured object representation. As
far as these topics are still regarded AI, they belong to a category called Sym-
bolic AI [52]. In symbolic AI, goals, beliefs, knowledge, and so on, as well as their
interrelationships, are all formalized as symbolic structures. The report ”Arti-
ficial Intelligence Concepts and Terminology” by WG 1 [45], mentions symbolic
AI briefly. It is referred to as ”classical AI”. Forty years ago, classical AI was AI
mainstream, focussing on very different methods and techniques than todays’s
AI which is dominated by machine learning. Modern AI predominantly is Con-
nectionist AI, a term coined in the 1980s [18]. In the technical literature of the
6 Th. Zielke
last two decades it has not much been used any more. Therefore Flasinski [19] is
probably right in categorizing connectionist AI under Computational Intelligence
(CI). He states the following common features of CI methods:
numeric information is basic in a knowledge representation,
knowledge processing is based mainly on numeric computation,
usually knowledge is not represented in an explicit way.
Cognitive Computing (CC) is another relevant term here. It is sometimes used
interchangeable with AI and CI. CC provides an interesting example of possible
conflicts when defining a terminology in standardization. The term has been
taken over by IBM ( as an umbrella term for the marketing of all
their products and services that somehow use AI technologies [60]. Originally,
CC was meant as a notion for engineering complex intelligent systems that in-
corporate many features and technologies of AI [41]. The definition given by [45]
covers technologies that uses natural language processing and machine learning
to enable people and machines to interact more naturally to extend and magnify
human expertise and cognition.
Machine learning (ML) and related topics are the current main focus of the
SC 42 standardization work, specifically of the work of WG1 on foundational
standards. We deal with ML in the following subsection. The overview report
[45] on foundational standards is structured by the following topics: functional
view of AI systems, applications of AI, AI ecosystems, AI concepts, and AI
systems categories.
Machine Learning (ML) Early AI was knowledge-based [27]. Today’s AI is
data-driven [61]. ML is the discipline that provides models and methods for the
transformation of data into task-specific knowledge. Most of the success of AI
in recent years can be attributed to ML. Face recognition is an example for a
prominent AI task that has a long research history. The best recognition rates on
a popular benchmark test went up from ca. 70% when using methods without ML
to more than 97% when deep learning was applied [59]. The report ”Framework
for Artificial Intelligence (AI) Systems Using Machine Learning (ML)” by WG 1
[46] intends to establish a framework for describing a generic AI system using ML
technology. The framework describes the system components and their functions
in the AI ecosystem. Under this scope, the report deals with the ML terminology,
subsystems, approaches and training data, pipeline, and the ML process.
As an example of the work on the definition and classification of ML ap-
proaches, concepts, and methods, we briefly look at the taxonomy for ML meth-
ods. Fig. 3 shows four different groups of ML methods. The main categories
are supervised and unsupervised, i.e. methods that need labelled data for train-
ing and methods that work with unlabelled data [6]. There are also hybrid or
semi-supervised methods [54]. Also not shown in the figure is reinforcement learn-
ing, a third major category of ML methods. In reinforcement learning there are
model-free and model-based methods, where model refers to a possible predictive
model of an unknown environment that a learning agent interacts with. Learn-
ing works through trial-and-error interactions [32] or by maximizing a numerical
Is Artificial Intelligence Ready for Standardization? 7
©Thomas Zielke
Fig. 3. Taxonomy of ML methods inspired by [34] and matched with many other ML
resources. This may not be the taxonomy that WG1 eventually adopts. The categories
supervised and unsupervised are generally accepted. There is also the category of semi-
supervised methods. Not all of the established methods could be listed in the diagram.
reward signal in doing so [55]. The field of ML research has produced much more
methods than could be listed in the figure. The selection shown reflects the pop-
ularity and the distinctiveness of the respective methods. Fig. 3 also classifies the
methods according to their respective suitability for specific tasks: classification,
regression, clustering, dimension reduction. One may argue that anomaly detec-
tion is an important additional task category. However, it may also come under
classification or clustering. For all categories in Fig. 3 there are methods based
on artificial neural networks (ANN). ANNs, in particular deep neural networks
(DNNs) provide a generic architecture for the data-driven approach to AI.
3.2 Working Groups 2 – 5
Big Data A few years ago, JTC 1 established a program of work on “big data”
through its working group WG 9. This work has been transferred to SC 42 and
assigned to WG 2. Due to the history of big data within JTC 1 , WG 2 is the
only working group of SC 42 that already has published ISO standards, e.g. [4].
Trustworthiness WG 3 on trustworthiness has the following main tasks: a) in-
vestigate approaches to establish trust in AI systems through transparency, ver-
ifiability, explainability, controllability b) investigate engineering pitfalls and as-
sess typical associated threats and risks to AI systems with their mitigation
8 Th. Zielke
techniques and methods c) investigate approaches to achieve AI systems’ ro-
bustness, resiliency, reliability, accuracy, safety, security, privacy.
Trustworthiness may be defined as the degree to which a user or other stake-
holder has confidence that a product or system will behave as intended. From
a perspective that not only considers technical aspects, trustworthiness can be
described following [50]:
Ability is the capability the AI system to do a specific task (robustness,
safety, reliability, etc.).
Integrity may be viewed as the insurance that information will not be ma-
nipulated in a malicious way by the AI system (completeness, accuracy,
certainty, consistency, etc.).
Benevolence is the extent to which the AI system is believed to do good, or
in other terms, to what extent the “Do No Harm” principle is respected.
The first publications of WG 3 are the reports ”Overview of trustworthiness
in artificial intelligence” [50], ”Assessment of the robustness of neural networks”
[47], ”Bias in AI systems and AI aided decision making” [48], and ”Overview
of ethical and societal concerns” [49]. Robustness is a topic of particular high
concern. Section 4 deals with that in more detail.
Use Cases and Applications WG 4 has the following main tasks: a) identify
different AI application domains and the different context of their use b) de-
scribe applications and use cases using the terminology and concepts defined in
ISO/IEC AWI 22989 and ISO/IEC AWI 23053 and extend the terms as neces-
sary c) collect and identify societal concerns related to the collected use cases.
The first publication of WG 4 is the report ”Use cases and applications” [51].
Computational Approaches and Computational Characteristics of AI
Systems The initial task of WG 5 has been to develop a technical report with
the title ”Overview of computational approaches for AI systems”. Its scope is the
state of the art of computational approaches for AI systems, describing: a) main
computational characteristics of AI systems b) main algorithms and approaches
used in AI systems, referencing use cases contained in the report of WG 4 [51].
4 Robustness of AI
The integration of AI components into products and industrial processes is cur-
rently limited to industries that do not have requirements for rigorous software
verification. Software verification is an essential part of many industrial pro-
cesses. The objective is to ensure both safety and performance of the software
in all parts of the system. In some domains, the software verification process is
also an important part of system certification, e.g. ISO 26262 in the automotive
industry [3]. While many methods exist for validating non-AI systems, they are
mostly not directly applicable to AI systems, and neural networks in particular.
Is Artificial Intelligence Ready for Standardization? 9
The problem is widely referred to as robustness of AI. Robustness is used as
a general term for describing properties that are required for the acceptance of
new high-stakes AI applications [14]. Many recent publications on this problem
deal with so-called adversarial examples that cause malfunctions of a deep neural
network model although the respective input patterns are very similar to valid
data [7]. In practice, robustness has to be defined goal-oriented in the context
of the respective application domain. Typical examples of robustness goals in
machine learning applications are:
Adherence to certain thresholds on a set of statistical metrics that need to
hold on the validation data.
Invariance of the functional performance w.r.t. certain types of data pertur-
bations [28].
Invariance of the functional performance w.r.t. systematic variations in the
input data, e.g. measurement drifts [22] or operating conditions [16].
Stability of training outcomes under small variations of the training data
and with different training runs under stochastic influences [8].
Consistency of the model output for similar input data (resistance to adver-
sarial examples) [7].
Traditional machine learning models with few parameters (shallow models)
are better suited to meet robustness goals than complex (deep) models. But
DNNs are heavily contributing to the success of AI. Deep models can be effec-
tively trained while yielding superior generalization [9]. The complexity of deep
models poses a risk in terms of robustness. Standardization is a way to manage
that risk and to enable industry to use deep models without compromising on
safety or other aspects related to robustness. The report ”Assessment of the Ro-
bustness of Neural Networks” by WG 3 [47] suggests that the state-of-the-art in
statistical and empirical methods for the assessment of robustness is sufficient for
the development of standards. There are also formal methods for the assessment
of robustness which potentially are most suitable for safety critical systems.
Neural network architectures, in particular DNN, represent a specific chal-
lenge as they are both hard to explain and sometimes have unexpected behavior
due to their nonlinear nature and the large number of model parameters. For this,
formal methods for the verification of neural networks do not play a significant
role in practice yet, as stated by Guidotti [25]: In spite of the extensive research
done on NNs verification the state-of-the-art methods and tools are still far from
being able to successfully verify the corresponding state-of-the-art NNs. Because
of the potential importance of formal methods for robustness verification, WG 3
will work on this topic with a dedicated project in the future.
4.1 Example for an Empirical Approach to Testing: Field Trials
Many aspects have to be studied for the establishment of trust in AI systems, but
the number of feasible approaches for analyzing a black box system’s behavior
and performance are limited. AI systems typically consist of software to a large
10 Th. Zielke
extent. Much of it is not a black box and therefore software testing standards can
be applied. ISO 29119 [2] describes the primary goals of software tests: Provide
information about the quality of the test item and any residual risk in relation to
how much the test item has been tested; to find defects in the test item prior to
its release for use; and to mitigate the risks to the stakeholders of poor product
quality. These goals are very difficult to achieve for all parts of a typical AI
systems, and only as of late, testing of AI systems is being researched on [37].
While AI is being considered as a tool for Software Process Improvement (SPI),
e.g. as a support tool for software test management [39], new approaches will
have to be developed to test AI software itself and software testers need new tools
and practices for validating that an AI software complies with the requirements
Defects and poor product quality are concerns when testing AI systems as
much as with conventional systems. However, the failure of an AI system in
a functional test may not be related to a ”software bug” or an erroneous de-
sign, AI systems showing occasional malfunctions may be regarded useful for
their intended purpose, and the efficacy of an AI system may not be measurable
by conventional approaches to software testing. Another fundamental difference
between many AI systems and conventional systems is, that the latter are de-
veloped, produced, and quality controlled to strictly meet certain specifications.
AI systems, in contrast, may reveal their degree of efficacy during deployment
only, as is the case with systems like Amazon’s Alexa and Apple’s HomePod,
for example. This often applies to AI systems that operate in interaction with
or dependency of natural environments and humans.
How to deal with the uncertainty of a product’s efficacy and the risks of its
deployment are subjects of many regulations in the medical domain. Medical
AI systems have to comply with DIN/EN/ISO 14155 [1]. They have to undergo
”clinical investigations”, a procedure that resembles ”clinical trials” [43]. For
non-medical AI systems, field trials have for long been a recognized means of
comparing and proving the performance of solutions. Some prominent examples
are: facial recognition trials [11], tests of decision support systems for agricultural
applications [12], practice for testing driverless cars [58], and tests of speech and
voice recognition systems [33]. Field trials for AI systems greatly vary w.r.t.
methodology, number of users or use samples involved, status of the responsible
organization/persons, and documentation of the results.
A good practice guideline for field trials is a concrete example of a possible
international standard in AI. In analogy to clinical investigations of medical
devices, a standard on field trials could specify general requirements intended to
protect the rights, safety and well-being of human participants,
ensure the scientific conduct of the field trial and the credibility of the in-
vestigation results,
define the responsibilities of the sponsor and principal investigator, and
assist sponsors, investigators, regulatory authorities and other bodies in-
volved in the conformity assessment of AI systems.
Is Artificial Intelligence Ready for Standardization? 11
5 Discussion
The results of this work can be structured according to the four basic functions
of technology standards described by Tassey [56]:
Variety Reduction
The current work on AI standards mainly addresses quality/reliability
and information. The WG 1 report [45] shows that there is a rich set of terms
and definitions that are specific for AI technologies and applications. Section
3.1 describes examples for definitions and taxonomy. The WG 3 report [47] is
a promising basis for the development of standards for measuring the quality
and reliability of AI systems. WG3 also has a project on AI risk management,
aiming at a standard that could pave the way for certification processes. Section
4 describes concrete examples for dealing with quality and reliability in AI.
In terms of the first two basic functions of technology standards in Tassey’s
list, it is justified to say that AI is ready for standardization.
Compatibility/interoperability has not much been in the focus of AI
standardization yet. Progress in this area is mainly be driven by the open source
community. The ONNX initiative (Open Neural Network Exchange)4, for ex-
ample, tries to create an open standard for machine learning interoperability.
NNEF (Neural Network Exchange Format)5is another example. However, these
exchange formats do not yet address features such as scalable and incremental
updates and compression. There is a standard under development on compres-
sion of neural networks for multimedia content description and analysis [5]. The
responsible SC 29 has a liaison with SC 42 for a joined project on neural network
The vitality of AI as a field of research indicates that the fourth function of
technology standards, namely variety reduction, may not be a realistic goal
in the foreseeable future. However, it is important for industry that developers
get some guidance on the choice of models, methods, and algorithms in AI.
As has been shown in this article, certain concepts, definitions, methods, and
procedures of AI are ready for standardization. For several essential topics, the
scientific basis is not sufficiently solid yet. The following examples for research
needs can be given:
Formal methods for the verification of deep neural networks or for the as-
sessment of their robustness [29].
Architectures and training methods for robust solutions based on deep neural
networks [8].
Methods and tools for generating comprehensible explanations for AI-based
decision processes [24].
12 Th. Zielke
For many experts, the work on standards for AI is not only about the four
functions or objectives discussed above. AI technologies have the potential of
reshaping our lives, our interactions, and our environments [21]. There is the
expectation that international AI standards also address ethical and societal
issues. The way this can be done is limited by the nature of international techni-
cal standards: Any bias toward value-sets that are specific for certain cultures or
countries has to be avoided. However, there is an official ISO document named
”Guidance on social responsibility” [49] which is not intended to be interpreted
as an “international standard”, “guideline” or “recommendation”. SC 42 WG 3
is going to publish a document on ”ethical and societal concerns” [49].
6 Conclusion
This article describes results and related background information from research
into the current state of the international standardisation of artificial intelli-
gence. It is an up-to-date overview of the current work at the International
Organization for Standardization (ISO) on the development of standards for AI.
Exemplarily, several important topics of AI standardization are elaborated on
in detail, e.g. the definition of the terms, the taxonomy of machine learning, and
the assessment of robustness of AI systems. Observing an exponential increase
in the usage of the term artificial intelligence in the patent literature, it can
be concluded that the market for technical solutions based on AI is no longer
a niche. Consequently, technical standards for AI are needed. Given the long
development time needed for ISO standards, it seems acceptable that certain
important topics, e.g. assessment of trustworthiness of AI, still have a shallow
scientific basis. The motivating question for this article ”Is Artificial Intelligence
Ready for Standardization?” may best be answered by ”AI standardization is
ready for takeoff”.
We would like to thank Dominic Kalkbrenner and Jens Lippel for programming
the data analytics on the bulk data from the US patent office. Dr. Andreas Riel
provided valuable feedback and discussions on the structure of the paper and
the relevance for the automotive industry.
1. ISO 14155:2011, clinical investigation of medical devices for human subjects - good
clinical practice (2011)
2. ISO/IEC/IEEE 29119-1:2013, software and systems engineering - software testing
- part 1:concepts and definitions (2013)
3. ISO 26262-1:2018, road vehicles - functional safety - part 1: Vocabulary (2018)
4. ISO/IEC 20546:2019, information technology - big data - overview and vocabulary
Is Artificial Intelligence Ready for Standardization? 13
5. ISO/IEC WD 15938-17 , multimedia content description interface - part 17: Com-
pression of neural networks for multimedia content description and analysis (2020)
6. Ang, J.C., Mirzal, A., Haron, H., Hamed, H.N.A.: Supervised, unsupervised, and
semi-supervised feature selection: A review on gene selection. IEEE/ACM Trans-
actions on Computational Biology and Bioinformatics 13(5), 971–989 (sep 2016)
7. Bastani, O., Ioannou, Y., Lampropoulos, L., Vytiniotis, D., Nori, A.V., Criminisi,
A.: Measuring neural net robustness with constraints. In: Proc. 30th Intern. Conf.
on Neural Inform. Process. Systems. pp. 2621 – 2629. Curran Associates Inc. (2016)
8. Becker, M., Lippel, J., Stuhlsatz, A., Zielke, T.: Robust dimensionality reduction
for data visualization with deep neural networks. Graphical Models 108, 101060
(mar 2020).
9. Belkin, M., Hsu, D., Ma, S., Mandal, S.: Reconciling modern machine-learning
practice and the classical bias–variance trade-off. Proceedings of the National
Academy of Sciences 116(32), 15849–15854 (jul 2019)
10. Blind, K., Jungmittag, A., Mangelsdorf, A.: The economic benefits of standardis-
ation. An update of the study carried out by DIN in 2000. DIN Berlin (01 2012)
11. BSI: An investigation into the performance of facial recognition systems relative
to their planned use in photo identification documents - BioP I. Tech. rep., Bunde-
samt f¨ur Sicherheit in der Informationstechnik (BSI), Bundeskriminalamt (BKA),
secunet AG (Apr 2004),
12. Burke, J., Dunne, B.: Field testing of six decision support systems for scheduling
fungicide applications to control mycosphaerella graminicola on winter wheat crops
in ireland. The Journal of Agricultural Science 146(04) (jan 2008)
13. Cunningham, S., Gambo, J., Lawless, A., Moore, D., Yilmaz, M., Clarke, P.M.,
O’Connor, R.V.: Software testing: A changing career. In: Communications in Com-
puter and Information Science, pp. 731–742. Springer International Publishing
14. Dietterich, T.G.: Steps toward robust artificial intelligence. AI Magazine 38(3),
3–24 (2017)
15. DIN: Interdisciplinary DIN working committee ”artificial intelligence”.
intelligence–329556 (2018)
16. Duthon, P., Bernardin, F., Chausse, F., Colomb, M.: Benchmark for the robustness
of image features in rainy conditions. Machine Vision and Applications 29(5), 915–
927 (jun 2018).
17. EU: Funding for AI,
research-area/industrial-research-and- innovation/
key-enabling-technologies/artificial-intelligence- ai_en
18. Feldman, J.A., Ballard, D.H.: Connectionist models and their properties. Cognitive
Science 6(3), 205–254 (1982). 1
19. Flasinski, M.: Introduction to Artificial Intelligence. Springer (2016)
20. Floridi, L.: AI and its new winter: from myths to realities. Philosophy & Technology
33(1), 1–3 (feb 2020).
21. Floridi, L., Cowls, J., Beltrametti, M., Chatila, R., Chazerand, P., Dignum, V.,
Luetge, C., Madelin, R., Pagallo, U., Rossi, F., Schafer, B., Valcke, P., Vayena,
E.: AI4people—an ethical framework for a good AI society: Opportunities, risks,
principles, and recommendations. Minds and Machines 28(4), 689–707 (nov 2018)
22. Gama, J., ˇ
Zliobait˙e, I., Bifet, A., Pechenizkiy, M., Bouchachia, A.: A survey on
concept drift adaptation. ACM Computing Surveys 46(4), 1–37 (mar 2014)
14 Th. Zielke
23. Ghahramani, Z.: Probabilistic machine learning and artificial intelligence. Nature
521(7553), 452–459 (may 2015).
24. Goebel, R., Chander, A., Holzinger, K., Lecue, F., Akata, Z., Stumpf, S., Kieseberg,
P., Holzinger, A.: Explainable AI: The new 42? In: Lecture Notes in Computer
Science, pp. 295–303. Springer International Publishing (2018)
25. Guidotti, D.: Enhancing neural networks through formal verification. In: Alviano,
M., Greco, G., Maratea, M., Scarcello, F. (eds.) Discussion and Doctoral Con-
sortium papers of AI*IA 2019 - 18th Intern. Conf. of the Italian Association for
Artificial Intelligence, Rende, Italy, November 19-22, 2019. CEUR Workshop Pro-
ceedings, vol. 2495, pp. 107–112. (2019)
26. Hatani, F.: Artificial intelligence in japan: Policy, prospects, and obstacles in the
automotive industry (2020). 15
27. Hayes-Roth, F., Jacobstein, N.: The state of knowledge-based systems. Communi-
cations of the ACM 37(3), 26–39 (mar 1994)
28. Hendrycks, D., Dietterich, T.G.: Benchmarking neural network robustness to com-
mon corruptions and perturbations. In: 7th International Conference on Learning
Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenRe- (2019),
29. Huang, X., Kroening, D., Ruan, W., Sharp, J., Sun, Y., Thamo, E., Wu, M., Yi,
X.: A survey of safety and trustworthiness of deep neural networks. arXiv preprint
arXiv:1812.08342 (2018)
30. Hurlburt, G.: How much to trust artificial intelligence? IT Professional 19(4), 7–11
31. Iversen, E.J., Vedel, T., Werle, R.: Standardization and the democratic design
of information and communication technology. Knowledge, Technology & Policy
17(2), 104–126 (jun 2004).
32. Kaelbling, L.P., Littman, M.L., Moore, A.W.: Reinforcement learning: A survey.
Journal of Artificial Intelligence Research 4, 237–285 (may 1996)
33. Lamel, L., Gauvain, J., Bennacef, S., Devillers, L., Foukia, S., Gangolf, J., Ros-
set, S.: Field trials of a telephone service for rail travel information. In: Proc. of
IVTTA ’96. Workshop on Interactive Voice Technology for Telecommunications
Applications. pp. 111–116. IEEE (1996)
34. Louridas, P., Ebert, C.: Machine learning. IEEE Software 33(5), 110–115 (2016)
35. Moor, J.: The dartmouth college artificial intelligence conference: The next fifty
years. AI Magazine 27(4), 87–87 (2006)
36. Nilsson, N.J.: Principles of Artificial Intelligence. Symbolic computation, Springer-
Verlag Berlin Heidelberg New York (1982)
37. Numan, G.: Testing artificial intelligence. In: Goericke, S. (ed.) The Future of
Software Quality Assurance, pp. 123–136. Springer International Publishing (2020)
38. O’Sullivan, E., Br´evignon-Dodin, L.: Role of standardisation in support of emerging
technologies. Tech. rep., Instit. for Manufacturing, Univ. of Cambridge (Jun 2012)
39. Poth, A., Beck, Q., Riel, A.: Artificial intelligence helps making quality assurance
processes leaner. In: Communications in Computer and Information Science, pp.
722–730. Springer International Publishing (2019)
40. Rosenblatt, F.: The perceptron: A probabilistic model for information storage and
organization in the brain. Psychological Review pp. 65–386 (1958)
41. Rozenblit, J.W.: Cognitive computing: Principles, architectures, and applications.
In: Proc. 19th European Conf. on Modelling and Simulation (ECMS) (2005)
42. Salay, R., Queiroz, R., Czarnecki, K.: An analysis of ISO 26262: Using machine
learning safely in automotive software. CoRR abs/1709.02435 (2017), http://
Is Artificial Intelligence Ready for Standardization? 15
43. Santos, I.C., Gazelle, G.S., Rocha, L.A., Tavares, J.M.R.: Medical device speci-
ficities: opportunities for a dedicated product development methodology. Expert
Review of Medical Devices 9(3), 299–311 (may 2012)
44. SC1: ISO/IEC 2382-31:1997(en) information technology - vocabulary - part 31:
Artificial intelligence - machine learning (1997)
45. SC42 WG1: Artificial intelligence concepts and terminology. Tech. Rep. CD 22989,
ISO/IEC JTC 1/SC 42 Artificial Intelligence (2019)
46. SC42 WG1: Framework for artificial intelligence (AI) systems using machine learn-
ing (ML). Tech. Rep. CD 23053, ISO/IEC JTC 1/SC 42 Artificial Intelligence
47. SC42 WG3: Assessment of the robustness of neural networks - part 1: Overview.
Tech. Rep. CD TR 24029-1, ISO/IEC JTC 1/SC 42 Artificial Intelligence (2019)
48. SC42 WG3: Bias in ai systems and ai aided decision making. Tech. Rep. AWI TR
24027, ISO/IEC JTC 1/SC 42 Artificial Intelligence (2020)
49. SC42 WG3: Overview of ethical and societal concerns. Tech. Rep. AWI TR 24368,
ISO/IEC JTC 1/SC 42 Artificial Intelligence (2020)
50. SC42 WG3: Overview of trustworthiness in artificial intelligence. Tech. Rep. PRF
TR 24028, ISO/IEC JTC 1/SC 42 Artificial Intelligence (2020)
51. SC42 WG4: Use cases and applications. Tech. Rep. CD TR 24030, ISO/IEC JTC
1/SC 42 Artificial Intelligence (2019)
52. Smolensky, P.: Connectionist ai, symbolic ai, and the brain. Artif. Intell. Rev. 1(2),
95–109 (1987).
53. Stone, P., Brooks, R., Brynjolfsson, E., Calo, R., Etzioni, O., Hager, G., Hirschberg,
J., Kalyanakrishnan, S., Kamar, E., Kraus, S., et al.: Artificial intelligence and life
in 2030. Tech. rep., Stanford University (Sep 2016)
54. Stuhlsatz, A., Lippel, J., Zielke, T.: Feature extraction with deep neural networks
by a generalized discriminant analysis. IEEE Transactions on Neural Networks and
Learning Systems 23(4), 596–608 (apr 2012)
55. Sutton, R.S., Barto, A.G.: Reinforcement Learning, An Introduction. The MIT
Press, 2nd edn. (2018)
56. Tassey, G.: Standardization in technology-based markets. Research Policy 29(4-5),
587–602 (apr 2000).
57. Turing, A.M.: Computing machinery and intelligence. Mind LIX(236), 433–460
(oct 1950).
58. UK-Government: The pathway to driverless cars: A code of practice for
testing. Tech. rep., Department for Transport, Great Minster House, 33
Horseferry Road, London (2015),
59. Wang, M., Deng, W.: Deep face recognition: A survey. CoRR abs/1804.06655
60. Weiss, H.: IBM verwettet seine zukunft auf cognitive comput-
ing. Computerwoche (Oct 2015),
ibm-verwettet-seine-zukunft- auf-cognitive-computing, 3218187
61. Yu, B., Kumbier, K.: Artificial intelligence and statistics. Frontiers of Information
Technology & Electronic Engineering 19(1), 6–9 (jan 2018)
ResearchGate has not been able to resolve any citations for this publication.
Full-text available
We elaborate on the robustness assessment of a deep neural network (DNN) approach to dimensionality reduction for data visualization. The proposed DNN seeks to improve the class separability and compactness in a low-dimensional feature space, which is a natural strategy to obtain well-clustered visualizations. It consists of a DNN-based nonlinear generalization of Fishers linear discriminant analysis and a DNN-based regularizer. Regarding data visualization, a well-regularized DNN guarantees to learn sufficiently similar data visualizations for different sets of samples that represent the data approximately equally well. Such a robustness against fluctuations in the data is essential for many real-world applications. Our results show that the combined DNN is considerably more robust than the generalized discriminant analysis alone. We further support this conclusion by examining feature representations from four comparative approaches. As a means of measuring the structural dissimilarity between different feature representations, we propose a hierarchical cluster analysis.
Full-text available
In AI, the algorithm is not coded but produced by a combination of training data, labelling (concepts) and the neural network. This is the essence of machine learning. The algorithm is not directly insightful and cannot be bug-fixed directly: it is “black box development”.
Full-text available
The software tester is an imperative component to quality software development. Their role has transformed over the last half a century and volumes of work have documented various approaches, methods, and skillsets to be used in that time. Software projects have gone from using monolithic architectures and heavyweight methodologies, to service-oriented and lightweight. Testing has transformed from a sequential step performed by dedicated testers to a continuous activity carried out by various development professionals. Technological advancements have pushed automation into routine test tasks permitting a change of focus for the tester. Management styles and methodologies have pushed development to be agile and lean, towards continuous integration and frequent release. Regardless of these many important changes, the software tester’s role remains the verification and validation of software code.
Full-text available
This article reports the findings of AI4People, an Atomium—EISMD initiative designed to lay the foundations for a “Good AI Society”. We introduce the core opportunities and risks of AI for society; present a synthesis of five ethical principles that should undergird its development and adoption; and offer 20 concrete recommendations—to assess, to develop, to incentivise, and to support good AI—which in some cases may be undertaken directly by national or supranational policy makers, while in others may be led by other stakeholders. If adopted, these recommendations would serve as a firm foundation for the establishment of a Good AI Society.
Full-text available
Computer vision systems are increasingly present on roadways, both on the roadside and on board vehicles. Image features are an essential building block for computer vision algorithms in a road environment. Eight of the most representative image features in a road environment are selected on the basis of a literature review, and their robustness in rainy conditions is evaluated. In order to do this, a new evaluation method is proposed, which is then applied to a new weather-image database. This database contains rain typical of latitudes with a temperate climate (\(0{-}30\, \mathrm{mm}\,\mathrm{h}^{-1} \)), various camera settings and images with natural rain and images with digitally simulated rain. Image features based on pixel intensity and those that use vertical edges are sensitive to rainy conditions. Conversely, the Harris feature and features that combine different edge orientations remain robust for rainfall rates of \(0{-}30\, \mathrm{mm}\,\mathrm{h}^{-1} \). The robustness of image features in rainy conditions decreases as the rainfall rate increases. Finally, the image features most sensitive to rain have potential for use in a camera-based rain classification application.
Full-text available
Driven by graphics processing units (GPUs), massive amounts of annotated data and more advanced algorithms, deep learning has recently taken the computer vision community by storm and has benefited real-world applications, including face recognition (FR). Deep FR methods leverage deep networks to learn more discriminative representations, significantly improving the state of the art and surpassing human performance (97.53%). In this paper, we provide a comprehensive survey of deep FR methods, including data, algorithms and scenes. First, we summarize the commonly used datasets for training and testing. Then, the data preprocessing methods are categorized into two classes: "one-to-many augmentation" and "many-to-one normalization". Second, for algorithms, we summarize different network architectures and loss functions used in the state-of-the art methods. Third, we review several scenes in deep FR, such as video FR, 3D FR and cross-age FR. Finally, some potential deficiencies of the current methods and several future directions are highlighted.
Lean processes focus on doing only necessary things in an efficient way. Artificial intelligence and Machine Learning offer new opportunities to optimizing processes. The presented approach demonstrates an improvement of the test process by using Machine Learning as a support tool for test management. The scope is the semi-automation of the selection of regression tests. The proposed lean testing process uses Machine Learning as a supporting machine, while keeping the human test manager in charge of the adequate test case selection.
Artificial intelligence (AI) is intrinsically data-driven. It calls for the application of statistical concepts through human-machine collaboration during generation of data, development of algorithms, and evaluation of results. This paper discusses how such human-machine collaboration can be approached through the statistical concepts of population, question of interest, representativeness of training data, and scrutiny of results (PQRS). The PQRS workflow provides a conceptual framework for integrating statistical ideas with human input into AI products and research. These ideas include experimental design principles of randomization and local control as well as the principle of stability to gain reproducibility and interpretability of algorithms and data results. We discuss the use of these principles in the contexts of self-driving cars, automated medical diagnoses, and examples from the authors' collaborative research.
Recent advances in artificial intelligence are encouraging governments and corporations to deploy AI in high-stakes settings including driving cars autonomously, managing the power grid, trading on stock exchanges, and controlling autonomous weapons systems. Such applications require AI methods to be robust to both the known unknowns (those uncertain aspects of the world about which the computer can reason explicitly) and the unknown unknowns (those aspects of the world that are not captured by the system’s models). This article discusses recent progress in AI and then describes eight ideas related to robustness that are being pursued within the AI research community. While these ideas are a start, we need to devote more attention to the challenges of dealing with the known and unknown unknowns. These issues are fascinating, because they touch on the fundamental question of how finite systems can survive and thrive in a complex and dangerous world