Conference PaperPDF Available

From Active Learning to Dedicated Collaborative Interactive Learning

Authors:

Abstract and Figures

Active learning (AL) is a machine learning paradigm where an active learner has to train a model (e.g., a classifier) which is in principle trained in a supervised way. AL has to be done by means of a data set where a low fraction of samples (also termed data points or observations) are labeled. To obtain labels for the unlabeled samples, the active learner has to ask an oracle (e.g., a human expert) for labels. In most cases, the goal is to maximize some metric assessing the task performance (e.g., the classification accuracy) and to minimize the number of queries at the same time. In this article, we first briefly discuss the state-of-the-art in the field of AL. Then, we propose the concept of dedicated collaborative interactive learning (D-CIL) and describe some research challenges. With D-CIL, we will overcome many of the harsh limitations of current AL. In particular, we envision scenarios where the expert may be wrong for various reasons. There also might be several or even many experts with different expertise who collaborate, the experts may label not only samples but also supply knowledge at a higher level such as rules, and we consider that the labeling costs depend on many conditions. Moreover, human experts may even profit by improving their own knowledge when they get feedback from the active learner.
Content may be subject to copyright.
Please quote as: Calma, A.; Leimeister, J. M.; Lukowicz, P.; Oeste-Reiß, S.;
Reitmaier, T.; Schmidt, A.; Sick, B.; Stumme, G. & Zweig, K. A. (2016): From Active
Learning to Dedicated Collaborative Interactive Learning. In: 4th International
Workshop on Self-Optimisation in Autonomic and Organic Computing Systems
(SAOS), Berlin.
1
From Active Learning to
Dedicated Collaborative Interactive Learning
Adrian Calma, Jan Marco Leimeister, Paul Lukowicz, Sarah Oeste-Reiß, Tobias Reitmaier,
Albrecht Schmidt§, Bernhard Sick, Gerd Stummeand Katharina Anna Zweigk
Intelligent Embedded Systems, University of Kassel, Germany, Email: {adrian.calma|tobias.reitmaier|bsick}@uni-kassel.de
Information Systems, University of Kassel, Germany, Email: {leimeister|oeste-reiss}@uni-kassel.de
Embedded Intelligence, DFKI, Kaiserslautern, Germany, Email: paul.lukowicz@dfki.de
§Human-Computer Interaction, University of Stuttgart, Germany, Email: albrecht.schmidt@vis.uni-stuttgart.de
Knowledge and Data Engineering, University of Kassel, Germany, Email: stumme@cs.uni-kassel.de
kUniversity of Kaiserslautern, Germany, Email: zweig@cs.uni-kl.de
F
Abstract—Active learning (AL) is a machine learning paradigm where
an active learner has to train a model (e.g., a classifier) which is in
principle trained in a supervised way. AL has to be done by means of
a data set where a low fraction of samples (also termed data points or
observations) are labeled. To obtain labels for the unlabeled samples,
the active learner has to ask an oracle (e.g., a human expert) for labels.
In most cases, the goal is to maximize some metric assessing the
task performance (e.g., the classification accuracy) and to minimize the
number of queries at the same time. In this article, we first briefly discuss
the state-of-the-art in the field of AL. Then, we propose the concept of
dedicated collaborative interactive learning (D-CIL) and describe some
research challenges. With D-CIL, we will overcome many of the harsh
limitations of current AL. In particular, we envision scenarios where the
expert may be wrong for various reasons. There also might be several or
even many experts with different expertise who collaborate, the experts
may label not only samples but also supply knowledge at a higher level
such as rules, and we consider that the labeling costs depend on many
conditions. Moreover, human experts may even profit by improving their
own knowledge when they get feedback from the active learner.
1 INTRODUCTION
Machine learning is based on sample data. Sometimes, these
data are labeled and, thus, models to solve a certain problem
(e.g., a classification or regression problem) can be built
using targets assigned to input data of the model. In other
cases, data are unlabeled (e.g., for clustering problems)
or only partially labeled. Correspondingly, we distinguish
the areas of supervised, unsupervised, and semi-supervised
learning. In many application areas (e.g., industrial quality
monitoring processes, intrusion detection in computer net-
works, speech recognition, or drug discovery) it is rather
easy to collect unlabeled data, but quite difficult, time-
consuming, or expensive to gather the corresponding tar-
gets. That is, labeling is in principal possible, but the costs
may be enormous.
This article focuses on a substantial advancement of
active learning (AL), a machine learning paradigm which is
related to semi-supervised learning.
AL starts with an initially unlabeled or very sparsely
labeled set of samples and iteratively increases the la-
beled fraction of the training data set by “asking the right
questions”. These questions are answered by humans (e.g.,
experts in an application domain), by simulation systems,
by means of real experiments, etc., often modeled by an
abstract “oracle”. Basically, the “idealized” goal of AL is to
obtain a model (e.g., a classifier or a regression model) with
(almost) the performance of a model trained with a fully
labeled data set at (almost) the cost of an unlabeled data set.
In the following, the framework consisting of a knowledge
model with machine learning techniques, pools of unlabeled
and (when available) labeled data, and a unit that selects
unlabeled samples for queries and controls the training of
the model will be referred to as active learner.
Often, the following assumptions are made in AL:
The labeling process starts with an initially labeled
set of samples and assumes well-defined learning
tasks (e.g., the number of classes is given in advance).
The oracle labels single samples or sets of samples
(called queries depending on the AL type, see Sec-
tion 2) presented by an active learner.
The oracle is omniscient and omnipresent, i.e., it
always delivers the correct answers and it is always
available.
The labeling costs for all samples are identical.
These assumptions impose severe limitations for many
applications. For this reason, the following key challenges
regarding an extension of AL can be identified:
Challenge 1: An expert may be (more or less) wrong
for various reasons, e.g., depending on her/his experience
in the application domain (we still assume we have no
malicious or deceptive experts that cheat or attack the active
learner).
Challenge 2: There might be several or even many ex-
perts with different expertise (e.g., different degree or kind
of experience) who may collaborate to provide the active
learner with labels.
Challenge 3: The experts may label not only samples but
also other kinds of queries to provide knowledge at a higher
level (e.g., by assigning a conclusion to a presented premise
of a rule).
2
Challenge 4: The labeling costs depend on many con-
ditions, e.g., whether samples or rules are labeled, on the
location of samples in the input space of a model (i.e., mak-
ing labeling more or less difficult), the degree of expertise of
a human, etc.
Challenge 5: The experts want to benefit from the active
learner by receiving feedback in order to improve their own
knowledge.
Challenge 6: The learning task may require a “lifelong”
learning of the system (e.g., if the process or environment
from which the measured data originate is time-variant).
Moreover, there may be several tasks that have to be
fulfilled at the same time (e.g., movies that are assessed
regarding several criteria) and different kinds of information
sources (e.g., human experts and simulation systems).
The above challenges 1 to 6 will be discussed in more
detail in this article.
We envision dedicated collaborative interactive learning (D-
CIL) approaches where the above limitations no longer hold.
That is, we will develop future AL processes that are
interactive in the sense that there is an information
flow not only from humans to the active learner but
also vice versa and not only in the form of labels but
in various, more complex ways,
collaborative in the sense that various experts collab-
orate to support the active learner with information,
and
dedicated in the sense that the learning process is
clearly defined (such as, e.g., in an industrial quality
monitoring process), the group of human experts is
rather small and the collaborate over a longer period
of time.
As an example for a D-CIL application, consider an
industrial quality monitoring problem we addressed some
years ago [1]: At a last stage of a silicon wafer fabrication
process the wafers have to be checked for possible defects
by means of visual inspection. Anomalies such as abrasions,
cracks, scratches, or dust particles must be identified in
images of wafers taken under different lightning conditions
in order to sort out unusable wafers. Conspicuous regions
on a wafer can rather easily be detected using appropriate
image processing techniques. The classification of these
regions, however, is rather difficult. Human experts often
fail, they disagree, or their assessment criteria vary over
time, depending on parameters such as fatigue, motivation,
experience, etc. that may not be known in detail. How
can such a classification process be automated using, e.g.,
features computed from the images such as the length-width
ratio describing a conspicuous region? It is cheap to obtain
a large amount of images of conspicuous regions, but time-
consuming and error-prone to get the corresponding labels.
The solution could be a D-CIL approach as sketched above.
The field of AL has awoken the interest of many com-
panies, such as Microsoft, IBM, Siemens, AT&T, Mitsubishi,
or Yahoo. Publications of those companies show that AL
can be successfully utilized to solve problems such as in
text classification [2], detecting and filtering abusive user-
generated content on the Web [3], sentiment analysis of
texts [4], speech recognition [5], [6], image classification [7],
drug design [8], [9], detection of plant diseases [10], malware
detection [11], or recommender systems [12], [13].
Altogether, we can be sure that there will also be an
increasing interest in AL and, as many limitations of AL
are abolished, in D-CIL, too. We even believe that many
problems arising in the field of Big Data may be solved
relying on D-CIL approaches. D-CIL techniques may also
advance more technical fields such as the field of self-
organizing and adaptive systems by increasing their degree
of autonomy in learning tasks. D-CIL may even be seen
as a first step towards CIL in open-ended environments,
an approach we call opportunistic collaborative interactive
learning (O-CIL). There, many technical devices (e.g., open,
heterogeneous, dynamic systems such as mobile devices,
e.g., smartphones) will interact in the sense sketched above
by actively collecting information from other devices, from
humans, or from the Internet, for instance. Although we
focus on D-CIL in this article, we will briefly outline O-CIL
in Section 5.
In the remainder of this article, we first present some
foundations of AL in Section 2 and define D-CIL in Sec-
tion 3. In Section 4 we investigate the above challenges in
more detail and briefly discuss possible solutions. Finally,
Section 5 concludes the article by taking a look at possible
application fields and at O-CIL.
2 OVERVIEW OF ACTIVE LEARNING FOUN DATIO NS
The motivation of AL is that obtaining plenty of unlabeled
data is often quite cheap, while acquiring labels is a task
with high costs (monetary or temporal). AL is based on the
hypothesis that a process of (iteratively) asking an oracle for
labels and refining the current model can be realized in a
way such that
the performance of the resulting model is compara-
ble to the performance of a model trained on a fully
labeled data set and
the overall labeling costs to obtain the final model
are much lower (typically simply measured by the
number of labels).
Actually, to address the previous requirements it is possible
to build an active learner that is based on a complementary
pair of model (e.g., a classifier) and selection strategy. With
a selection strategy, the active learner decides whether a
sample is informative and asks the oracle for labels. Here,
informative means that the active learner expects a (high)
performance gain if this sample is labeled (similarly, a set of
samples can also be called informative).
Basically, various kinds of models can be used for AL,
but the selection strategy should always be defined de-
pending on the model type (e.g., whether support vector
machines, neural networks, probabilistic classifiers, or deci-
sion trees are chosen to solve a classification problem). AL
can be used for classification problems (e.g., [14], [15], [16],
[17]), to modify the results of clustering (e.g., [18]), to solve
regression problems (e.g., [19], [20], [21], [22]), or for feature
selection (e.g., [8], [9]).
In the field of active learning (AL), membership query
learning (MQL) [23], stream-based active learning (SAL) [24],
3
and pool-based active learning (PAL) [25] are the most impor-
tant paradigms (see Figure 1a).
In an MQL scenario, the active learner may query labels
for any sample in the input space, including samples gen-
erated by the active learner itself. Lang and Baum [26], for
example, describe an MQL scenario with human oracles to
classify written digits. The queries generated by the active
learner turned out to be some mixtures of digits, therefore
being too difficult for a human to provide reliable answers.
An alternative to MQL is SAL, which assumes that
obtaining unlabeled samples generates low or no costs.
Therefore, a sample is drawn from the data source and the
active learner decides whether or not to request label infor-
mation. In SAL the source data is scanned sequentially and
a decision is made for each sample individually. Typically,
SAL selects only one sample in each learning cycle.
For many practical problems a large set of unlabeled
samples may be gathered inexpensively and this set is
available at the very beginning of the AL process. This
motivates the PAL scenario. The learning cycle of PAL is
depicted in Figure 1b. Typically, PAL starts with a large
pool of unlabeled and a small set of labeled samples. On
the basis of the labeled samples the knowledge model (e.g.,
a classifier) is trained. Then, based on a selection strategy,
which considers the “knowledge” of the active learner, a
query set of unlabeled samples is determined and presented
to the oracle (e.g., a human domain expert), who provides
the label information. The set of labeled samples is updated
with the newly labeled samples and the learner updates
its knowledge. The learning cycle is repeated until a given
stopping condition is met.
In the remainder of this article we focus on PAL for
classification problems. This is only done to simplify the
discussion of the challenges. Basically, MQL and SAL suffer
from the same limitations and they will benefit from D-CIL.
Also, many of the solution ideas for classification problems
may be transferred to other kinds of problems such as
regression.
A selection strategy for PAL has to fulfill several tasks,
two of which shall be given as an example: At an early stage
of the AL process, samples have to be chosen in all regions
of the input space covered by data (exploration phase). At a
late stage of the AL process, a fine-tuning of the decision
boundary of the classifier has to be realized by choosing
samples close to the (current) decision boundary (exploita-
tion phase). Thus, “asking he right question” (i.e., choosing
samples for a query) is a multi-faceted problem and various
selection strategies have been proposed and investigated.
We want to emphasize that a successful selection strategy
has to consider structure in the (un-)labeled data.
Typically, very limiting assumptions (cf. Section 1) are
made concerning the oracle and the labeling costs (omni-
scient, omnipresent oracle that labels samples on a fixed cost
basis). Moreover, some other aspects of real-world problems
are often more or less neglected by current research:
In real-world applications, AL has often to start
“from scratch”, i.e., with no labels at all. This requires
sophisticated selection strategies with different be-
haviors at different phases of the AL process.
Parameters of the active learner (including parameter
of training algorithms for the classifier and the selec-
Data
Source xActive Learner decides
to query or not
Active Learner
generates a query
Active Learner
selects the best query
Membership Query Learning
Stream-based Active Learning
Pool-based Active Learning
Pool
xOracle
sampling
of one sample
sampling of a large
pool of samples
(a) Main AL paradigms.
Oracle
Knowledge
Model
Pool Labeled
Set
nN
Sigewählt mittels
Selektionsstrategie
hx1,?i
.
.
.
hxN,?i
hx1,y1i
.
.
.
hxn,yni
Strategy selects
Query Set
Labeled
Query Set
update
C
y
c
l
e
i
+
1
Active Learner
b
r
e
a
k
?
(b) Learning cycle of standard PAL.
Fig. 1. Overview of main AL scenarios with focus on PAL.
tion strategy) cannot be found by trial-and-error. AL
only allows for “one shot”.
There are a number of articles that assess the state-of-
the-art in AL:
A general introduction to AL, including a discussion
of AL scenarios and an overview of query strategies
is provided in [27].
A detailed overview of relevant PAL techniques is
part of [14]. In addition to single-view/single-learner
methods, alternative approaches are outlined: multi-
view/single-learner, single-view/multi-learner, and
multi-view/multi-learner.
For certain problem areas it makes sense to use
AL in combination with semi-supervised learning
(SSL). AL techniques that integrate SSL techniques
are presented in [15].
Work that uses AL in combination with support vec-
tor machines (SVM) for solving classification prob-
lems is summarized in [28], [29].
3 CHARACTERIZATION OF DEDICATED COL LAB O-
RATIVE INTERACTIVE LEARNING
In this section we describe our vision of future AL that we
call dedicated collaborative interactive learning (D-CIL).
To overcome the unrealistic limitations made by con-
ventional AL techniques (cf. Sections 1 and 2) we have to
integrate multiple “uncertain oracles” (e.g., human domain
experts) into the AL process (see Figure 2). That is, these
oracles possibly make errors due to various causes, e.g., they
are differently experienced in coping with the learning task,
4
or their work quality depends on daily condition, moti-
vation, etc. Therefore, D-CIL explicitly models information
uncertainty, i.e. uncertainty regarding samples, labels, or
parametrization of models. This uncertainty is then taken
into account when either (1) the expertise of the human
domain experts has to be identified or (2) their knowledge
is required to provide labels.
Uncertain
Oracle
Uncertain
Oracle
Uncertain
Oracle
Classifier
Pool Labeled
Set
nN
Sigewählt mittels
Selektionsstrategie
hx1,?i
.
.
.
hxN,?i
hx1,y1i
.
.
.
hxn,yni
Strategy selects
Query Set
Labeled
Query Set
update
C
y
c
l
e
i
+
1
Feedback
Active Learner
b
r
e
a
k
?
Fig. 2. Learning cycle of D-CIL with multiple collaborating uncertain
oracles (human experts, simulation systems, etc.).
In real-world applications D-CIL has to start “from
scratch”, e.g., without any label information. We assume
that, during the AL process more and more (uncertain)
labels become available, but no “ground truth”. Therefore,
the collaboration of various human experts will be essential
for the success of D-CIL. Experts not only collaborate with
the active learner. Other kinds of collaboration could be
the mutual support of two or more experts, according to
the idea of pair programming, in order to achieve higher
accuracies in solving the learning task. These collaboration
processes are indicated by the dotted, blue arrows in Fig-
ure 2.
D-CIL integrates multiple experts into the AL process
and models their expertise explicitly. Consequently, D-CIL
needs more sophisticated selection strategies than those
used in conventional AL. In D-CIL, the selection strategy has
not only to choose the most informative samples (from the
pool of unlabeled samples) considering the current knowl-
edge of the model (here, a classifier) in order to build a query
set in each AL cycle, but also to decide which experts shall
be queried depending on their expertise. That is, we have
an exploration/exploitation problem again. In addition, in
D-CIL we query not only samples but also knowledge at
higher abstraction levels (e.g. premises of rules), such that
more sophisticated cost schemes are needed, too.
Particularly, D-CIL differs from conventional AL in the
fact that D-CIL gives targeted feedback (cf. the dashed, red
arrows in Figure 2) which will improve the experts’ level
of expertise. To emphasize this difference, we use the term
interactive learning. Therefore, the main goal of D-CIL is to
maximize the accuracy of the actively trained classifier and
to maximize the benefit of the human domain experts with
minimal costs.
We assume that the humans who collaborate to solve a
learning task are actually have the knowledge about that
specific learning task (e.g., a certain industrial problem). We
use the term “expert” to emphasize this fact. These experts
are assumed to be motivated to collaborate over a longer
time period, i.e., they are regarded as being dedicated to the
learning task. The learning task itself may be time-variant,
i.e., it may change its characteristics over time. An example
are new classes that have to be detected or classes that are
not relevant any more (resulting in novel or obsolete clusters
of samples in the input space of a classifier). So, we need
online learning techniques to solve such problems.
Altogether, D-CIL targets a specific class of applications
where we may assume that the following assumptions
hold: We have rather small, quite homogeneous groups of
experts that collaborate over a longer period of time to solve
a specific application problem. Though we act on these
assumptions for D-CIL, we will abandon them for O-CIL
which will be sketched in Section 5.
4 CHALLENGES FOR FUTURE D-CIL RESEARCH
In the field of D-CIL, we will answer many questions, most
of which caused by the harsh limitations of AL sketched
in Section 1. In the following, we examine the six key
challenges (see Section 1).
4.1 Challenge 1: Uncertain Oracles
In a first step, we address the obvious fact that oracles
are not always right. In principal, labels are subject to
uncertainty. Here, the meaning of the term uncertainty is
adopted from [30]. That is, “uncertain” is a generic term to
address aspects such as “unlikely”, “doubtful”, “implausi-
ble”, “unreliable”, “imprecise”, “inconsistent”, or “vague”.
In real-world applications, labels may come from var-
ious sources, often but not always humans. Therefore,
a new problem arises: The labels are subject to un-
certainty for different reasons. For example, the perfor-
mance of human annotators depends on many factors:
e.g., expertise/experience, concentration/distraction, bore-
dom/disinterest, fatigue level, etc. Furthermore, some sam-
ples are difficult for both experts and machines to label (e.g.,
samples near the decision boundary of a classifier). Results
of real experiments or simulations may be influenced, too:
There may be stochasticity which is inherent to a certain
process, sensor noise, transmission errors, etc., just to men-
tion a few. Thus, we face many questions: How can we make
use of uncertain oracles (annotators that can be erroneous)?
How do we decide whether an already queried sample has
to be labeled again? How do we deal with noisy experts
whose quality varies over time (e.g., they gather experience
with the task, they get fatigued)? How does remuneration
influence the labeling quality of a noisy expert (e.g., if
they are payed better, they are more accurate)? How can
we decide whether the expert is erroneous or an observed
process itself is nondeterministic?
As a starting point, we may assume that the “expertise
of an expert” (i.e., the degree of uncertainty of an oracle)
is time-invariant and global in the sense that it does not
depend on certain classes, certain regions of the input space
of the model to be learned (e.g., a classifier), etc. Then, we
may ask experts for, e.g.,
one class label with a degree of confidence,
5
membership probabilities for each class (with or
without confidence labels),
lower bounds for membership probabilities (cf. [31]),
a difficulty estimate for a data object that is labeled,
or
relative difficulty estimates for two data objects
(“easier” or “more difficult” to label).
Then, we have to define appropriate ways to model that
uncertainty (e.g., second-order distributions over parame-
ters of class distributions in a probabilistic framework) and
to consider it in selection strategies (e.g., with additional
criteria) and for the training of a classifier (e.g., with gradual
labels).
4.2 Challenge 2: Multiple Uncertain Oracles
In a second step, we address situations where several, indi-
vidually uncertain oracles (e.g., several human experts with
different degree of expertise) contribute their knowledge.
Thus, the learning process will now rely on the collective
intelligence of a group of oracles. We see this step as a first
important step towards true collaboration between human
experts to support such a learning process.
In various applications, different, uncertain oracles may
contribute labels (cf. Figure 2). These experts may not only
have different degrees of expertise. They also may have
more or less expertise for different parts of the problem
that has to be solved, e.g., for different classes that have
to be recognized, for different regions of the input space,
for different dimensions of the input space (attributes), etc.
Also, experts collaborate with others, which stimulates a
learning from others and results in a knowledge gain for
the expert. Now, we face many new questions: What are
appropriate mechanisms to identify the expertise of the
human expert? Which are the criteria for identifying the
“optimal” human expert? Which experts should collaborate
with each other in a labeling process in order to constitute
a high-performance group? How can exploration (identify-
ing expertise) and exploitation phases (using the experts’
knowledge) be interwoven? How can we merge uncertain
information obtained from several experts? How can this
process be designed in order to be independent of time
and place (e.g., for experts are only available on a part time
basis)?
As a starting point, we may initially assume that the
“expertise of an expert” is known. We may use generative,
probabilistic models, for example, to describe the individual
knowledge of experts and the “global” knowledge of the
active learner (cf. [14], [15]). Uncertainty may again be cap-
tured with second-order approaches. New selection strate-
gies must then not only choose samples, but also oracles. If
the expertise of an expert is not known, it must be revealed
either by asking for difficulty or confidence estimates or by
comparing it to the knowledge of others (e.g., by asking
an expert who has to be assessed some questions with
already known answers). In order to explore solutions to
challenge 2, we may not only rely on real experiments with
humans (cf. the field of crowdsourcing, for instance). In
addition, we may also be confronted with the problem of
simulation: We have to simulate several uncertain oracles
with the different characteristics mentioned above.
4.3 Challenge 3: Alternative Query Types
If we have to explore the knowledge of oracles as sketched
above, the costs of AL increase substantially. In the other
hand, we might ask oracles such as human experts for more
abstract knowledge with the goal to reduce the number of
queries this way.
In many applications, active learners could ask for more
“valuable” knowledge. Examples are conclusions that a
human expert gives for a presented rule premise, or correla-
tions between different features or features and classes that
an expert provides in order to identify important or redun-
dant features. Questions that arise in this context are: Which
questions can be asked? How can we provide (i.e., visualize,
for instance) the required information to the expert? How
can we combine different kinds of expert statements, e.g.,
about samples, rules, relations between features, etc? How
can we use this information to initialize the models that are
trained or to restrict the model capabilities in an appropriate
way (e.g., if features are known not to be correlated?
+
++
low high
lowhigh
x1
x2
x3
p(x3|i= 1)
x3
p(x3|i= 2)
x3
p(x3|i= 3)
Fig. 3. Asking for conclusions of rule premises.
As a starting point, we could investigate the case of
annotating rule premises with conclusions. To stay in a
probabilistic framework we could obtain user-readable rule
premises by marginalization of density functions from a
generative process model. Figure 3 gives an example for
a density model consisting of three components in a three
dimensional input space. The first two dimensions x1and
x2are continuous and, thus, modeled by bivariate Gaus-
sians whose centers are described by larger crosses (+). The
ellipses are level curves (surfaces of constant density) with
shapes defined by the covariance matrices of the Gaussians.
Here, due to the diagonality of the covariance matrices these
ellipses are axes-oriented and their projection onto the axes
is also shown. The third dimension x3is categorical with
categories A (red), B (green), and C (blue). The distributions
of the third dimension x3are illustrated by the histograms
next to every component. Here, only categories with a
probability strictly greater than the average are considered
in rules in order to simplify the resulting rules. We assume
that the components modeling sets of circles (green) and
6
crosses (red) are already labeled, resulting in two rules for
these components:
if x1is low and x2is high and x3is A or B
then class = red,
if x1is high and x2is high and x3is C
then class = green.
Now, the active learner presents the following rule premise
and asks for a conclusion in form of a class assignment:
x1is high and x2is low and x3is B.
This information could then be used to (re-)train a classifier,
e.g., in a transductive learning step.
To investigate relations between features, i.e., between
input dimensions of a classifier, we may rely on statistical
measures, but also adopt ideas from the field of concept
exploration (cf. the field of attribute exploration in [32]).
4.4 Challenge 4: Complex Cost Schemes for Queries
In many real-world applications obtaining information may
be possible at different costs, e.g., some class information
is more expensive than other or the labeling costs depend
on the location of the sample in the input space. This
already applies to a “conventional” AL setting without the
many ideas discussed above. In a D-CIL setting, considering
complex cost schemes is even more important.
We must consider costs that depend on
1) samples with their classes: As mentioned above, la-
beling costs may depend on the class (e.g., some
kinds of error classes in an industrial production
process may be more difficult to detect than others)
or on the location of the sample in the input space
(e.g., samples close to the decision boundary require
higher temporal effort), for instance.
2) query types: It is obvious that different labeling costs
have to be foreseen for samples (with or without
certainty estimates) and for more complex queries
such as rule premises. The cost schemes have to be
even more detailed in a D-CIL setting with feedback
to the humans (e.g., with queries such as “Can you
confirm that ...?”).
3) oracles (experts): The costs of humans may depend on
their expertise, their temporal effort, their availabil-
ity (e.g., working hour may be modeled with finite
costs, otherwise costs are infinite), etc.
In principle, all these costs may change over time, too. The
basic questions in this context are: How can a cost schema be
defined and which different types are existent? How should
compensation mechanisms for the differentiated expertise of
a human be designed? How can these compensation mech-
anisms be implemented? How must the selection strategies
of an active learner be adapted?
As a starting point, we suggest to choose the first point
from the list above and investigate solutions in a “classical”
AL setting. Then D-CIL requires solutions for the second
and third points, respectively. Mechanisms of crowdsourc-
ing will provide additional insights. On the one hand,
differentiated compensation mechanisms can be realized if
a task with defined costs can be outsourced to the crowd
[33]. On the other hand, the definition of the task requires
additional research in the field of crowdsourcing.
4.5 Challenge 5: True Collaboration of Human Experts
and Interaction with an Active Learner
In a next step we must pave the way for a true collaboration
of human experts in AL, which will essentially be based on
the capability of humans to learn and the ability of the active
learner to provide appropriate feedback to the humans to
enable them to learn themselves. Then, the new technique
actually deserves to be called D-CIL.
In many applications, experts would be interested in
getting feedback from an active learner, in improving their
own knowledge, and sharing their expertise with others. As
an important requirement, the active learner must be able
to give feedback to the humans and asking for comments
on such feedback. Some possible kinds of interactions of an
active learner with humans are (cf. also [34]):
The following rule appears to be very certain be-
cause ... !
The following rule is in conflict with your knowl-
edge because ... !
Other experts are much less uncertain concerning
the following rule than you are ... !
Can you confirm the following rule ... ?
Can you confirm that the following two features
are not correlated ... ?
Can you confirm that the following feature is very
important ... ?
Can you provide additional samples for the follow-
ing input regions of the classifier ... ?
Some of the many new questions arising with this challenge
are: How can we deal with time-invariant knowledge of
oracles? Which information should be provided and how
(e.g., with/without certainty estimates, restriction to “crisp”
rules or not)? How must we adapt the active learner and
the selection strategies? In particular, a compromise has to
be found between modeling capabilities on the one hand
and the abilities of humans to actually understand readable
rules on the other. How do human experts change their
behavior if they get feedback? How do human experts
cooperate when they use a D-CIL system? How can an
explicit “pooling” of experts in teams be realized? May
we suggest solutions to experts? How can we realize a
review mechanism for answers of experts? When and how
can human experts be recruited? How can we measure the
benefit of human experts or groups of experts?
As a starting point, we may stay within our proba-
bilistic framework, consider the individual knowledge of
humans (challenge 2) and present samples and rules (e.g.,
obtained by marginalization from density models to make
them human-readable as sketched above, challenge 3) with
fused statements (labels or conclusions) and certainty esti-
mates. Then, the time-variance of human knowledge must
be considered by extending the solutions from challenge 2.
Altogether, the collaboration activities between humans and
between humans and the active learner need to be designed
in a structured and re-usable way [35], [36]. Again, the eval-
uation of any new, proposed techniques will be a challenge
by itself.
7
4.6 Challenge 6: Online Learning for Time-Variant
Learning Tasks
Above, we have sketched D-CIL which takes place in a
time-variant environment in the sense that the knowledge
of experts improves over time. But, the observed and mod-
eled processes could be time-variant, too. That is, these
processes may change slightly (e.g., due to increased wear
of mechanical parts of an observed process), become ob-
solete, or new processes corresponding to known or to
new, previously unknown, classes may arise during the
application of the model. Then, a major challenge consists
in developing online D-CIL techniques that cope with such
effects. Altogether, we can say that in essence we have to
to solve a learning task which changes over time, where
we only have partial knowledge, and where knowledge is
uncertain. That is, from the viewpoint of the humans and
the active learner “lifelong” learning may be needed.
Questions that come up when we address this challenge
are, for example: How can changes in the characteristics of
the processes underlying the observed data be detected?
How can new classes be considered online? How can we
efficiently and effectively integrate the human experts in the
process of detection and modeling?
As a starting point, we may adapt techniques from the
fields of anomaly or novelty detection, obsoleteness detec-
tion, detection of concept drift or shift, or online clustering.
However, these techniques are typically intended to work
in a fully autonomous way, but we may again integrate
the knowledge of human experts, for instance, to improve
these techniques. Also, we may take a look at some existing
multiple learner / multiple expert approaches, and adopt
ideas from the field of SAL.
4.7 Further Challenges
Two additional challenges must be addressed as well:
Stopping Criterion: Currently, the stopping criterion in
real-world applications is based on economic factors, e.g.,
the learner queries samples as long as the budget allows.
The challenge consists in knowing when to stop querying
for labels. One possibility may be to determine the point
at which the cost of querying more labels is higher than
costs for misclassification. Another possibility is to deter-
mine when the learner is at least as good as the group of
annotators. For such a “self-stopping criterion”, the active
learner must be able to assess its own performance.
Performance Assessment: In AL, the performance of
an active learner must be assessed by means of several
criteria to capture effectiveness and efficiency of AL. For
this purpose, we may use a ranked performance measure, a
data utilization measure, the area under the learning curve,
and a class distribution measure (see, e.g., [14], [15]). D-
CIL requires additional measures, e.g., to assess the various
learning costs or to evaluate the learning progress of human
experts.
Apart from these challenges we still face the already dis-
cussed requirements such as “parameter-free” active learn-
ing techniques or self-adaptation of selection strategies to
different phases of the active learning process.
5 SUMMARY AND OUTLOOK
In this article, we have sketched our vision of D-CIL which
will be elaborated in more detail in the near future. In this
novel field, we would like to concentrate on developing
models that take information uncertainty into consideration,
identifying the annotators’ level of expertise, making use of
different levels of expertise and fusing possibly contradict-
ing knowledge, labeling abstract knowledge, and improving
the expertise of the experts. In the envisioned D-CIL sce-
narios, human domain experts should benefit from sharing
their knowledge in the group. They should receive feedback
which will improve their own level of expertise. We assume
that in a D-CIL scenario the number of humans involved
is rather low (e.g., they are specialists for certain industrial
problems), they are more or less motivated, and they con-
tribute their knowledge for a long term. In principal, many
applications may benefit from D-CIL, for example, product
quality control (e.g., deflectometry, classification of errors
on silicon wafers or mirrors, analysis of sewing or garments
in clothing industry), fault detection in technical and other
systems (e.g., analysis of fault memory entries in control
units of cars, analysis of different kinds of errors in cyber-
physical systems, etc.), planing of product development
processes (e.g., in drug design), or fraud detection and
surveillance (e.g., credit card fraud, detection of tax evasion,
intrusion detection, or video surveillance).
www Internet as additional
knowledge source
devices that collaborate
humans that collaborate
and assist devices
in their collaboration
Fig. 4. Idea of Opportunistic CIL (O-CIL).
In our future world, technical systems have to evolve
over time. Not all knowledge about any situation the system
will face at run-time will be available at design-time. That
is, the system has to detect fundamental changes in its
environment and react accordingly. This requires that “nev-
erending” or “lifelong” learning mechanisms have to be im-
plemented into such systems. Amongst other mechanisms
(e.g., context- or self-awareness), these learning mechanisms
will include appropriate active learning techniques. These
future technical systems may be mobile devices, for exam-
ple, that actively collect data and other kinds of information
from other devices, humans (who are often non-experts
in a field), or the Internet (e.g., from social networks),
cf. Figure 4. These active learning processes comprise large
(e.g., thousands), open (participants may leave or others
8
may enter), and heterogeneous (e.g., different types of de-
vices, kinds of knowledge, etc.) groups of “participants”.
The data that are labeled may include video, audio, text, or
image data for instance. Also new kinds of human-computer
interaction may come into play [37]. Each active learner
built into such a future system has to make best of the
available information, i.e., it has to act in an “opportunistic”
way (cf. [38]). This requires an extension of D-CIL to O-
CIL (opportunistic collaborative interactive learning), i.e., AL
in open-ended environments, and also new techniques to
model and analyze AL in such groups (cf., e.g. [39]).
REFERENCES
[1] M. Bauer, O. Buchtala, T. Horeis, R. Kern, B. Sick, and R. Wagner,
“Technical data mining with evolutionary radial basis function
classifiers,” Applied Soft Computing, vol. 9, no. 2, pp. 765–774, 2009.
[2] U. Paquet, J. V. Gael, D. Stern, G. Kasneci, R. Herbrich, and
T. Graepel, “Vuvuzelas & active learning for online classification,”
in Workshop on Computational Social Science and the Wisdom of
Crowds, Whistler, BC, 2010, pp. 1 5.
[3] W. Chu, M. Zinkevich, L. Li, A. Thomas, and B. L. Tseng, “Un-
biased online active learning in data streams,” in Int. Conf. on
Knowledge Discovery and Data Mining, San Diego, CA, 2011, pp.
195–203.
[4] P. Melville and V. Sindhwani, “Active dual supervision: Reducing
the cost of annotating examples and features,” in Workshop on
Active Learning for Natural Language Processing, Boulder, CO, 2009,
pp. 49–57.
[5] D. Hakkani-T ¨
ur, G. Riccardi, and G. Tur, “An active approach
to spoken language processing,” ACM Transactions on Speech and
Language Processing, vol. 3, no. 3, pp. 1–31, 2006.
[6] G. Tur, R. E. Schapire, and D. Hakkani-T ¨
ur, “Active learning for
spoken language understanding,” in Int. Conf. on Acoustics, Speech,
and Signal Processing, Hong Kong, China, 2003, pp. 276–279.
[7] A. J. Joshi, F. Porikli, and N. P. Papanikolopoulos, “Scalable active
learning for multi-class image classification,” IEEE Transactions on
Pattern Analysis and Machine Intelligence, vol. 34, no. 11, pp. 2259–
2273, 2012.
[8] R. F. Murphy, “An active role for machine learning in drug
development,” Nature Chemical Biology, vol. 7, pp. 327–330, 2011.
[9] J. D. Kangas, A. W. Naik, and R. F. Murphy, “Efficient discovery of
responses of proteins to compounds using active learning,” BMC
Bioinformatics, vol. 15, no. 143, pp. 1–11, 2014.
[10] P. Schmitter, J. Behmann, J. Steinr ¨
ucken, A.-K. Mahlein,
E.-C. Oerke, and L. Pl ¨
umer, “Aktives Lernen zur Detek-
tion von Pflanzenkrankheiten in hyperspektralen Bildern,” in
Wissenschaftlich-Technische Jahrestagung der DGPF, K ¨
oln, Germany,
2015, pp. 398–406.
[11] N. Nissim, R. Moskovitch, L. Rokach, and Y. Elovici, “Novel active
learning methods for enhanced PC malware detection in Windows
OS,” Expert Systems with Applications, vol. 41, no. 13, pp. 5843–5857,
2014.
[12] B. Lamche, U. Trottmann, and W. W¨
orndl, “Active learning strate-
gies for exploratory mobile recommender systems,” in Workshop
on Context-Awareness in Retrieval and Recommendation, Amsterdam,
Netherlands, 2014, pp. 10–17.
[13] H. Yu, “SVM selective sampling for ranking with application
to data retrieval,” in Int. Conf. on Knowledge Discovery and Data
Mining, Chicago, IL, 2005, pp. 354–363.
[14] T. Reitmaier and B. Sick, “Let us know your decision: Pool-based
active training of a generative classifier with the selection strategy
4DS,” Information Sciences, vol. 230, pp. 106–131, 2013.
[15] T. Reitmaier, A. Calma, and B. Sick, “Transductive active learning
a new semi-supervised learning approach based on iteratively
refined generative models to capture structure in data,” Informa-
tion Sciences, vol. 239, pp. 275–298, 2014.
[16] C. Constantinopoulos and A. C. Likas, “An incremental training
method for the probabilistic RBF network,” IEEE Transactions on
Neural Networks, vol. 17, no. 4, pp. 966–974, 2006.
[17] Y. Zhang, H. Yang, S. Prasad, E. Pasolli, J. Jung, and M. Crawford,
“Ensemble multiple kernel active learning for classification of
multisource remote sensing data,” Selected Topics in Applied Earth
Observations and Remote Sensing, vol. 8, no. 2, pp. 845–858, 2015.
[18] R. Marcacini, G. Correa, and S. Rezende, “An active learning
approach to frequent itemset-based text clustering,” in Int. Conf. on
Pattern Recognition, Tsukuba, Japan, 2012, pp. 3529–3532.
[19] W. Cai, Y. Zhang, and J. Zhou, “Maximizing expected model
change for active learning in regression,” in IEEE Int. Conf. on Data
Mining, Dallas, TX, 2013, pp. 51–60.
[20] E. Pasolli and F. Melgani, “Gaussian process regression within an
active learning scheme,” in IEEE Int. Geoscience and Remote Sensing
Symposium, Vancouver, BC, 2011, pp. 3574–3577.
[21] B. Demir and B. L., “A multiple criteria active learning method for
support vector regression,” Pattern Recognition, vol. 47, no. 7, pp.
2558–2567, 2014.
[22] F. Douak, F. Melgani, and N. Benoudjit, “Kernel ridge regression
with active learning for wind speed prediction,” Applied Energy,
vol. 103, pp. 328–340, 2013.
[23] D. Angluin, “Queries and concept learning,” Machine Learning,
vol. 2, no. 4, pp. 319–342, 1988.
[24] L. Atlas, D. Cohn, R. Ladner, M. A. El-Sharkawi, and R. J. Marks,
II, “Training connectionist networks with queries and selective
sampling,” in Advances in Neural Information Processing Systems 2,
Denver, CO, 1990, pp. 566–573.
[25] D. Lewis and W. A. Gale, “A sequential algorithm for training
text classifiers,” in ACM Conf. on Research and Development in
Information Retrieval, Dublin, Ireland, 1994, pp. 3–12.
[26] K. Lang and E. Baum, “Query learning can work poorly when a
human oracle is used,” in IEEE Int. Joint Conf. on Neural Networks,
Los Alamitos, CA, 1992, pp. 335–340.
[27] B. Settles, “Active learning literature survey,” University of Wis-
consin, Department of Computer Science, Computer Sciences
Technical Report 1648, 2009.
[28] J. Jun and I. Horace, Active Learning with SVM. Hershey, PA: IGI
Global, 2008, vol. 3, ch. 1, pp. 1–7.
[29] J. Kremer, K. S. Pedersen, and C. Igel, “Active learning with
support vector machines,” Wiley Interdisciplinary Reviews. Data
Mining and Knowledge Discovery, vol. 4, no. 4, pp. 313–326, 2014.
[30] A. Motro and P. Smets, Eds., Uncertainty Management in Information
Systems From Needs to Solutions. Springer US, 1997.
[31] D. Andrade and B. Sick, “Lower bound Bayesian networks
efficient inference of lower bounds on probability distributions,”
in Conf. on Uncertainty in Artificial Intelligence, Montreal, QC, 2009,
pp. 10–18.
[32] G. Stumme, “Attribute exploration with background implications
and exceptions,” in Annual Conf. of the Gesellschaft f¨ur Klassifikation.
Springer-Verlag, Heidelberg-Berlin, 1996, pp. 457–469.
[33] S. Zogaj, N. Leicht, B. Ivo, U. Bretschneider, and J. M. Leimeis-
ter, “Towards successful crowdsourcing projects: Evaluating the
implementation of governance mechanisms,” in Int. Conf. on Infor-
mation Systems, Fort Worth, TX, 2015.
[34] T. Horeis and B. Sick, “Collaborative knowledge discovery & data
mining: From knowledge to experience,” in IEEE Symposium on
Computational Intelligence and Data Mining, Honolulu, HI, 2007, pp.
421–428.
[35] J. Leimeister, Collaboration Engineering IT-gest ¨utzte Zusammenar-
beitsprozesse systematisch entwickeln und durchfhren. Berlin Heidel-
berg: Springer Gabler, 2014.
[36] S. Oeste-Reiß, M. S ¨
ollner, and J. M. Leimeister, “Development
of a peer-creation-process to leverage the power of collaborative
knowledge transfer,” in Hawaii Int. Conf. on System Sciences, Kauai,
HI, 2016, not yet published.
[37] A. Schimdt, “Following or leading?: the HCI community and new
interaction technologies,” interactions, no. 22, pp. 74–77, 2015.
[38] D. Roggen, G. Troster, P. Lukowicz, A. Ferscha, J. del R.Millan,
and R. Chavarriaga, “Opportunistic human activity and context
recognition,” Computer, vol. 46, no. 2, pp. 36–45, 2013.
[39] M. Kaufmann and K. Zweig, “Modeling and designing real–world
networks,” in Algorithmics of Large and Complex Networks, J. Lerner,
D. Wagner, and K. Zweig, Eds. Springer Berlin Heidelberg, 2009,
vol. 5515, pp. 359–379.
... Therefore, we will concentrate on describing collaborative learning in the field of AL: collaborative in the sense that different entities (e.g. domain experts, users, or simulation systems) collaborate to assist the learning system with information [12]. Regarding the broad definition, the following question may arise: What does "two or more" and "together" mean? ...
... They solve the task "together", in the sense that there are different ways of interactions between the subjects that assist the learning system and between the subjects and the learning system itself. Moreover, the "learning task" may a be a precisely defined one (e.g. in case of dedicated collaborative interactive learning [12]) or a more general, opportunistic, lifelong learning task (e.g. in case of opportunistic collaborative interactive learning [5]). ...
... Dedicated collaborative interactive learning [12] provides solutions based on the first assumptions, whereas opportunistic collaborative active learning [5] for the latter presented assumptions. ...
... Various concepts have been proposed to overcome the limitations above. These include collaborative interactive learning [36,37] and proactive learning [38,39]. We summarize their main differences to traditional AL through the three following aspects: ...
... The error-proneness of annotators poses a major challenge in real-world AL [36]. In this section, we discuss the typical factors influencing the performance of error-prone annotators. ...
Article
Full-text available
Pool-based active learning (AL) aims to optimize the annotation process (i.e., labeling) as the acquisition of annotations is often time-consuming and therefore expensive. For this purpose, an AL strategy queries annotations intelligently from annotators to train a high-performance classification model at a low annotation cost. Traditional AL strategies operate in an idealized framework. They assume a single, omniscient annotator who never gets tired and charges uniformly regardless of query difficulty. However, in real-world applications, we often face human annotators, e.g., crowd or in-house workers, who make annotation mistakes and can be reluctant to respond if tired or faced with complex queries. Recently, a wide range of novel AL strategies has been proposed to address these issues. They differ in at least one of the following three central aspects from traditional AL: (1) They explicitly consider (multiple) human annotators whose performances can be affected by various factors, such as missing expertise. (2) They generalize the interaction with human annotators by considering different query and annotation types, such as asking an annotator for feedback on an inferred classification rule. (3) They take more complex cost schemes regarding annotations and misclassifications into account. This survey provides an overview of these AL strategies and refers to them as real-world AL. Therefore, we introduce a general real-world AL strategy as part of a learning cycle and use its elements, e.g., the query and annotator selection algorithm, to categorize about 60 real-world AL strategies. Finally, we outline possible directions for future research in the field of AL.
... Novel systems that enable developers to engage domain experts to share their domain-specific knowledge can increase the effectiveness and efficiency of chatbot development (Harms et al. 2019;Huang et al. 2018;Rieser and Lemon 2011). A promising approach to increase the engagement of domain experts is to design interactive development systems that enable the flow of information not only from the user to the system but also vice versa (Amershi et al. 2014;Calma et al. 2016). The interactive features offered by these systems have psychological correlates and empirical evidence has shown that they can shift the attention and cognitive processing capacities towards the system (Sundar et al. 2015). ...
... As a starting point for our design, we reviewed mechanisms that have shown to be successful antecedents for engagement. In this context, interactive systems that enable information flows not only from the user to the system but also vice versa have shown to have a positive impact on engagement (Amershi et al. 2014;Calma et al. 2016). To conceptualize interactivity, we followed the interactivity effects model (Sundar 2012;Sundar et al. 2015) that distinguishes interactivity in the fundamental core components of communication (i.e., source, modality, and message). ...
Conference Paper
Full-text available
Domain experts with their special knowledge and understanding of a specific field are critical in the development of chatbots. However, engaging domain experts in the chatbot development is time-consuming and cumbersome because developers lack adequate systems. We address this problem by proposing three design principles for interactive chatbot development systems grounded in the interactivity effects model. We instantiate the proposed design and evaluate the resulting artifact in an online experiment. The results of the online experiment (N=70) show that the proposed design significantly increases subjective and objective engagement and that perceived interactivity mediates these effects. Our study contributes with prescriptive knowledge for designing interactive systems that increase engagement and with a novel artifact in the form of an interactive chatbot development system.
... Furthermore, the knowledge provided by each expert is subject to uncertainty. For example, the performance of human annotators depends on factors such as level of expertise, experience, and concentration/distraction [47]. Additionally, experts may have more or less knowledge for different parts of the problem that has to be solved (e.g., different classes, different features). ...
... In addition to type of an expert, one other major problem in health monitoring is the subjectivity of the expert's decisions. For example, the performance of human annotators depends on factors such as level of expertise, experience, and concentration/distraction [47]. ...
Article
Full-text available
Mobile health monitoring plays a central role in the future of cyber physical systems (CPS) for healthcare applications. Such monitoring systems need to process user data accurately. Unlike in other human-centered CPS, in healthcare CPS, the user functions in multiple roles all at the same time: as an operator, an actuator, the physical environment and, most importantly, the target that needs to be monitored in the process. Therefore, mobile health CPS devices face highly dynamic settings generally, and accuracy of the machine learning models the devices employ may drop dramatically every time a change in setting happens. Novel learning architecture that specifically address challenges associated with dynamic environments are therefore needed. Using active learning and transfer learning as organizing principles, we propose a collaborative multiple-expert architecture and accompanying algorithms for the design of machine learning models that autonomously adapt to a new configuration, context, or user need. Specifically, our architecture and its constituent algorithms are designed to manage heterogeneous knowledge sources or experts with varying levels of confidence and type while minimizing adaptation cost. Additionally, our framework incorporates a mechanism for collaboration among experts to enrich their knowledge, which in turn decreases both cost and uncertainty of data labeling in future steps. We evaluate the efficacy of the architecture using two publicly available human activity datasets. We attain activity recognition accuracy of over 85 % (for the first dataset) and 92 % (for the second dataset) by labeling only 15 % of unlabeled data.
... As the concentration of annotators may fluctuate or annotators may learn during the annotation process, taking time-varying APs into account is another potential avenue for future research (Donmez et al., 2010). Finally, there are already crowdsourcing approaches (Chang et al., 2017) and concepts (Calma et al., 2016) that support collaboration between annotators. Accordingly, developing techniques that consider or even recommend such collaborations is of practical value (Fang et al., 2012). ...
Preprint
Solving complex classification tasks using deep neural networks typically requires large amounts of annotated data. However, corresponding class labels are noisy when provided by error-prone annotators, e.g., crowd workers. Training standard deep neural networks leads to subpar performances in such multi-annotator supervised learning settings. We address this issue by presenting a probabilistic training framework named multi-annotator deep learning (MaDL). A ground truth and an annotator performance model are jointly trained in an end-to-end learning approach. The ground truth model learns to predict instances' true class labels, while the annotator performance model infers probabilistic estimates of annotators' performances. A modular network architecture enables us to make varying assumptions regarding annotators' performances, e.g., an optional class or instance dependency. Further, we learn annotator embeddings to estimate annotators' densities within a latent space as proxies of their potentially correlated annotations. Together with a weighted loss function, we improve the learning from correlated annotation patterns. In a comprehensive evaluation, we examine three research questions about multi-annotator supervised learning. Our findings indicate MaDL's state-of-the-art performance and robustness against many correlated, spamming annotators.
... Zur Realisierung von hybrider Intelligenz müssen zukünftige soziotechnische Systeme wie Social Machines so gestaltet werden, dass ihre technischen Systeme über 3 Haupteigenschaften verfügen, die sich zusammengefasst als kollaborativ-interaktiv lernend (engl. Collaborative Interactive Learning, CIL) bezeichnen lassen [3,37] [5,8]. Auch Agency (Handlungsfähigkeit) muss in Vernetzung und hybride gedacht werden [21]. ...
Article
Full-text available
Zusammenfassung Social Machines sind ein Paradigma für die Gestaltung soziotechnischer Systeme, die unter Verwendung von Web- und Plattformlösungen das Potenzial digitaler Technologien mit der Eigenlogik sozialer Interaktion, Organisation und Strukturbildung auf neue Weise zusammenführen. Im Folgenden diskutieren wir das Paradigma der Social Machine aus den Perspektiven der Informatik, der Wirtschaftsinformatik, der Soziologie und des Rechts, um Orientierungspunkte für seine Gestaltung zu identifizieren. Der Begriff ist in der Literatur jedoch bisher nicht abschließend definiert sondern nur durch Beispiele illustriert. In diesem Artikel stellen wir zunächst die folgende Definition zur Diskussion: Social Machines sind soziotechnische Systeme, in denen die Prozesse sozialer Interaktion hybrid zwischen menschlichen und maschinellen Akteuren ablaufen und teilweise algorithmisiert sind. Im Anschluss beleuchten wir drei aktuelle, sich gegenseitig bedingende Entwicklungen von Social Machines: die immer stärkere Verschmelzung von Sozialität und Maschine, die Vermessung von Nutzeraktivitäten als Grundstoff gesellschaftlichen Zusammenhalts und die zunehmende Algorithmisierung gesellschaftlicher Prozesse. Abschließend diskutieren wir, dass eine teilhabeorientierte, demokratischen Werten folgende Gestaltung von Social Machines die Perspektiven der Nutzungsakzeptanz, der gesellschaftlichen Akzeptabilität und der nachhaltigen Wirtschaftlichkeit adressieren und umsetzen muss.
... -Active Learning with Multiple Annotators [Zhang et al., 2015, Calma et al., 2016: We use Multiple annotators that label the data and assume that they are not omniscient but will label benevolently. The cost between each annotator may vary. ...
Preprint
Machine learning applications often need large amounts of training data to perform well. Whereas unlabeled data can be easily gathered, the labeling process is difficult, time-consuming, or expensive in most applications. Active learning can help solve this problem by querying labels for those data points that will improve the performance the most. Thereby, the goal is that the learning algorithm performs sufficiently well with fewer labels. We provide a library called scikit-activeml that covers the most relevant query strategies and implements tools to work with partially labeled data. It is programmed in Python and builds on top of scikit-learn.
Preprint
Pool-based active learning (AL) aims to optimize the annotation process (i.e., labeling) as the acquisition of annotations is often time-consuming and therefore expensive. For this purpose, an AL strategy queries annotations intelligently from annotators to train a high-performance classification model at a low annotation cost. Traditional AL strategies operate in an idealized framework. They assume a single, omniscient annotator who never gets tired and charges uniformly regardless of query difficulty. However, in real-world applications, we often face human annotators, e.g., crowd or in-house workers, who make annotation mistakes and can be reluctant to respond if tired or faced with complex queries. Recently, a wide range of novel AL strategies has been proposed to address these issues. They differ in at least one of the following three central aspects from traditional AL: (1) They explicitly consider (multiple) human annotators whose performances can be affected by various factors, such as missing expertise. (2) They generalize the interaction with human annotators by considering different query and annotation types, such as asking an annotator for feedback on an inferred classification rule. (3) They take more complex cost schemes regarding annotations and misclassifications into account. This survey provides an overview of these AL strategies and refers to them as real-world AL. Therefore, we introduce a general real-world AL strategy as part of a learning cycle and use its elements, e.g., the query and annotator selection algorithm, to categorize about 60 real-world AL strategies. Finally, we outline possible directions for future research in the field of AL.
Conference Paper
Full-text available
Science, technology, and commerce increasingly recognise the importance of machine learning approaches for data-intensive, evidence-based decision making. This is accompanied by increasing numbers of machine learning applications and volumes of data. Nevertheless, the capacities of processing systems or human supervisors or domain experts remain limited in real-world applications. Furthermore, many applications require fast reaction to new situations, which means that first predictive models need to be available even if little data is yet available. Therefore approaches are needed that optimise the whole learning process, including the interaction with human supervisors, processing systems, and data of various kind and at different timings: techniques for estimating the impact of additional resources (e.g. data) on the learning progress; techniques for the active selection of the information processed or queried; techniques for reusing knowledge across time, domains, or tasks, by identifying similarities and adaptation to changes between them; techniques for making use of different types of information, such as labeled or unlabeled data, constraints or domain knowledge. Such techniques are studied for example in the fields of adaptive, active, semi-supervised, and transfer learning. However, this is mostly done in separate lines of research, while combinations thereof in interactive and adaptive machine learning systems that are capable of operating under various constraints, and thereby address the immanent real-world challenges of volume, velocity and variability of data and data mining systems, are rarely reported. Therefore, this workshop and tutorial aims to bring together researchers and practitioners from these different areas, and to stimulate research in interactive and adaptive machine learning systems as a whole. It continues a successful series of events at ECML PKDD 2017 in Skopje (Workshop & Tutorial), IJCNN 2018 in Rio (Tutorial), and ECML PKDD 2018 in Dublin (Workshop). The workshop aims at discussing techniques and approaches for optimising the whole learning process, including the interaction with human supervisors, processing systems, and includes adaptive, active, semi-supervised, and transfer learning techniques, and combinations thereof in interactive and adaptive machine learning systems. All in all, we accepted six regular papers (9 papers submitted) and three short papers (4 submitted) to be published in these workshop proceedings. The authors discuss approaches, identify challenges and gaps between active learning research and meaningful applications, as well as define new application-relevant research directions. We thank the authors for their submissions and the program committee for their hard work. September 2019 - Adrian Calma, Andreas Holzinger, Daniel Kottke, Georg Krempl, Vincent Lemaire
Conference Paper
Full-text available
Active learning is well-motivated in many supervised learning tasks where unlabeled data may be abundant but labeled examples are expensive to obtain. The goal of active learning is to maximize the performance of a learning model using as few labeled training data as possible, thereby minimizing the cost of data annotation. So far, there is still very limited work on active learning for regression. In this paper, we propose a new active learning framework for regression called Expected Model Change Maximization (EMCM), which aims to choose the examples that lead to the largest change to the current model. The model change is measured as the difference between the current model parameters and the updated parameters after training with the enlarged training set. Inspired by the Stochastic Gradient Descent (SGD) update rule, the change is estimated as the gradient of the loss with respect to a candidate example for active learning. Under this framework, we derive novel active learning algorithms for both linear regression and nonlinear regression to select the most informative examples. Extensive experimental results on the benchmark data sets from UCI machine learning repository have demonstrated that the proposed algorithms are highly effective in choosing the most informative examples and robust to various types of data distributions.
Conference Paper
Full-text available
Opportunistic activity and context recognition systems draw from the characteristic to use sensing devices that just happen to be avail-able rather than pre-defining a fixed sensor infrastructure at design time. Opportunistic sensing offers the possibility to obtain data from sensors that just happen to be available in the area surrounding the user. This enables users or applications to state recognition goals, saying what has to be sensed for, at runtime to the system. The available sensing devices that can contribute to the recognition goal are configured to an ensem-ble, which is the best set of sensors to recognize the goal. This paper describes the OPPORTUNITY Framework and shows its functionality with respect to four application cases (goal querying and sensor config-uration, sensor appears/disappears, sensor learns from other sensor and sensor self trust) to show the dynamic nature of an opportunistic system as the available sensing infrastructure is not fixed and changes during runtime.
Chapter
Jiang, Jun; IP Horace H. S. With the increasing demand of multimedia information retrieval, such as image and video retrieval from the Web, there is a need to find ways to train a classifier when the training dataset is combined with a small number of labelled data and a large number of unlabeled one. Traditional supervised or unsupervised learning methods are not suited to solving such problems particularly when the problem is associated with data in a high-dimension space. In recent years, many methods have been proposed that can be broadly divided into two groups: semi-supervised and active learning (AL). Support Vector Machine (SVM) has been recognized as an efficient tool to deal with high-dimensionality problems, a number of researchers have proposed algorithms of Active Learning with SVM (ALSVM) since the turn of the Century. Considering their rapid development, we review, in this chapter, the state-of-the-art of ALSVM for solving classification problems.
Conference Paper
When faced with the task of building machine learning or NLP models, it is often worthwhile to turn to active learning to obtain human annotations at minimal costs. Traditional active learning schemes query a human for labels of intelligently chosen examples. However, human effort can also be expended in collecting alternative forms of annotations. For example, one may attempt to learn a text classifier by labeling class-indicating words, instead of, or in addition to, documents. Learning from two different kinds of supervision brings a new, unexplored dimension to the problem of active learning. In this paper, we demonstrate the value of such active dual supervision in the context of sentiment analysis. We show how interleaving queries for both documents and words significantly reduces human effort -- more than what is possible through traditional one-dimensional active learning, or by passive combinations of supervisory inputs.
Article
Pool-based active learning is a paradigm where users (e.g., domains experts) are iteratively asked to label initially unlabeled data, e.g., to train a classifier from these data. An appropriate selection strategy has to choose unlabeled data for such user queries in an efficient and effective way (in principle, high classification performance at low labeling costs). In our transductive active learning approach we provide a completely labeled data pool (samples are either labeled by the experts or in a semi-supervised way) in each active learning cycle. Thereby, a key aspect is to explore and exploit information about structure in data. Structure in data can be detected and modeled by means of clustering algorithms or probabilistic, generative modeling techniques, for instance. Usually, this is done at the beginning of the active learning process when the data are still unlabeled. In our approach we show how a probabilistic generative model, initially parametrized with unlabeled data, can iteratively be refined and improved when during the active learning process more and more labels became available. In each cycle of the active learning process we use this generative model to label all samples not labeled by an expert so far in order to train the kind of classifier we want to train with the active learning process. Thus, this transductive learning process can be combined with any selection strategy and any kind of classifier. Here, we combine it with the 4DS selection strategy and the CMM probabilistic classifier described in previous work. For 20 publicly available benchmark data sets, we show that this new transductive learning process helps to improve pool-based active learning noticeably.
Article
Mobile recommender systems provide personalized recommendations to help deal with today's information overload. This paper presents a shopping recommender system developed for an exploratory mobile context combining previous research in Active Learning and critiquing. Integrating Active Learning methods in mobile recommender systems is a largely unexplored research area that allows for personalized information retrieval and recommendation. The system actively selects training points for user critique so as to learn the most about the user's preferences in the current context. The system uses a conversation-based Active Learning strategy, which involves the user in a cycle of updating displayed recommendations based on her/his critiques on features of those items until a satisfactory item is found and selected. Even if the customer uses the system for the first time, the system presents recommendations from the beginning, without requiring the user to insert a search query. Feature critiques are differentiated into positive ('like') and negative ('dislike') feedback enabling the system to decide whether to further refine the selection by showing more similar items or to refocus and show a more diverse set of items. An Android application integrating the developed system was evaluated with a diverse set of real people. Results show that conversational Active Learning improves the user experience and diversity-based information retrieval is preferred to similarity-based in a mobile exploratory context regarding accuracy, effort and the intention in returning to the system.