Conference PaperPDF Available

Interactive machine learning



Content may be subject to copyright.
Interactive Machine Learning
Jerry Alan Fails, Dan R. Olsen, Jr.
Computer Science Department
Brigham Young University
Provo, Utah 84602
{failsj, olsen}
Perceptual user interfaces (PUIs) are an important part of
ubiquitous computing. Creating such interfaces is difficult
because of the image and signal processing knowledge
required for creating classifiers. We propose an interactive
machine-learning (IML) model that allows users to train,
classify/view and correct the classifications. The concept
and implementation details of IML are discussed and
contrasted with classical machine learning models.
Evaluations of two algorithms are also presented. We also
briefly describe Image Processing with Crayons (Crayons),
which is a tool for creating new camera-based interfaces
using a simple painting metaphor. The Crayons tool
embodies our notions of interactive machine learning.
Categories: H.5.2, D.2.2
General Terms: Design, Experimentation
Machine learning, perceptive user interfaces, interaction,
image processing, classification
Perceptual user interfaces (PUIs) are establishing the need
for machine learning in interactive settings. PUIs like
VideoPlace [8], Light Widgets [3], and Light Table [15,16]
all use cameras as their perceptive medium. Other systems
use sensors other than cameras such as depth scanners and
infrared sensors [13,14,15]. All of these PUIs require
machine learning and computer vision techniques to create
some sort of a classifier. This classification component of
the UI often demands great effort and expense. Because
most developers have little knowledge on how to
implement recognition in their UIs this becomes
problematic. Even those who do have this knowledge
would benefit if the classifier building expense were
lessened. We suggest the way to decrease this expense is
through the use of a visual image classifier generator,
which would allow developers to add intelligence to
interfaces without forcing additional programming.
Similar to how Visual Basic allows simple and fast
development, this tool would allow for fast integration of
recognition or perception into a UI. Implementation of
such a tool, however, poses many problems. First and
foremost is the problem of rapidly creating a satisfactory
classifier. The simple solution is to using behind-the-
scenes machine learning and image processing.
Machine learning allows automatic creation of classifiers,
however, the classical models are generally slow to train,
and not interactive. The classical machine-learning (CML)
model is summarized in Figure 1. Prior to the training of
the classifier, features need to be selected. Training is then
performed “off-line” so that classification can be done
quickly and efficiently. In this model classification is
optimized at the expense of longer training time.
Generally, the classifier will run quickly so it can be done
real-time. The assumption is that training will be
performed only once and need not be interactive. Many
machine-learning algorithms are very sensitive to feature
selection and suffer greatly if there are very many features.
Train Classify
Interactive Use
Figure 1 – Classical machine learning model
With CML, it is infeasible to create an interactive tool to
create classifiers. CML requires the user to choose the
features and wait an extended amount of time for the
algorithm to train. The selection of features is very
problematic for most interface designers. If one is
designing an interactive technique involving laser spot
tracking, most designers understand that the spot is
generally red. They are not prepared to deal with how to
sort out this spot from red clothing, camera noise or a
variety of other problems. There are well-known image
processing features for handling these problems, but very
few interface designers would know how to carefully select
them in a way that the machine learning algorithms could
The current approach requires too much technical
knowledge on the part of the interface designer. What we
would like to do is replace the classical machine-learning
model with the interactive model shown in Figure 2. This
interactive training allows the classifier to be coached
along until the desired results are met. In this model the
designer is correcting and teaching the classifier and the
classifier must perform the appropriate feature selection.
Train Classify
Interactive Use
Feedback To
Figure 2 – Interactive machine learning (IML) model
The pre-selection of features can be eliminated and
transferred to the learning part of the IML if the learning
algorithm used performs feature selection. This means that
a large repository of features are initially calculated and fed
to the learning algorithm so it can learn the best features for
the classification problem at hand. The idea is to feed a
very large number of features into the classifier training
and let the classifier do the filtering rather than the human.
The human designer then is focused on rapidly creating
training data that will correct the errors of the classifier.
In classical machine learning, algorithms are evaluated on
their inductive power. That is, how well the algorithm will
perform on new data based on the extrapolations made on
the training data. Good inductive power requires careful
analysis and a great deal of computing time. This time is
frequently exponential in the number of features to be
considered. We believe that using the IML model a simple
visual tool can be designed to build classifiers quickly. We
hypothesize that when using the IML, having a very fast
training algorithm is more important than strong induction.
In place of careful analysis of many feature combinations
we provide much more human input to correct errors as
they appear. This allows the interactive cycle to be iterated
quickly so it can be done more frequently.
The remainder of the paper is as follows. The next section
briefly discusses the visual tool we created using the IML
model, called Image Processing with Crayons (Crayons).
This is done to show one application of the IML model’s
power and versatility. Following the explanation of
Crayons, we explore the details of the IML model by
examining its distinction from CML, the problems it must
overcome, and its implementation details. Finally we
present some results from some tests between two of the
implemented machine learning algorithms. From these
results we base some preliminary conclusions of IML as it
relates to Crayons.
Crayons is a system we created that uses IML to create
image classifiers. Crayons is intended to aid UI designers
who do not have detailed knowledge of image processing
and machine learning. It is also intended to accelerate the
efforts of more knowledgeable programmers.
There are two primary goals for the Crayons tool: 1) to
allow the user to create an image/pixel classifier quickly,
and 2) to allow the user to focus on the classification
problem rather than image processing or algorithms.
Crayons is successful if it takes minutes rather than weeks
or months to create an effective classifier. For simplicity
sake, we will refer to this as the UI principle of fast and
focused. This principle refers to enabling the designer to
quickly accomplish his/her task while remaining focused
solely on that task.
Figure 3 shows the Crayons design process. Images are
input into the Crayons system, which can then export the
generated classifier. It is assumed the user has already
taken digital pictures and saved them as files to import into
the system, or that a camera is set up on the machine
running Crayons, so it can capture images from it.
Exporting the classifier is equally trivial, since our
implementation is written in Java. The classifier object is
simply serialized and output to a file using the standard
Java mechanisms.
Figure 3 – Classifier Design Process
An overview of the internal architecture of Crayons is
shown in Figure 4. Crayons receives images upon which
the user does some manual classification, a classifier is
created, then feedback is displayed. The user can then
refine the classifier by adding more manual classification
or, if the classifier is satisfactory, the user can export the
classifier. The internal loop shown in Figure 4 directly
correlates to the aforementioned train, feedback, correct
cycle of the IML (see Figure 2). To accomplish the fast
and focused UI principle, this loop must be easy and quick
to cycle through. To be interactive the training part of the
loop must take less than five seconds and generally much
faster. The cycle can be broken down into two
components: the UI and the Classifier. The UI component
needs to be simple so the user can remain focused on the
classification problem at hand. The classifier creation
needs to be fast and efficient so the user gets feedback as
quickly as possible, so they are not distracted from the
classification problem.
Figure 4 – The classification design loop
Although the IML and the machine-learning component of
Crayons are the primary discussion of this paper it is
notable to mention that Crayons has profited from work
done by Viola and Jones [19] and Jaimes and Chang
[5,6,7]. Also a brief example of how Crayons can be used
is illustrative. The sequence of images in Figure 5 shows
the process of creating a classifier using Crayons.
Figure 5 – Crayons interaction process
Figure 5 illustrates how the user initially paints very little
data, views the feedback provided by the resulting
classifier, corrects by painting additional class pixels and
then iterates through the cycle. As seen in the first image
pair in Figure 5, only a little data can generate a classifier
that roughly learns skin and background. The classifier,
however, over-generalizes in favor of background;
therefore, in the second image pair you can see skin has
been painted where the classifier previously did poorly at
classifying skin. The resulting classifier shown on the right
of the second image pair shows the new classifier
classifying most of the skin on the hand, but also
classifying some of the background as skin. The classifier
is corrected again, and the resulting classifier is shown as
the third image pair in the sequence. Thus, in only a few
iterations, a skin classifier is created.
The simplicity of the example above shows the power that
Crayons has due to the effectiveness of the IML model.
The key issue in the creation of such a tool lies in quickly
generating effective classifiers so the interactive design
loop can be utilized.
For the IML model to function, the classifier must be
generated quickly and be able to generalize well. As such
we will first discuss the distinctions between IML and
CML, followed by the problems IML must overcome
because of its interactive setting, and lastly its
implementation details including specific algorithms.
Classical machine learning generally has the following
There are relatively few carefully chosen features,
There is limited training data,
The classifier must amplify that limited training data
into excellent performance on new training data,
Time to train the classifier is relatively unimportant
as long as it does not take too many days.
None of these assumptions hold in our interactive situation.
Our UI designers have no idea what features will be
appropriate. In fact, we are trying to insulate them from
knowing such things. In our current Crayons prototype
there are more than 150 features per pixel. To reach the
breadth of application that we desire for Crayons we
project over 1,000 features will be necessary. The
additional features will handle texture, shape and motion
over time. For any given problem somewhere between
three and fifteen of those features will actually be used, but
the classifier algorithm must automatically make this
selection. The classifier we choose must therefore be able
to accommodate such a large number of features, and/or
select only the best features.
In Crayons, when a designer begins to paint classes on an
image a very large number of training examples is quickly
generated. With 77K pixels per image and 20 images one
can rapidly generate over a million training examples. In
practice, the number stays in the 100K examples range
because designers only paint pixels that they need to
correct rather than all pixels in the image. What this
means, however, is that designers can generate a huge
amount of training data very quickly. CML generally
focuses on the ability of a classifier to predict correct
behavior on new data. In IML, however, if the classifier’s
predictions for new data are wrong, the designer can
rapidly make those corrections. By rapid feedback and
correction the classifier is quickly (in a matter of minutes)
focused onto the desired behavior. The goal of the
classifier is not to predict the designer’s intent into new
situations but rapidly reflect intent as expressed in concrete
Because additional training examples can be added so
readily, IML’s bias differs greatly from that of CML.
Because it extrapolates a little data to create a classifier that
will be frequently used in the future, CML is very
concerned about overfit. Overfit is where the trained
classifier adheres too closely to the training data rather than
deducing general principles. Cross-validation and other
measures are generally taken to minimize overfit. These
measures add substantially to the training time for CML
algorithms. IML’s bias is to include the human in the loop
by facilitating rapid correction of mistakes. Overfit can
easily occur, but it is also readily perceived by the designer
and instantly corrected by the addition of new training data
in exactly the areas that are most problematic. This is
shown clearly in Figure 5 where a designer rapidly
provides new data in the edges of the hand where the
generalization failed.
Our interactive classification loop requires that the
classifier training be very fast. To be effective, the
classifier must be generated from the training examples in
under five seconds. If the classifier takes minutes or hours,
the process of ‘train-feedback-correct’ is no longer
interactive, and much less effective as a design tool.
Training on 100,000 examples with 150 features each in
less than five seconds is a serious challenge for most CML
Lastly, for this tool to be viable the final classifier will need
to be able to classify 320 x 240 images in less than a fourth
of a second. If the resulting classifier is much slower than
this it becomes impossible to use it to track interactive
behavior in a meaningful way.
IML Implementation
Throughout our discussion thus far, many requirements for
the machine-learning algorithm in IML have been made.
The machine-learning algorithm must:
learn/train very quickly,
accommodate 100s to 1000s of features,
perform feature selection,
allow for tens to hundreds of thousands of training
These requirements put firm bounds on what kind of a
learning algorithm can be used in IML. They invoke the
fundamental question of which machine-learning algorithm
fits all of these criteria. We discuss several options and the
reason why they are not viable before we settle on our
algorithm of choice: decision trees (DT).
Neural Networks [12] are a powerful and often used
machine-learning algorithm. They can provably
approximate any function in two layers. Their strength lies
in their abilities to intelligently integrate a variety of
features. Neural networks also produce relatively small
and efficient classifiers, however, there are not feasible in
IML. The number of features used in systems like Crayons
along with the number of hidden nodes required to produce
the kinds of classifications that are necessary completely
overpowers this algorithm. Even more debilitating is the
training time for neural networks. The time this algorithm
takes to converge is far to long for interactive use. For 150
features this can take hours or days.
The nearest-neighbor algorithm [1] is easy to train but not
very effective. Besides not being able to discriminate
amongst features, nearest-neighbor has serious problems in
high dimensional feature spaces of the kind needed in IML
and Crayons. Nearest-neighbor generally has a
classification time that is linear in the number of training
examples which also makes it unacceptably slow.
There are yet other algorithms such as boosting that do well
with feature selection, which is a desirable characteristic.
While boosting has shown itself to be very effective on
tasks such as face tracing [18], its lengthy training time is
prohibitive for interactive use in Crayons.
There are many more machine-learning algorithms,
however, this discussion is sufficient to preface to our
decision of the use of decision trees. All the algorithms
discussed above suffer from the curse of dimensionality.
When many features are used (100s to 1000s), their
creation and execution times dramatically increase. In
addition, the number of training examples required to
adequately cover such high dimensional feature spaces
would far exceed what designers can produce. With just
one decision per feature the size of the example set must
approach 2100, which is completely unacceptable. We need
a classifier that rapidly discards features and focuses on the
1-10 features that characterize a particular problem.
Decision trees [10] have many appealing properties that
coincide with the requirements of IML. First and foremost
is that the DT algorithm is fundamentally a process of
feature selection. The algorithm operates by examining
each feature and selecting a decision point for dividing the
range of that feature. It then computes the “impurity” of
the result of dividing the training examples at that decision
point. One can think of impurity as measuring the amount
of confusion in a given set. A set of examples that all
belong to one class would be pure (zero impurity). There
are a variety of possible impurity measures [2]. The
feature whose partition yields the least impurity is the one
chosen, the set is divided and the algorithm applied
recursively to the divided subsets. Features that do not
provide discrimination between classes are quickly
discarded. The simplicity of DTs also provides many
implementation advantages in terms of speed and space of
the resulting classifier.
Quinlan’s original DT algorithm [10] worked only on
features that were discrete (a small number of choices).
Our image features do not have that property. Most of our
features are continuous real values. Many extensions of
the original DT algorithm, ID3, have been made to allow
use of real–valued data [4,11]. All of these algorithms
either discretize the data or by selecting a threshold T for a
given feature F divide the training examples into two sets
where F<T and F>=T. The trick is for each feature to
select a value T that gives the lowest impurity (best
classification improvement). The selection of T from a
large number of features and a large number of training
examples is very slow to do correctly.
We have implemented two algorithms, which employ
different division techniques. These two algorithms also
represent the two approaches of longer training time with
better generalization vs. shorter training time with poorer
generalization. The first strategy slightly reduces
interactivity and relies more on learning performance. The
second relies on speed and interactivity. The two strategies
are Center Weighted (CW) and Mean Split (MS).
Our first DT attempt was to order all of the training
examples for each feature and step through all of the
examples calculating the impurity as if the division was
between each of the examples. This yielded a minimum
impurity split, however, this generally provided a best split
close to the beginning or end of the list of examples, still
leaving a large number of examples in one of the divisions.
Divisions of this nature yield deeper and more unbalanced
trees, which correlate to slower classification times. To
improve this algorithm, we developed Center Weighted
(CW), which does the same as above, except that it more
heavily weights central splits (more equal divisions). By
insuring that the split threshold is generally in the middle of
the feature range, the resulting tree tends to be more
balanced and the sizes of the training sets to be examined at
each level of the tree drops exponentially.
CW DTs do, however, suffer from an initial sort of all
training examples for each feature, resulting in a O(f * N
log N) cost up front, where f is the number of features and
N the number of training examples. Since in IML, we
assume that both f and N are large, this can be extremely
Because of the extreme initial cost of sorting all N training
examples f times, we have extended Center Weighted with
CWSS. The ‘SS’ stand for sub-sampled. Since the
iteration through training examples is purely to find a good
split, we can sample the examples to find a statistically
sound split. For example, say N is 100,000, if we sample
1,000 of the original N, sort those and calculate the best
split then our initial sort is 100 times faster. It is obvious
that a better threshold could be computed using all of the
training data, but this is mitigated by the fact that those data
items will still be considered in lower levels of the tree.
When a split decision is made, all of the training examples
are split, not just the sub-sample. The sub-sampling means
that each node’s split decision is never greater than
O(f*1000*5), but that eventually all training data will be
Quinlan used a sampling technique called “windowing”.
Windowing initially used a small sample of training
examples and increased the number of training examples
used to create the DT, until all of the original examples
were classified correctly [11]. Our technique, although
similar, differs in that the number of samples is fixed. At
each node in the DT a new sample of fixed size is drawn,
allowing misclassified examples in a higher level of the DT
to be considered at a lower level.
The use of sub-sampling in CWSS produced very slight
differences in classification accuracy as compared to CW,
but reduced training time by a factor of at least two (for
training sets with N 5,000). This factor however will
continue to grow as N increases. (For N = 40,000 CWSS is
approximately 5 times faster than CW; 8 for N = 80,000.)
The CW and CWSS algorithms spend considerable
computing resources in trying to choose a threshold value
for each feature. The Mean Split (MS) algorithm spends
very little time on such decisions and relies on large
amounts of training data to correct decisions at lower levels
of the tree. The MS algorithm uses T=mean(F) as the
threshold for dividing each feature F and compares the
impurities of the divisions of all features. This is very
efficient and produces relatively shallow decision trees by
generally dividing the training set in half at each decision
point. Mean split, however, does not ensure that the
division will necessarily divide the examples at points that
are meaningful to correct classification. Successive splits
at lower levels of the tree will eventually correctly classify
the training data, but may not generalize as well.
The resulting MS decision trees are not as good as those
produced by more careful means such as CW or CWSS.
However, we hypothesized, that the speedup in
classification would improve interactivity and thus reduce
the time for designers to train a classifier. We believe
designers make up for the lower quality of the decision tree
with the ability to correct more rapidly. The key is in
optimizing designer judgment rather than classifier
predictions. MSSS is a sub-sampled version of MS in the
same manner as CWSS. In MSSS, since we just evaluate
the impurity at the mean, and since the mean is a simple
statistical value, the resulting divisions are generally
identical to those of straight MS.
As a parenthetical note, another important bottleneck that is
common to all of the classifiers is the necessity to calculate
all features initially to create the classifier. We made the
assumption in IML that all features are pre-calculated and
that the learning part will find the distinguishing features.
Although, this can be optimized so it is faster, all
algorithms will suffer from this bottleneck.
There are many differences between the performances of
each of the algorithms. The most important is that the CW
algorithms train slower than the MS algorithms, but tend to
create better classifiers. Other differences are of note
though. For example, the sub sampled versions, CWSS
and MSSS, generally allowed the classifiers to be
generated faster. More specifically, CWSS was usually
twice as fast as CW, as was MSSS compared to MS.
Because of the gains in speed and lack of loss of
classification power, only CWSS and MSSS will be used
for comparisons. The critical comparison is to see which
algorithm allows the user to create a satisfactory classifier
the fastest. User tests comparing these algorithms are
outlined and presented in the next section.
User tests were conducted to evaluate the differences
between CWSS and MSSS. When creating a new
perceptual interface it is not classification time that is the
real issue. The important issue is designer time. As stated
before, classification creation time for CWSS is longer than
MSSS, but the center-weighted algorithms tend to
generalize better than the mean split algorithms. The
CWSS generally takes 1-10 seconds to train on training
sets of 10,000-60,000 examples, while MSSS is
approximately twice as fast on the same training sets.
These differences are important; as our hypothesis was that
faster classifier creation times can overcome poorer
inductive strength and thus reduce overall designer time.
To test the difference between CWSS and MSSS we used
three key measurements: wall clock time to create the
classifier, number of classify/correct iterations, and
structure of the resulting tree (depth and number of nodes).
The latter of these three corresponds to the amount of time
the classifier takes to classify an image in actual usage.
In order to test the amount of time a designer takes to
create a good classifier, we need a standard to define “good
classifier”. A “gold standard” was created for four
different classification problems: skin-detection, paper card
tracking, robot car tracking and laser tracking. These gold
standards were created by carefully classifying pixels until,
in human judgment, the best possible classification was
being performed on the test images for each problem. The
resulting classifier was then saved as a standard.
Ten total test subjects were used and divided into two
groups. The first five did each task using the CWSS
followed by the MSSS and the remaining five MSSS
followed by CWSS. The users were given each of the
problems in turn and asked to build a classifier. Each time
the subject requested a classifier to be built that classifier’s
performance was measured against the performance of the
standard classifier for that task. When the subject’s
classifier agreed with the standard on more than 97.5% of
the pixels, the test was declared complete.
Table 1, shows the average times and iterations for the first
group, Table 2, the second group.
Problem Time Iteration
Time Iteration
Skin 03:06 4.4 10:35 12.6
Paper Cards 02:29 4.2 02:23 5.0
Robot Car 00:50 1.6 01:00 1.6
Laser 00:46 1.2 00:52 1.4
Table 1 – CWSS followed by MSSS
Problem Time Iteration
Time Iteration
Skin 10:26 11.4 03:51 3.6
Paper Cards 04:02 5.0 02:37 2.6
Robot Car 01:48 1.2 01:37 1.2
Laser 01:29 1.0 01:16 1.0
Table 2 – MSSS followed by CWSS
The laser tracker is a relatively simple classifier because of
the uniqueness of bright red spots [9]. The robot car was
contrasted with a uniform colored carpet and was similarly
straightforward. Identifying colored paper cards against a
cluttered background was more difficult because of the
diversity of the background. The skin tracker is the hardest
because of the diversity of skin color, camera over-
saturation problems and cluttered background [20].
As can be seen in tables 1 and 2, MSSS takes substantially
more designer effort on the hard problems than CWSS. All
subjects specifically stated that CWSS was “faster” than
MSSS especially in the Skin case. (Some did not notice a
difference between the two algorithms while working on
the other problems.) We did not test any of the slower
algorithms such as neural nets or nearest-neighbor.
Interactively these are so poor that the results are self-
evident. We also did not test the full CW algorithm. Its
classification times tend into minutes and clearly could not
compete with the times shown in tables 1 and 2. It is clear
from our evaluations that a classification algorithm must
get under the 10-20 second barrier in producing a new
classification, but that once under that barrier, the
designer’s time begins to dominate. Once the designer’s
time begins to dominate the total time, then the classifier
with better generalization wins.
We also mentioned the importance of the tree structure as it
relates to the classification time of an image. Table 3
shows the average tree structures (tree depth and number of
nodes) as well as the average classification time (ACT) in
milliseconds over the set of test images.
Problem Dept
ACT Dept
Skin 16.20 577 243 25.60 12530 375
Cards 15.10 1661 201 16.20 2389 329
Car 13.60 1689 235 15.70 2859 317
Laser 13.00 4860 110 8.20 513 171
Table 3 – Tree structures and average classify time (ACT)
As seen in Table 3, depth, number of nodes and ACT, were
all lower in CWSS than in MSSS. This was predicted as
CWSS provides better divisions between the training
While testing we observed that those who used the MSSS
which is fast but less accurate, first, ended up using more
training data, even when they used the CWSS, which
usually generalizes better and needs less data. Those who
used the CWSS first, were pleased with the interactivity of
CWSS and became very frustrated when they used MSSS,
even though it could cycle faster through the interactive
loop. In actuality, because of the poor generalization of the
mean split algorithm, even though the classifier generation
time for MSSS was quicker than CWSS, the users felt it
necessary to paint more using the MSSS, so the overall
time increased using MSSS.
When using machine learning in an interactive design
setting, feature selection must be automatic rather than
manual and classifier training-time must be relatively fast.
Decision Trees using a sub-sampling technique to improve
training times are very effective for both of these purposes.
Once interactive speeds are achieved, however, the quality
of the classifier’s generalization becomes important. Using
tools like Crayons, demonstrates that machine learning can
form an appropriate basis for the design tools needed to
create new perceptual user interfaces.
1. Cover, T., and Hart, P. “Nearest Neighbor Pattern
Classification.” IEEE Transactions on Information
Theory, 13, (1967) 21-27.
2. Duda, R. O., Hart, P. E., and Stork, D. G., Pattern
Classification. (2001).
3. Fails, J.A., Olsen, D.R. “LightWidgets: Interacting in
Everyday Spaces.” Proceedings of IUI ’02 (San
Francisco CA, January 2002).
4. Fayyad, U.M. and Irani, K. B. “On the Handling of
Continuous-valued Attributes in Decision Tree
Generation.” Machine Learning, 8, 87-102,(1992).
5. Jaimes, A. and Chang, S.-F. “A Conceptual Framework
for Indexing Visual Information at Multiple Levels.”
IS&T/SPIE Internet Imaging 2000, (San Jose CA,
January 2000).
6. Jaimes, A. and Chang, S.-F. “Automatic Selection of
Visual Features and Classifier.” Storage and Retrieval
for Image and Video Databases VIII, IS&T/SPIE (San
Jose CA, January 2000).
7. Jaimes, A. and Chang, S.-F. “Integrating Multiple
Classifiers in Visual Object Detectors Learned from
User Input.” Invited paper, session on Image and Video
Databases, 4th Asian Conference on Computer Vision
(ACCV 2000), Taipei, Taiwan, January 8-11, 2000.
8. Krueger, M. W., Gionfriddo. T., and Hinrichsen, K.,
“VIDEOPLACE -- an artificial reality”. Human Factors
in Computing Systems, CHI '85 Conference
Proceedings, ACM Press, 1985, 35-40.
9. Olsen, D.R., Nielsen, T. “Laser Pointer Interaction.”
Proceedings of CHI ’01 (Seattle WA, March 2001).
10. Quinlan, J. R. “Induction of Decision Trees.” Machine
Learning, 1(1); 81-106, (1986).
11. Quinlan, J. R. “C4.5: Programs for machine learning.”
Morgan Kaufmann, San Mateo, CA, 1993.
12. Rumelhart, D., Widrow, B., and Lehr, M. “The Basic
Ideas in Neural Networks.” Communications of the
ACM, 37(3), (1994), pp 87-92.
13. Schmidt, A. “Implicit Human Computer Interaction
Through Context.” Personal Technologies, Vol 4(2),
June 2000.
14. Starner, T., Auxier, J. and Ashbrook, D. “The Gesture
Pendant: A Self-illuminating, Wearable, Infrared
Computer Vision System for Home Automation Control
and Medical Monitoring.” International Symposium on
Wearable Computing (Atlanta GA, October 2000).
15. Triggs, B. “Model-based Sonar Localisation for Mobile
Robots.” Intelligent Robotic Systems ’93, Zakopane,
Poland, 1993.
16. Underkoffler, J. and Ishii H. “Illuminating Light: An
Optical Design Tool with a Luminous-Tangible
Interface.” Proceedings of CHI ’98 (Los Angeles CA,
April 1998).
17. Underkoffler, J., Ullmer, B. and Ishii, H. “Emancipated
Pixels: Real-World Graphics in the Luminous Room.”
Proceedings of SIGGRAPH ’99 (Los Angeles CA,
1999), ACM Press, 385-392.
18. Vailaya, A., Zhong, Y., and Jain, A. K. “A hierarchical
system for efficient image retrieval.” In Proc. Int. Conf.
on Patt. Recog. (August 1996).
19. Viola, P. and Jones, M. “Robust real-time object
detection.” Technical Report 2001/01, Compaq CRL,
February 2001.
20. Yang, M.H. and Ahuja, N. “Gaussian Mixture Model
for Human Skin Color and Its Application in Image and
Video Databases.” Proceedings of SPIE ’99 (San Jose
CA, Jan 1999), 458-466.
... In addition, even the best ML today is so highly specialized it has become generally accepted by researchers and practitioners that the best way to harness the potential of AI today is through collaboration with people (Dellermann et al. 2019). The field of Interactive Machine Learning (IML) addresses this by using people in the training process with the hope that they can fill the gaps in data used for training or help create better models faster (Fails and Olsen Jr 2003). By adding a human to help train these algorithms IML has been shown to facilitate co-learning between people and computers (Fiebrink, Cook, and Within the field of ML, TM and NLP are the most promising for addressing the bottlenecks of the scenario creation process because they are used for finding insights in large amounts of text data which is perhaps the costliest part of the scenario process managed by expert practitioners today. ...
... However, if we were to integrate people into the automated scenario process they could still have input and rely on automation to do most of the heavy lifting. In the field of ML, there is a practice known as Interactive Machine Learning (IML) where people are used as part of the training process to create better models (Fails and Olsen Jr 2003;Amershi et al. 2014). IML attempts to use people in the training process for ML with the hope that they can fill the gaps in data used for training or help create better models faster. ...
... This is different than traditional ML modeling because there is not always an expert in data science in the middle analyzing data or tweaking models, instead, people of varying technical ability and in some cases the target users of the system train the model before it is put into production. IML has been used with people that have non-technical backgrounds to help train models (Fails and Olsen Jr 2003) and is used in a range of applications today such as image recognition (Sanghoon Lee et al. 2020) or sentiment analysis (Wu, Weld, and Heer 2019) to name a few. In a typical IML process, a person is used to dynamically tag data which is then used for training. ...
Full-text available
Scenario planning is used extensively in strategic planning because it helps leaders broaden their perspectives and make better decisions by presenting possible futures in story form. Some of the benefits of using scenarios include breaking away from groupthink, creating better products, acceleration of organization learning and reducing bias. Product development teams, particularly for digital products, are gaining more autonomy in organizations and tend to manage risk by undergoing very short development iterations on their products while leaning on their consumers for feedback – a process known as agile development. This method tends to limit the perspective of the team and foster groupthink, two side effects which could potentially be addressed using scenarios. However, the time-consuming and expensive processes used to create scenarios are inaccessible to agile product development teams, and even teams that use scenarios for strategic direction typically use them at the beginning of product development and do not keep them up to date over time, eventually making them irrelevant to decision making. This research explores automating the bottlenecks of the scenario process so they can be incorporated into autonomous agile teams by creating and rigorously tests an artifact that combines Natural Language Processing (NLP) to understand data, Interactive Machine Learning (IML) to combine automation with human expertise, Fuzzy Cognitive Maps (FCM) for quantitative scenario modeling, and Horizon Scanning (HS) to keep models up to date; a system I call Scenario Acceleration through Automated Modelling (SAAM). Using Design Science Research (DSR), I demonstrate how these technologies can be used together to speed up the scenario creation process while keeping people in the loop, and how they can be kept up to date over time. This research lays the foundation for product development teams to use scenarios in agile processes, with the goal of creating better products and avoiding disruption. This work makes several contributions: Firstly, it furthers the body of knowledge on scenario development by showing how to create scenarios with automation and how scenarios could be used by agile teams. Secondly, it demonstrates a novel method of creating FCM with NLP and human collaboration, and how to use Horizon Scanning to keep models up to date over time. Finally, I leave an artifact that can be used by other teams who want to continue this vein of research, or for product teams that want to utilize this method.
... Interactive machine learning (iML) is an active machine learning technique in which models are designed and implemented with human-in-the-loop manner. End-users participate in model building process by iteratively feeding training parameters, inspecting model outputs and providing feedback on intermediate results Arendt et al. [2017], Amershi et al. [2014], Fails and Olsen Jr [2003], Jiang et al. [2019], Holzinger et al. [2018], Boukhelifa et al. [2018], Liu et al. [2017a], Françoise [2020]. ...
... A model building component in iML framework interacts with Oracles by issuing queries for additional training data or feedback against its intermediate results. iML based methods mainly aspires to build robust Huang et al. [2011], , Carlini and Wagner [2017], Szegedy et al. [2013a], Sun et al. [2018], Brendel et al. [2017], Guo et al. [2019], Cheng et al. [2019a], Guo et al. [2018], Porter et al. [2013], Ma et al. [2019], , Slack et al. [2020], Emamjomeh-Zadeh and Kempe [2017] models by trading-off accuracy for trust Teso and Kersting [2019], Gutzwiller and Reeder [2017], , Ribeiro et al. [2016], Mozina [2018], Gutzwiller and Reeder [2017], Holzinger et al. [2018], Turchetta et al. [2019], Berkenkamp et al. [2016], Sui et al. [2018], Van Den Elzen and Van Wijk [2011], Liu et al. [2017b], Zhao et al. [2018], Mühlbacher et al. [2014] and low resource learning Ambati [2011], Frazier and Riedl [2019], Porter et al. [2013], Preuveneers et al. [2020], Holzinger et al. [2017], Fails and Olsen Jr [2003], Amershi et al. [2011], Tegen et al. [2020], Amershi et al. [2012], Dzyuba et al. [2014], , Jain et al. [2020], Fiebrink and Cook [2010], Gillian and Paradiso [2014], Schedel et al. [2011], Diaz et al. [2019], Arendt et al. [2018Arendt et al. [ , 2017. ...
... Interactive machine learning also plays a significant role in image processing and computer vision in general.The use of iML for automatic feature selection is proposed in Fails and Olsen Jr [2003]. In this work, the researchers exploit the capacity of iML for automated feature selection. ...
Full-text available
Machine learning has proved useful in many software disciplines, including computer vision, speech and audio processing, natural language processing, robotics and some other fields. However, its applicability has been significantly hampered due its black-box nature and significant resource consumption. Performance is achieved at the expense of enormous computational resource and usually compromising the robustness and trustworthiness of the model. Recent researches have been identifying a lack of interactivity as the prime source of these machine learning problems. Consequently, interactive machine learning (iML) has acquired increased attention of researchers on account of its human-in-the-loop modality and relatively efficient resource utilization. Thereby, a state-of-the-art review of interactive machine learning plays a vital role in easing the effort toward building human-centred models. In this paper, we provide a comprehensive analysis of the state-of-the-art of iML. We analyze salient research works using merit-oriented and application/task oriented mixed taxonomy. We use a bottom-up clustering approach to generate a taxonomy of iML research works. Research works on adversarial black-box attacks and corresponding iML based defense system, exploratory machine learning, resource constrained learning, and iML performance evaluation are analyzed under their corresponding theme in our merit-oriented taxonomy. We have further classified these research works into technical and sectoral categories. Finally, research opportunities that we believe are inspiring for future work in iML are discussed thoroughly.
... Such explanations are instrumental for enabling stakeholders to inspect the system's knowledge and reasoning patterns, however stakeholders only participate as passive observers and have no control over the system or its behavior. On the other hand, IML focuses primarily on communication between machines and humans, and it is specifically concerned with eliciting and incorporating human feedback into the training process via intelligent user interfaces [53,10,109,176,71,173]. IML covers a broad range of techniques for in-the-loop interaction between humans and machines, however, most research does not explicitly consider explanations. ...
... For instance, in active learning the machine is essentially a black-box, with no information being disclosed about what knowledge it has acquired and what effect feedback has on it [160]. Strategies for interactive customization of ML models, like the one proposed by Fails and Olsen Jr [53], are less opaque, in that users can explore the impact of their changes and tune their feedback accordingly. Yet, the model's logic can only be (approximately) reconstructed from changes in behavior, making it hard to anticipate what information should be provided to guide the model in a desirable direction [92]. ...
Explanations have gained an increasing level of interest in the AI and Machine Learning (ML) communities in order to improve model transparency and allow users to form a mental model of a trained ML model. However, explanations can go beyond this one way communication as a mechanism to elicit user control, because once users understand, they can then provide feedback. The goal of this paper is to present an overview of research where explanations are combined with interactive capabilities as a mean to learn new models from scratch and to edit and debug existing ones. To this end, we draw a conceptual map of the state-of-the-art, grouping relevant approaches based on their intended purpose and on how they structure the interaction, highlighting similarities and differences between them. We also discuss open research issues and outline possible directions forward, with the hope of spurring further research on this blooming research topic.
... Participants in the creative process iterate through the Grokloop until they are happy with the content. Other authors have introduced similar concepts, such as Fails et al. [24] who, in the context of interactive machine learning, introduced a three-stage loop: manually classify, generate classifier and evaluate classifier. ...
... Compton [23] introduces the idea of the Grokloop, which conceptualises the iterative loop for a generative system. Fails & Olsen's work [24] on Interactive Machine Learning (IML) introduces a similar loop with the concept of "fast and focused". ...
We present a survey of mixed-initiative methods for the creation of content for video games. We also propose a definition of what mixed-initiative implies, as the term lacks a clear specification. The survey includes works not directly aimed at video games but which create content that can potentially be used in games, such as art programs utilizing mixed-initiative. Furthermore, we highlight research areas that overlap wholly or partly with mixed-initiative, such as casual creators, explainable AI, or interactive evolutionary computation. We examine these and several other topics in the context of mixed-initiative. Finally, we provide a catalogue of typical techniques and challenges connected with mixed-initiative before considering future directions.
... The phrases form preliminary sets of module codes and state codes for that flowchart. For example, Crayons [27] supports image pixel-level segmentation with a flowchart containing five phrases. Four of them ("train", "classify", "feedback to designer", and "manual correction") are categorized as preliminary codes for modules because they describe actions and are placed inside blocks in the flowchart figure, while one of them ("interactive use") is excluded as it is a modifier of another phrase. ...
... The final module/state codes are generated from representative preliminary codes and normalized into short noun phrases describing an action/variable. For example, "classify" in Crayons [27] is finally grouped into the theme "default labeling", and this theme also serves as its final code. ...
... Generally, training a deep learning model with a few features can be challenging since most deep learning models require a sufficient number of features to reach higher accuracy [43]. While deep learning models' accuracy could also suffer from many number features [44]. ...
Full-text available
Nowadays, technological advancement has transformed traditional vehicles into Au-tonomous Vehicles (A.V.s). In addition, in our daily lives, A.V.s play an important role since they are considered an essential component of smart cities. A.V. is an intelligent vehicle capable of main-taining safe driving by avoiding crashes caused by drivers. Unlike traditional vehicles, which are fully controlled and operated by humans, A.V.s collect information about the outside environment using sensors to ensure safe navigation. Furthermore, A.V.s reduce environmental impact because they usually use electricity to operate instead of fossil fuel, thus decreasing the greenhouse gasses. However, A.V.s could be threatened by cyberattacks, posing risks to human life. For example, re-searchers reported that Wi-Fi technology could be vulnerable to cyberattacks through Tesla and BMW AVs. Therefore, more research is needed to detect cyberattacks targeting the components of A.V.s to mitigate their negative consequences. This research will contribute to the security of A.V.s by detecting cyberattacks at the early stages. First, we inject False Data Injection (FDI) attacks into an A.V. simulation-based system developed by MathWorks. Inc. Second, we collect the dataset generated from the simulation model after integrating the cyberattack. Third, we implement an intelligent symmetrical anomaly detection method to identify FDI attacks targeting the control system of the A.V. through a compromised sensor. We use long short-term memory (LSTM) deep networks to detect FDI attacks in the early stage to ensure the stability of the operation of A.V.s. Our method classifies the collected dataset into two classifications: normal and anomaly data. The ex-perimental result shows that our proposed model's accuracy is 99.95%. To this end, the proposed model outperforms other state-of-the-art models in the same study area.
... Before deploying a machine learning (ML) model in high-stakes use cases, practitioners, who are responsible for developing and maintaining models, may solicit and incorporate feedback from experts [4,30,44]. Prior work has largely focused on incorporating feedback of technical experts (e.g., from ML engineers, data scientists) into models [2,90,92,107,114,116,124]. The feedback of technical experts might be immediately actionable, as likely few communication barriers exist between technical experts and practitioners. In contrast, the relationship between a practitioner and non-technical expert (e.g., doctors, lawyers, elected officials, policymakers, social workers), as illustrated in Figure 1, is more complex [10,21]. ...
Full-text available
Machine learning (ML) practitioners are increasingly tasked with developing models that are aligned with non-technical experts' values and goals. However, there has been insufficient consideration on how practitioners should translate domain expertise into ML updates. In this paper, we consider how to capture interactions between practitioners and experts systematically. We devise a taxonomy to match expert feedback types with practitioner updates. A practitioner may receive feedback from an expert at the observation- or domain-level, and convert this feedback into updates to the dataset, loss function, or parameter space. We review existing work from ML and human-computer interaction to describe this feedback-update taxonomy, and highlight the insufficient consideration given to incorporating feedback from non-technical experts. We end with a set of open questions that naturally arise from our proposed taxonomy and subsequent survey.
Full-text available
Technological advancement has transformed traditional vehicles into autonomous vehicles. Autonomous vehicles play an important role since they are considered an essential component of smart cities. The autonomous vehicle is an intelligent vehicle capable of maintaining safe driving by avoiding crashes caused by drivers. Unlike traditional vehicles, which are fully controlled and operated by humans, autonomous vehicles collect information about the outside environment using sensors to ensure safe navigation. Autonomous vehicles reduce environmental impact because they usually use electricity to operate instead of fossil fuel, thus decreasing the greenhouse gasses. However, autonomous vehicles could be threatened by cyberattacks, posing risks to human life. For example, researchers reported that Wi-Fi technology could be vulnerable to cyberattacks through Tesla and BMW autonomous vehicles. Therefore, further research is needed to detect cyberattacks targeting the control components of autonomous vehicles to mitigate their negative consequences. This research will contribute to the security of autonomous vehicles by detecting cyberattacks in the early stages. First, we inject False Data Injection (FDI) attacks into an autonomous vehicle simulation-based system developed by MathWorks. Inc. Second, we collect the dataset generated from the simulation model after integrating the cyberattack. Third, we implement an intelligent symmetrical anomaly detection method to identify false data cyber-attacks targeting the control system of autonomous vehicles through a compromised sensor. We utilize long short-term memory (LSTM) deep networks to detect False Data Injection (FDI) attacks in the early stage to ensure the stability of the operation of autonomous vehicles. Our method classifies the collected dataset into two classifications: normal and anomaly data. The experimental result shows that our proposed model’s accuracy is 99.95%. To this end, the proposed model outperforms other state-of-the-art models in the same study area.
Full-text available
Technological advancement has transformed traditional vehicles into autonomous vehicles. Autonomous vehicles play an important role since they are considered an essential component of smart cities. The autonomous vehicle is an intelligent vehicle capable of maintaining safe driving by avoiding crashes caused by drivers. Unlike traditional vehicles, which are fully controlled and operated by humans, autonomous vehicles collect information about the outside environment using sensors to ensure safe navigation. Autonomous vehicles reduce environmental impact because they usually use electricity to operate instead of fossil fuel, thus decreasing the greenhouse gasses. However, autonomous vehicles could be threatened by cyberattacks, posing risks to human life. For example, researchers reported that Wi-Fi technology could be vulnerable to cyberattacks through Tesla and BMW autonomous vehicles. Therefore, further research is needed to detect cyberattacks targeting the control components of autonomous vehicles to mitigate their negative consequences. This research will contribute to the security of autonomous vehicles by detecting cyberattacks in the early stages. First, we inject False Data Injection (FDI) attacks into an autonomous vehicle simulation-based system developed by MathWorks. Inc. Second, we collect the dataset generated from the simulation model after integrating the cyberattack. Third, we implement an intelligent symmetrical anomaly detection method to identify false data cyber-attacks targeting the control system of autonomous vehicles through a compromised sensor. We utilize long short-term memory (LSTM) deep networks to detect False Data Injection (FDI) attacks in the early stage to ensure the stability of the operation of autonomous vehicles. Our method classifies the collected dataset into two classifications: normal and anomaly data. The experimental result shows that our proposed model’s accuracy is 99.95%. To this end, the proposed model outperforms other state-of-the-art models in the same study area.
Conference Paper
Full-text available
We describe a novel system for rapid prototyping of laserbasedoptical and holographic layouts. Users of this opticalprototyping tool -- called the Illuminating Light system --move physical representations of various optical elementsabout a workspace, while the system tracks these componentsand projects back onto the workspace surface thesimulated propagation of laser light through the evolvinglayout. This application is built atop the Luminous Roominfrastructure, an aggregate of ...
Conference Paper
Full-text available
We describe a conceptual infrastructure -- the Luminous Room --for providing graphical display and interaction at each of aninterior architectural space's various surfaces, arguing that pervasiveenvironmental output and input is one natural heir totoday's rather more limited notion of spatially-confined, outputonlydisplay (the CRT). We discuss the requirements of suchreal-world graphics, including computational & networkingdemands; schemes for spatially omnipresent capture and display;...
Conference Paper
Full-text available
In this paper we present a wearable device for control of home automation systems via hand gestures. This solution has many advantages over traditional home automation interfaces in that it can be used by those with loss of vision, motor skills, and mobility. By combining other sources of context with the pendant we can reduce the number and complexity of gestures while maintaining functionality. As users input gestures, the system can also analyze their movements for pathological tremors. This information can then be used for medical diagnosis, therapy, and emergency services. Currently, the Gesture Pendant can recognize control gestures with an accuracy of 95% and user-defined gestures with an accuracy of 97%. It can detect tremors above 2 HZ within ±.1 Hz
This paper describes a visual object detection framework that is capable of processing images extremely rapidly while achieving high detection rates. There are three key contributions. The first is the introduction of a new image representation called the "Integral Image" which allows the features used by our detector to be computed very quickly. The second is a learning algorithm, based on AdaBoost, which selects a small number of critical visual features and yields extremely efficient number of critical visual features and yields extremely efficient classifiers [6]. The third contribution is a method for combining classifiers in a “cascade” which allows background regions of the image to be quickly discarded while spending more computation on promising object-like regions. A set of experiments in the domain of face detection are presented. The system yields face detection performace comparable to the best previous systems [18, 13, 16, 12, 1]. Implemented on a conventional desktop, face detection proceeds at 15 frames per second.
The technology for building knowledge-based systems by inductive inference from examples has been demonstrated successfully in several practical applications. This paper summarizes an approach to synthesizing decision trees that has been used in a variety of systems, and it describes one such system, ID3, in detail. Results from recent studies show ways in which the methodology can be modified to deal with information that is noisy and/or incomplete. A reported shortcoming of the basic algorithm is discussed and two means of overcoming it are compared. The paper concludes with illustrations of current research directions.
We describe a sonar localisation system for autonomous mobile robot navigation in a known environment, which tries to extract as much information as possible from the sensors by building a detailed probabilistic model of each sonar event. It takes account of multiple hypotheses about the source of each signal and uses a probabilistic sensor fusion technique to merge the results into a single location update. The system is designed to run under our decentralised, highly parallel vehicle architecture, and we discuss some of the implementation techniques required to achieve this. The results of some initial simulations are presented.
Interest in the study of neural networks has grown remarkably in the last several years. This effort has been characterized in a variety of ways: as the study of brain-style computation, connectionist architectures, parallel distributed-processing systems, neuromorphic computation, artificial neural systems. The common theme to these efforts has been an interest in looking at the brain as a model of a parallel computational device very different from that of a traditional serial computer.
We present a result applicable to classification learning algorithms that generate decision trees or rules using the information entropy minimization heuristic for discretizing continuous-valued attributes. The result serves to give a better understanding of the entropy measure, to point out that the behavior of the information entropy heuristic possesses desirable properties that justify its usage in a formal sense, and to improve the efficiency of evaluating continuous-valued attributes for cut value selection. Along with the formal proof, we present empirical results that demonstrate the theoretically expected reduction in evaluation effort for training data sets from real-world domains. Peer Reviewed