Conference PaperPDF Available

Learning to classify human object sketches

Authors:

Abstract

We present ongoing work on object category recognition from binary human outline sketches. We first define a novel set of 187 "sketchable" object categories by extracting the labels of the most frequent objects in the LabelMe dataset. In a large-scale experiment, we then gather a dataset of over 5,500 human sketches, evenly distributed over all categories. We show that by training multi-class support vector machines on this dataset, we can classify novel sketches with high accuracy. We demonstrate this in an inter-active sketching application that progressively updates its category prediction as users add more strokes to a sketch.
Learning to classify human object sketches
Mathias Eitz
TU Berlin
James Hays
Brown University
bulldozer owl
lightbulb
potted plant
mug
pineapple
winebottle
cloud
telephoneflying bird
wristwatch camel
Figure 1: Left: sample sketches from two categories, right: averages of 30 sketches per category.
Abstract
We present ongoing work on object category recognition from bi-
nary human outline sketches. We first define a novel set of 187
“sketchable” object categories by extracting the labels of the most
frequent objects in the LabelMe dataset. In a large-scale exper-
iment, we then gather a dataset of over 5,500 human sketches,
evenly distributed over all categories. We show that by training
multi-class support vector machines on this dataset, we can classify
novel sketches with high accuracy. We demonstrate this in an inter-
active sketching application that progressively updates its category
prediction as users add more strokes to a sketch.
1 Introduction
Sketching is a common means of visual communication often used
for conveying rough visual ideas as in architectural drawings, de-
sign studies, comics, or movie storyboards. There exists signif-
icant prior research on retrieving images or 3d models based on
sketches. The assumption in all of these works is that, in some
well-engineered feature space, sketched objects will resemble their
real-world counterparts. But this fundamental assumption is often
violated – most humans are not faithful artists. Instead people use
shared, iconic representations of objects (e.g. stick figures) or they
make dramatic simplifications or exaggerations. Because the rela-
tionship between sketched and real objects is so abstract, to recog-
nize sketched objects one must learn from a training database of
real sketches. Because people represent the same object using dif-
fering degrees of realism and distinct drawing styles (see Fig. 1,
left), we believe that a successful approach can only be based on a
dataset that provides a sufficiently dense sampling of that space, i.e.
we need a large training dataset of sketches. Although both shape
and proportions of a sketched object may be far from that of the
corresponding real object, and at the same time sketches are an im-
poverished visual representation, humans are amazingly accurate at
interpreting such sketches. In this paper we demonstrate our ongo-
ing work on trying to teach computers classify sketched objects just
as humans do effortlessly.
2 Dataset & Classification
To classify sketches we address four main tasks: 1) defining a set of
object categories – ideally those would represent the most common
objects in our world; 2) creating a dataset of sketches with diverse
samples for each category; 3) defining low-level features for rep-
resenting the sketches and finally 4) training classifiers from our
e-mail: m.eitz@tu-berlin.de
e-mail: hays@cs.brown.edu
dataset such that we can accurately recognize novel sketches. We
have defined a list of common object categories by computing the
1,000 most frequent objects from the LabelMe dataset and collaps-
ing semantically similar categories. This resulted in 187 object cat-
egories, mainly containing common objects such as airplane, house,
cup, and horse. In a large-scale user experiment, we asked humans
to sketch such objects given only the category name. We instructed
them to a) draw sketches that would be “clearly recognizable to
other humans” as belonging to a given category, b) use outlines
only and c) avoid context around the actual object. Currently, the
dataset contains 30 sketches in each category for a total of about
5,500 sketches. We have performed this experiment using Ama-
zon Mechanical Turk. For learning on this dataset we perform two
main steps: we employ a bag-of-features approach [Sivic and Zis-
serman 2003] and use SIFT-like descriptors to represent sketches
as histograms of visual words [Eitz et al. 2011]. We train one-vs-
all classifiers using support vector machines with RBF kernels. We
find the best model by performing grid search over the parameters
space of the SVM and use 5-fold cross-validation to avoid over-
fitting. The best-performing model achieves an accuracy of about
37%. This is a very reasonable result considering that chance lies
at about 0.54%. We additionally demonstrate the subjectively very
good performance of the resulting model in an interactive applica-
tion that progressively visualizes classification results as the user
adds strokes to a sketch (please see the accompanying video).
3 Conclusion
We have demonstrated that – given a large dataset of sketches – rea-
sonable classification rates can be achieved, limited primarily (we
believe) by the bag-of-features representation which does not en-
code any spatial information. Clearly, constructing better features,
and extending and analyzing the dataset are promising areas for
future work. Finally, the large dataset in itself (which we plan to
provide as a free resource) as well as the semantic sketch classifica-
tion will be highly beneficial for applications such as sketch-based
image and 3D model retrieval.
References
EIT Z, M., HILDEBRAND, K., BOUBEKEUR, T., A ND AL EX A,
M. 2011. Sketch-based image retrieval: benchmark and bag-of-
features descriptors. IEEE Trans. Vis. Comp. Graph.. Preprints.
SIVIC, J., AN D ZISSERMAN, A. 2003. Video Google: A Text Re-
trieval Approach to Object Matching in Videos. In IEEE ICCV.
... One of the successful uses of SIFT features for sketch recognition has been done by Eitz et al. [9]. They extracted SIFT Fig. 3. Shape context matching cost as described in [26] with 80 descriptors; the lefthand image is warped, and the righthand unwarped. ...
Article
Searching is a necessary tool for managing and navigating the massive amounts of data available in today’s information age. While new searching methods have become increasingly popular and reliable in recent years, such as image-based searching, these methods may be more limited than text-based means in that they do not allow generic user input. Sketch-based searching is a method that allows users to draw generic search queries and return similar drawn images, giving more user control over their search content. In this paper, we present SketchSeeker, a system for indexing and searching across a large number of sketches quickly based on their similarity. SketchSeeker introduces a technique for indexing sketches in extremely compressed representations, which allows for fast, accurate retrieval augmented with a multilevel ranking subsystem. SketchSeeker was tested on a large set of sketches against existing sketch similarity metrics, and it shows significant improvements in terms of storage requirements, speed, and accuracy.
... An accurate sketch recognition method would be the one which is capable of distinguishing details. As the further step to prove the accuracy of our method, in this paper, we evaluate our SRC based BSBL method on the sketch dataset from Eitz et al. [18,19]. The benchmark data as well as the large image database are made publicly available. ...
Article
Full-text available
Human-computer interaction has become increasingly easy and popular using widespread smart devices. Gestures and sketches as the trajectory of hands in 3D space are among the popular interaction media. Therefore, their recognition is essential. However, diversity of human gestures along with the lack of visual cues make the sketch recognition process challenging. This paper aims to develop an accurate sketch recognition algorithm using Block Sparse Bayesian Learning (BSBL). Sketches are acquired from three datasets using a Wii-mote in a virtual-reality environment. We evaluate the performance of the proposed sketch recognition approach (MATRACK) on diverse sketch datasets. Comparisons are drawn with three high accuracy classifiers namely, Hidden Markov Model (HMM), Principle Component Analysis (PCA) and K-Nearest Neighbour (K-NN). MATRACK, the developed BSBL based sketch recognition system, outperforms k-NN, HMM and PCA. Specifically, for the most diverse dataset, MATRACK reaches the accuracy of 93.5%, where other three classifiers approximately catches 80% accuracy.
Article
We propose a user-drawn sketch image-based three-dimensional (3D) object recognition method, which automatically learns and optimizes features by using unsupervised algorithm to overcome the difficulty of extracting robust features from the black and white sketch image. As a preprocessing task, both the sketch image database and the projected image database of the 3D objects are built by learning with various user-drawn sketch images and suggestive contour images of the 3D objects respectively, and each sketch image is mapped to the most similar projected database image by measuring the similarity. This enables us to avoid a direct comparison of the sketch query and the projected images of the 3D objects and to use the learned robust sparse features of the trained sketch images in the sketch database, compensating for the difference between the user-drawn sketch image and synthesized images of the 3D mesh model. The locally-enforced feature optimization of the local and global features of the database images reduces the error and retains the feature properties. Furthermore, we quantitatively compared the proposed method to previous remarkable object recognition approaches. Numerous experiments on various challenging 3D objects and sketch images demonstrate that the proposed methodology performs favorably against several state-of-the-art algorithms.
Article
3D object retrieval from user-drawn (sketch) queries is one of the important research issues in the areas of pattern recognition and computer graphics for simulation, visualization, and Computer Aided Design. The performance of any content-based 3D object retrieval system crucially depends on the availability of effective descriptors and similarity measures for this kind of data. We present a sketch-based approach for improving 3D object retrieval effectiveness by optimizing the representation of one particular type of features (oriented gradients) using a sparse coding approach. We perform experiments, the results of which show that the retrieval quality improves over alternative features and codings. Based our findings, the coding can be proposed for sketch-based 3D object retrieval systems relying on oriented gradient features.
Article
Full-text available
We introduce a benchmark for evaluating the performance of large-scale sketch-based image retrieval systems. The necessary data are acquired in a controlled user study where subjects rate how well given sketch/image pairs match. We suggest how to use the data for evaluating the performance of sketch-based image retrieval systems. The benchmark data as well as the large image database are made publicly available for further studies of this type. Furthermore, we develop new descriptors based on the bag-of-features approach and use the benchmark to demonstrate that they significantly outperform other descriptors in the literature.
Conference Paper
We describe an approach to object and scene retrieval which searches for and localizes all the occurrences of a user outlined object in a video. The object is represented by a set of viewpoint invariant region descriptors so that recognition can proceed successfully despite changes in viewpoint, illumination and partial occlusion. The temporal continuity of the video within a shot is used to track the regions in order to reject unstable regions and reduce the effects of noise in the descriptors. The analogy with text retrieval is in the implementation where matches on descriptors are pre-computed (using vector quantization), and inverted file systems and document rankings are used. The result is that retrieved is immediate, returning a ranked list of key frames/shots in the manner of Google. The method is illustrated for matching in two full length feature films.
Sketch-based image retrieval: benchmark and bag-offeatures descriptors
  • M Hildebrand
  • K Boubekeur
  • T And
EITZ, M., HILDEBRAND, K., BOUBEKEUR, T., AND ALEXA, M. 2011. Sketch-based image retrieval: benchmark and bag-offeatures descriptors. IEEE Trans. Vis. Comp. Graph.. Preprints.