Content uploaded by Kristen Vaccaro
Author content
All content in this area was uploaded by Kristen Vaccaro on Aug 22, 2019
Content may be subject to copyright.
The Elements of Fashion Style
Kristen Vaccaro, Sunaya Shivakumar, Ziqiao Ding, Karrie Karahalios, and Ranjitha Kumar
Department of Computer Science
University of Illinois at Urbana-Champaign
{kvaccaro,sshivak2, zding5,kkarahal, ranjitha}@illinois.edu
USER INPUT STYLE DOCUMENT TOP ITEMS
“
”
I need an outfit for a beach wedding
that I'm going to early this summer.
I'm so excited -- it's going to be warm
and exotic and tropical... I want my
outfit to look effortless, breezy,
flowy, like I’m floating over the sand!
Oh, and obviously no white! For a
tropical spot, I think my outfit should
be bright and colorful.
beach
wedding
summer
tropical
exotic
effortless
breezy
glowing
radiant
floating
flowy
warm
bright
colorful
Figure 1: This paper presents a data-driven fashion model that learns correspondences between high-level styles (like “beach,” “flowy,” and “wedding”)
and low-level design elements such as color, material, and silhouette. The model powers a number of fashion applications, such as an automated personal
stylist that recommends fashion outfits (right) based on natural language specifications (left).
ABSTRACT
The outfits people wear contain latent fashion concepts cap-
turing styles, seasons, events, and environments. Fashion the-
orists have proposed that these concepts are shaped by de-
sign elements such as color, material, and silhouette. While
a dress may be “bohemian” because of its pattern, material,
trim, or some combination thereof, it is not always clear how
low-level elements translate to high-level styles. In this pa-
per, we use polylingual topic modeling to learn latent fashion
concepts jointly in two languages capturing these elements
and styles. This latent topic formation enables translation
between languages through topic space, exposing the ele-
ments of fashion style. The model is trained on a set of more
than half a million outfits collected from Polyvore, a popular
fashion-based social network. We present novel, data-driven
fashion applications that allow users to express their desires in
natural language just as they would to a real stylist, and pro-
duce tailored item recommendations for their fashion needs.
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage and that copies bear this notice and the full citation
on the first page. Copyrights for components of this work owned by others than the
author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or
republish, to post on servers or to redistribute to lists, requires prior specific permission
and/or a fee. Request permissions from permissions@acm.org.
UIST 2016, October 16 - 19, 2016, Tokyo, Japan
© 2016 Copyright held by the owner/author(s). Publication rights licensed to ACM.
ISBN 978-1-4503-4189-9/16/10. . . $15.00
DOI: http://dx.doi.org/10.1145/2984511.2984573
Author Keywords
Fashion, elements, styles, polylingual topic modeling
ACM Classification Keywords
H.5.2 User Interfaces; H.2.8 Database Applications
INTRODUCTION
Outfits contain latent fashion concepts, capturing styles, sea-
sons, events, and environments. Fashion theorists have pro-
posed that important design elements — color, material, sil-
houette, and trim — shape these concepts [24]. A long-
standing question in fashion theory is how low-level fashion
elements map to high-level styles. One of the first theorists to
study fashion as a language, Roland Barthes, highlighted the
difficulty of translating between elements and styles [2]:
If I read that a square-necked, white silk sweater is very
smart, it is impossible for me to say – without again hav-
ing to revert to intuition – which of these four features
(sweater, silk, white, square neck) act as signifiers for
the concept smart: is it only one feature which carries
the meaning, or conversely do non-signifying elements
come together and suddenly create meaning as soon as
they are combined?
To address this fundamental question, we present a model that
learns the relation between fashion design elements and fash-
ion styles. This model adapts a natural language processing
technique — polylingual topic modeling — to learn latent
777
fashion concepts jointly in two languages: a style language
used to describe outfits, and an element language used to la-
bel clothing items. This model answers Barthes’ question:
identifying the elements that determine styles. It also powers
automated personal stylist systems that can identify people’s
styles from an outfit they assemble, or recommend items for
a desired style (Figure 1).
We train a polylingual topic model (PLTM) on a dataset of
over half a million outfits collected from a popular fashion-
based social network, Polyvore1
. Polyvore outfits have both
free-text outfit descriptions written by their creator and item
labels (e.g., color, material, pattern, designer) extracted by
Polyvore. These two streams of data form a pair of parallel
documents (style and element, respectively) for each outfit,
which comprise the training inputs for the model.
Each topic in the trained PLTM corresponds to a pair of dis-
tributions over the two vocabularies, capturing the correspon-
dence between style and element words (Figure 2). For exam-
ple, the model learns that fashion elements such as “black,”
“leather,” and “jacket” are often signifiers for styles such as
“biker,” and “motorcycle.”
We validate the model using a set of crowdsourced, percep-
tual tasks: for example, asking users to select the set of words
in the element language that is the best match for a set in the
style language. These tasks demonstrate that the learned top-
ics mirror human perception: the topics are semantically co-
herent and translation between elements and styles is mean-
ingful to users.
This paper motivates the choice of model, describes both the
Polyvore outfit dataset as well as the training and evaluation
of the PLTM, and illustrates several resultant fashion applica-
tions. Using the PLTM, we can explain why a clothing item
fits a certain style: we know whether it is the collar, color,
material, or item itself that makes a sweater “smart.” We can
build data-driven style quizzes, predicting style preferences
from a user’s outfit. We even describe an automated personal
stylist which can provide outfit recommendations for a de-
sired style expressed in natural language. Polylingual topic
modeling can help us better our understanding of fashion the-
ory, and support a rich new set of interactive fashion tools.
MOTIVATION
As more fashion data has become available online, re-
searchers have built data-driven fashion systems that process
large-scale data to characterize styles [6, 11, 12, 20, 23],
recommend outfits [21, 22, 27, 32], and capture changing
trends [8, 10, 29]. Several projects have used deep learn-
ing to automatically infer structure from low-level (typically)
vision-based features [15, 28]. While these models can pre-
dict whether items match or whether an outfit is “hipster,”
they cannot explain why. For many applications, models
predicated on human-interpretable features are more useful
than models that merely predict outcomes [9]. For example,
when a user looks for a “party” outfit, an explanation like
“this item was recommended because it is a black miniskirt”
helps her understand the suggestion and provide feedback to
the system.
topic distribution
12
1
2
STYLE
summer, vintage, beach,
american, relaxed, retro, unisex
ELEMENT
short, denim, highwaisted, shirt,
top, cutoff, form, distressed
STYLE
biker, motorcycle, vintage,
summer, college, varsity, military
ELEMENT
jacket, black, leather, shirt, zip,
denim, sleeve, faux
STYLE
prom, occasion, special, party,
holiday, bridesmaid
ELEMENT
dress, shoe, cocktail, evening,
mini, heel, costume
STYLE
party, summer, night, sexy,
vintage, fitting, botanical
ELEMENT
dress, mini, sleeveless, cocktail,
skater, flare, out, lace, floral
1
2
1
2
topic distribution
Figure 2: Via polylingual topic modeling, we infer distributions over latent
fashion topics in outfits that capture the elements of fashion style. Fashion
elements like “jacket, black, leather” signify the “biker, motorcycle” style.
Conversely, fashion styles like “prom, special occasion” label groups of
elements such as “cocktail, mini, dress.”
This paper presents a fashion model that maps low-level el-
ements to high-level styles, adapting polylingual topic mod-
eling to learn correspondences between them [18]. Both sets
of features (elements and styles) are human interpretable, and
the translational capability of PLTMs can power applications
that indicate how design features are tied to user outcomes,
identifying peoples’ styles from the elements in their outfits
and recommending clothing items from high-level style de-
scriptions.
In addition to their translational capabilities, PLTMs offer a
number of other advantages. Unlike systems built on discrim-
inative models [21, 27, 32], PLTMs support a dynamic set of
styles that grows with the dataset and need not be specified
a priori. Moreover, topic modeling represents documents as
distributions over concepts, allowing styles to coexist within
outfits rather than labeling them with individual styles [6, 11,
12, 20]. Finally, the model smooths distributions so that sys-
tems can support low frequency queries. Even though there
are no “wedding” outfits explicitly labeled “punk rock” in
our dataset, we can still suggest appropriate attire for such an
event by identifying high probability fashion elements associ-
ated with “wedding” (e.g., “white,” “lace”) and “punk rock”
(e.g., “leather”, “studded”), and searching for clothing items
which contain them.
To build a PLTM for fashion, we require data that contains
both style and element information. Researchers have stud-
ied many sources of fashion data, from independent fash-
778
Happy Valentine’s
Day
Happy Valentine’s
Day! Have a nice time
with your boyfriends,
and don’t forget
about people who
are alone (like me).
The next few days
will be in tones of
romance, couples,
blush colors. Have a
nice weekend! Send
warm hugs and love.
#valentinesday
#personalstyle
#sweaterweather
Red cardigan,
Long sleeve tops,
Mango tops
Short sleeve shirts,
White t shirt,
Lightweight shirt,
Mango shirt
Stack heel shoes,
Oxford shoes
Retro sunglasses,
Heart sunglasses,
Hippie glasses
happy, love, hugs
blush, valentines
boyfriends, warm
couples, romance
alone, weekend
retro, hippie
valentinesday
personalstyle
sweaterweather
STYLE
red, short, sleeve
shirts, white, tshirt
mango, oxford
lightweight, stack
tops, heel, shoes
sunglasses, heart
cardigan, long
ELEMENT
POLYVORE OUTFIT PLTM DOCUMENTS
Figure 3: Polyvore outfits (left) are described at two levels: high-level
style descriptions (e.g., “#valentinesday”) and specific descriptions of the
items’ design elements (e.g., “red cardigan,” “lightweight shirt”). For each
outfit, we process these two streams of data into a pair of parallel docu-
ments (right).
ion items [3] to objects with rough co-occurrence informa-
tion [15] to entire outfits captured in photographs [10, 11,
31]. Each source has its own strengths, but most require pars-
ing, annotation, or the use of proxy labels. We take advantage
of Polyvore’s coordinated outfit data, where each outfit is de-
scribed in both a low-level element language and a high-level
style one.
FASHION DATA
Polyvore is a fashion-based social network with over 20 mil-
lion users [25]. On Polyvore, users create collections of fash-
ion items which they collage together. Such collages are com-
mon in fashion: mood boards are frequently used “to com-
municate the themes, concepts, colors and fabrics that will
be used” in a collection [24]. True mood boards are rarely
“wearable” in a real sense, but on Polyvore collages typically
form a cohesive outfit.
Polyvore outfits are described at two levels: specific descrip-
tions of the items’ design elements (e.g., “black,” “leather,”
“crop top”) and high-level style descriptions, often of the out-
fit as a whole (e.g., “punk”). We leverage these two streams of
data to construct a pair of parallel documents for each outfit,
which become the training inputs for the PLTM.
Polyvore Outfit Data
Polyvore outfit datasets contain an image of the outfit, a ti-
tle, text description, and a list of items the outfit comprises
(Figure 3). Titles and text descriptions are provided by users
and often capture abstract, high-level fashion concepts: the
use of the outfit; its appropriate environment, season, or even
mood. In addition, each outfit item has its own image and
element labels provided by Polyvore (Figure 3, bottom left).
These labels are typically low-level descriptions of the item’s
design elements, such as silhouette, color, pattern, material,
trim, and designer.
We collected text and image data for 590,234 outfits using a
snowball sampling approach [7] from Polyvore’s front page,
sampling sporadically over several months between 2013 and
2015. Our collection includes more than three million unique
fashion items, with an average of 10 items per outfit. We
collected label data for 675,699 of those items, resulting in a
repository of just over 4 million item labels.
Representing Outfits in Two Languages
With the outfit and item data collected, we create two vocab-
ularies to process outfits into parallel style and element docu-
ments. The style vocabulary is created by extracting terms
from the repository’s text data relating to style, event, oc-
casion, environment, weather, etc. Most of these words are
drawn from the text produced by Polyvore users since they
annotate outfits using high-level descriptors; however, we
also include Polyvore item labels that describe styles (e.g.,
“retro” sunglasses). We manually process the 10,000 most
frequent words from the title and description text to identify
words that should be added to the style vocabulary, keeping
hashtags such as “summerstyle” and discarding common En-
glish words that are irrelevant to fashion.
The element vocabulary is drawn from the repository’s set of
Polyvore item labels. We learn frequent bigrams, trigrams
and quadgrams such as “Oscar de la Renta” or “high heels”
via pointwise mutual information [14]. The element vocab-
ulary comprises these terms and any remaining unigram la-
bels not added to the style vocabulary. After processing the
repository’s text, the style vocabulary has 3106 terms and the
element vocabulary 7231.
Using these vocabularies, we process each outfit’s text data
into a pair of parallel documents: one containing words from
the style vocabulary, and a second containing words from the
element vocabulary (Figure 3, right). Both documents de-
scribe the same set of items in two different languages: an
outfit might be “goth” in the style language, but the words
used to describe it in the element language might be “black,”
“velvet,” and “Kambriel.” These parallel documents become
the training input for the PLTM.
FASHION TOPICS
To capture the correspondence between the fashion styles and
elements exposed by the Polyvore dataset, we adapt polylin-
gual topic modeling. Polylingual topic modeling is a gener-
alization of LDA topic modeling that accounts for multiple
vocabularies describing the same set of latent concepts [18].
A PLTM learns from polylingual document “tuples,” where
each tuple is a set of equivalent documents, each written in a
different language. The core assumptions of PLTMs are that
all documents in a tuple have the same distribution over top-
ics and each topic is produced from a set of distributions over
words, one distribution per language.
We train a PLTM to learn latent fashion topics jointly over the
style and element vocabularies. The training input consists of
the repository of Polyvore outfits, where each outfit is repre-
sented by a pair of documents, one per language. The key in-
sight motivating this work is that these documents represent
the same distribution over fashion concepts, expressed with
779
STYLE: christmas, winter, fall, away, night, school
ELEMENT: sweater, coat, black, long, leather, wool
STYLE: prom, party, special, occasion, sexy, summer
ELEMENT: dress, shoe, mini, cocktail, sleeveless, lace
STYLE: beach, summer, band, swimming, bathing, sexy
ELEMENT: hat, swimsuit, top, black, beanie, bikini
STYLE: military, combat, army, cowgirl, cowboy, western
ELEMENT: boot, booty, black, ankle, lace, up
Top words for 4 topics with n=25
21
24
16
15
T
O
P
I
C
T
O
P
I
C
T
O
P
I
C
T
O
P
I
C
Figure 4: Four topics from a 25-topic PLTM represented by high proba-
bility words from both the style and element languages. Topics convey a
diverse set of fashion concepts: seasons (winter/fall), events (prom), envi-
ronments and activities (beach/swimming), and styles (military/western).
different vocabularies. Below we briefly outline the model,
referring the reader to Mimno et al. [18] for additional de-
tails.
Generative Process
An outfit’s set of fashion concepts is generated by drawing a
single topic distribution from an asymmetric Dirichlet prior
θ∼Dir(θ, α),
where αis a model parameter capturing both the base mea-
sure and concentration parameter.
For every word in both the style language Sand element lan-
guage E, a topic assignment is drawn
zS∼P(zS|θ)=QnθzS
nand zE∼P(zE|θ)=QnθzE
n.
To create the outfit’s document in each language, words are
drawn successively using the language’s topic parameters
wS∼P(wS|zS,ΦS)=QnφS
wS
n|zS
n
and
wE∼P(wE|zE,ΦE)=QnφE
wE
n|zE
n
,
where the set of language-specific topics (ΦSor ΦE) is
drawn from a language-specific symmetric Dirichlet distri-
bution with concentration parameter βSand βErespectively.
Inference
We fit PLTMs to the outfit document tuples using Mallet’s
Gibbs sampling implementation for polylingual topic model
learning [16]. To learn hyperparameters αand β, we use
MALLET’s built-in optimization setting. Each PLTM learns
a distribution over style words for each topic (ΦS), a distri-
bution over element words for each topic (ΦE), and a dis-
tribution over fashion topics for each outfit in the training
set. Since choosing the optimal number of topics is a central
problem in topic modeling applications, we train a variety of
PLTMs with varying numbers of topics and conduct a series
of perceptual tests to select the most suitable one. Figure 4
illustrates topics drawn from a model trained with 25 topics,
expressing each topic in terms of high probability words in
both the style and element languages.
Translation
PLTMs were not originally intended to support direct trans-
lation between languages [18]. However, in domains where
word order is unimportant, given a document in one language,
PLTMs can be used to produce an equivalent document in a
different language by identifying high probability words. For
example, given a document WEin the element language, we
can infer the topic distribution θfor that document under the
trained model. Since the topic distribution for a document
will be the same in style language, we can produce an equiv-
alent outfit in the style language WS=θ·ΦSby identifying
high probability words in that language.
UNDERSTANDING FASHION TOPICS
To evaluate the trained PLTMs, we ran a set of crowdsourced
experiments. These perceptual tests validate the suitability
of the trained PLTMs for translation-based applications in a
controlled setting.
Topic Coherence
To measure topic coherence in each language, we adapted
Chang et al.’s intruder detection task [5]. The task requires
users to choose an “intruder” word that has low probability
from a set of the most-likely words for a topic. The extent to
which users are able to identify intruder is representative of
the coherence of the topic.
We performed a grid search with PLTMs trained with be-
tween 10 and 800 topics. For each trained model, we sam-
pled up to 100 topics and found the 5 most probable words
for each. An intruder topic was chosen at random, and an in-
Amazon Mechanical Turk
Amazon Mechanical Turk
Figure 5: To measure topic coherence, Mechanical Turk workers were
asked to detect the “intruder” word from a list of six words, where five of
them were likely in a topic and one was not. “Adidas” is the intruder in
this swimwear topic.
780
Figure 6: Results of the intruder detection experiments: users successfully identified intruders in both the element and style languages compared to a
baseline of random selection (dotted line). Peak performance for the style-based tasks occurs at a lower topic number than for the element-based tasks.
truder word sampled from it. Mechanical Turk workers were
shown the six words and asked to choose the one that did not
belong (Figure 5).
Figure 6 shows the results of this task. Users were able to
identify intruder words in the element and style languages
with peak median accuracies of 66% and 50%, respectively,
significantly above the baseline of random selection at 16%.
The coherence peak for the element language occurred be-
tween 35 and 50 topics; the peak for the style language oc-
curred between 15 and 35.
In both tasks, accuracy was highest for a relatively small num-
ber of topics. However, there is a tradeoffbetween semantic
coherence and fashion nuance. With fewer topics, the model
clusters fashion concepts with similar looking words and high
semantic coherence: “summer,” “summerstyle,” “summer-
outfit,” “summerfashion.” As the number of topics increases,
topics are split into finer-grained concepts, and the semantic
coherence within each topic falls offmore quickly. Figure 7
illustrates this phenomenon, where the last topic shown in
Figure 4 has split into two (“western” and “military”).
STYLE: cowgirl, cowboy, western, vintage, rain, riding, winter
ELEMENT: boot, ankle, short, bootie, booty, brown, suede
STYLE: combat, military, army, seriously, florida, pretending
ELEMENT: boot, lace, up, black, combat, booty, laced, shirt
67
65
Top words for 2 topics with n=100
T
O
P
I
C
T
O
P
I
C
Figure 7: While the intruder detection results suggests using a small
number of topics, there is a tradeoff between semantic coherence and
fashion nuance. Although the semantic coherence within each topic falls
off more quickly, a model trained with 100-topics exhibits finer-grained
buckets — separate cowboy and military topics — than a 25-topic model.
Amazon Mechanical Turk
Amazon Mechanical Turk
Figure 8: To measure translational topic coherence, Mechanical Turk
workers were shown five likely words from a topic in one language and
asked to choose a row of words in the other language that was the best
match.
Translation
We also measured translational topic coherence through per-
ceptual tasks. Users were shown the top five words from a
topic in one language and asked to select the row of words
that best matched it in the other language (Figure 8). One
row of words was drawn from the same topic as the prompt,
while the other three were drawn at random from other top-
ics. Users were shown groups of words (rather than single
words) to provide a better sense of the topic as a whole [5].
We restricted this test to models with between 15 and 100
topics, since the word intrusion results showed highest topic
coherence in that range.
Figure 9 shows the results from this task. Performance was
similar in both translation directions, with a peak median
agreement with the model of 60% with prompts in the style
language, and a peak median agreement of 66% with prompts
in the element language, where the baseline of random selec-
tion is 25%. Accuracy is again highest for a relatively small
number (25–35) of topics.
781
Figure 9: Results of the translation experiments: performance was similar in both directions, with users successfully translating between element and
style terms compared to a baseline of random selection (dotted line). Accuracy is highest for a relatively small number (25–35) of topics.
APPLICATIONS
We describe three translation-based fashion applications
powered by the trained PLTMs, illustrating how human-
interpretable features can lead to a richer understanding of
fashion style. We show how analyzing the topics learned by
the model can answer Barthes’ question. In addition, translat-
ing an outfit from an element document to a style one powers
a style quiz and translating from a style document to an ele-
ment one supports an automated personal stylist system.
Answering Barthes’ Question
To answer Barthes’ question, we can directly analyze the
learned topics (style concepts) to understand which features
(words in the element language) act as signifiers. For some
topics, the probability mass is concentrated in one fashion el-
ement; for others, the distributions are spread across several
features. By computing the entropy of the word distributions
in the element language,
H(ΦE)=−
n
X
i=1
P(wi) ln P(wi),
we can measure which topics are characterized by one (low
entropy) or several (high entropy) fashion elements.
Figure 10 (top) shows three topics that have low entropy: a
single word determines each style. The next three topics have
high entropy, with many equally-important features coming
together to create the style. For a “prom” style, “dress” alone
signifies; for a “winter” style, many signifiers (“leather”,
“long”, “black”, “wool”, “sweater”) come together.
Style Quiz
Fashion magazines often feature “style quizzes” that help
readers identify their style by answering sets of questions like
“you are most comfortable in: (a) long, flowing dresses; (b)
cable-knit sweaters; (c) a bikini” or selecting outfit images
they prefer. While these quizzes are fun, the style advice they
provide has limited scope and utility.
boot
dress
jeans
LOW ENTROPY TOPICS TOP STYLE
WORDS
TOP STYLE
WORDS
HIGH ENTROPY TOPICS
word distribution (element language)
word distribution (element language)
headband
sweater
black
top
hat
swimsuit
beanie
bikini
swimwear
swim-suit
hair-accessory
bag
flower
backpack
floral
wrap
crown
leather
long
coat
wool
neck
sleeve
black
prom
party
special
occasion
faded
rock
summer
vintage
military
combat
army
cowgirl
beach
summer
swimming
bathing
band
boho
bohemian
vintage
christmas
winter
fall
away
24
20
15
16
0
21
Figure 10: To answer Barthes’ question, we analyze each topic — style
concept — to understand which features — words in the element lan-
guage — act as signifiers. Some style concepts are determined by one
or two elements (low entropy); for others, several elements come together
to define the style (high entropy).
782
beach
summer
swimming
bathing
sexy
retro
getaway
fishing
triangle bathing
suit swimsuit swim
one-piece white
slimming leather
wedge platform
ankle-strap
peep-toe sandal
red knot silk
head-wrap
headband
polka-dot
dolce&gabbana
cat-eye round
sunglasses white
INPUT OUTFIT ELEMENT
DOCUMENT
TOP STYLE WORDS
party
sexy
wedding
night
special
occasion
realreal
season
t-shirt purple
shirt cap sexy
balconette
mesh strappy
lingerie short
pleated skirt
man bag pink
loius vuitton
purse white
shoe leather t-
strap platform
pump pointed-
toe high-heel
urban outfitters
summer tops
cotton shirts wrap
skirt high low navy
tie-dye purple
summer billabong
beach bag hippie
retro bagpack
print day pack
boho jewelry
bohemian rope
bracelet leather
cord
boho
bohemian
summer
vintage
holiday
party
wet
sexy
HIGH CONFIDENCELOW CONFIDENCE
Figure 11: A style quiz that infers a user’s style from an outfit. We extract
labels for all the items in the outfit, infer a topic distribution for the ele-
ment document, and return high probability style words to the user. We
measure the confidence of the style predictions as the inverse of the topic
distribution’s entropy.
Applications built on our model can help users understand
their personal style preferences using an open-ended inter-
action that provides a rich set of styles — and a confidence
measure from the model of those style labels — as a result.
Users capture their style by creating an outfit they like (Fig-
ure 11, left); the set of words for the items in the outfit forms a
document in the element language. We can then infer a topic
distribution for this document and find the highest-probability
words in the style language. We measure confidence for these
style labels by computing the inverse of the topic distribu-
tion’s entropy.
When an outfit draws from several topics at once, there is no
single dominating style. High entropy outfits sometimes ap-
pear to be a confusing mix of items; other times users seem
to intentionally mix two completely disparate styles (e.g., ro-
mantic dresses with distressed jean jackets). Indeed, the user
who created the lowest confidence outfit in the repository la-
beled it “half trash half angel,” evidently having exactly such
a juxtaposition in mind!
Automated Personal Stylist
While personal stylist systems can provide useful advice on
constructing new outfits or updating a user’s wardrobe, ex-
isting recommendation and feedback systems typically have
limited sets of styles [11, 21, 32] or must connect users to hu-
man workers [26, 4, 19]. The learned PLTM allows users
to describe their fashion needs in natural language — just
as they would to a personal stylist — and see suggestions a
stylist might recommend.
We introduce a system that asks users to describe an event,
environment, occasion, or location for which they need an
outfit in free text. From this text description, the system ex-
tracts any words contained in the style vocabulary to produce
a new style document. Then, it infers a topic distribution
for this new document and produces a set of high-probability
words in the element language that fit that document. The top
25 such words are then taken as candidate labels, and com-
pared to each of the 675,669 labeled items in the database.
The system measures the goodness of fit of each item using
intersection-over-union (IOU) of the two sets of labels
IOU (li,lj)=|li∩lj|
|li∪lj|.
The system ranks the items by IOU, groups the results by
most frequent label, and presents the resultant groups to the
user (Figure 12).
CONCLUSIONS AND FUTURE WORK
This paper presents a model that learns correspondences be-
tween fashion design elements and styles by training polylin-
gual topic models on outfit data collected from Polyvore. Sys-
tems built on this model can bridge the semantic gap in fash-
ion: exposing the signifiers for different styles, characterizing
users’ styles from outfits they create, and suggesting clothing
items and accessories for different needs.
One promising opportunity to extend the presented model is
to leverage streams of data beyond textual descriptors, in-
cluding vision-based, social, and temporal features. Train-
ing a joint model that uses computer vision to incorporate
both visual and textual information could well lead to a more
nuanced understanding of fashion style. Similarly, mining
Polyvore’s social network structure (e.g., likes, views, com-
ments) could enhance the model with information about the
popularity of fashion styles and elements [30, 13], or how
fashion trends form and evolve through time [10, 8, 29].
While the translation-based experiments described in the pa-
per validate the suitability of PLTMs for fashion applications
in a controlled setting, we are eager to perform more mean-
ingful user testing “in the wild.” Deploying the tools de-
scribed in the paper at scale and monitoring how they are
used would allow us to build more personalized and context-
aware models of fashion preference. The semantics of fash-
ion change by location, culture, and individual: the “decora”
style might not make sense outside of Japan; “western” outfits
might only be worn in the United States; individuals may not
agree on what constitutes “workwear.” Better understanding
how different users interact with our tools is a necessary first
step towards making them truly useful, and enabling them to
dynamically adapt to different people and contexts.
The framework presented in this paper is not limited to fash-
ion. Design artifacts in many domains contain latent concepts
783
I’m looking for officewear. I
want it to convey that I’m
serious, professional,
powerful. I like workwear
that’s modern, with clean
lines, and even a bit edgy.
And I’d like something a
bit masculine. If I could
wear menswear to the
office, I probably would !
“
”
workwear
menswear
professional
powerful
serious
office
officewear
edgy
modern
clean
masculine
USER INPUT STYLE DOCUMENT TOP ITEMS
USER INPUT STYLE DOCUMENT TOP ITEMS
“
”
I’m in town for New York Fashion
Week and I’d like to find something
flashy, maybe a little funky, to wear
to the shows. You know everyone’s
out, watching the different groups,
the runway-to-street crowd, the
blogger-style crowd… Me, I’m more
of a streetstyle, streetchic person.
Just edgy enough, you know?
nyfw
funk
funky
streetfashion
runway2street
runway
edgy
flashy
streetstyle
streetchic
bloggerstyle
USER INPUT STYLE DOCUMENT TOP ITEMS
“
”
I need some clothes for a yoga retreat
I’m doing next month. We’ll be up in
the mountains in Colorado, enjoying
the calming natural beauty. It is so
beautiful up there in nature… and we’ll
be running, doing yoga all day,
sweating and finding zen...
yoga
activewear
fitness
zen
calming
calm
nature
naturalbeauty
running
athletic
jogging
colorado
retreat
sweat
USER INPUT STYLE DOCUMENT TOP ITEMS
“
”
I’d like to get some suggestions for a
dressy, sparkly, special-occasion
outfit- there’s a holiday party coming
up that I’m going to. It’s a cold winter
and I’m sure there will be rain or snow,
but I’d still like to dress up in
something stylish and chic.
dressy
sparkly
special
occasion
holiday
party
cold
winter
rain
snow
stylish
chic
Figure 12: A personal stylist interface that recommends fashion items based on natural language input. We extract the style tokens from a user’s
description of an outfit, infer a topic distribution over the style document, and compute a list of high probability words in the element language. Users are
shown items ranked by intersection-over-union over the top element words.
784
that can be expressed with sets of human-interpretable fea-
tures capturing different levels of granularity [1, 17]. This
model also offers attractive capabilities: it can infer latent
concepts of a design, translate between different feature rep-
resentations, and even generate new artifacts. In the future,
we hope that this framework can power new applications in
domains like graphic design, 3D modeling, and architecture.
ACKNOWLEDGMENTS
We thank P. Daphne Tsatsoulis for her early contributions to
this work, and David Mimno for his helpful discussions of the
PLTM.
REFERENCES
1. Adar, E., Dontcheva, M., and Laput, G.
CommandSpace: modeling the relationships between
tasks, descriptions and features. In Proc. UIST (2014).
2. Barthes, R. The language of fashion. A&C Black, 2013.
3. Berg, T., Berg, A., and Shih, J. Automatic attribute
discovery and characterization from noisy web data. In
Proc. ECCV (2010).
4. Burton, M. A., Brady, E., Brewer, R., Neylan, C.,
Bigham, J. P., and Hurst, A. Crowdsourcing subjective
fashion advice using VizWiz: challenges and
opportunities. In Proc. ACCESS (2012).
5. Chang, J., Boyd-Graber, J., Wang, C., Gerrish, S., and
Blei, D. M. Reading tea leaves: how humans interpret
topic models. In Proc. NIPS (2009).
6. Di, W., Wah, C., Bhardwaj, A., Piramuthu, R., and
Sundaresan, N. Style Finder: fine-grained clothing style
detection and retrieval. In Proc. CVPR (2013).
7. Goodman, L. A. Snowball sampling. The Annals of
Mathematical Statistics (1961).
8. He, R., and McAuley, J. Ups and downs: modeling the
visual evolution of fashion trends with one-class
collaborative filtering. In Proc. WWW (2015).
9. Herlocker, J. L., Konstan, J. A., and Riedl, J. Explaining
collaborative filtering recommendations. In Proc. CSCW
(2000).
10. Hidayati, S. C., Hua, K.-L., Cheng, W.-H., and Sun,
S.-W. What are the fashion trends in New York? In
Proc. MM (2014).
11. Kiapour, M., Yamaguchi, K., Berg, A., and Berg, T.
Hipster wars: discovering elements of fashion styles. In
Proc. ECCV (2014).
12. Kwak, I. S., Murillo, A. C., Belhumeur, P., Belongie, S.,
and Kriegman, D. From bikers to surfers: visual
recognition of urban tribes. In Proc. BMVC (2013).
13. Lin, Y., Xu, H., Zhou, Y., and Lee, W.-C. Styles in the
fashion social network: an analysis on Lookbook.nu. In
Social Computing, Behavioral-Cultural Modeling, and
Prediction. Springer International Publishing, 2015.
14. Manning, C. D., and Sch¨
utze, H. Foundations of
statistical natural language processing. MIT Press,
1999.
15. McAuley, J., Targett, C., Shi, Q., and van den Hengel, A.
Image-based recommendations on styles and substitutes.
In Proc. SIGIR (2015).
16. McCallum, A. K. MALLET: a machine learning for
language toolkit. http://mallet.cs.umass.edu, 2002.
17. Michailidou, E., Harper, S., and Bechhofer, S. Visual
complexity and aesthetic perception of web pages. In
Proc. SIGDOC (2008).
18. Mimno, D., Wallach, H. M., Naradowsky, J., Smith,
D. A., and McCallum, A. Polylingual topic models. In
Proc. EMNLP (2009).
19. Morris, M. R., Inkpen, K., and Venolia, G. Remote
shopping advice: enhancing in-store shopping with
social technologies. In Proc. CSCW (2014).
20. Murillo, A. C., Kwak, I. S., Bourdev, L., Kriegman, D.,
and Belongie, S. Urban tribes: analyzing group photos
from a social perspective. In Proc. CVPRW (2012).
21. Shen, E., Lieberman, H., and Lam, F. What am I gonna
wear?: scenario-oriented recommendation. In Proc. IUI
(2007).
22. Simo-Serra, E., and Ishikawa, H. Fashion style in 128
floats: joint ranking and classification using weak data
for feature extraction. In Proc. CVPR (2016).
23. Song, Z., Wang, M., Hua, X.-S., and Yan, S. Predicting
occupation via human clothing and contexts. In Proc.
ICCV (2011).
24. Sorger, R., and Udale, J. The fundamentals of fashion
design. AVA Publishing, 2006.
25. Tam, D. Social commerce site Polyvore reaches 20M
users. http://www.cnet.com/news/
social-commerce- site-polyvore- reaches-20m- users/,
2012.
26. Tsujita, H., Tsukada, K., Kambara, K., and Siio, I.
Complete fashion coordinator: a support system for
capturing and selecting daily clothes with social
networks. In Proc. AVI (2010).
27. Vartak, M., and Madden, S. CHIC: a combination-based
recommendation system. In Proc. SIGMOD (2013).
28. Veit, A., Kovacs, B., Bell, S., McAuely, J., Bala, K., and
Belongie, S. Learning visual clothing style with
heterogeneous dyadic co-occurrences. In Proc. ICCV
(2015).
29. Vittayakorn, S., Yamaguchi, K., Berg, A., and Berg, T.
Runway to realway: visual analysis of fashion. In Proc.
WACV (2015).
30. Yamaguchi, K., Berg, T. L., and Ortiz, L. E. Chic or
social: visual popularity analysis in online fashion
networks. In Proc. MM (2014).
31. Yamaguchi, K., Kiapour, M. H., and Berg, T. Paper doll
parsing: retrieving similar styles to parse clothing items.
In Proc. ICCV (2013).
32. Yu, L.-F., Yeung, S.-K., Terzopoulos, D., and Chan, T. F.
DressUp! outfit synthesis through automatic
optimization. In Proc. SIGGRAPH Asia (2012).
785