Conference PaperPDF Available

The Elements of Fashion Style



The outfits people wear contain latent fashion concepts capturing styles, seasons, events, and environments. Fashion theorists have proposed that these concepts are shaped by design elements such as color, material, and silhouette. A dress may be "bohemian" because of its pattern, material, trim, or some combination of them: it is not always clear how low-level elements translate to high-level styles. In this paper, we use polylingual topic modeling to learn latent fashion concepts jointly in two languages capturing these elements and styles. Using this latent topic formation we can translate between these two languages through topic space, exposing the elements of fashion style. We train the polylingual topic model (PLTM) on a set of more than half a million outfits collected from Polyvore, a popular fashion-based social net- work. We present novel, data-driven fashion applications that allow users to express their needs in natural language just as they would to a real stylist and produce tailored item recommendations for these style needs.
The Elements of Fashion Style
Kristen Vaccaro, Sunaya Shivakumar, Ziqiao Ding, Karrie Karahalios, and Ranjitha Kumar
Department of Computer Science
University of Illinois at Urbana-Champaign
{kvaccaro,sshivak2, zding5,kkarahal, ranjitha}
I need an outfit for a beach wedding
that I'm going to early this summer.
I'm so excited -- it's going to be warm
and exotic and tropical... I want my
outfit to look eortless, breezy,
flowy, like I’m floating over the sand!
Oh, and obviously no white! For a
tropical spot, I think my outfit should
be bright and colorful.
Figure 1: This paper presents a data-driven fashion model that learns correspondences between high-level styles (like “beach,” “flowy, and “wedding”)
and low-level design elements such as color, material, and silhouette. The model powers a number of fashion applications, such as an automated personal
stylist that recommends fashion outfits (right) based on natural language specifications (left).
The outfits people wear contain latent fashion concepts cap-
turing styles, seasons, events, and environments. Fashion the-
orists have proposed that these concepts are shaped by de-
sign elements such as color, material, and silhouette. While
a dress may be “bohemian” because of its pattern, material,
trim, or some combination thereof, it is not always clear how
low-level elements translate to high-level styles. In this pa-
per, we use polylingual topic modeling to learn latent fashion
concepts jointly in two languages capturing these elements
and styles. This latent topic formation enables translation
between languages through topic space, exposing the ele-
ments of fashion style. The model is trained on a set of more
than half a million outfits collected from Polyvore, a popular
fashion-based social network. We present novel, data-driven
fashion applications that allow users to express their desires in
natural language just as they would to a real stylist, and pro-
duce tailored item recommendations for their fashion needs.
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage and that copies bear this notice and the full citation
on the first page. Copyrights for components of this work owned by others than the
author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or
republish, to post on servers or to redistribute to lists, requires prior specific permission
and/or a fee. Request permissions from
UIST 2016, October 16 - 19, 2016, Tokyo, Japan
© 2016 Copyright held by the owner/author(s). Publication rights licensed to ACM.
ISBN 978-1-4503-4189-9/16/10. . . $15.00
Author Keywords
Fashion, elements, styles, polylingual topic modeling
ACM Classification Keywords
H.5.2 User Interfaces; H.2.8 Database Applications
Outfits contain latent fashion concepts, capturing styles, sea-
sons, events, and environments. Fashion theorists have pro-
posed that important design elements color, material, sil-
houette, and trim shape these concepts [24]. A long-
standing question in fashion theory is how low-level fashion
elements map to high-level styles. One of the first theorists to
study fashion as a language, Roland Barthes, highlighted the
diculty of translating between elements and styles [2]:
If I read that a square-necked, white silk sweater is very
smart, it is impossible for me to say without again hav-
ing to revert to intuition which of these four features
(sweater, silk, white, square neck) act as signifiers for
the concept smart: is it only one feature which carries
the meaning, or conversely do non-signifying elements
come together and suddenly create meaning as soon as
they are combined?
To address this fundamental question, we present a model that
learns the relation between fashion design elements and fash-
ion styles. This model adapts a natural language processing
technique polylingual topic modeling to learn latent
fashion concepts jointly in two languages: a style language
used to describe outfits, and an element language used to la-
bel clothing items. This model answers Barthes’ question:
identifying the elements that determine styles. It also powers
automated personal stylist systems that can identify people’s
styles from an outfit they assemble, or recommend items for
a desired style (Figure 1).
We train a polylingual topic model (PLTM) on a dataset of
over half a million outfits collected from a popular fashion-
based social network, Polyvore1
. Polyvore outfits have both
free-text outfit descriptions written by their creator and item
labels (e.g., color, material, pattern, designer) extracted by
Polyvore. These two streams of data form a pair of parallel
documents (style and element, respectively) for each outfit,
which comprise the training inputs for the model.
Each topic in the trained PLTM corresponds to a pair of dis-
tributions over the two vocabularies, capturing the correspon-
dence between style and element words (Figure 2). For exam-
ple, the model learns that fashion elements such as “black,”
“leather, and “jacket” are often signifiers for styles such as
“biker, and “motorcycle.
We validate the model using a set of crowdsourced, percep-
tual tasks: for example, asking users to select the set of words
in the element language that is the best match for a set in the
style language. These tasks demonstrate that the learned top-
ics mirror human perception: the topics are semantically co-
herent and translation between elements and styles is mean-
ingful to users.
This paper motivates the choice of model, describes both the
Polyvore outfit dataset as well as the training and evaluation
of the PLTM, and illustrates several resultant fashion applica-
tions. Using the PLTM, we can explain why a clothing item
fits a certain style: we know whether it is the collar, color,
material, or item itself that makes a sweater “smart. We can
build data-driven style quizzes, predicting style preferences
from a user’s outfit. We even describe an automated personal
stylist which can provide outfit recommendations for a de-
sired style expressed in natural language. Polylingual topic
modeling can help us better our understanding of fashion the-
ory, and support a rich new set of interactive fashion tools.
As more fashion data has become available online, re-
searchers have built data-driven fashion systems that process
large-scale data to characterize styles [6, 11, 12, 20, 23],
recommend outfits [21, 22, 27, 32], and capture changing
trends [8, 10, 29]. Several projects have used deep learn-
ing to automatically infer structure from low-level (typically)
vision-based features [15, 28]. While these models can pre-
dict whether items match or whether an outfit is “hipster,
they cannot explain why. For many applications, models
predicated on human-interpretable features are more useful
than models that merely predict outcomes [9]. For example,
when a user looks for a “party” outfit, an explanation like
“this item was recommended because it is a black miniskirt”
helps her understand the suggestion and provide feedback to
the system.
topic distribution
summer, vintage, beach,
american, relaxed, retro, unisex
short, denim, highwaisted, shirt,
top, cuto, form, distressed
biker, motorcycle, vintage,
summer, college, varsity, military
jacket, black, leather, shirt, zip,
denim, sleeve, faux
prom, occasion, special, party,
holiday, bridesmaid
dress, shoe, cocktail, evening,
mini, heel, costume
party, summer, night, sexy,
vintage, fitting, botanical
dress, mini, sleeveless, cocktail,
skater, flare, out, lace, floral
topic distribution
Figure 2: Via polylingual topic modeling, we infer distributions over latent
fashion topics in outfits that capture the elements of fashion style. Fashion
elements like “jacket, black, leather” signify the “biker, motorcycle” style.
Conversely, fashion styles like “prom, special occasion” label groups of
elements such as “cocktail, mini, dress.
This paper presents a fashion model that maps low-level el-
ements to high-level styles, adapting polylingual topic mod-
eling to learn correspondences between them [18]. Both sets
of features (elements and styles) are human interpretable, and
the translational capability of PLTMs can power applications
that indicate how design features are tied to user outcomes,
identifying peoples’ styles from the elements in their outfits
and recommending clothing items from high-level style de-
In addition to their translational capabilities, PLTMs oer a
number of other advantages. Unlike systems built on discrim-
inative models [21, 27, 32], PLTMs support a dynamic set of
styles that grows with the dataset and need not be specified
a priori. Moreover, topic modeling represents documents as
distributions over concepts, allowing styles to coexist within
outfits rather than labeling them with individual styles [6, 11,
12, 20]. Finally, the model smooths distributions so that sys-
tems can support low frequency queries. Even though there
are no “wedding” outfits explicitly labeled “punk rock” in
our dataset, we can still suggest appropriate attire for such an
event by identifying high probability fashion elements associ-
ated with “wedding” (e.g., “white,” “lace”) and “punk rock”
(e.g., “leather”, “studded”), and searching for clothing items
which contain them.
To build a PLTM for fashion, we require data that contains
both style and element information. Researchers have stud-
ied many sources of fashion data, from independent fash-
Happy Valentine’s
Happy Valentine’s
Day! Have a nice time
with your boyfriends,
and don’t forget
about people who
are alone (like me).
The next few days
will be in tones of
romance, couples,
blush colors. Have a
nice weekend! Send
warm hugs and love.
Red cardigan,
Long sleeve tops,
Mango tops
Short sleeve shirts,
White t shirt,
Lightweight shirt,
Mango shirt
Stack heel shoes,
Oxford shoes
Retro sunglasses,
Heart sunglasses,
Hippie glasses
happy, love, hugs
blush, valentines
boyfriends, warm
couples, romance
alone, weekend
retro, hippie
red, short, sleeve
shirts, white, tshirt
mango, oxford
lightweight, stack
tops, heel, shoes
sunglasses, heart
cardigan, long
Figure 3: Polyvore outfits (left) are described at two levels: high-level
style descriptions (e.g., “#valentinesday”) and specific descriptions of the
items’ design elements (e.g., “red cardigan, “lightweight shirt”). For each
outfit, we process these two streams of data into a pair of parallel docu-
ments (right).
ion items [3] to objects with rough co-occurrence informa-
tion [15] to entire outfits captured in photographs [10, 11,
31]. Each source has its own strengths, but most require pars-
ing, annotation, or the use of proxy labels. We take advantage
of Polyvore’s coordinated outfit data, where each outfit is de-
scribed in both a low-level element language and a high-level
style one.
Polyvore is a fashion-based social network with over 20 mil-
lion users [25]. On Polyvore, users create collections of fash-
ion items which they collage together. Such collages are com-
mon in fashion: mood boards are frequently used “to com-
municate the themes, concepts, colors and fabrics that will
be used” in a collection [24]. True mood boards are rarely
“wearable” in a real sense, but on Polyvore collages typically
form a cohesive outfit.
Polyvore outfits are described at two levels: specific descrip-
tions of the items’ design elements (e.g., “black,” “leather,
“crop top”) and high-level style descriptions, often of the out-
fit as a whole (e.g., “punk”). We leverage these two streams of
data to construct a pair of parallel documents for each outfit,
which become the training inputs for the PLTM.
Polyvore Outfit Data
Polyvore outfit datasets contain an image of the outfit, a ti-
tle, text description, and a list of items the outfit comprises
(Figure 3). Titles and text descriptions are provided by users
and often capture abstract, high-level fashion concepts: the
use of the outfit; its appropriate environment, season, or even
mood. In addition, each outfit item has its own image and
element labels provided by Polyvore (Figure 3, bottom left).
These labels are typically low-level descriptions of the item’s
design elements, such as silhouette, color, pattern, material,
trim, and designer.
We collected text and image data for 590,234 outfits using a
snowball sampling approach [7] from Polyvore’s front page,
sampling sporadically over several months between 2013 and
2015. Our collection includes more than three million unique
fashion items, with an average of 10 items per outfit. We
collected label data for 675,699 of those items, resulting in a
repository of just over 4 million item labels.
Representing Outfits in Two Languages
With the outfit and item data collected, we create two vocab-
ularies to process outfits into parallel style and element docu-
ments. The style vocabulary is created by extracting terms
from the repository’s text data relating to style, event, oc-
casion, environment, weather, etc. Most of these words are
drawn from the text produced by Polyvore users since they
annotate outfits using high-level descriptors; however, we
also include Polyvore item labels that describe styles (e.g.,
“retro” sunglasses). We manually process the 10,000 most
frequent words from the title and description text to identify
words that should be added to the style vocabulary, keeping
hashtags such as “summerstyle” and discarding common En-
glish words that are irrelevant to fashion.
The element vocabulary is drawn from the repository’s set of
Polyvore item labels. We learn frequent bigrams, trigrams
and quadgrams such as “Oscar de la Renta” or “high heels”
via pointwise mutual information [14]. The element vocab-
ulary comprises these terms and any remaining unigram la-
bels not added to the style vocabulary. After processing the
repository’s text, the style vocabulary has 3106 terms and the
element vocabulary 7231.
Using these vocabularies, we process each outfit’s text data
into a pair of parallel documents: one containing words from
the style vocabulary, and a second containing words from the
element vocabulary (Figure 3, right). Both documents de-
scribe the same set of items in two dierent languages: an
outfit might be “goth” in the style language, but the words
used to describe it in the element language might be “black,”
“velvet, and “Kambriel.” These parallel documents become
the training input for the PLTM.
To capture the correspondence between the fashion styles and
elements exposed by the Polyvore dataset, we adapt polylin-
gual topic modeling. Polylingual topic modeling is a gener-
alization of LDA topic modeling that accounts for multiple
vocabularies describing the same set of latent concepts [18].
A PLTM learns from polylingual document “tuples,” where
each tuple is a set of equivalent documents, each written in a
dierent language. The core assumptions of PLTMs are that
all documents in a tuple have the same distribution over top-
ics and each topic is produced from a set of distributions over
words, one distribution per language.
We train a PLTM to learn latent fashion topics jointly over the
style and element vocabularies. The training input consists of
the repository of Polyvore outfits, where each outfit is repre-
sented by a pair of documents, one per language. The key in-
sight motivating this work is that these documents represent
the same distribution over fashion concepts, expressed with
STYLE: christmas, winter, fall, away, night, school
ELEMENT: sweater, coat, black, long, leather, wool
STYLE: prom, party, special, occasion, sexy, summer
ELEMENT: dress, shoe, mini, cocktail, sleeveless, lace
STYLE: beach, summer, band, swimming, bathing, sexy
ELEMENT: hat, swimsuit, top, black, beanie, bikini
STYLE: military, combat, army, cowgirl, cowboy, western
ELEMENT: boot, booty, black, ankle, lace, up
Top words for 4 topics with n=25
Figure 4: Four topics from a 25-topic PLTM represented by high proba-
bility words from both the style and element languages. Topics convey a
diverse set of fashion concepts: seasons (winter/fall), events (prom), envi-
ronments and activities (beach/swimming), and styles (military/western).
dierent vocabularies. Below we briefly outline the model,
referring the reader to Mimno et al. [18] for additional de-
Generative Process
An outfit’s set of fashion concepts is generated by drawing a
single topic distribution from an asymmetric Dirichlet prior
θDir(θ, α),
where αis a model parameter capturing both the base mea-
sure and concentration parameter.
For every word in both the style language Sand element lan-
guage E, a topic assignment is drawn
nand zEP(zE|θ)=QnθzE
To create the outfit’s document in each language, words are
drawn successively using the language’s topic parameters
where the set of language-specific topics (ΦSor ΦE) is
drawn from a language-specific symmetric Dirichlet distri-
bution with concentration parameter βSand βErespectively.
We fit PLTMs to the outfit document tuples using Mallet’s
Gibbs sampling implementation for polylingual topic model
learning [16]. To learn hyperparameters αand β, we use
MALLET’s built-in optimization setting. Each PLTM learns
a distribution over style words for each topic (ΦS), a distri-
bution over element words for each topic (ΦE), and a dis-
tribution over fashion topics for each outfit in the training
set. Since choosing the optimal number of topics is a central
problem in topic modeling applications, we train a variety of
PLTMs with varying numbers of topics and conduct a series
of perceptual tests to select the most suitable one. Figure 4
illustrates topics drawn from a model trained with 25 topics,
expressing each topic in terms of high probability words in
both the style and element languages.
PLTMs were not originally intended to support direct trans-
lation between languages [18]. However, in domains where
word order is unimportant, given a document in one language,
PLTMs can be used to produce an equivalent document in a
dierent language by identifying high probability words. For
example, given a document WEin the element language, we
can infer the topic distribution θfor that document under the
trained model. Since the topic distribution for a document
will be the same in style language, we can produce an equiv-
alent outfit in the style language WS=θ·ΦSby identifying
high probability words in that language.
To evaluate the trained PLTMs, we ran a set of crowdsourced
experiments. These perceptual tests validate the suitability
of the trained PLTMs for translation-based applications in a
controlled setting.
Topic Coherence
To measure topic coherence in each language, we adapted
Chang et al.’s intruder detection task [5]. The task requires
users to choose an “intruder” word that has low probability
from a set of the most-likely words for a topic. The extent to
which users are able to identify intruder is representative of
the coherence of the topic.
We performed a grid search with PLTMs trained with be-
tween 10 and 800 topics. For each trained model, we sam-
pled up to 100 topics and found the 5 most probable words
for each. An intruder topic was chosen at random, and an in-
Amazon Mechanical Turk
Amazon Mechanical Turk
Figure 5: To measure topic coherence, Mechanical Turk workers were
asked to detect the “intruder” word from a list of six words, where five of
them were likely in a topic and one was not. “Adidas” is the intruder in
this swimwear topic.
Figure 6: Results of the intruder detection experiments: users successfully identified intruders in both the element and style languages compared to a
baseline of random selection (dotted line). Peak performance for the style-based tasks occurs at a lower topic number than for the element-based tasks.
truder word sampled from it. Mechanical Turk workers were
shown the six words and asked to choose the one that did not
belong (Figure 5).
Figure 6 shows the results of this task. Users were able to
identify intruder words in the element and style languages
with peak median accuracies of 66% and 50%, respectively,
significantly above the baseline of random selection at 16%.
The coherence peak for the element language occurred be-
tween 35 and 50 topics; the peak for the style language oc-
curred between 15 and 35.
In both tasks, accuracy was highest for a relatively small num-
ber of topics. However, there is a tradeobetween semantic
coherence and fashion nuance. With fewer topics, the model
clusters fashion concepts with similar looking words and high
semantic coherence: “summer,” “summerstyle, “summer-
outfit,” “summerfashion.” As the number of topics increases,
topics are split into finer-grained concepts, and the semantic
coherence within each topic falls omore quickly. Figure 7
illustrates this phenomenon, where the last topic shown in
Figure 4 has split into two (“western” and “military”).
STYLE: cowgirl, cowboy, western, vintage, rain, riding, winter
ELEMENT: boot, ankle, short, bootie, booty, brown, suede
STYLE: combat, military, army, seriously, florida, pretending
ELEMENT: boot, lace, up, black, combat, booty, laced, shirt
Top words for 2 topics with n=100
Figure 7: While the intruder detection results suggests using a small
number of topics, there is a tradeoff between semantic coherence and
fashion nuance. Although the semantic coherence within each topic falls
off more quickly, a model trained with 100-topics exhibits finer-grained
buckets separate cowboy and military topics than a 25-topic model.
Amazon Mechanical Turk
Amazon Mechanical Turk
Figure 8: To measure translational topic coherence, Mechanical Turk
workers were shown five likely words from a topic in one language and
asked to choose a row of words in the other language that was the best
We also measured translational topic coherence through per-
ceptual tasks. Users were shown the top five words from a
topic in one language and asked to select the row of words
that best matched it in the other language (Figure 8). One
row of words was drawn from the same topic as the prompt,
while the other three were drawn at random from other top-
ics. Users were shown groups of words (rather than single
words) to provide a better sense of the topic as a whole [5].
We restricted this test to models with between 15 and 100
topics, since the word intrusion results showed highest topic
coherence in that range.
Figure 9 shows the results from this task. Performance was
similar in both translation directions, with a peak median
agreement with the model of 60% with prompts in the style
language, and a peak median agreement of 66% with prompts
in the element language, where the baseline of random selec-
tion is 25%. Accuracy is again highest for a relatively small
number (25–35) of topics.
Figure 9: Results of the translation experiments: performance was similar in both directions, with users successfully translating between element and
style terms compared to a baseline of random selection (dotted line). Accuracy is highest for a relatively small number (25–35) of topics.
We describe three translation-based fashion applications
powered by the trained PLTMs, illustrating how human-
interpretable features can lead to a richer understanding of
fashion style. We show how analyzing the topics learned by
the model can answer Barthes’ question. In addition, translat-
ing an outfit from an element document to a style one powers
a style quiz and translating from a style document to an ele-
ment one supports an automated personal stylist system.
Answering Barthes’ Question
To answer Barthes’ question, we can directly analyze the
learned topics (style concepts) to understand which features
(words in the element language) act as signifiers. For some
topics, the probability mass is concentrated in one fashion el-
ement; for others, the distributions are spread across several
features. By computing the entropy of the word distributions
in the element language,
P(wi) ln P(wi),
we can measure which topics are characterized by one (low
entropy) or several (high entropy) fashion elements.
Figure 10 (top) shows three topics that have low entropy: a
single word determines each style. The next three topics have
high entropy, with many equally-important features coming
together to create the style. For a “prom” style, “dress” alone
signifies; for a “winter” style, many signifiers (“leather”,
“long”, “black”, “wool”, “sweater”) come together.
Style Quiz
Fashion magazines often feature “style quizzes” that help
readers identify their style by answering sets of questions like
“you are most comfortable in: (a) long, flowing dresses; (b)
cable-knit sweaters; (c) a bikini” or selecting outfit images
they prefer. While these quizzes are fun, the style advice they
provide has limited scope and utility.
Figure 10: To answer Barthes’ question, we analyze each topic style
concept to understand which features words in the element lan-
guage act as signifiers. Some style concepts are determined by one
or two elements (low entropy); for others, several elements come together
to define the style (high entropy).
triangle bathing
suit swimsuit swim
one-piece white
slimming leather
wedge platform
peep-toe sandal
red knot silk
cat-eye round
sunglasses white
t-shirt purple
shirt cap sexy
mesh strappy
lingerie short
pleated skirt
man bag pink
loius vuitton
purse white
shoe leather t-
strap platform
pump pointed-
toe high-heel
urban outfitters
summer tops
cotton shirts wrap
skirt high low navy
tie-dye purple
summer billabong
beach bag hippie
retro bagpack
print day pack
boho jewelry
bohemian rope
bracelet leather
Figure 11: A style quiz that infers a user’s style from an outfit. We extract
labels for all the items in the outfit, infer a topic distribution for the ele-
ment document, and return high probability style words to the user. We
measure the confidence of the style predictions as the inverse of the topic
distribution’s entropy.
Applications built on our model can help users understand
their personal style preferences using an open-ended inter-
action that provides a rich set of styles and a confidence
measure from the model of those style labels as a result.
Users capture their style by creating an outfit they like (Fig-
ure 11, left); the set of words for the items in the outfit forms a
document in the element language. We can then infer a topic
distribution for this document and find the highest-probability
words in the style language. We measure confidence for these
style labels by computing the inverse of the topic distribu-
tion’s entropy.
When an outfit draws from several topics at once, there is no
single dominating style. High entropy outfits sometimes ap-
pear to be a confusing mix of items; other times users seem
to intentionally mix two completely disparate styles (e.g., ro-
mantic dresses with distressed jean jackets). Indeed, the user
who created the lowest confidence outfit in the repository la-
beled it “half trash half angel,” evidently having exactly such
a juxtaposition in mind!
Automated Personal Stylist
While personal stylist systems can provide useful advice on
constructing new outfits or updating a user’s wardrobe, ex-
isting recommendation and feedback systems typically have
limited sets of styles [11, 21, 32] or must connect users to hu-
man workers [26, 4, 19]. The learned PLTM allows users
to describe their fashion needs in natural language just
as they would to a personal stylist and see suggestions a
stylist might recommend.
We introduce a system that asks users to describe an event,
environment, occasion, or location for which they need an
outfit in free text. From this text description, the system ex-
tracts any words contained in the style vocabulary to produce
a new style document. Then, it infers a topic distribution
for this new document and produces a set of high-probability
words in the element language that fit that document. The top
25 such words are then taken as candidate labels, and com-
pared to each of the 675,669 labeled items in the database.
The system measures the goodness of fit of each item using
intersection-over-union (IOU) of the two sets of labels
IOU (li,lj)=|lilj|
The system ranks the items by IOU, groups the results by
most frequent label, and presents the resultant groups to the
user (Figure 12).
This paper presents a model that learns correspondences be-
tween fashion design elements and styles by training polylin-
gual topic models on outfit data collected from Polyvore. Sys-
tems built on this model can bridge the semantic gap in fash-
ion: exposing the signifiers for dierent styles, characterizing
users’ styles from outfits they create, and suggesting clothing
items and accessories for dierent needs.
One promising opportunity to extend the presented model is
to leverage streams of data beyond textual descriptors, in-
cluding vision-based, social, and temporal features. Train-
ing a joint model that uses computer vision to incorporate
both visual and textual information could well lead to a more
nuanced understanding of fashion style. Similarly, mining
Polyvore’s social network structure (e.g., likes, views, com-
ments) could enhance the model with information about the
popularity of fashion styles and elements [30, 13], or how
fashion trends form and evolve through time [10, 8, 29].
While the translation-based experiments described in the pa-
per validate the suitability of PLTMs for fashion applications
in a controlled setting, we are eager to perform more mean-
ingful user testing “in the wild.” Deploying the tools de-
scribed in the paper at scale and monitoring how they are
used would allow us to build more personalized and context-
aware models of fashion preference. The semantics of fash-
ion change by location, culture, and individual: the “decora”
style might not make sense outside of Japan; “western” outfits
might only be worn in the United States; individuals may not
agree on what constitutes “workwear. Better understanding
how dierent users interact with our tools is a necessary first
step towards making them truly useful, and enabling them to
dynamically adapt to dierent people and contexts.
The framework presented in this paper is not limited to fash-
ion. Design artifacts in many domains contain latent concepts
I’m looking for ocewear. I
want it to convey that I’m
serious, professional,
powerful. I like workwear
that’s modern, with clean
lines, and even a bit edgy.
And I’d like something a
bit masculine. If I could
wear menswear to the
oce, I probably would !
I’m in town for New York Fashion
Week and I’d like to find something
flashy, maybe a little funky, to wear
to the shows. You know everyone’s
out, watching the dierent groups,
the runway-to-street crowd, the
blogger-style crowd… Me, I’m more
of a streetstyle, streetchic person.
Just edgy enough, you know?
I need some clothes for a yoga retreat
I’m doing next month. We’ll be up in
the mountains in Colorado, enjoying
the calming natural beauty. It is so
beautiful up there in nature… and we’ll
be running, doing yoga all day,
sweating and finding zen...
I’d like to get some suggestions for a
dressy, sparkly, special-occasion
outfit- there’s a holiday party coming
up that I’m going to. It’s a cold winter
and I’m sure there will be rain or snow,
but I’d still like to dress up in
something stylish and chic.
Figure 12: A personal stylist interface that recommends fashion items based on natural language input. We extract the style tokens from a user’s
description of an outfit, infer a topic distribution over the style document, and compute a list of high probability words in the element language. Users are
shown items ranked by intersection-over-union over the top element words.
that can be expressed with sets of human-interpretable fea-
tures capturing dierent levels of granularity [1, 17]. This
model also oers attractive capabilities: it can infer latent
concepts of a design, translate between dierent feature rep-
resentations, and even generate new artifacts. In the future,
we hope that this framework can power new applications in
domains like graphic design, 3D modeling, and architecture.
We thank P. Daphne Tsatsoulis for her early contributions to
this work, and David Mimno for his helpful discussions of the
1. Adar, E., Dontcheva, M., and Laput, G.
CommandSpace: modeling the relationships between
tasks, descriptions and features. In Proc. UIST (2014).
2. Barthes, R. The language of fashion. A&C Black, 2013.
3. Berg, T., Berg, A., and Shih, J. Automatic attribute
discovery and characterization from noisy web data. In
Proc. ECCV (2010).
4. Burton, M. A., Brady, E., Brewer, R., Neylan, C.,
Bigham, J. P., and Hurst, A. Crowdsourcing subjective
fashion advice using VizWiz: challenges and
opportunities. In Proc. ACCESS (2012).
5. Chang, J., Boyd-Graber, J., Wang, C., Gerrish, S., and
Blei, D. M. Reading tea leaves: how humans interpret
topic models. In Proc. NIPS (2009).
6. Di, W., Wah, C., Bhardwaj, A., Piramuthu, R., and
Sundaresan, N. Style Finder: fine-grained clothing style
detection and retrieval. In Proc. CVPR (2013).
7. Goodman, L. A. Snowball sampling. The Annals of
Mathematical Statistics (1961).
8. He, R., and McAuley, J. Ups and downs: modeling the
visual evolution of fashion trends with one-class
collaborative filtering. In Proc. WWW (2015).
9. Herlocker, J. L., Konstan, J. A., and Riedl, J. Explaining
collaborative filtering recommendations. In Proc. CSCW
10. Hidayati, S. C., Hua, K.-L., Cheng, W.-H., and Sun,
S.-W. What are the fashion trends in New York? In
Proc. MM (2014).
11. Kiapour, M., Yamaguchi, K., Berg, A., and Berg, T.
Hipster wars: discovering elements of fashion styles. In
Proc. ECCV (2014).
12. Kwak, I. S., Murillo, A. C., Belhumeur, P., Belongie, S.,
and Kriegman, D. From bikers to surfers: visual
recognition of urban tribes. In Proc. BMVC (2013).
13. Lin, Y., Xu, H., Zhou, Y., and Lee, W.-C. Styles in the
fashion social network: an analysis on In
Social Computing, Behavioral-Cultural Modeling, and
Prediction. Springer International Publishing, 2015.
14. Manning, C. D., and Sch¨
utze, H. Foundations of
statistical natural language processing. MIT Press,
15. McAuley, J., Targett, C., Shi, Q., and van den Hengel, A.
Image-based recommendations on styles and substitutes.
In Proc. SIGIR (2015).
16. McCallum, A. K. MALLET: a machine learning for
language toolkit., 2002.
17. Michailidou, E., Harper, S., and Bechhofer, S. Visual
complexity and aesthetic perception of web pages. In
Proc. SIGDOC (2008).
18. Mimno, D., Wallach, H. M., Naradowsky, J., Smith,
D. A., and McCallum, A. Polylingual topic models. In
Proc. EMNLP (2009).
19. Morris, M. R., Inkpen, K., and Venolia, G. Remote
shopping advice: enhancing in-store shopping with
social technologies. In Proc. CSCW (2014).
20. Murillo, A. C., Kwak, I. S., Bourdev, L., Kriegman, D.,
and Belongie, S. Urban tribes: analyzing group photos
from a social perspective. In Proc. CVPRW (2012).
21. Shen, E., Lieberman, H., and Lam, F. What am I gonna
wear?: scenario-oriented recommendation. In Proc. IUI
22. Simo-Serra, E., and Ishikawa, H. Fashion style in 128
floats: joint ranking and classification using weak data
for feature extraction. In Proc. CVPR (2016).
23. Song, Z., Wang, M., Hua, X.-S., and Yan, S. Predicting
occupation via human clothing and contexts. In Proc.
ICCV (2011).
24. Sorger, R., and Udale, J. The fundamentals of fashion
design. AVA Publishing, 2006.
25. Tam, D. Social commerce site Polyvore reaches 20M
social-commerce- site-polyvore- reaches-20m- users/,
26. Tsujita, H., Tsukada, K., Kambara, K., and Siio, I.
Complete fashion coordinator: a support system for
capturing and selecting daily clothes with social
networks. In Proc. AVI (2010).
27. Vartak, M., and Madden, S. CHIC: a combination-based
recommendation system. In Proc. SIGMOD (2013).
28. Veit, A., Kovacs, B., Bell, S., McAuely, J., Bala, K., and
Belongie, S. Learning visual clothing style with
heterogeneous dyadic co-occurrences. In Proc. ICCV
29. Vittayakorn, S., Yamaguchi, K., Berg, A., and Berg, T.
Runway to realway: visual analysis of fashion. In Proc.
WACV (2015).
30. Yamaguchi, K., Berg, T. L., and Ortiz, L. E. Chic or
social: visual popularity analysis in online fashion
networks. In Proc. MM (2014).
31. Yamaguchi, K., Kiapour, M. H., and Berg, T. Paper doll
parsing: retrieving similar styles to parse clothing items.
In Proc. ICCV (2013).
32. Yu, L.-F., Yeung, S.-K., Terzopoulos, D., and Chan, T. F.
DressUp! outfit synthesis through automatic
optimization. In Proc. SIGGRAPH Asia (2012).
... Using a hand-crafted feature extraction technique with a support vector machine (SVM) classifier, they classified the images, but did not achieve such high accuracy. Then, Deep Learning-based approaches were used to classify fashion styles [20][21][22][23][24][25][26][27]. Takagi et al. [20] collected the FashionStyle14 dataset. ...
... The BCDA learns the basic rules of tops and bottoms as two modals of clothing collocations to generate input to the regression model for predicting coordinate values in the FSS. Vaccaro et al. [26] presented a data-driven fashion model that translates low-level design elements (e.g., ''red cardigan'', ''lightweight shirt'') into high-level styles (e.g., ''valentines day'') by training polylingual topic modeling on outfit data collected from Polyvore. This model attempts to translate between languages through the topic space by using a natural language processing technique to learn latent fashion concepts jointly over style and element vocabularies. ...
Recently, loss functions based on angular spans improved the performance of deep visual recognition. These losses converted Euclidean cross entropy to angular cross entropy loss. Fashion style recognition deals with the problem of assigning a person’s outfit to a fashion style category. Due to the high similarity between different clothing items and the use of softmax-based loss functions, many of the current methods that address this problem show relatively poor performance and cannot guarantee sufficient inter-class margins in the fashion domain. In this work, we propose an end-to-end method for deep visual recognition by combining a standard CNN architecture with a novel loss function, which we call Additive Cosine Margin Loss (ACML). The proposed function not only projects feature vectors of different classes into different regions of the embedding, but also enforces compactness of the projections within each class. Our experiments were conducted on two public and well-known fashion style recognition datasets FashionStyle14 and HipsterWars, and on the face verification and identification datasets LFW, YTF, and MegaFace. These experiments demonstrate the superiority of the proposed loss function over: (1) existing angular margin-based loss functions (2) state-of-the-art methods for clothing style recognition as well as face analysis tasks.
... LDA is a probabilistic generative model which has been successfully used for a wide variety of applications including information retrieval, collaborative filtering, and image modeling. It has also been applied in the fashion domain for other relevant tasks such as style analysis [56,161]. From the perspective of topic modeling, a fashion outfit can be seen as a documents covering several main topics and attributes can be seen as words. As shown in Fig. 4 (d), the idea is that well-matched item pairs will have a certain topic proportion, meaning different items should have close topic proportions to make a good match together. ...
Fashion recommendation is a key research field in computational fashion research and has attracted considerable interest in the computer vision, multimedia, and information retrieval communities in recent years. Due to the great demand for applications, various fashion recommendation tasks, such as personalized fashion product recommendation, complementary (mix-and-match) recommendation, and outfit recommendation, have been posed and explored in the literature. The continuing research attention and advances impel us to look back and in-depth into the field for a better understanding. In this paper, we comprehensively review recent research efforts on fashion recommendation from a technological perspective. We first introduce fashion recommendation at a macro level and analyse its characteristics and differences with general recommendation tasks. We then clearly categorize different fashion recommendation efforts into several sub-tasks and focus on each sub-task in terms of its problem formulation, research focus, state-of-the-art methods, and limitations. We also summarize the datasets proposed in the literature for use in fashion recommendation studies to give readers a brief illustration. Finally, we discuss several promising directions for future research in this field. Overall, this survey systematically reviews the development of fashion recommendation research. It also discusses the current limitations and gaps between academic research and the real needs of the fashion industry. In the process, we offer a deep insight into how the fashion industry could benefit from fashion recommendation technologies. the computational technologies of fashion recommendation.
... The style types of clothing can be analyzed from different perspectives [8]. Different unique patterns (e.g., patterns and cats) can be used as a style, different materials (e.g., nylon and cotton) can be used as a style, and shapes can also be used as a style [24]. However, these classification methods cannot cover all clothing styles, because the same style of clothing also has different styles of pattern and material composition. ...
Full-text available
In recent years, the online selection of virtual clothing styles has been used to explore and expand diversified personal aesthetics, and it is also an overall reform and challenge to the clothing industry. Under the condition of the existing clothing style categories, this paper puts forward a style classification method combining fine-grained and coarse-grained techniques. Furthermore, a new deep neural network is proposed, which can improve the robustness of recognition and avoid the interference of image background through the pan learning and the background learning of image features. In order to study the relationship between the fine-grained attributes of clothing and the whole style, firstly, the clothing types are learned to realize the pre-training of model parameters. Secondly, through the transfer learning of the first stage of the pre-training model parameters, the model parameters are fine-tuned to make them more suitable for identifying the coarse-grained style types. Finally, a network structure based on the dual attention mechanism is proposed to improve the accuracy of final identification by adding different attention mechanisms at different stages of the network to enhance the performance of network features. In the experiment, we collected 50,000 images of 10 clothing styles to train and evaluate the models. The results show that the proposed classification method can effectively distinguish clothing styles and types.
Fashion recommendation is a key research field in computational fashion research and has attracted considerable interest in the computer vision, multimedia, and information retrieval communities in recent years. Due to the great demand for applications, various fashion recommendation tasks, such as personalized fashion product recommendation, complementary (mix-and-match) recommendation, and outfit recommendation, have been posed and explored in the literature. The continuing research attention and advances impel us to look back and in-depth into the field for a better understanding. In this paper, we comprehensively review recent research efforts on fashion recommendation from a technological perspective. We first introduce fashion recommendation at a macro level and analyse its characteristics and differences with general recommendation tasks. We then clearly categorize different fashion recommendation efforts into several sub-tasks and focus on each sub-task in terms of its problem formulation, research focus, state-of-the-art methods, and limitations. We also summarize the datasets proposed in the literature for use in fashion recommendation studies to give readers a brief illustration. Finally, we discuss several promising directions for future research in this field. Overall, this survey systematically reviews the development of fashion recommendation research. It also discusses the current limitations and gaps between academic research and the real needs of the fashion industry. In the process, we offer a deep insight into how the fashion industry could benefit from the computational technologies of fashion recommendation.
Full-text available
Purpose: Fashion is an important aspect of day-to-day life. There are various factors influencing fashion trends such as social, cultural, political, environmental, and psychological factors. A bigger interest in fashion has developed at present. This study is conducted to prove the importance of fashion trends to understand and develop the dynamic capacities to guarantee long-term Success the fashion. In clothing and apparel products, the aesthetic sphere is socially and culturally grounded. All groups have their likes and dislikes. This study explores clothing practices and personality traits among young college students. Daily choices of clothing depend on social, economic, and psychological reasons. Social recognition is a major part of an individual’s identity, attitude, and self-evaluation. Hence this article synthesizes many scholarly research articles on wearer perception, published in a few of the globally familiar journals. Design: The study was able to identify the key influencing variables and essential constituent aspects of the young adolescent’s fashion consciousness. This paper pertaining to the research agenda of body image and fashion trends, extensively evaluated personality traits of young adolescents for this study. Findings/ Results: This literature synthesis establishes that the concept of body image and clothing is highly interrelated. This study focuses on depicting the correlation between fashion adoption and personality traits. The researchers have found that the drive for a well-dressed fashionable presentation is highlighted among men and women. Social comparison and objectification together determine their confidence. Some expressed that they used to dress to confirm their selves young. Here we want to explore that fashionable clothing, accessories, makeover are the main means to meet their needs to be fashionable. Through clothing self-descriptions of a person can be presented. Discussion of the concept of self-presentation is included in this research taking into consideration of individual behavior according to their clothing, personal values, interest, religion, culture, and customs. Originality/ Value: Clothing practices reflect one's standard. Sociology is being used to study clothing and human confidence after being well-dressed. This review of literature focuses on the relationship between human behavior and fashion. Information in this review will be useful for the referrer to understand the social psychology of fashion. Body makeover illustrates the impact of changing standards of attractiveness on appearance in the presentation of one’s self. Paper Type: Literature Review
Full-text available
Fashion, as a popular aesthetic expression, is often conveyed in a specific context, in particular in clothing, footwear, and accessories. Attributed to its enormous economic potential, intelligent fashion analysis has attracted focused attention from both academia and industrial practitioners in recent years. In this research, we provide a comprehensive review of fashion analysis-related tasks, which include fashion detection, fashion parsing, fashion retrieval, fashion style learning, fashion compatibility learning, fashion attribute prediction, and fashion generation. We investigated state-of-the-art articles from 1990 to the present, and provided a new taxonomy of extant research topics over these articles. We then summarized most representative fashion datasets with detailed statistical information. The coupled topics, methods, and features with respect to different levels of fashion analysis are also highlighted. Finally, remaining challenges and open issues are discussed in order to provide guidance for future studies.
Face-top coordination, which exists in most clothes-fitting scenarios, is challenging due to varieties of attributes, implicit correlations, and tradeoffs between general preferences and individual preferences. We present a Deep-Based Self-Refined (DBSR) system to simulate face-top coordination based on intuition evaluation. To this end, we first establish a well-coordinated face-top (WCFT) dataset from fashion databases and communities. Then, we use a jointly trained CNN Deep Canonical Correlation Analysis (DCCA) method to bridge the semantic face-top gap based on the WCFT dataset to deal with general preferences. Subsequently, an irrelevance-based Optimum-path Forest (OPF) method is developed to adapt the results to individual preferences iteratively. Experimental results and user study demonstrate the effectiveness of our method.
Full-text available
Clothing and fashion are an integral part of our everyday lives. In this paper we present an approach to studying fashion both on the runway and in more real-world settings, computationally, and at large scale, using computer vision. Our contributions include collecting a new runway dataset, designing features suitable for capturing outfit appearance, collecting human judgments of outfit similarity, and learning similarity functions on the features to mimic those judgments. We provide both intrinsic and extrinsic evaluations of our learned models to assess performance on outfit similarity prediction as well as season, year, and brand estimation. An example application tracks visual trends as runway fashions filter down to "real way" street fashions.
Full-text available
With the rapid proliferation of smart mobile devices, users now take millions of photos every day. These include large numbers of clothing and accessory images. We would like to answer questions like `What outfit goes well with this pair of shoes?' To answer these types of questions, one has to go beyond learning visual similarity and learn a visual notion of compatibility across categories. In this paper, we propose a novel learning framework to help answer these types of questions. The main idea of this framework is to learn a feature transformation from images of items into a latent space that expresses compatibility. For the feature transformation, we use a Siamese Convolutional Neural Network (CNN) architecture, where training examples are pairs of items that are either compatible or incompatible. We model compatibility based on co-occurrence in large-scale user behavior data; in particular co-purchase data from To learn cross-category fit, we introduce a strategic method to sample training data, where pairs of items are heterogeneous dyads, i.e., the two elements of a pair belong to different high-level categories. While this approach is applicable to a wide variety of settings, we focus on the representative problem of learning compatible clothing style. Our results indicate that the proposed framework is capable of learning semantic information about visual style and is able to generate outfits of clothes, with items from different categories, that go well together.
The Fundamentals of Fashion Design provides a fully illustrated introduction to the key elements of fashion design, from the initial concept of a fashion idea to realizing it in 3D form. Writing with clarity and precision, Richard Sorger and Jenny Udale explain the entire fashion design process, including research and design, fabrics and their properties, construction methods and how to form and promote a collection. This third edition has been updated to include the latest design and construction techniques and stunning new visual examples. New and updated interviews with practitioners working for leading fashion brands offer key insights into succeeding in the industry today and a preface by fashion designer and instructor Shelley Fox introduces and contextualizes the new edition. Exercises also help readers to discover and experiment with design techniques first hand. Overall, this book is a rich and dynamic resource that will inspire readers to develop their own design work and embark on a career in fashion with confidence, proficiency and enthusiasm.
Conference Paper
As birds of a feather flock together, so do people with similar interests and preferences befriend with each other. Numerous Social Network Analysis (SNA) researchers have investigated how individuals’ identification affects their behaviors, such as ethnicity, education, political opinions and even musical tastes. What about one’s fashion style? These days, online social networks provide rich resources for us to study these phenomena; however, no research has investigated people’s styles/tastes within a social network. In this paper, we analyze the largest fashion social network, By applying SNA techniques, we answer whether people with similar styles tend to connect with each other on the online social networks and whether people form communities based on their styles. We believe this is the first work studying people’s fashion styles on an online social network empirically.
Building a successful recommender system depends on understanding both the dimensions of people's preferences as well as their dynamics. In certain domains, such as fashion, modeling such preferences can be incredibly difficult, due to the need to simultaneously model the visual appearance of products as well as their evolution over time. The subtle semantics and non-linear dynamics of fashion evolution raise unique challenges especially considering the sparsity and large scale of the underlying datasets. In this paper we build novel models for the One-Class Collaborative Filtering setting, where our goal is to estimate users' fashion-aware personalized ranking functions based on their past feedback. To uncover the complex and evolving visual factors that people consider when evaluating products, our method combines high-level visual features extracted from a deep convolutional neural network, users' past feedback, as well as evolving trends within the community. Experimentally we evaluate our method on two large real-world datasets from, where we show it to outperform state-of-the-art personalized ranking measures, and also use it to visualize the high-level fashion trends across the 11-year span of our dataset.
Conference Paper
Fashion is a reflection of the society of a period. Given that New York City is one of the world's fashion capitals, understanding its change in fashion becomes a way to know the society and the times. To keep up with fashion trends, it is important to know what's " in" and what's "out" for a season. Though the fashion trends have been analyzed by fashion designers and fashion analysts for a long time, this issue has been ignored in multimedia science. In this paper, we present a novel algorithm that automatically discovers visual style elements representing fashion trends for a certain season. The visual style elements are discovered based on the stylistic coherent and unique characteristics. The experimental results demonstrate the effectiveness of our proposed method through a large number of catwalk show videos.
From Flickr to Facebook to Pinterest, pictures are increasingly becoming a core content type in social networks. But, how important is this visual content and how does it influence behavior in the network? In this paper we study the effects of visual, textual, and social factors on popularity in a large real-world network focused on fashion. We make use of state of the art computer vision techniques for clothing representation, as well as network and text information to predict post popularity in both in-network and out-ofnetwork scenarios. Our experiments find significant statistical evidence that social factors dominate the in-network scenario, but that combinations of content and social factors can be helpful for predicting popularity outside of the network. This in depth study of image popularity in social networks suggests that social factors should be carefully considered for research involving social network photos.
Users often describe what they want to accomplish with an application in a language that is very different from the application's domain language. To address this gap between system and human language, we propose modeling an application's domain language by mining a large corpus of Web documents about the application using deep learning techniques. A high dimensional vector space representation can model the relationships between user tasks, system commands, and natural language descriptions and supports mapping operations, such as identifying likely system commands given natural language queries and identifying user tasks given a trace of user operations. We demonstrate the feasibility of this approach with a system, COMMANDSPACE, for the popular photo editing application Adobe Photoshop. We build and evaluate several applications enabled by our model showing the power and flexibility of this approach.