Enhancing Digital Literacy by
Multi-modal Data Mining of the Digital Lifespan
University of Surrey
Newcastle upon Tyne, UK
Kathryn M. Orzech
University of Dundee
University of Nottingham
Social media pervades modern digital society, yet despite its
regular use the general level of digital literacy and aware-
ness of online representations of self remain low. We re-
port progress on the RCUK DE funded ‘Charting the Digital
Lifespan’ project, which promotes digital literacy of social
media through design of novel technological interventions
that raise awareness of the digital personhood. We describe
progress towards automatic social media data-mining capa-
ble of categorising photographs and associated comments
into high-level semantic concept groups elicited from digital
anthropological study. We describe how this pilot system
has been incorporated into live trials of technologies that
visualize and encourage reﬂection upon digital personhood.
Categories and Subject Descriptors
D.4 [Human-centered computing]: Social Media
Over two billion individuals world-wide, and 57% of the
UK population, maintain digital reﬂections of their ana-
logue lives through the lens of social media . The paths
of these digital and physical lives run in parallel, converg-
ing and diverging as they mediate personhood. The RCUK
Digital Economy Theme funded Charting the Digital Lifes-
pan (CDL) project is embarking upon the second half of a
two year interdisciplinary study exploring this space. A key
goal for CDL is to enhance the digital literacy of individu-
als by facilitating reﬂection upon their digital personhood.
Frequently individuals are unaware of the permanence of
their digital footprints online, which are often contributed
in an ad-hoc incremental manner without regard to the holis-
tic representations being constructed of their digital selves.
Repercussions can occur when these digital representations
are revealed to, or interact with, others in unexpected ways.
1.1 Methodology and Context
CDL studies drivers and use patterns of social media at
three life stages: emerging adults; new parents; and re-
tirees. An experience-centred design (ECD)  methodol-
ogy is adopted, in which technology probes  — hardware
or software devices — are introduced into individuals’ lives
enabling novel forms of reﬂection upon their digital person-
hood. The probes provoke individual reﬂection by present-
ing a visualisation of the digital personhood, instantiated
through automatic analysis (‘data mining’) of online social
media presence. The design of the technology probe itself is
informed by both semi-structured interviews conducted with
individuals also at that life-stage (digital anthropology ),
and exploratory workshops centred around design ﬁctions
that encourage critical speculation about future possibilities
involving digital personhood . The needs of the technol-
ogy probe drive technical innovation in social media data
mining, which forms the focus of this paper.
One concrete example of the CDL methodology was the ‘Ad-
mixed Portrait’ ; a visualisation aimed at promoting re-
ﬂection on online identity in the new parent user-group. For
this technology probe, a visual amalgam was generated by
data mining face portrait images recently posted by a par-
ticipant to their Facebook timeline (Fig. 1). New parents
were able to reﬂect upon how they are portrayed online,
through the nature and content of their posts (e.g. a trend
towards posts of their new child) and posts made by others.
This intervention was designed around new parent use cases
identiﬁed through anthropological study.
In our current instantiation with the emerging adult group,
we are encouraging reﬂection on the kinds of material pre-
viously shared on their Facebook newsfeed. An exploratory
user study was used to determine a set of nine representative
‘concept groups’, e. g. sport, friends and family (Sec. 3.1).
Computer Vision (CV) and Machine Learning (ML) algo-
rithms have been developed to automatically classify partic-
ipants’ social media posts into one or more of these groups.
Due to the challenges automatic classiﬁcation, only posts of
photographs (visuals) with associated free text comments
were considered. Visualizations indicating the frequency
with which posts were made into each group were then shown
to participants to encourage reﬂection on the kinds of ac-
tivity that they were depicting through their digital per-
sonhood. We elaborate upon the CV/ML algorithms em-
ployed in performing the post classiﬁcation, and the results
obtained, in the remainder of this paper.
2. RELATED WORK
The identiﬁcation of semantic concepts within images (im-
age classiﬁcation) is a fundamental CV/ML challenge. Mod-
ern approaches tackle the problem using a three-stage pipeline.
First, a set of features are extracted from the image en-
coding texture and visual structure in a high dimensional
space. Commonly features are gradient domain (e. g. SIFT
, HOG ). Second, the feature space is simpliﬁed whilst
maintaining its ability to discriminate content. This is typi-
cally performed by clustering using k-means  or Gaussian
mixtures  to form a visual dictionary, and expressing all
features with respect to that dictionary. Third, the image
descriptors for each concept are used to train a classiﬁer us-
ing a set of training examples. More recently the pipeline
has been adapted to learn the classiﬁer and appropriate ﬁrst-
stage features simultaneously; so called ‘deep learning’ .
These frameworks have been applied classically to lab-scale
datasets; a few tens of videos, or thousands of images with
success rates approaching 70% for object recognition e. g.
PASCAL VOC . More recently large datasets such as
ImageNet , containing many hundreds of categories each
with tens of exemplars have been explored. In our setup
we face the complementary problem of only a few categories
but many thousands of diverse exemplars (high intra-class
variance) and sparse text data that previous, domain con-
strained, social classiﬁcation work does not accommodate.
3. SOCIAL MEDIA CLASSIFICATION
We adopt a standard supervised classiﬁcation approach in
which a set of social media posts (SMPs) are marked up
manually by tagging any of the nine concepts present. A
fraction of this data is used to train our proposed classi-
ﬁer, and the remainder used to test the system to evaluate
accuracy. The train-test split is randomised and repeated
several times in a cross-validation framework to report an
mean averaged precision (MAP) value for accuracy.
3.1 Datasets and Concept Groups
Two datasets were harvested from Facebook using a bespoke
web crawler. The ﬁrst indexed private Facebook proﬁles of
∼20 participants in the emerging adult group. The sec-
ond indexed publicly available proﬁles linked from emerg-
ing adults. Approximately ∼47k posts were harvested, and
∼6k usable records containing both photos and English text
comments were retained for the combined dataset. Manual
annotation was performed to establish ground-truth. For
public data, the CDL team annotated the data. For pri-
vate data, annotation was crowd-sourced from participants
contributing their data.
The nine concept groups for annotation and subsequent clas-
siﬁcation were scoped initially by CDL staﬀ, and subse-
quently reﬁned during 15 semi-structured interviews with
emerging adults conducted by the project anthropologist.
Participants volunteered category names, and these were
grouped after all interviews with emerging adults were com-
plete, aligning these with the preliminary concept groups
when appropriate. The resulting concept groups were: Art;
Attitude & Beliefs; Family & Pets; Food; Friends; Travel;
Celebrations; Personal style and self-imagery (e. g. selﬁe);
Sports. To collect private Facebook data and have it marked
up, the project anthropologist enlisted ﬁrst-year art students
to submit their Facebook image data and classify images do-
nated by their peers to explore images categories. This ac-
tivity took place in the context of a design project aimed at
creating a novel way to share digital photos. The students
had the option to opt-out of the data collection and classiﬁ-
cation, but most (22) participated, allowing a Facebook app
designed within the project to access their photos and asso-
ciated comments. A web interface presented random photos
to participants, who chose appropriate category labels.
3.2 Text mining
For each SMP a set of nouns, verbs and adjectives were
extracted from text comments provided by both the au-
thor and social media contributors. Topic discovery was
performed using LDA based on keyword co-occurrence ,
using the set of keywords present for each annotated concept
group. The result was a set of 10 representative keywords for
each concept group, determined by SMP content. A binary
feature space was deﬁned over the resulting 9 ×10 dimen-
sional set of keywords, with each dimension indicating the
presence of a word in SMP comments. Non-linear support
vector machines were trained in a one-vs-all pattern using
SMPs from the training class. An mean average precision
(MAP) of 32.0% was obtained (Fig. 2, green).
3.3 Visual mining
Dense color-SIFT features were extracted from each pho-
tograph in the SMP training set, and 10% of these were
clustered using k−means to quantize the feature space in to
k= 2000 codewords. Histograms of codeword occurrence
were built for each training SMP yielding a Bag of Visual
Words  representation for the media, which were used
to train a non-linear support vector machine in a one-vs-all
pattern. A MAP of 25.3% was obtained (Fig. 2, blue).
3.4 Multimodal fusion
An iterative fusion method adapted from Zhang et al.’s Co-
Trade  was used to integrate both the trained Text and
Visual SVM classiﬁers to produce a combined classiﬁer in-
formed by both modalities. Training SMPs were partitioned
further, into a Training and Validation set on a 50:50 basis.
Validation set SMPs were labelled using the resulting classi-
ﬁers. A hyper-graph was built for each modality connecting
each training and validation SMP (node) with its Kclosest
neighbours (in our experiments K= 5), as measured by Eu-
clidean distance in the respective modality. Each node was
evaluated for label coherency with its immediate neighbours,
and ranked. An initially empty set of ‘coherent’ training
data was incrementally built by steadily adding the most
coherent nodes from the uniﬁed training and validation set,
until classiﬁcation accuracy over validation data reached a
maximum. This resulted in a single, overall classiﬁer with
performance comparable (a fraction of a percentage point
greater) than the better of the two uni-modal classiﬁers.
The reﬂected the complementarity of performance seen by
those independent classiﬁers (Fig. 2).
4. DISCUSSION AND CONCLUSIONS
Social media is highly diverse, complicating eﬀorts to au-
tomatically categorise it. A system has been outlined to
attempt this through multi-modal fusion of visual and text
classiﬁers, yielding performance of ∼30% MAP on unstruc-
tured raw social media in the wild. Whilst encouraging, the
system does not yet generalise to accommodate unseen data
well – e. g. new keywords (‘Whiskey’ would not be regarded
as similar to the word ‘drink’, in the celebrations concept).
We are augmenting our system to incorporate WordNet as
a semantic distance measure in our text classiﬁer to enable
such inferences. Visual classiﬁcation is also a challenge, due
to noise in the training set and diﬃculty of obtaining a suf-
ﬁciently large quantity of training mark-up to counter the
high visual diversity. We are exploring domain adaptation
from web-based sources, and user-speciﬁc annotation (e. g.
in the form of user supplied ‘corrections’ to categorisations)
to improve accuracy. The application of our classiﬁcation
is visualizations to facilitate reﬂection on the digital lifes-
pan. As such, ‘perfect’ classiﬁcation is not needed – rather,
a ‘mostly correct’ result is acceptable to visualise general
trends in SMPs which tend to be the most eﬀective form of
data presentation within the technology probes we continue
to explore within CDL.
Figure 1: The Ad-mixed Portrait, used to promote
reﬂection amongst new parents on their digital rep-
resentations of self on social media (Facebook) .
Figure 2: Result of independent modalities used to
classify SMPs in the wild: Text (green) and Visual
(blue) yielded independent accuracies of 32.0% and
25.3% MAP respectively, and exhibited complemen-
tary performance across the categories. This indi-
cates value in pursuing a multi-modal SMP classiﬁ-
The Charting the Digital Lifespan (CDL) project is funded
by RCUK (EP/L00383X/1). The third author is addition-
ally supported by The Leverhulme Trust (ECF-2012-642).
 Internet access - households and individuals 2012.
Technical report, Oﬃce of National Statistics, London,
UK, February 2013.
 J. Bleecker. Design ﬁction: A short essay on design,
science, fact and ﬁction.
 D. Blei, A. Ng, and M. Jordan. Latent dirichlet
allocation. Journal of Machine Learning Research, 3,
 K. Chatﬁeld, K. Simonyan, A. Vedaldi, and
A. Zisserman. Return of the devil in the details:
Delving deep into convolutional nets. In Proceedings of
the British Machine Vision Conference (BMVC),
 N. Dalal and B. Triggs. Histograms of oriented
gradients for human detection. In Proceedings Intl.
Conf. on Computer Vision and Pattern Recognition
(CVPR), pages 886–893, 2005.
 T. Deselaers and V. Ferrari. Visual and semantic
similarity in imagenet. In Proceedings of the 2011
IEEE Conference on Computer Vision and Pattern
Recognition (CVPR), pages 1777–1784, 2011.
 H. A. Horst and D. Miller. Digital Anthropology. Berg,
 H. Hutchinson, W. Mackay, B. Westerlund, A. Druin,
C. Plaisant, M. Beaudouin-Lafon, S. Conversy,
H. Evans, H. Hansen, N. Roussel, and B. Eiderback.
Technology probes: inspiring design for and with
families. In Proceedings of ACM CHI, pages 17–24,
 A. Lowe. Object recognition from local scale-invariant
features. In Proceedings Intl. Conf. on Computer
Vision (ICCV), pages 1150–1157, 1999.
 J. Sanchez, F. Perronnin, T. Mensink, and J. Verbeek.
Image classiﬁcation with the ﬁsher vector: Theory and
practice. Technical Report 8209, INRIA, Project
LEAR, Grenoble, France, May 2013.
 J. Sivic and A. Zisserman. Video Google: Eﬃcient
visual search of videos. In J. Ponce, M. Hebert,
C. Schmid, and A. Zisserman, editors, Toward
Category-Level Object Recognition, volume 4170 of
LNCS, pages 127–144. Springer, 2006.
 D. Trujillo-Pisanty, A. Durrant, S. Martindale,
S. James, and J. Collomosse. Admixed portrait:
Reﬂections on being online as a new parent. In
Proceedings of ACM Designing Interactive Systems
(DIS), pages 503–512. ACM Press, 2014.
 P. Wright and J. McCarthy. Experience-Centred
Design: Designers, Users and Communities in
Dialogue. Morgan & Claypool, 2010.
 M. Zhang and Z. Zhou. Cotrade: Conﬁdent
co-training with data editing. IEEE Transactions on
Systems, Man, and Cybernetics, Part B: Cybernetics,
41(6), December 2011.