Conference PaperPDF Available

Abstract and Figures

Recent studies in the context of machine learning have shown the effectiveness of deep attentional mechanisms for identifying important communities and relationships within a given input network. These studies can be effectively applied in those contexts where capturing specific dependencies, while downloading useless content, is essential to take decisions and provide accurate inference. This is the case, for example, of current recommender systems that exploit social information as a clever source of recommendations and / or explanations. In this paper we extend the social engine of our educational platform “WhoTeach” to leverage social information for educational services. In particular, we report our work in progress for providing “WhoTeach” with an attentional-based recommander system oriented to the design of programmes and courses for new teachers.
Content may be subject to copyright.
Attentional Neural Mechanisms for Social Recommendations in
Educational Platforms
Italo Zoppis1 a, Sara Manzoni1 b, Giancarlo Mauri1 c, Ricardo Anibal Matamoros Aragon1,
Luca Marconi2and Francesco Epifania2
1Department of Computer Science, University of Milano Bicocca, Milano, Italy
2Social Things srl., Milano, Italy
Keywords: Social Networks, WhoTeach, Social Recommendations, Graph Attention Networks.
Abstract: Recent studies in the context of machine learning have shown the effectiveness of deep attentional mecha-
nisms for identifying important communities and relationships within a given input network. These studies
can be effectively applied in those contexts where capturing specific dependencies, while downloading useless
content, is essential to take decisions and provide accurate inference. This is the case, for example, of current
recommender systems that exploit social information as a clever source of recommendations and / or expla-
nations. In this paper we extend the social engine of our educational platform “WhoTeach” to leverage social
information for educational services. In particular, we report our work in progress for providing “WhoTeach”
with an attentional-based recommander system oriented to the design of programmes and courses for new
teachers.
1 INTRODUCTION
Modern Recommender Systems (RS) use social in-
formation to identify the interest of a target user
and provide reliable suggestions (Bonhard and Sasse,
2006). These “social” recommendations extend tradi-
tional approaches (e.g., collaborative filter) by obtain-
ing compelling information from complex networks
of users and items (Zhou et al., 2012) to identify per-
sonal interests and implicitly forecast preferences on
available items (Gupta et al., 2013; Schafer et al.,
2007).
In this context, explainable AI is aimed at pro-
viding intuitive explanations for the suggestions and
recommendations given by the algorithm (Zhang and
Chen, 2018). Basically they try to address the prob-
lem of why certain recommendations are suggested
by the applied models. The results of this research
are progressively shortening the distance between so-
cial networks and recommender systems, changing
the way people interact and the content they can share
(Zhou et al., 2012). At the same time, different at-
tempts in current deep learning literature try to extend
ahttps://orcid.org/0000-0001-7312-7123
bhttps://orcid.org/0000-0002-6406-536X
chttps://orcid.org/0000-0003-3520-4022
deep techniques to deal with social data, recommen-
dations and explanations. Initial work in this context
used recursive networks to process structured data
such as direct acyclic graphs (Frasconi et al., 1998;
Sperduti and Starita, 1997). More recently, Graph
Neural Networks (GNNs) and others machine learn-
ing techniques (Dondi et al., 2019; Dondi et al., 2016;
Zoppis et al., 2019b) have been introduced as a gen-
eralization of recursive networks capable of handling
more general classes of graphs (Gori et al., 2005;
Scarselli et al., 2008). In this regard, emerging re-
searches on deep architectures focus on how to bring
out relevant parts of a network to perform a given task
(Veliˇ
ckovi´
c et al., 2017). Technically, this approach
is known as “attentional mechanism” for graphs, or
“Graph attention networks” (GATs). Introduced for
the first time in the deep learning community in or-
der to access important parts of the data (Bahdanau
et al., 2014), the attention mechanism has recently
been successful for the resolution of a series of ob-
jectives (Lee et al., 2018). For example, it is worth
to cite (Chen et al., 2018) where explainable sequen-
tial recommendations are extracted due to memory
networks. Another interesting approach comes from
capsule networks (Li et al., 2019), namely neural net-
works empowered with capsule structures to manage
hierarchies.
Zoppis, I., Manzoni, S., Mauri, G., Aragon, R., Marconi, L. and Epifania, F.
Attentional Neural Mechanisms for Social Recommendations in Educational Platforms.
DOI: 10.5220/0009568901110117
In Proceedings of the 12th International Conference on Computer Supported Education (CSEDU 2020) - Volume 1, pages 111-117
ISBN: 978-989-758-417-6
Copyright c
2020 by SCITEPRESS – Science and Technology Publications, Lda. All rights reser ved
111
In this article, focusing on these researches, we
extend the social engine of our educational platform
(Zoppis et al., 2019a) called “WhoTeach” with a
graph attentional mechanism aiming to provide so-
cial recommendations for the design of new didacti-
cal programmes and courses. More in details, we de-
scribe the “WhoTeach” platform and its social engine
in Sec. 2. In Sec. 3 we consider the main theoreti-
cal aspects used in our framework. We introduce the
attentional based architecture applied in this work in
Sec. 4. In Sec. 5 we report the numerical experiments
on a public dataset. Finally, we conclude the paper in
Sec.6 by describing future directions of this research.
2 WhoTeach
WhoTeach is an innovative e-learning system, aimed
at promoting the development of customized learn-
ing and training paths by aggregating and dissem-
inating knowledge created and updated by experts.
WhoTeach is conceived as a Social Intelligent Learn-
ing Management System (SILMS) and it is structured
around three main components:
1. The Recommender System, which helps experts
and teachers to quickly assemble high-quality
contents into courses: thanks to an intelligent
analysis of available material, it suggests teach-
ers the best resources, in any format, to include.
Moreover, it helps students, employees and work-
ers to acquire intelligent suggestions. This way,
they can find customized courses in real time, tak-
ing into consideration their personal profile, ob-
jectives and ambitions, along with rules and crite-
ria defined by their belonging organization.
2. The ”Knowledge co-creation Social Platform”,
which is a technological infrastructure based on
an integrated and highly interactive social net-
work. This component allows learners to interact
and cooperate in the learning path, exchanging in-
formation and mutual advice about the contents.
That amplifies their learning effort and motivates
them to get the best from it. Moreover, it provides
experts and teachers with feedback from learners
about clarity and comprehension of content and
with high-quality comments from domain experts
and other teachers (like a peer review). Conse-
quently, it enables the creation and the exchange
of knowledge inside organizations at no additional
cost. All is done in an informal, unstructured, in-
teractive and pervasive way.
3. The content’s repository where to upload con-
tents from any course or training material, either
proprietary or open, like the ones promoted by
the European Union (for example EMMA or the
Open Education portal). Thanks to the use of
metadata, the system can build a new, original
and high-quality course because WT can identify
the right information needed and obtain it from
any course (internal e/o external) already avail-
able. This mechanism enriches the contents’ of-
fer without additional cost, increasing the efficacy
of the recommender system that works on a much
wider range of data coming from different infor-
mation sources.
In the current complex and dynamic labour market,
every organization needs to deploy a continuous and
never-ending learning and formation process in order
to keep its working force and stakeholders always up-
dated and ready to leverage innovative technologies,
new working paradigms and be proactive versus mar-
ket changes. For the same reasons, training institu-
tions need to provide always updated classes and ma-
terials so that students can acquire skills and compe-
tences required by the market.
The challenge for all involved parties in the learn-
ing and formation process is huge. Employees, work-
ers and students (learners) feel the need to be able to
cope with these changes and want to quickly access
to the right, up-to-date and high-quality learning ma-
terial. They would like to be guided through a spe-
cific learning path customized for their personal back-
ground, ambitions and objectives. They also require
mutual interaction with their peers, teachers and sub-
ject matter experts, in order to exchange knowledge
and experience as well as to feedback regarding the
different contents available.
Subject matter experts and teachers, on the other end,
always need to know which learning material and
contents are mostly requested by users and where to
invest their time. They also seek for high-level com-
parison and advice from their peers like other teach-
ers, domain experts and even learners, in order to cre-
ate learning material with the required quality stan-
dards that match the learners’ needs. To save time
and be quick in updating or producing new courses,
they would like to easily reuse still valid contents and
to be supported in selecting the right course structure.
All organizations and training institutions need to
provide training to their learners in a continuous, per-
vasive way with constantly updated contents. Cost of
training is already relevant and will grow steadily not
leveraging the accumulated knowledge of all learn-
ers or reusing still valid contents for new training ma-
terial or courses. In addition, it becomes mandatory
to easily integrate proprietary contents with free and
publicly available training contents.
CSEDU 2020 - 12th International Conference on Computer Supported Education
112
In this scenario, WhoTeach is a solution then con-
ceived for demanding users, aiming to teach or to
learn in highly-dynamic and complex disciplinary en-
vironments. Specifically, WhoTeach is the result of
the exploitation of the European project NETT, which
was focused on the conception and design of a so-
cial learning solution for promoting and stimulating
the diffusion of the entrepreneurship knowledge and
mindset in the European countries. In order to con-
tinuously help teachers to conceive, organize and cre-
ate effective and right courses, WhoTeach provides
them with suggestions thanks to the recommender
system. Due to the learning algorithms, the system
identifies relations between the features of the didac-
tic resources that may be relevant, according to the
inquirer’s needs. In this way, teachers and experts
are supported getting intelligent and consistent sug-
gestions, by reducing the high number of available re-
sources. The system intelligence progressively guides
the user in the composition of the course, picking re-
sources in any available format and quickly assem-
bling them to avoid frustration and waste of time.
Moreover, the solution recommends learners the best
training that fits their background and their goals in
terms of skill improvement and career/curriculum de-
velopment in the belonging organization, providing
them with the most updated and customized content
and training material. By the means of the several in-
teraction possibilities available in the social platform,
learners can exchange their experience and feedback
with experts and other users. Teachers and experts
will then receive suggestions on how to improve their
contents based on user comments and which contents
require update.
The learning algorithms allow WhoTeach to dy-
namically choose, organize and update contents,
given the needs, the objectives and the feedback pro-
vided by users. Due to the use of metadata, the con-
tents are homogeneously identified through a vector
of parameters to represent them. Thereafter, the feed-
back of other users (learners or experts) allow to as-
sociate each composition of vectors to a score. As
a consequence, a dynamical decision tree procedure
leads to where, satisfactory completion of the course
represented by specific branches of decision trees.
The organization of the learning material is composed
by different knowledge areas related to disciplinary
macro-areas. Contents are divided in resources, mod-
ules and courses. In particular, the resources can vary
in their structure (wiki, discussion forum, eBook, etc)
and format (word, pdf,etc).
In order to highly empower the social learn-
ing processes, especially in relatively new or
highly dynamic learning contexts, the second macro-
component of the platform structure is the so-called
”Knowledge co-creation Social Platform”, based on a
social network providing multiple possibilities of in-
teractions, material exchange and communities cre-
ation, fostered around each disciplinary sector. Es-
pecially oriented to students, the platform helps them
to make them evolve their personal learning experi-
ence, reaching a more collaborative and amazing one.
The platform has standard social network tools (like
blogs, chat, forum, messaging), plus some advanced
features, like the following ones:
Definition of Community. Their can be created
around each disciplinary sector, in particular for
teachers, but even for interdisciplinary areas, con-
necting other different communities. Thematic
communities can be freely created by teachers or
experts. All communities are moderated by the
master of the knowledge area. In this way, disci-
plinary aggregations led to communities of prac-
tice and knowledge exchange and improvement.
Sharing of Didactic Materials. Users can share all
the didactic material, according to their need. In
particular, there is the possibility to share non offi-
cial material, without waiting experts’ or masters’
approvals. In this way, other feedback can be ac-
quired, thus making the knowledge management
and evolution faster.
Informal Communication among Users. Private
or public feedback progressively help teachers or
experts in improving their materials. Feedback
are stimulated in the knowledge areas but also
among different areas or disciplines, so as to get
insights for knowledge evolution and disciplinary
innovation.
Users Profile. Teachers and students have the pos-
sibility to create and edit their own profile: there
is a wide range of possibilities to share personal
experience, academic background, interests, com-
petences and personal or professional ambitions.
Organizations can also interact directly in order to
stimulate talent selection and competences match-
ing.
Groups and Forums. They have several possibil-
ities of interactive features to stimulate coopera-
tion among users. Besides directly sharing ma-
terials, teachers and students can create or share
different kind of contents in a dynamic way (e.g.
tests, links, videos, exercises etc.).
Therefore, this kind of social platform is aimed at
stimulating various types of collaborations among
users. That gives rise to rich, efficient and fruitful
communities of practice to favor course design activ-
ities and allow peer-to-peer learning methodologies.
Attentional Neural Mechanisms for Social Recommendations in Educational Platforms
113
From a technical perspective, the system is based on
a PHP shell piloting and empowering the customiza-
tion of the Moodle platform, serving as a base for a
Content Management System, while Mahara is used
to build the nested social platform. Mahara is a fully-
featured web application to build an electronic port-
folio. A user can create journals, upload files, embed
social media resources from the web and collaborate
with other users in groups. In Mahara user can con-
trol which items and what information other users see
within their portfolio. The Moodle system was chosen
because of its high diffusion within basically any kind
of training institution and due to the presence of a
wide development community. The platform has been
then integrated with social network features coming
from Mahara, in order to introduce meta-services as
previously described.
3 MAIN CONCEPTS AND
DEFINITIONS
Graphs (annotated with G= (V,E)) are theoretical
objects widely applied to model the complex set of
relationships that typically characterize current net-
works. They consist of a set of “entities ”, V(ver-
tices or nodes), and relationships between them, i.e.
edges, E. In this paper, we use weighted graphs
(graphs whose edges {i,j} ∈ Ehave assigned weights
label(i,j)), with an associated graph adiacency ma-
trix, A, to indicate whether two vertices are connected
by some edge, i.e., Ai,j=label(i,j)if {vi,vj} ∈ E.
Moreover, given a vertex vV, we indicate with
N(v) = {j:{v,j} ∈ E}the neighborhood of the ver-
tex v. We will also indicate with G[A]the graph in-
duced by A. In order to summarize the relationships
between vertices and capture relevant information in
a graph, embedding (i.e., objects transformation to
lower dimensional spaces) is typically applied (Goyal
and Ferrara, 2018). This approach allows to use a rich
set of analytical methods offering to deep neural net-
works the capability of providing different levels of
representation. Embedding can be defined at different
level: for example, at node level, at graph level, or
even through different mathematical strategies. Typ-
ically, the embedding is realized by fitting the (deep)
network’s parameters using standard gradient-based
optimization. The following definitions can be useful
(Lee et al., 2018).
Definition 3.1. Given a graph G= (V,E)with V
as the set of vertices and Ethe set of edges, the
objective of node embedding is to learn a function
f:VRksuch that each vertex iVis mapped
to a k-dimensional vector,~
h.
Definition 3.2. Given a set of graphs, G, the objective
of graph embedding is to learn a function f:GRk
that maps an input graph GGto a low dimensional
embedding vector,~
h.
4 A GRAPH ATTENTION
MECHANISM FOR
RECCOMENDER SYSTEMS
The principle according to which the attention mech-
anisms play their role is to select the most relevant
information among those available for the neural re-
sponse computation. In other words, “attention” is
essentially a way to non-uniformly weight the contri-
butions of input - or part of it, in order to optimize the
learning process for some specific task.
There are many way to get this result. In this paper
we consider the case of node embedding–based atten-
tion, as proposed in (Veliˇ
ckovi´
c et al., 2017).
Let us consider an user/item relationship matrix A,
and the corresponding weighted graph, G[A]=(V,E),
whose edges are labeled with scores attributed by
users, UV, to resources, RV, and collected
within A. Given a pair of vertices (u,r),uU,rR,
the induced graph representation of Ahas an edge
between user uand resource rin case that uap-
plied rand evaluated such a resource with the score
label(u,r). In this way, we have label (u,r) = Au,r.
With the above notation, we conveniently adapt the
definition of “attention” reported in (Lee et al., 2018)
as follows.
Definition 4.1. Let Abe an user/item relationship
matrix, and G[A]=(UR,E)its induced weighted
graph with vertices equal to the union of users, U,
and items R, respectively.
Given (u,r),uU,rR, an attentional mechanism
for Gis a function a:Rn×RnRwhich com-
putes coefficients e(l)
u,r=a~
h(l)
u,~
h(l)
racross the pairs
of vertices, (u,r), based on their feature representa-
tion~
h(l)
u,~
h(l)
rat level l.
Coefficients e(l)
u,rare parameters which act by weight-
ing the relevance of the vertex r’s features to (user) u.
Following (Veliˇ
ckovi´
c et al., 2017), we define aas a
feed-forward neural network with a learnable vector
of parameters (i.e., weights) ~a- updated with the oth-
ers network’s parameters according to standard opti-
mization, and nonlinear LeakyReLU activation func-
tion. In this way, we have
e(l)
u,r=LeakyReLU~a(l)ThW(l)
~
h(l)
u||W(l)
~
h(l)
ri,(1)
CSEDU 2020 - 12th International Conference on Computer Supported Education
114
where Wis a learnable parameter matrix and
W(l)
~
h(l)
u||W(l)
~
h(l)
ris the concatenation of the embed-
ded representation for the vertices u,r.
The coefficients e(l)
u,rcan be normalized over all ele-
ments in the neighborood of u. For instance, we can
apply the softmax function to obtain the following ex-
pression.
α(l)
u,r=exp(e(l)
u,r)
kN(u)exp(e(l)
u,k)
.
At this point, the role of (coefficients) α(l)
u,rbecomes
fundamental for our extension. Let us consider the
induced graph G[A]=(UR,E), and focus on some
user (vertex) uU. When only resources (items)
around uare considered, we can use the normalized
(attention) coefficients α(l)
u,rto compute a combination
of (the embedded resources)~
h(l)
rin N(u)as follows.
~
h(l+1)
u=σ
rN(u),rR
α(l)
u,rW(l)
~
h(l)
r(2)
where σis non linear vector-valued function (sig-
moid). With this formulation, Eq. 2 provides the
next level embedding for user uscaled by the atten-
tion coefficients which, in turn, can be interpreted as
the relevance of the resources used by (user) u(i.e.,
resources in the neighborood of u). Similarly to Eq.
2, the following quantity can be interpreted as the em-
bedded representation for the resource rscaled by the
attention coefficients which weight users (representa-
tion) who applied r(i.e., users in the neighborood of
r).
~
h(l+1)
r=σ
uN(r),uU
α(l)
u,rW(l)
~
h(l)
u(3)
In this way, the “GAT layer” returns for each
pair (u,r)U×Rthe embedded representation
(
~
h(l+1)
u,~
h(l+1)
r). In our experiments we will consider
only one level of embedding, i.e., l=1.
4.1 A Stacked Architecture for
Social-based Recommendations
The attentional mechanism described in this paper
was applied as a “base” layer (Module A) for the
stacked architecture reported in Fig. 1. Two outputs
are provided: ~
h(l+1)
ii.e., the embedded representation
for user-u’s score, and~
h(l+1)
j, the embedded represen-
tation for resource-r’s scores, respectively. The fol-
lowing details summarize this layer.
Architecture: described in the previous para-
graph.
Input: Given the user/item matrix, A= (si,j),
which contains the score, si,j, for each user i
and resource (item) j, a training set of examples
T={((~ui,~rj),si,j): 1 si,j5}was obtained by
composing the vector of scores, ~ui(provided by
user ifor each available resource), and the vector,
~rj(scores attributed by all users to the resource j).
Output: For each iand j, the embedded user-i’s
scores~
h(l+1)
iand item- j’ s scores~
h(l+1)
j.
Two embedding, ~
h(l+1)
i,~
h(l+1)
jare then passed and
combined through feed forward levels (FFL) in or-
der to obtain, using a final sigmod-based activation,
the score predicted for the user/item (input) pair (i,j).
The whole model is trained with MSE loss and SGD
(stochastic gradient descent) optimizer. In particu-
lar the following general architecture (Fig. 1a) was
stacked on the top of the attention layer.
Stacked layer (Module B in Fig. 1a).
Input: ~
h(l+1)
i,~
h(l+1)
j.
Output: For each iand j, the predicted score, ˆsi,j,
for user iwhen choosing (resource) j.
FFL, Sub-mod. n.1: ~
h(l+1)
ias input +Dense
layer with ReLU Activation function +hias em-
bedded output representation for user-i’s scores.
FFL, Sub-mod. n.2: ~
h(l+1)
jas input +Dense
layer with ReLU Activation function +hjas
embedded output representation for resource- j’s
scores.
Operator Layer: ~
h(l+1)
iand~
h(l+1)
jare combined
(through a specific operator) to obtain the vector
(
~
h(l+1)
i,~
h(l+1)
j).
Dense Layer: This final layer uses an output
sigmoid activation. The value ˆsi,jassumed by
the sigmoid function is then interpreted as output
score for user iand resource j.
Note that the architecture described above shows a
general structure, and can provide different models
according to the type of operation applied by the con-
sidered “operator layer”.
5 NUMERICAL EXPERIMENTS
Numerical experiments use an homogeneous set of
data whose characteristics combine well with the re-
quirements of the WhoTeach platform. These data
come from the “Goodbooks” data-set (https://www.
kaggle.com/zygmunt/goodbooks-10k), a large collec-
tion reporting up to 10000 books and 1000000 rat-
ings assigned by 53400 readers. In particular, numer-
ical ratings, ranging from “1” to “5”, are given by
Attentional Neural Mechanisms for Social Recommendations in Educational Platforms
115
(a)
(b)
Figure 1: System Architecture. A feed-forward based net-
work (Module B) is stacked on the top of the GAT layer
(Module A). The embedding of user-u’s scores~
h(l+1)
i, and
item-r’s scores ~
h(l+1)
j, are passed and combined (“opera-
tor layer” in Mod. B) with further embedding (FFL Sub-
modules) in order to provide through a sigmod-based acti-
vation (output of “Dense layer”) the predicted final sugges-
tion.
users (readers) for each resource type, in this case,
different sort of book. Numerical experiments are
planed to evaluate the capability of the attentional-
based models to reduce error (loss function) between
the reported and predicted preference scores. To pro-
vide robust estimation, we sub-sampled the data us-
ing cross-validation. The models described in this
paper was implemented using the Pytorch library
(https://pytorch.org/), and then executed using differ-
ent parameters for early stopping and learning rate, on
COLAB (https://colab.research.google.com/). In this
work in progress the attention-based model with con-
catenation operator in the stacked layer (see Fig. 1)
was compared with the following alternative models.
Performances were averaged on the number of folds
(10 cross-validation).
1. Dot product model.
Input: Training set T={((ui,rj),si,j): 1
si,j5}as described previously for the atten-
tional based architectures.
Output: for each i,j, the score ˆsi,jrecommended
for user iand resource j.
Loss Function: MSE; Optimizer: SGD.
Architecture: similar to the stacked architecture
(with no attention). A dot product operation is
Table 1: MSE comparison: Attention is applied with the
concatenation operator at the stacked layer. Hadamard uses
element-wise product between two vectors at the operator
layer.
Model Attention Concatenate Multiply Hadamard
MSE 0.0389 0.0439 0.0437 0.0436
applied to “aggregate” the embedded representa-
tions of ~uiand ~rj. No learnable parameters are
considered.
2. Element-wise product model (Hadamard product
model).
Similar to the previous case (dot product model)
but with an element wise product operation
(Hadamard product between vectors) computed
by the Operator Layer. The result of the element-
wise operation is is passed to the final dense layer.
3. Concatenation model.
Similar to the dot product model but with a con-
catenation operation computed by the Operator
Layer. In this case, the embedded representations
of~uiand~rjare concatenated in a new latent vector
and finally passed to a dense layer.
Preliminary results are reported in Tab. 1. A gen-
eral better tendency to reduce the MSE loss is ob-
served when attention layer with concatenation is ap-
plied as a base module for the considered stacked
layer.
6 CONCLUSIONS
Online social spaces are rich sources of information
for modern recommender systems. However, the suc-
cess of reliable recommendations is related to both the
capability of a framework to capture the social con-
tent and the availability of an effective information in
the considered online social space. The work reported
in this paper has focused on a recent ”mechanism”
formulation for learning on graphs with “attention”
(Veliˇ
ckovi´
c et al., 2017). In particular, the proposed
architecture intends to benefit from exploiting (with
“attention” weights) the graph’s task-relevant part in
order to provide reliable social recomendations. As
recently reported (Seo et al., 2017; Lu et al., 2018),
these mechanisms constitute challange solutions to
provide users with effective (social) justifications for
the suggestions that modern recommendation systems
are able to offer. Our research will follow now this tar-
get by adding to the social engine of our “WhoTeach”
platform textual suggestions based on the computed
“attention weights” as defined in Sec. 4. Numerical
results obtained in the experiments are encouraging.
CSEDU 2020 - 12th International Conference on Computer Supported Education
116
The proposed framework outperform feed forward-
based networks when the attention layer is applied.
Our future research in this context will also consider
additional architectures and data for further evalua-
tions and more general conclusions.
REFERENCES
Bahdanau, D., Cho, K., and Bengio, Y. (2014). Neural ma-
chine translation by jointly learning to align and trans-
late. arXiv preprint arXiv:1409.0473.
Bonhard, P. and Sasse, M. A. (2006). ’knowing me, know-
ing you’—using profiles and social networking to im-
prove recommender systems. BT Technology Journal,
24(3):84–98.
Chen, X., Xu, H., Zhang, Y., Tang, J., Cao, Y., Qin, Z., and
Zha, H. (2018). Sequential recommendation with user
memory networks. In Proceedings of the Eleventh
ACM International Conference on Web Search and
Data Mining, WSDM ’18, pages 108–116, New York,
NY, USA. ACM.
Dondi, R., Mauri, G., and Zoppis, I. (2016). Clique editing
to support case versus control discrimination. In Intel-
ligent Decision Technologies, pages 27–36. Springer.
Dondi, R., Mauri, G., and Zoppis, I. (2019). On the
tractability of finding disjoint clubs in a network. The-
oretical Computer Science.
Frasconi, P., Gori, M., and Sperduti, A. (1998). A general
framework for adaptive processing of data structures.
IEEE transactions on Neural Networks, 9(5):768–
786.
Gori, M., Monfardini, G., and Scarselli, F. (2005). A new
model for learning in graph domains. In Proceedings.
2005 IEEE International Joint Conference on Neural
Networks, 2005., volume 2, pages 729–734. IEEE.
Goyal, P. and Ferrara, E. (2018). Graph embedding tech-
niques, applications, and performance: A survey.
Knowledge-Based Systems, 151:78–94.
Gupta, P., Goel, A., Lin, J., Sharma, A., Wang, D., and
Zadeh, R. (2013). Wtf: The who to follow service at
twitter. In Proceedings of the 22nd international con-
ference on World Wide Web, pages 505–514. ACM.
Lee, J. B., Rossi, R. A., Kim, S., Ahmed, N. K., and Koh, E.
(2018). Attention models in graphs: A survey. arXiv
preprint arXiv:1807.07984.
Li, C., Quan, C., Peng, L., Qi, Y., Deng, Y., and Wu, L.
(2019). A capsule network for recommendation and
explaining what you like and dislike.
Lu, Y., Dong, R., and Smyth, B. (2018). Coevolutionary
recommendation model: Mutual learning between rat-
ings and reviews. In Proceedings of the 2018 World
Wide Web Conference, pages 773–782.
Scarselli, F., Gori, M., Tsoi, A. C., Hagenbuchner, M.,
and Monfardini, G. (2008). The graph neural net-
work model. IEEE Transactions on Neural Networks,
20(1):61–80.
Schafer, J. B., Frankowski, D., Herlocker, J., and Sen, S.
(2007). Collaborative filtering recommender systems.
In The adaptive web, pages 291–324. Springer.
Seo, S., Huang, J., Yang, H., and Liu, Y. (2017). Inter-
pretable convolutional neural networks with dual local
and global attention for review rating prediction. In
Proceedings of the eleventh ACM conference on rec-
ommender systems, pages 297–305.
Sperduti, A. and Starita, A. (1997). Supervised neural net-
works for the classification of structures. IEEE Trans-
actions on Neural Networks, 8(3):714–735.
Veliˇ
ckovi´
c, P., Cucurull, G., Casanova, A., Romero, A., Lio,
P., and Bengio, Y. (2017). Graph attention networks.
arXiv preprint arXiv:1710.10903.
Zhang, Y. and Chen, X. (2018). Explainable recommenda-
tion: A survey and new perspectives.
Zhou, X., Xu, Y., Li, Y., Josang, A., and Cox, C. (2012).
The state-of-the-art in personalized recommender sys-
tems for social networking. Artificial Intelligence Re-
view, 37(2):119–132.
Zoppis, I., Dondi, R., Manzoni, S., Mauri, G., Marconi, L.,
and Epifania, F. (2019a). Optimized social explana-
tion for educational platforms. In Int. Conf. on Com-
puter Supported Education, volume 1, pages 85–91.
Zoppis, I., Manzoni, S., and Mauri, G. (2019b). A com-
putational model for promoting targeted communica-
tion and supplying social explainable recommenda-
tions. In 2019 IEEE 32nd International Symposium
on Computer-Based Medical Systems (CBMS), pages
429–434. IEEE.
Attentional Neural Mechanisms for Social Recommendations in Educational Platforms
117
... In this article, we provide the first stages of our ongoing research project, aimed at significantly empowering the RS of our educational platform "WhoTeach" [29] by the means of an explainable attention model (XAM). Specifically, we report our current positioning in the state of the art with the proposed model to extend the social engine of "WhoTeach" with a graph attentional mechanism aiming to provide social recommendations for the design of new didactic programs and courses. ...
... Here we report a short review of the numerical experiments described in [29]. ...
Chapter
Learning and training processes are starting to be affected by the diffusion of Artificial Intelligence (AI) techniques and methods. AI can be variously exploited for supporting education, though especially deep learning (DL) models are normally suffering from some degree of opacity and lack of interpretability. Explainable AI (XAI) is aimed at creating a set of new AI techniques able to improve their output or decisions with more transparency and interpretability. In the educational field it could be particularly significant and challenging to understand the reasons behind models outcomes, especially when it comes to suggestions to create, manage or evaluate courses or didactic resources. Deep attentional mechanisms proved to be particularly effective for identifying relevant communities and relationships in any given input network that can be exploited with the aim of improving useful information to interpret the suggested decision process. In this paper we provide the first stages of our ongoing research project, aimed at significantly empowering the recommender system of the educational platform “WhoTeach” by means of explainability, to help teachers or experts to create and manage high-quality courses for personalized learning.
Conference Paper
Full-text available
Collaborative filtering (CF) is a common recommendation approach that relies on user-item ratings. However, the natural sparsity of user-item rating data can be problematic in many domains and settings, limiting the ability to generate accurate predictions and effective recommendations. Moreover, in some CF approaches latent features are often used to represent users and items, which can lead to a lack of recommendation transparency and explainability. User-generated, customer reviews are now commonplace on many websites, providing users with an opportunity to convey their experiences and opinions of products and services. As such, these reviews have the potential to serve as a useful source of recommendation data, through capturing valuable sentiment information about particular product features. In this paper, we present a novel deep learning recommendation model, which co-learns user and item information from ratings and customer reviews, by optimizing matrix factorization and an attention-based GRU network. Using real-world datasets we show a significant improvement in recommendation performance, compared to a variety of alternatives. Furthermore, the approach is useful when it comes to assigning intuitive meanings to latent features to improve the transparency and explainability of recommender systems.
Conference Paper
Full-text available
User preferences are usually dynamic in real-world recommender systems, and a user»s historical behavior records may not be equally important when predicting his/her future interests. Existing recommendation algorithms -- including both shallow and deep approaches -- usually embed a user»s historical records into a single latent vector/representation, which may have lost the per item- or feature-level correlations between a user»s historical records and future interests. In this paper, we aim to express, store, and manipulate users» historical records in a more explicit, dynamic, and effective manner. To do so, we introduce the memory mechanism to recommender systems. Specifically, we design a memory-augmented neural network (MANN) integrated with the insights of collaborative filtering for recommendation. By leveraging the external memory matrix in MANN, we store and update users» historical records explicitly, which enhances the expressiveness of the model. We further adapt our framework to both item- and feature-level versions, and design the corresponding memory reading/writing operations according to the nature of personalized recommendation scenarios. Compared with state-of-the-art methods that consider users» sequential behavior for recommendation, e.g., sequential recommenders with recurrent neural networks (RNN) or Markov chains, our method achieves significantly and consistently better performance on four real-world datasets. Moreover, experimental analyses show that our method is able to extract the intuitive patterns of how users» future actions are affected by previous behaviors.
Article
Full-text available
Graphs, such as social networks, word co-occurrence networks, and communication networks, occur naturally in various real-world applications. Analyzing them yields insight into the structure of society, language, and different patterns of communication. Many approaches have been proposed to perform the analysis. Recently, methods which use the representation of graph nodes in vector space have gained traction from the research community. In this survey, we provide a comprehensive and structured analysis of various graph embedding techniques proposed in the literature. We first introduce the embedding task and its challenges such as scalability, choice of dimensionality, and features to be preserved, and their possible solutions. We then present three categories of approaches based on factorization methods, random walks, and deep learning, with examples of representative algorithms in each category and analysis of their performance on various tasks. We evaluate these state-of-the-art methods on a few common datasets and compare their performance against one another and versus non-embedding based models. Our analysis concludes by suggesting some potential applications and future directions. We finally present the open-source Python library, named GEM (Graph Embedding Methods), we developed that provides all presented algorithms within a unified interface, to foster and facilitate research on the topic.
Conference Paper
User reviews contain rich semantics towards the preference of users to features of items. Recently, many deep learning based solutions have been proposed by exploiting reviews for recommendation. The attention mechanism is mainly adopted in these works to identify words or aspects that are important for rating prediction. However, it is still hard to understand whether a user likes or dislikes an aspect of an item according to what viewpoint the user holds and to what extent, without examining the review details. Here, we consider a pair of a viewpoint held by a user and an aspect of an item as a logic unit. Reasoning a rating behavior by discovering the informative logic units from the reviews and resolving their corresponding sentiments could enable a better rating prediction with explanation. To this end, in this paper, we propose a capsule network based model for rating prediction with user reviews, named CARP. For each user-item pair, CARP is devised to extract the informative logic units from the reviews and infer their corresponding sentiments. The model firstly extracts the viewpoints and aspects from the user and item review documents respectively. Then we derive the representation of each logic unit based on its constituent viewpoint and aspect. A sentiment capsule architecture with a novel Routing by Bi-Agreement mechanism is proposed to identify the informative logic unit and the sentiment based representations in user-item level for rating prediction. Extensive experiments are conducted over seven real-world datasets with diverse characteristics. Our results demonstrate that the proposed CARP obtains substantial performance gain over recently proposed state-of-the-art models in terms of prediction accuracy. Further analysis shows that our model can successfully discover the interpretable reasons at a finer level of granularity.
Article
We study a variant of the problem of finding a collection of disjoint s-clubs in a given network. Given a graph, the problem asks whether there exists a collection of at most r disjoint s-clubs that covers at least k vertices of the network. An s-club is a connected graph that has diameter bounded by s, for a positive integer s. We demand that each club is non-trivial, that is it has order at least t≥2, for some positive integer t. We prove that the problem is APX-hard even when the input graph has bounded degree, s=2, t=3 and r=|V|. Moreover, we show that the problem is polynomial-time solvable when s≥4, t=3 and r=|V|, and when s≥3, t=2 and r=|V|. Finally, for s≥2, we present a fixed-parameter algorithm for the problem, when parameterized by the number of covered vertices.
Conference Paper
Recently, many e-commerce websites have encouraged their users to rate shopping items and write review texts. This review information has been very useful for understanding user preferences and item properties, as well as enhancing the capability to make personalized recommendations of these websites. In this paper, we propose to model user preferences and item properties using convolutional neural networks (CNNs) with dual local and global attention, motivated by the superiority of CNNs to extract complex features. By using aggregated review texts from a user and aggregated review text for an item, our model can learn the unique features (embedding) of each user and each item. These features are then used to predict ratings. We train these user and item networks jointly which enable the interaction between users and items in a similar way as matrix factorization. The local attention provides us insight on a user's preferences or an item's properties. The global attention helps CNNs focus on the semantic meaning of the whole review text. Thus, the combined local and global attentions enable an interpretable and better-learned representation of users and items. We validate the proposed models by testing on popular review datasets in Yelp and Amazon and compare the results with matrix factorization (MF), the hidden factor and topical (HFT) model, and the recently proposed convolutional matrix factorization (ConvMF+). Our proposed CNNs with dual attention model outperforms HFT and ConvMF+ in terms of mean square errors (MSE). In addition, we compare the user/item embeddings learned from these models for classification and recommendation. These results also confirm the superior quality of user/item embeddings learned from our model.
Chapter
We present a graph-based approach to support case vs control discrimination problems. The goal is to partition a given input graph in two sets, a clique and an independent set, such that there is no edge connecting a vertex of the clique with a vertex of the independent set. Following a parsimonious principle, we consider the problem that aims to modify the input graph into a most similar output graph that consists of a clique and an independent set (with no edge between the two sets). First, we present a theoretical result showing that the problem admits a polynomial-time approximation scheme. Then, motivated by the complexity of such an algorithm, we propose a genetic algorithm and we present an experimental analysis on simulated data.