Content uploaded by Vinicius Cardoso Garcia
Author content
All content in this area was uploaded by Vinicius Cardoso Garcia on Jul 20, 2015
Content may be subject to copyright.
Content uploaded by Vinicius Cardoso Garcia
Author content
All content in this area was uploaded by Vinicius Cardoso Garcia on Jul 20, 2015
Content may be subject to copyright.
Content uploaded by Vinicius Cardoso Garcia
Author content
All content in this area was uploaded by Vinicius Cardoso Garcia on Jun 03, 2015
Content may be subject to copyright.
A Cloud-based Recommendation Model
Ricardo B. Rodrigues
Informatics Center - Federal
University of Pernambuco
Recife, Pernambuco, Brazil
rbr@cin.ufpe.br
Frederico A. Durão
Mathematics of Institute -
Federal University of Bahia
Salvador, Bahia, Brazil
freddurao@dcc.ufba.br
Vinicius C. Garcia
Informatics Center - Federal
University of Pernambuco
Recife, Pernambuco, Brazil
vcg@cin.ufpe.br
Carlo M. R. Silva
Informatics Center - Federal
University of Pernambuco
Recife, Pernambuco, Brazil
cmrs@cin.ufpe.br
Rafael R. Souza
Informatics Center - Federal
University of Pernambuco
Recife, Pernambuco, Brazil
rafaelmarlin@gmail.com
Rodrigo E. Assad
Usto.re
Recife, Pernambuco, Brazil
rodrigo@usto.re
ABSTRACT
The recommendation systems aim to minimize information
overload by helping user’s in searching desired information.
Faced with this scenario, we investigate the use of cloud
factors able to have a positive influence on generating rec-
ommendations. Thus, we present a new, simple model based
on cloud features which is associated with the content-based
technique of recommendation. The practical applicability of
data storage environments in the cloud provides the best use
of cloud resources and meets user’s preferences.
Categories and Subject Descriptors
H.3.3 [Information Search and Retrieval]: Information
filtering.
Keywords
Recommendation System, Cloud-based, Cloud Storage.
1. INTRODUCTION
With the advent of cloud computing, cloud storage sys-
tems have emerged that enable their users to store files in
the cloud. With the increasing use of these systems the mass
of data stored in cloud became impossible to be processed
humanistic implicated in the concealment of relevant infor-
mation to users who fail to discover new content because
they have no effective means to assist in the filtering data in
search of relevant knowledge and meets their expectations.
In this scenario, recommendation systems become an alter-
native to assist users in making decision to choose which file
and filter relevant information among a multitude of data.
Recommender systems (RS) are software programs and
techniques which provide suggestions of items to users [5].
These systems are part of our lives, we are faced with daily
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee. Request permissions from Permissions@acm.org.
EATIS’14, April 02 - 04 2014, Valparaiso, Chile.
Copyright 2014 ACM 978-1-4503-2435-9/14/04 ...$15.00.
http://dx.doi.org/10.1145/2590651.2590673
recommendations by email or web pages. Many stores and
platforms provide recommendation services, Amazon and
BarnesAndNoble. There are two predominant approaches
in building RS - Collaborative Filtering (CF) and Content-
Based Filtering (CB). CF systems recommend items that
are similar to the characteristics of the user, for example,
his profile on a social network. CB Systems recommend the
user to items similar to those in which he expressed inter-
est in previous experiments. Therefore, the system analyzes
descriptions of content of the items evaluated by the user to
build his profile, which is used to filter the remaining items
of the base [2] [6].
The recommendation systems aim to reduce information
overload, performing filtering of items based on user inter-
ests. Of several existing techniques for performing this task,
the approach used in this article is the Content-Based Fil-
tering (CB), which is based on files that the user has shown
interest in the past [6].
This paper presents a recommendation model for a storage
environment data in cloud. The generation of recommenda-
tions is used the technique CB and factors of the cloud. The
purpose of the recommendation model proposed here, is to
recommend the user files that are similar to his preference
and meet the factors of the cloud thus recommended a file
to the user, is always available and accessible in the cloud
storage, plus proportional reduction in the time spent in
downloading a file recommended and filtering relevant con-
tent through the vast amount of data available in cloud.
This paper is organized as follows: section 2 presents re-
lated works; section 3 presents the proposed model; section
4 presents results; and section 5 presents the conclusions and
future work of this study.
2. RELATED WORKS
Few studies in the literature discuss issues concerning rec-
ommendation systems on the cloud. In this section, we in-
troduce some of them, emphasizing the similarities and dif-
ferences in relation to the model proposed in this research
recommendation. Lee et al. (2010) [4] proposes a RS that
uses data stored into the cloud to provide recommendations
of files stored in the cloud, a solution that distinguishes the
scope of this work, which aims use factors of the cloud to
ensure availability of the recommended files to users. Lai
et al. (2011) [3] present a work that comes closest to the
purpose of this research. They propose an RS of TV shows
on the cloud with the aim of offering a scalable system that
has a high rate of availability for the system.
The model proposed in this research uses cloud factors to
generate recommendations, which guarantees the availabil-
ity of the recommended file and saves time spent on down-
loading recommended files and makes sure the recommen-
dations attend to the user
ˇ
Ss preferences.
3. THE RECOMMENDATION MODEL
The modeled recommendation process in this paper is
formed by the technique of recommendation combined with
the characteristics of the cloud. The proposed recommenda-
tion model is composed of five factors which come from the
cloud, and involve metadata files being stored in the cloud.
The proposed factors were defined based on the observation
of storage cloud environments and user priorities in these
environments. The factors are:
• Similarity
• Availability
• Rate Download
• File Size
• File Relevance
Below we detail each factor and their respective calcula-
tion:
Similarity Factor: This factor meets the requirement
for user preferences, as it calculates the similarity between
the contents of a file in which the user has demonstrated
preferences. Also, public files stored in the cloud are can-
didates to be recommended. The result of the similarity
between the contents is obtained by the technique of co-
sine similarity, which returns a value between 0 (zero) and
1 (one) [1]. The calculation of cosine similarity is given by
the Equation 1:
St = cos(θ) =
A · B
k A k · k B k
(1)
The similarity between two vectors A and B is calculated,
where we obtain the product of A and B and calculate the
magnitude of the vectors A and B. Such magnitudes are
multiplied and divided by the scalar product of the vectors
A and B.
The files that are similar to the file that represents the
user’s preferences will be ranked according to their degree
of similarity. The greater the score of similarity, the best
ranked file will be in reference to one similar to it. For
example, if the RS contains two files “A” and “B” similar to
user preferences and has a similarity score equal to “file A
= 0.8” and “file B = 0.5”, in this scenario “file A” will be
ranked better than “file B” in similarity factor.
The similarity between the files to be recommended and
a file in which the user has demonstrating interest is essen-
tial in this recommendation model, which aims to meet user
preferences in relation to filtering relevant content among a
large mass of data.
Availability Factor: The availability factor refers to the
time that a file will be available to the user. The availability
of this model is measured in hours, and the number of hours
that a peer that stores a file to be recommended is avail-
able on the network. A file should only be recommended to
the user if the peer that stores the files is online, making
the file available by allowing the user to download it. The
calculation of availability factor is shown in this Equation 2:
Dp = h ·
1
n
(2)
In calculating the availability factor, “h” is the amount of
time that a time machine that stores files is available on the
network; “n”is the number of hours for which a machine may
become available on the network if the clouds stay online all
day; and “n” will be equal to 24 (hours). The number of
hours of availability is normalized to a value between 0 and
1. The following example shows how the availability factor
contributes to generate a recommendation. Consider that
two files A and B are similar; “file A” is stored in peer 1,
which is available on the network between 14 and 16 hours,
totaling two hours of availability; “file B”is stored on another
peer that is available on the network from 14 to 18 hours,
totaling 4 hours of availability. In this way, the file that will
be better ranked as the file will be available on the network
for more time than “file A”, allowing the download to occur
in a wider window of time. The chief objective is to reduce
the risk of the user being unable to perform the download,
such that a recommended file is always accessible to the user.
Download Rate Factor: Download Rate Factor refers
to the available rate to perform the download of a file on the
cloud. The goal is to have files that reduce the time spent
downloading which are better than other unranked files.
The contribution of this factor in reducing the time spent
downloading a recommended file is produced in conjunction
with the factor “File Size” [Explained next point]; for exam-
ple, if we have two files similar to user preferences where “file
A” has a size equal to 10 Gigabytes and “file B” has a size
equal to 2 Gigabytes, and the download rate is the same for
both files, “file A” will be better ranked than”file B” by pro-
viding the best economy in the time spent in downloading.
The download rate can modify the rank of recommendations
depending on the time the recommendation is calculated,
especially in environments where the download rate is oscil-
lating. This factor has a value from 0 to 12 Megabits per
second (Mbps), which represents the overall average rate of
downloads. This factor is calculated by the following Equa-
tion 3:
T d = ns ·
1
n
(3)
The factor is represented by T d, or “Download Rate Fac-
tor”, which represents the rate in Mbps download, then this
value is normalized to a value between 0 and 1, which repre-
sents the value of the global average rate in Mbps downloads.
File Size Factor: This factor corresponds to the size
of the file candidate to be recommended and aims to con-
tribute to the task of alleviating the time spent downloading
a recommended file. As explained above in “Download Rate
Factor”, the File Size Factor is directly related to the factor
that measures the download rate available.
The rank recommendation changes according to the rate
available for download; if the download rate is low, smaller
files should be better than their larger ranked counterparts.
Likewise, when the download rate is high, larger files should
be better ranked. Here’s an example of the ranking of this
factor: “file A” is similar to “file B”; “file A” has a size equal
to 9 gigabytes. “File B” has a size of 2 gigabytes. Thus,
the file will be ranked better by offering better conditions
for the realization of the download (smaller size) whereas
the download rate is low. The calculation of this factor is
performed by the Equation 4:
S = T ·
1
n
(4)
The “File Size Factor” is represented by S, which is the file
size to be recommended. File size is measured in gigabytes
(GB) due to the fact that most cloud storage systems limit
the maximum sizes of files that can be saved on the cloud and
the space available to the user in the system in gigabytes.
The file size is multiplied by
1
n
, which is normalized by a
value from 0 to 1, and the value 1 is divided by n, which is
the maximum size of a file that can be stored on the storage
system.
File Relevancy Factor: This factor is the social im-
portance of a file in the cloud, determined by the amount
of downloads made of that file. The higher the number of
downloads of a file, the greater the popularity of the file on
the network, resulting in an improved position of this file.
Here’s an example of the ranking of this factor: “file A” is
similar to the “file B”; “file A” already had 10 downloads,
“file B” already had 16 downloads. Thus, “file B” will be
ranked better than “file A” for having a greater number of
downloads in the system. The calculation of this factor is
represented by the Equation 5:
R = Qd ·
1
n
(5)
The file relevancy is represented by R. For every down-
load done for a particular file, the downloads of that file are
counted in increments of 1. This value is measured from 0 to
n, where n is the largest amount of downloads performed in
a single file on the network. The value of n is obtained from
the observation of the file download history on the system.
The calculation of the factor Qd, the number of downloads
of a file, is normalized by dividing by
1
n
, so the resulting
value of this factor is between 0 and 1.
3.1 Factor Weight
In a recommendation engine the factors must be balanced
by weights, to compose the score recommendation, resulting
in a ranking of the items that should be recommended to the
user. In the model that we propose in this study, in weights
for each factor were defined based on relevancy factor in
building the objective of the proposed model. In Table of
Weights, we present the proposed weights for each factor.
Table 1: Factors Weights
Factors Weights
Similarity 4
Availability 2
Rate Download 2
File Size 1
File Relevance 1
In Table the weights of each factor of the model are pre-
sented in RecCloud. Below we detail how the weights were
determined for each factor.
• The Similarity factor has a weight of 4 and repre-
sents 40% of the score of the recommendation to ensure
that the contents of a file recommended to the user is
similar their preferences. Another motivating factor
for the point corresponding to a 40% similarity score
of the recommendation is the purpose of alleviating or
solving a major problem of the technique of content-
based recommendation, cited by [8] [7] [6], which is the
suggestion of items that are always very similar, lim-
iting users in discovering new content, and thus our
recommendation model has answered user preferences
and at the same time will be recommending new con-
tent that is related to the contents of user preferences.
A file that has a similarity equal to 0 compared to user
preferences should not be recommended.
• The Availability factor has a weight of 2, which
represents 20% of the score for a recommendation. It
represents the time when a database server is available
on the network, allowing the download of a file that is
recommended. This factor is extremely important for
recommendations based on cloud fea-tures, and repre-
sents one of the main features and advantages of using
cloud storage system files. A file can only be recom-
mended to the user if it is stored on an available server.
• The Download Rate factor has a weight of 2, which
represents 20% of a recommendation. A file that has
a low download rate with a size larger than other sim-
ilar files, the recommendation score will be lower and
therefore will not be as good as similar ones, ranked
successively by downloads that require more time and
processing. A file with a low download rate may ap-
pear in the top rankings of a recommendation, since its
size is proportional to the low rate of download. For a
file in the cloud to become recommended, this factor
must be >0, thus it will be possible to download the
file.
• File Size factor is assigned a weight of 1, representing
10% of the score of the recommendation. Less critical,
this factor has less weight than the other factors. Thus
a file that has a size equal to the maximum accepted by
the environment may be recommended if your down-
load rate is proportional, ensuring good performance
in the download file.
• Relevancy File factor is assigned a weight of 1, rep-
resenting 10% of the score of recommendation. Less
critical, this factor has less weight than the other fac-
tors. Thus, a file that is not popular in the cloud can
be recommended to the user, just like a new network
file if it is well ranked among the other factors in the
model.
3.2 Calculations of Recommendations
In this section we present the calculation of recommenda-
tion, which consists of weighting factors with their weights.
The calculation of the recommendations is represented by
the formula in Figure 1:
The calculation presented in the recommendation score is
equal to the result of the weighing of the factors by their
Figure 1: Recommendation Calculation.
respective weights. The RS multiplies the factor by factor
Availability Dp for the Download Rate T d, the product of
this multiplication is subtracted by file size, and the result
is multiplied by the result of the sum of the similarity factor
S, with the Relevancy factor R. After this process the result
is normalized between 0 (zero) and 1 (one). Thus, the score
is always a value between 0 (zero) and 1 (one).
The similarity factor is added to the relevancy factor with
the objective of recommending similar content to user pref-
erence files and files that are most relevant in the cloud,
from unranked to top-ranked recommendations. The file
size factor penalizes the availability and download rate fac-
tors, aiming to provide the user with better conditions for
the realization of the recommended download file, lessening
the time spent downloading and recommending files that
have higher rates of availability in the cloud.
A file will only be recommended to the user if their recom-
mendation score is greater than 0 (zero). Files with a score
equal to or less than 0 (zero) are not recommended to the
user who requested the recommendations.
4. RESULTS
The experiment conducted in this paper was performed
in a real environment data storage in the cloud. The ex-
periment presented below, provides partial results of this
research, generated by simulation users using the system.
The main goal of this experiment is analyse the relevance of
the recommended file in relation with the preference elicited
by the user.
The experiment conducted in this study evaluated the rec-
ommendations made by the system. In this experiment we
used a database containing 100 scientific articles in the pub-
lic domain, from this cloud-based, recommendations were
requested to distinct content files. In total 50 recommenda-
tions were evaluated, which were assessed as Like or Dislike.
In the event that a recommendation does not meet the pref-
erences and expectations of the evaluator should receive the
same evaluation Dislike or Like in the case of recommenda-
tion suit the preferences of the evaluator. Figure 2 shows
the results of the evaluations.
Figure 2: Evaluation.
From the analysis of the values shown in Figure 2, we
infer that 85% of the recommendations received positive re-
views, which is that most of the recommendations generated
attained to the expectations of the evaluator. This way, val-
idates recommendations generated and proposed model of
recommendation.
5. CONCLUSION AND FUTURE WORKS
This paper investigates the impact of factors derived from
the cloud on generating recommendations into a cloud stor-
age environment. The mathematical model was presented
and proposed in this research, as well as the factors that
form the proposed model ”Cloud-based”. The development
of the system and initial experiments were deployed and ex-
ecuted in a real environment data storage in the cloud.
As future work, it is deemed important to redo and im-
prove the experiments presented in this article, using real
users of the cloud environment, as well as conduct new ex-
periments in order to compare the results obtained in this
model with other avaiables models in the literature. Par-
ticularly, we intend to propose new cloud-based factors that
may contribute to the improvement of the proposed model.
6. ACKNOWLEDGMENTS
This work was supported [in part] by the National In-
stitute of Science and Technology for Software Engineering
(INES
1
), funded for Facepe and CNPq, process 573964/2008-
4 and APQ-1037-1.03/08.
7. REFERENCES
[1] R. A. Baeza-Yates and B. Ribeiro-Neto. Modern
Information Retrieval. Addison-Wesley Longman
Publishing Co., Inc., Boston, MA, USA, 1999.
[2] Y. Blanco-Fern´andez, J. J. P. Arias, A. Gil-Solla, M. R.
Cabrer, and M. L. Nores. Providing entertainment by
content-based filtering and semantic reasoning in
intelligent recommender systems. IEEE Trans.
Consumer Electronics, 54(2):727–735, 2008.
[3] C.-F. Lai, J.-H. Chang, C.-C. Hu, Y.-M. Huang, and
H.-C. Chao. Cprs: A cloud-based program
recommendation system for digital tv platforms. Future
Gener. Comput. Syst., 27(6):823–835, June 2011.
[4] S. Lee, D. Lee, and S. Lee. Personalized dtv program
recommendation system under a cloud computing
environment. IEEE Trans. on Consum. Electron.,
56(2):1034–1042, May 2010.
[5] M. J. Pazzani and D. Billsus. Learning and revising
user profiles: The identification of interesting web sites.
Machine Learning, 27(3):313–331, 1997.
[6] F. Ricci, L. Rokach, B. Shapira, and P. B. Kantor,
editors. Recommender Systems Handbook. Springer,
2011.
[7] N. Stormer, H.; Werro and D. Risch. Recommending
products with a fuzzy classification. CollECTeR
Europe, 2006.
[8] D. M. Vieira. Sobre a interdependˆencia da
recomenda¸c˜ao de conte´udo e do desempenho da rede.
Master’s thesis, Universidade Federal do Rio de
Janeiro, UFRJ., 2013.
1
http://www.ines.org.br