ArticlePDF Available

Recommendation of Crowdsourcing Tasks Based on Word2vec Semantic Tags


Abstract and Figures

Crowdsourcing is the perfect show of collective intelligence, and the key of finishing perfectly the crowdsourcing task is to allocate the appropriate task to the appropriate worker. Now the most of crowdsourcing platforms select tasks through tasks search, but it is short of individual recommendation of tasks. Tag-semantic task recommendation model based on deep learning is proposed in the paper. In this paper, the similarity of word vectors is computed, and the semantic tags similar matrix database is established based on the Word2vec deep learning. The task recommending model is established based on semantic tags to achieve the individual recommendation of crowdsourcing tasks. Through computing the similarity of tags, the relevance between task and worker is obtained, which improves the robustness of task recommendation. Through conducting comparison experiments on Tianpeng web dataset, the effectiveness and applicability of the proposed model are verified.
This content is subject to copyright. Terms and conditions apply.
Research Article
Recommendation of Crowdsourcing Tasks Based on
Word2vec Semantic Tags
Qingxian Pan ,1,2 Hongbin Dong ,1Yingjie Wang,2Zhipeng Cai ,3and Lizong Zhang4
1College of Computer Science and Technology, Harbin Engineering University, Harbin 150001, China
2School of Computer and Control Engineering, Yantai University, Yantai 264005, China
3Department of Computer Science, Georgia State University, Atlanta, GA 30303, USA
4School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China
Correspondence should be addressed to Hongbin Dong;
Received 1 November 2018; Revised 18 February 2019; Accepted 3 March 2019; Published 24 March 2019
Guest Editor: Michele Nogueira
Copyright ©  Qingxian Pan et al. is is an open access article distributed under the Creative Commons Attribution License,
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Crowdsourcing is the perfect show of collective intelligence, and the key of nishing perfectly the crowdsourcing task is to allocate
the appropriate task to the appropriate worker. Now the most of crowdsourcing platforms select tasks through tasks search, but it is
short of individual recommendation of tasks. Tag-semantic task recommendation model based on deep learning is proposed in the
paper. In this paper, the similarity of word vectors is computed, and the semantic tags similar matrix database is established based
on the Wordvec deep learning. e task recommending model is established based on semantic tags to achieve the individual
recommendation of crowdsourcing tasks. rough computing the similarity of tags, the relevance between task and worker is
obtained, which improves the robustness of task recommendation. rough conducting comparison experiments on Tianpeng
web dataset, the eectiveness and applicability of the proposed model are veried.
1. Introduction
Deep learning was proposed by Georey Hinton et al. in
. is method simulates human brain neural network to
model and realize multiple level abstraction [, ]. In ,
Je Howe of American Wired magazine reporter proposed
crowdsourcing concept []. As a new kind of business model,
crowdsourcing has been widespread concern in various elds
and becomes the new hot point of computer research elds.
Task requester, crowdsourcing platform, and worker make
up crowdsourcing system []. e process of crowdsourc-
ing includes designing task, publishing task, selecting task,
sensing task, submitting solution, and integrating solution.
Among them, task selection is the key phase in the process
task that the appropriate worker selects appropriate task in
appropriate time [].
e popular crowdsourcing platforms use task searching
to get the favourite task by keyword searching []. However,
with the rapid development of crowdsourcing, the problem of
information overload is more and more serious. In addition,
it is more and more dicult to get the favourite crowdsourc-
ing task for worker. Recommender system is an eective
medium to solve the problem, which is used on many E-
Commerce Platforms, such as Alibaba, Amazon, and Netix
[]. But there are many problems which are not solved in rec-
ommender systems, such as similarity calculation, the lower
recommended accuracy, data sparseness, and cold boot. In
brief, improving the accuracy and reliability of recommender
systems has been paid more attention by scholars.
However, individual recommendation research of the
task is lesser in crowdsourcing, and task selection is relied
on hobbies and expertise. Few crowdsourcing platforms can
actively recommend task. is paper researches the crowd-
sourcing tasks recommendation model based on Wordvec
semantic tags in order to achieve individual recommendation
of crowdsourcing tasks [].
e main contributions of this paper include following
three contents:
Wireless Communications and Mobile Computing
Volume 2019, Article ID 2121850, 10 pages
Wireless Communications and Mobile Computing
Tas k D esig n
Task Publishing
Receive Answer
Finishing Answer
Task Selection
Task Reception
Task Solution
Answer Submission
F : e workow of crowdsourcing.
() Compute the similarity of word vectors and build the
semantic tags similar matrix database based on the
Wordvec deep learning.
() Research the task recommending model based on
semantic tags to achieve the individual recommen-
dation of crowdsourcing tasks. is paper computes
tag similar matrix.
() Utilizing the Tianpeng Web dataset, the experiments
are conducted. e experimental results show that the
model is feasible and eective. e model can be used
in other elds according to the dierent semantic
related works. e Workvec is discussed in Section . In
addition, the tasks recommendation model and realization
method based on semantic tags are researched in Section .
e comparison experiments, as well as the analysis for
the experimental results, are introduced in Section . e
conclusion is presented in Section .
2. Related Works
In order to discuss the related works for recommendation of
crowdsourcing, we, respectively, introduce the related works
of crowdsourcing and recommendations.
2.1. Crowdsourcing. In , Je Howe proposed crowd-
sourcing concept rstly []: a company or an institution
outsources the tasks performed by an employee in the past to
an unspecic public network in a free and voluntary manner.
With the development of crowdsourcing technology, the
dierent crowdsourcing concepts appeared. Chen et al. []
summarized  dierent crowdsourcing denitions. Feng et
al. [] gave the denition of crowdsourcing according to the
basic features of crowdsourcing. According to the denition,
crowdsourcing is a distributed problem-solving mechanism
opening to the Internet public, and it completes the tasks
that are dicult to complete by a computer through inte-
grating computers and the unknown public on the Internet
Crowdsourcing is successfully applied in language trans-
lation, image recognition, intelligent transportation, soware
development, entry interpretation, tourism photography, and
other elds, which has become the perfect embodiment
of group wisdom [, ]. Crowdsourcing is made up of
the task requester, crowdsourcing platform, and workers.
e crowdsourcing workow includes designing tasks by
task requester, publishing tasks, selecting tasks by workers,
solving tasks, submitting answer, and arranging answer. e
workow of crowdsourcing is shown by Figure . e public
participation is the basis of crowdsourcing. And the key to
high-quality complete crowdsourcing tasks is to recommend
appropriate tasks to appropriate worker in appropriate time
2.2. Recommender Systems. With the arrival of big data era,
the problem of information overload is more and more
serious and that nding the useful and best information
is more and more dicult. Recommender Systems is an
eective medium to solve the above problems []. However,
there are some inherent defects in recommendation systems,
such as low accuracy, data sparseness, cold boot, the defects
of the centralized system, similarity calculation, and being
easy to be attacked. In addition, many recommender sys-
more goods and seek the maximum benets, rather than
to recommend the best commodities to users. In brief, the
credibility and accuracy of recommendation systems need to
be improved, which has attracted the attention of scholars.
Yang et al. [] proposed a recommender system based on
transfer learning. Chen et al. [] proposed a recommender
system based on bind context. Tang et al. [] researched
recommender system based on crossing knowledge. Liu
[] and Zhou et al. [] researched recommender systems
for social recommendation. Combining Markov and social
attributes of users, Wang et al. [] proposed a probability-
based recommendation model to recommend items for
Wireless Communications and Mobile Computing
Input Projection Output
(a) CBOW model
Input Projection Output
(b) Skip-gram model
F : CBOW model and Skip-gram model.
the perspective of crowdsourcing platform. Based on the
task discovery model, crowdsourcing platform recommends
related tasks according to the preferences of workers []. e
main crowdsourcing platforms basically adopt the way of task
search and rarely adopt the method of task recommendation
[]. Some task recommendation methods were researched
based on traditional recommendation methods, including
content-based recommendation, collaborative ltering, and
mixed recommendation algorithms. Ambati et al. [] pro-
posed the use of task and workers' historical information for
task recommendation. Yuen et al. [] proposed a worker-
task recommendation model through combining the histori-
cal information of workers and browsing history. Deng et al.
[] researched the problem of maximizing task selection for
spatiotemporal tasks.
3. Word2vec
In , Bengio et al. [] proposed Neural Network Lan-
guage Model-NNLM based on  levels. NNLM is used to
compute the probability (𝑡= | ) of the next
word 𝑡of a context, and word vector is the byproduct
during training. Wordvec is a tool based on deep learning
to compute the similarity of word vector which was proposed
by Google company in  []. It converts the word into
word vector and computes similarity according to the cosine
segmentation are input, and the output-word vector can be
used to do a lot of Natural Language Processing (NLP) related
work, such as clustering, looking for synonyms, and part of
speech analysis.
Wordvec uses word vector presentation mode based
on Distributed representation. Distributed representation is
proposed by Hinton in  []. Its basic thought is to map
each word into a -dimension real vector by training (is
a hyperparameter in the model) and to judge the semantic
similarity between them according to the distance between
words (such as cosine similarity, Euclidean distance). It uses
a ‘ layers neural network, input layer-hidden layer-output
layer. Its core technology is to use Human code according to
word frequency, which makes the activated content basically
consistent of all word frequency similar words in hidden
layer. e higher the frequency of the word, the less the
number of hidden layers they activate, which eectively
reduces the computational complexity.
Compared with Latent Semantic Index-LSI and Latent
Dirichlet Allocation-LDA, Wordvec uses the context of
words and makes the semantic information richer. ere are
two kinds of training model-CBOW (Continuous Bag-of-
Words) and Skip-gram in Wordvec, which are shown by
Figure . Two models both include input layer, projection
layer, and output layer. CBOW model predicts the current
predicts context according to the current words.
In this paper, the objective optimization function of
CBOW is expressed by
where 𝑤means the word vector of the root node in the
Homan tree, ()represents the context of word ,
that is, the collection of peripheral words, 𝑤represents the
nodes number of the path 𝑤,and𝑤
𝑗∈ {0,1}represents
Human code of the word ;𝑤
𝑗−1 ∈𝑚represents
the vectors corresponding to nonleaf nodes of the path 𝑤.
erefore, the logistic regression probability (𝑤
that passes a node intheHomantreeisshownby().
e corresponding parameter (𝑇
𝑗−1)is shown by ().
𝑗−1, 𝑤
𝑗−1, 𝑤
𝑗=1. ()
𝑗−1= 1
𝑗−1 ()
Wireless Communications and Mobile Computing
In order to clearly represent the meaning of logistic regression
probability (𝑤
𝑗−1), we combine () and () to obtain
the value of (𝑤
For avoiding the value of ( | ())too small, log-
arithm Likelihood function is used to represent the objective
function; thus, () can be converted into
log (|()) ()
rough combining () and (), the objective function is
shown by
𝑗=2 𝑇
𝑗=2 1𝑤
𝑗log 𝑇
𝑗log 1𝑇
erefore, () is the object function of CBOW in this paper.
Wordvec uses random gradient ascent method to optimize
the object function of CBOW.
4. The Tasks Recommendation Model and
Realization Method Based on Semantic Tags
4.1. Basic Model Frame and Mathematical Computation
Model. e results and discussion may be presented sepa-
rately, or in one combined section, and may optionally be
divided into headed subsections.
e core of the model is the research of tag similar matrix.
e model uses tag similar matrix to compute the similarity
of workers and tasks, produces worker-tag similar matrix,
and realizes tasks recommendation or workers recommenda-
computing. Worker-tag matrix is got according to history
work information of the worker, registration information, etc.
And task-tag matrix is got according to task description, task
classication, etc.
Dene tag similar matrix ∈
𝑚×𝑚,𝑙11 ⋅⋅⋅ 𝑙1𝑚
𝑙𝑚1 ⋅⋅⋅ 𝑙𝑚𝑚 ,
is a symmetric matrix, that is, 𝑖𝑗 =
𝑗𝑖,𝑖𝑗 represents the
similarity of tag and tag ,𝑖𝑗 ∈[0,1],anditsvalueisgot
through using Wordvec tool to compute. Dene worker-tag
matrix ∈
𝑛×𝑚,𝑤11 ⋅⋅⋅ 𝑤1𝑚
𝑤𝑛1 ⋅⋅⋅ 𝑤𝑛𝑚 ,and,amongthem, 𝑖𝑗 =
{1, worker has tag ; 0, worker has not tag }.
We dene the task-tag matrix ∈
𝑡11 ⋅⋅⋅ 𝑡1𝑚
𝑡𝑝1 ⋅⋅⋅ 𝑡𝑝𝑚 ,and,amongthem, 𝑖𝑗 ={1,task has tag ; 0,
task has not tag }.
erefore, the worker-task similar matrix 𝑇is obtained
by (), where is the worker-tag matrix, is the tag
similar matrix, and 𝑇means the task-tag transposed matrix.
rough (), the relationship between workers and tasks can
be obtained.
11 ⋅⋅⋅ 1𝑚
𝑛1 ⋅⋅⋅ 𝑛𝑚
11 ⋅⋅⋅ 1𝑚
𝑚1 ⋅⋅⋅ 𝑚𝑚
11 ⋅⋅⋅ 1𝑚
𝑝1 ⋅⋅⋅ 𝑝𝑚
4.2. Basic Flow. e main steps of the process of the proposed
recommendation model are shown as follows: () compute
the word vectors based on Wordvec; () computing the
similarity of word vectors; () generating the tag similar
matrix; () obtaining the worker-tag matrix and task-tag
matrix; () computing the worker-task similarity matrix;
() 2standardization and normalization; () tasks and
workers recommendation. Tag similar matrix generation
uses Wordvec tool. Worker-task similarity computation uses
mathematical methods introduced in the previous section.
e section mainly introduces standardization and normal-
ization method.
2standardization method: the 2norm denition of
vector (1,2,...,𝑛)is shown as follows: () =
In order to make normalized to the unit 2norm, the
mapping between and 󸀠is established, so that the 2norm
of 󸀠is , and the proof is shown as follows:
()2+ 2
()2+⋅⋅⋅+ 𝑛
where the value of 󸀠
𝑖is shown by
In order to get the standardization and generality of data, the
standardization data of 2is normalized, so that the data fall
in the interval [0,1], the conversion formula is shown by (),
where min()means the minimum in ,andmax()is the
maximum in .
𝑖=𝑖min ()
max ()min ()()
Wireless Communications and Mobile Computing
T : Wordvec parameter setting.
Parameter Value Parameter Value
window  hs
size  cbow yes
threads  alpha .
binary negative 
T : Tag similar matrix L of simulation dataset.
L L L L L L L ...
L . . . . . . . ...
L . . . . . . . ...
L . . . . . . . ...
L . . . . . . . ...
L . . . . . . . ...
L . . . . . . . ...
L . . . . . . . ...
... ... ... ... ... ... ... ... ...
L L L L L L L L L L L ...
W   ...
W  ...
W  ...
W   ...
W  ...
... ... ... ... ... ... ... ... ... ... ... ... ...
5. Experiment and Simulation
In this section, we conduct the comparison experiments on
the simulation dataset and real dataset, respectively. e real
dataset is the dataset crawled from Tianpeng web site.
In the experiment, text is corpora training set, and
experimental environment is Intel Core (TM) i-U CPU
@.GHz dual-core, and GB memory.
5.1. e Experiments Conducted on Simulation Dataset. In
this group of comparison experiments, the training param-
eters are shown in Table .
In addition, the tag similar matrix aer training is shown
in Table . In the matrix, the elements indicate the similarities
between tags.
In this group of experiments, there are  workers, 
tasks,  tags in the experiment. e worker-tag matrix is
generated randomly, which is shown in Table . e elements
in Table  represent the similarities between workers and tags.
e task-tag matrix is shown in Table . e elements in
Table  indicate the similarities between tasks and tags. Aer
computing the worker-task matrix, the standardization and
normalization of worker-task matrix are shown in Table .
e elements in Table  mean the similarities between
workers and tasks.
L L L L L L L L L L L ...
T   ...
T  ...
T  ...
T   ...
T   ...
... ... ... ... ... ... ... ... ... ... ... ... ...
T : Worker-task similar matrix.
T T T T T T T ...
U . . . . . . . ...
U . . . . . . . ...
U . . . . . . . ...
U . . . . . . . ...
U . . . . . . . ...
... ... ... ... ... ... ... ... ...
Recall, precision, and F-measure are commonly used
evaluation indexes []. e computing methods for the three
to (), (), and (), it can be seen that F-measure index is
the comprehensive measure index through considering both
recall and precision.
=the quantity of related information retrieved
the quantity of related information in system
=the quantity of related information retrieved
the quantity of all information retrieved
F-measure =Precision ×Pecall
Precision +Recall ()
and the recall, precision, and F-measure of the  tasks are
obtained. e comparison experimental results on recall,
precision, and F-measure indexes are shown by Figures ,
, and , respectively. In these experiments, x-coordinate
indicates the Task-tag matrix T, and y-coordinates are recall
rate, precision rate, and F-measure rate, respectively. From
the experimental results, it can be seen that threshold=. has
better performance than other two thresholds comprehen-
In addition, we compare the proposed method with the
method of tasks research. e experimental result is shown in
Figure , where x-coordinate indicates the Task-tag matrix T
used in this paper is better than the method used in tasks
research, which proves the eectiveness of the method of
this paper. In addition, the potential workers can be found
Wireless Communications and Mobile Computing
L L L L L L L L L L L L L ...
L . . -. . . . . -. -. -. -. . -. ...
L . . . . . . . . . . . . . ...
L -. . . . . . . . . . . . . ...
L . . . . . . . . . . . . . ...
L . . . . . . . . . . . . . ...
L . . . . . . . . . . . . . ...
L . . . . . . . . . . . . . ...
L -. . . . . . . . . . . . . ...
L -. . . . . . . . . . . . . ...
L -. . . . . . . . . . . . . ...
L -. . . . . . . . . . . . . ...
L . . . . . . . . . . . . . ...
L -. . . . . . . . . . . . . ...
... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
T1 T2 T3 T4 T5 T6 T7 T8 T9 T10 T11 T12 T13 T14 T15 T16 T17 T18 T19 T20 T21 T22 T23 T24 T25 T26 T27 T28 T29 T30 T31 T32 T33 T34 T35 T36 T37 T38 T39 T40T41 T42 T43 T44 T45 T46 T47 T48 T49 T50
F : Recall of dierent thresholds.
by lowering the threshold, which can be used to analyze the
potential users.
5.2. e Experiments Conducted on Tianpeng Dataset. e
form a corpus for training, and the tag similarity matrix was
obtained as shown in the Table .
We sele c t   w o r k e r s a n d  t a s k s f rom Ti anpeng
dataset as experimental objects. Utilizing the dataset, we con-
duct the comparison experiments to verify the eectiveness
of the proposed model. In the comparison experiments, .
as recommended objects. e experimental results were
compared with binary map matching and greedy algorithm
in terms of recall rate, accuracy rate, and F-value measure
According to the recall measure index, the comparison
experimental result is shown by Figure . e x-coordinate
indicates the Task-tag matrix T, and y-coordinate presents
the recall rate. From the experimental result, it can be
seen that the proposed recommendation model has the best
performance on recall rate through compared with greedy
algorithm and bipartite graph matching. In addition, the
proposed recommendation model has better stability with the
changing of T.
Figure  shows the experimental result on precision rate.
Similarly, the x-coordinate indicates the Task-tag matrix T,
and y-coordinate means the precision rate. In experimental
Wireless Communications and Mobile Computing
T1 T2 T3 T4 T5 T6 T7 T8 T9 T10 T11 T12 T13 T14 T15 T16 T17 T18 T19 T20 T21 T22 T23 T24 T25 T26 T27 T28 T29 T30 T31 T32 T33 T34 T35T36 T37 T38 T39 T40 T41 T42 T43 T44 T45 T46 T47 T48 T49 T50
F : Precision of dierent thresholds.
T1 T2 T3 T4 T5 T6 T7 T8 T9 T10 T11 T12 T13 T14 T15 T16 T17 T18 T19 T20 T21 T22 T23 T24 T25 T26 T27 T28 T29 T30 T31 T32 T33 T34 T35 T36 T37 T38T39 T40 T41 T42 T43 T44 T45 T46 T47 T48 T49 T50
F : F-measure of dierent thresholds.
result, the average precision rate of the proposed recommen-
dation is better than other two algorithms. From Figure ,
it can be seen that the proposed recommendation has the
best performance on precision rate through compared with
greedy algorithm and bipartite graph matching.
According to the experimental result on F-measure
dation also has the best performance on F-measure. In addi-
tion, F-measure index is the comprehensive measure index
through considering both recall and precision. erefore, we
can infer that the proposed recommendation has the best
performance through compared with greedy algorithm and
bipartite graph matching algorithm.
rough the comparison shows that the proposed meth-
ods than the binary map matching method, greedy algorithm
accuracy with high and low, because to make the task would
be able to complete the task of recommended for workers
as much as possible, including the potential of workers, so
the accuracy index can be put lower in the recommended
requirements. It can be seen that the method proposed in this
paper has higher practical signicance and application value.
Wireless Communications and Mobile Computing
T1 T2 T3 T4 T5 T6 T7 T8 T9 T10 T11 T12 T13 T14 T15 T16 T17 T18 T19 T20 T21 T22 T23 T24 T25 T26 T27 T28 T29 T30 T31 T32 T33 T34 T35 T36 T37 T38 T39 T40T41 T42 T43 T44 T45 T46 T47 T48 T49 T50
this paper
tasks research
F : Comparison of experimental results.
T1 T2 T3 T4 T5 T6 T7 T8 T9 T10 T11 T12 T13 T14 T15 T16 T17 T18 T19 T20
this paper
greedy algorithm
bipartite graph matching
F : Recall of dierent methods.
6. Conclusion
Crowdsourcing is the prefect shown of group wisdom. It
was applied in many elds as a new business model. In
recent years, it has become the new hot research in computer
science. e success key of crowdsourcing is to recommend
task to appropriate worker. e recommendation method
based on tag similar matrix is proposed in this paper. e
method uses Wordvec technology to generate tag similar
According to the comparison experiments, it proves that
method can be extended to other elds with the dierent
Because the success key of crowdsourcing is the partic-
ipate rate of workers, it has become a hot topic in crowd-
sourcing research, such as reputation mechanism, prefer-
ence evolution, and privacy protection of workers. It will
be the focus of future research to improve the accuracy
of recommender systems by combining recommender sys-
tems with reputation, preference evolution and historical
Wireless Communications and Mobile Computing
T1 T2 T3 T4 T5 T6 T7 T8 T9 T10 T11 T12 T13 T14 T15 T16 T17 T18 T19 T20
this paper
greedy algorithm
bipartite graph matching
F : Precision of dierent methods.
T1 T2 T3 T4 T5 T6 T7 T8 T9 T10 T11 T12 T13 T14 T15 T16 T17 T18 T19 T20
this paper
greedy algorithm
bipartite graph matching
F : F-measure of dierent methods.
Data Availability
e [Tianpeng] dataset used to support the ndings of this
study are available from the corresponding author upon
Conflicts of Interest
e authors declare that there are no conicts of interest
regarding the publication of this paper.
dation of China under Grants No. , No. ,
and No. , the China Postdoctoral Science Founda-
tion under Grant No. M, the National Science
Foundation (NSF) under Grants No. , No. ,
and No. , and the Natural Science Foundation of
Sichuan Province under Grant No. HH.
[] Y. Cun, Y. Bengio, and G. Hinton, “Deep learning,Nature,vol.
[] Y. Wang, Z. Cai, G. Yin, Y. Gao, X. Tong, and G. Wu,
An incentive mechanism with privacy protection in mobile
crowdsourcing systems,Computer Networks,vol.,pp.
, .
[] J. Howe, “e rise of crowdsourcing,Wired Magazine ,vol.,
no. , pp. –, .
[] Z. Cai and X. Zheng, “A private and ecient mechanism for data
uploading in smart cyber-physical systems,IEEE Transactions
on Network Science and Engineering,p.,.
[] Y.Hu,Y.Wang,Y.Li,andX.Tong,“Anincentivemechanism
based on multi-attribute reverse auction in mobile crowdsourc-
ing,Sensors, vol. , no. , p. , .
 Wireless Communications and Mobile Computing
tive mechanisms for geographical position conicting mobile
crowdsensing systems,IEEE Transactions on Computational
Social Systems,vol.,no.,pp.,.
[] R. Katarya and O. P. Verma, “Recent developments in aective
recommender systems,Physica A: Statistical Mechanics and its
[] K. W. Church, “Emerging trends: WordVec,Natural Language
[] X. Chen, P. N. Bennett, K. Collins-ompson, and E. Horvitz,
“Pairwise ranking aggregation in a crowdsourced setting,” in
Proceedings of the Sixth ACM International Conference,pp.
[] J. Feng, G. Li, and J. Feng, “A sur vey on crowdsourcing,Chinese
Journal of Computers, vol. , pp. –, .
[] Z. Duan, W. Li, and Z. Cai, “Distributed auctions for task
assignment and scheduling in mobile crowdsensing systems,”
in Proceedings of the 2017 IEEE 37th International Conference
on Distributed Computing Systems (ICDCS), pp. –, GA,
USA, June .
[] Y.Wang,Z.Cai,X.Tong,Y.Gao,andG.Yin,“Truthfulincentive
mechanism with location privacy-preserving for mobile crowd-
sourcing systems,Computer Networks,vol.,pp.,.
[] Y.Wang,Y.Li,Z.Chi,andX.Tong,“etruthfulevolutionand
incentive for large-scale mobile crowd sensing networks,IEEE
[] J. L. Cai, M. Yan, and Y. Li, “Using crowdsourced data in
location-based social networks to explore inuence maximiza-
tion,” in Proceedings of the 35th Annual IEEE International
Conference on Computer Communications,.
[] P. Resnick and H. R. Varian, “Recommender systems,Commu-
nications of the ACM,vol.,no.,pp.,.
[] W. Pan and Q. Yang, “Transfer learning in heterogeneous
collaborative ltering domains,Artificial Intelligence,vol.,
[] G. Chen and L. Chen, “Recommendation based on contextual
opinions,UMAP 2014, LNCS 8538,pp.,.
[] L. Liu, J. Tang, J. Han, and S. Yang, “Learning inuence from
heterogeneous social networks,Data Mining and Knowledge
[] J. Tang, X. Hu, and H. Liu, “Social recommendation: a review,
Social Network Analysis and Mining, vol. , no. , pp. –,
[] L. L¨
“Recommender systems,Physics Reports,vol.,no.,pp.
, .
[] Y.Wang,G.Yin,Z.Cai,Y.Dong,andH.Dong,“Atrust-based
probabilistic recommendation model for social networks,” Jour-
[] L. Zhang, Z. Cai, and X. Wang, “FakeMask: a novel privacy
preserving approach for smartphones,IEEE Transactions on
Network and Service Management,vol.,no.,pp.,
[] V.Ambati,S.Vogel,andJ.Carbonell,“Towardstaskrecommen-
dation in micro-task markets,” in Proceedings of the 25th AAAI
Workshop in Human Computation,pp.,CA,USA,.
factorization in task recommendation in crowdsourcing sys-
tems,” in Proceedings of the 19th International Conference on
Neural Information Processing, pp. –, Springer, Doha,
Qatar, .
[] D. Deng, C. Shahabi, and U. Demiryurek, “Maximizing the
number of worker’s self-selected tasks in spatial crowdsourc-
ing,” in Proceedings of the 21st ACM SIGSPATIAL International
Conference,pp. –, FL, USA, November .
] J.Turian,L.Ratinov,andY.Bengio,“Wordrepresentations:a
simple and general method for semi-supervised learning,” in
Proceedings of the 8th Annual Meeting of the Association for
Computational Linguistics, pp. –, Uppsala, Sweden, July
[] Y. Yao, X. Li, X. Liu et al., “Sensing spatial distribution of
urban land use by integrating points-of-interest and Google
WordVec model,International Journal of Geographical Infor-
mation Science,vol.,no.,pp.,.
[] R. Wang, H. Zhao, B.-L. Lu, M. Utiyama, and E. Sumita, “Bilin-
gual continuous-space language model growing for statistical
machine translation,IEEE Transactions on Audio, Speech and
Language Processing,vol.,no.,pp.,.
[] L. Li, G. Liu, and Q. Liu, “Advancing iterative quantization
hashing using isotropic prior,” in Proceedings of the Inter national
Conference on Multimedia Modelling, pp. –, Springer
International Publishing, .
... Word2Vec is a machine learning-based tool for calculating word vector similarity. It converts words into word vectors and calculates the cosine similarity of word vectors (Pan et al., 2019). Figure 2 illustrates the two training model types available in Word2Vec: CBOW (Continuous Bag of Words) and Skip-Gram. ...
... CBOW Model and Skip-Gram Architecture(Pan et al., 2019) ...
Full-text available
Before watching a movie, people usually read reviews written by movie critics or regular audiences to gain insights about the movie’s quality and discover recommended films. However, analyzing movie reviews can be challenging due to several reasons. Firstly, popular movies can receive hundreds of reviews, each comprising several paragraphs, making it time-consuming and effort-intensive to read them all. Secondly, different reviews may express varying opinions about the movie, making it difficult to draw definitive conclusions. To address these challenges, sentiment analysis using CNN and LSTM models, known for their effectiveness in classifying text in various datasets, was performed on the movie reviews. Additionally, techniques such as TF-IDF, Word2Vec, and data balancing with SMOTEN were applied to enhance the analysis. The CNN achieved an impressive sentiment analysis accuracy of 98.56%, while the LSTM achieved a close 98.53%. Moreover, both classifiers performed well in terms of the F1-score, with CNN obtaining 77.87% and LSTM achieving 78.92%. These results demonstrate the effectiveness of the sentiment analysis approach in extracting valuable insights from movie reviews and helping people make informed decisions about which movies to watch.
... To address this issue, numerous techniques that consider the semantic and contextual information of terms have been presented in the literature. From the literature, we identified VOLUME 11, 2023 a renowned word-embedding technique employed in several domains [1], [47], [48], [49], [50]. This technique has been used to represent document vocabularies. ...
... To evaluate our proposed technique, we utilized a well-known evaluation called accuracy. The reason of choosing these evaluation measure is its frequent usage in literature [24], [48], [50]. ...
Full-text available
Many individuals, including researchers, professors, and students, encounter difficulties when searching for scholarly documents, papers, and journals within a specific domain. Consequently, scholars have begun to focus on document classification problem, offering various methods to address this issue. Researchers have utilized diverse data sources, such as citations, metadata, content, and hybrids, in their approaches.In these sources, the meta-data-based approach stands out for research paper classification due to its availability at no cost. Various scholars have employed different metadata parameters of research articles, including the title, abstract, keywords, and general terms, for research paper classification. In this study, we chose four meta-data-based features—title, keyword, abstract, and general terms—from the CENTOS dataset, which was prepared by ACM. To represent these features numerically, we employed a semantic-based model called BERT instead of the commonly used count-based models. BERT generates a 768-dimensional vector for each record, which introduces significant time complexity during computation. Additionally, our proposed model optimizes the features using a genetic algorithm. Optimal feature selection performances a crucial role in this domain, enhancing the overall accuracy of the document classification system while reducing the time complexity associated with selecting the most relevant features from this large-dimensional space. For classification purposes, we employed GNB and SVM classifiers. The outcomes of our study exposed that the combination of title and keywords outperformed other combinations.
... Many researchers have tried to introduce deep learning into the field of crowdsourcing recommendation algorithms. For example, Pan et al. [18] proposed a deep learning-based tag semantic task recommendation model, which calculated the similarity of word vectors through Word2Vec software. The study established a semantic tag similarity matrix database, as a means to realize personalized recommendations for crowdsourcing tasks. ...
... For topic modeling and recommendation tasks, the semantic similarity of word vectors is employed to extract keywords. Word2Vec effectively expresses the relationship between job and worker, improving the system's overall performance (Pan et al. 2019b). An ontology-based word embedding is utilized to extract key geoscience terms and gets an F1-score of 40.7% (Qiu et al. 2019). ...
Full-text available
The selection of word embedding and deep learning models for better outcomes is vital. Word embeddings are an n-dimensional distributed representation of a text that attempts to capture the meanings of the words. Deep learning models utilize multiple computing layers to learn hierarchical representations of data. The word embedding technique represented by deep learning has received much attention. It is used in various natural language processing (NLP) applications, such as text classification, sentiment analysis, named entity recognition, topic modeling, etc. This paper reviews the representative methods of the most prominent word embedding and deep learning models. It presents an overview of recent research trends in NLP and a detailed understanding of how to use these models to achieve efficient results on text analytics tasks. The review summarizes, contrasts, and compares numerous word embedding and deep learning models and includes a list of prominent datasets, tools, APIs, and popular publications. A reference for selecting a suitable word embedding and deep learning approach is presented based on a comparative analysis of different techniques to perform text analytics tasks. This paper can serve as a quick reference for learning the basics, benefits, and challenges of various word representation approaches and deep learning models, with their application to text analytics and a future outlook on research. It can be concluded from the findings of this study that domain-specific word embedding and the long short term memory model can be employed to improve overall text analytics task performance.
... Image Processing [29] 90% accuracy Natural Language Processing Tasks [30] More than 90% accuracy Recommendation Tasks [31] Up to 95% accuracy Biosciences [32] More than 90% accuracy Semantics Task [33] More than 90% accuracy Malware Detection Tasks [34] Up to 99% accuracy Word embedding is most important and efficient nowadays in terms of representing a text in vectors without losing its semantics. Word2Vec can capture the context of a word, semantic and syntactic similarity, relation with other words, etc. Word2Vec was presented by Tomas Mikolov in 2013 at Google [35]. ...
Full-text available
With the increase in users of social media websites such as IMDb, a movie website, and the rise of publicly available data, opinion mining is more accessible than ever. In the research field of language understanding, categorization of movie reviews can be challenging because human language is complex, leading to scenarios where connotation words exist. Connotation words have a different meaning than their literal meanings. While representing a word, the context in which the word is used changes the semantics of words. In this research work, categorizing movie reviews with good F-Measure scores has been investigated with Word2Vec and three different aspects of proposed features have been inspected. First, psychological features are extracted from reviews positive emotion, negative emotion, anger, sadness, clout (confidence level) and dictionary words. Second, readablility features are extracted; the Automated Readability Index (ARI), the Coleman Liau Index (CLI) and Word Count (WC) are calculated to measure the review’s understandability score and their impact on review classification performance is measured. Lastly, linguistic features are also extracted from reviews adjectives and adverbs. The Word2Vec model is trained on collecting 50,000 reviews related to movies. A self-trained Word2Vec model is used for the contextualized embedding of words into vectors with 50, 100, 150 and 300 dimensions.The pretrained Word2Vec model converts words into vectors with 150 and 300 dimensions. Traditional and advanced machine-learning (ML) algorithms are applied and evaluated according to performance measures: accuracy, precision, recall and F-Measure. The results indicate Support Vector Machine (SVM) using self-trained Word2Vec achieved 86% F-Measure and using psychological, linguistic and readability features with concatenation of Word2Vec features SVM achieved 87.93% F-Measure.
... The efficiency of worker selection is improved through this method. Pan et al. [33] established a semantic tags similarity matrix database based on the Word2vec deep learning method. Through computing the similarity of tags, the correlation between task and worker and the similarity between workers are obtained, which achieves personalized task recommendations for workers. ...
Full-text available
With the development of the Internet of Things and the popularity of smart terminal devices, mobile crowdsourcing systems are receiving more and more attention. However, the information overload of crowdsourcing platforms makes workers face difficulties in task selection. This paper proposes a task recommendation model based on the prediction of workers’ mobile trajectories. A recurrent neural network is used to obtain the movement pattern of workers and predict the next destination. In addition, an attention mechanism is added to the task recommendation model in order to capture records that are similar to candidate tasks and to obtain task selection preferences. Finally, we conduct experiments on two real datasets, Foursquare and AMT (Amazon Mechanical Turk), to verify the effectiveness of the proposed recommendation model.
... As a result, they have overlooked the semantic and contextual information of keywords, potentially leading to the incorrect categorization of research publications. In this study, one of the most well-known techniques, word embedding, is used [16][17][18] . It can recognize the context of words in a document, such as semantic similarity, grammatical similarity, and relationships with other words. ...
Full-text available
Every year, around 28,100 journals publish 2.5 million research publications. Search engines, digital libraries, and citation indexes are used extensively to search these publications. When a user submits a query, it generates a large number of documents among which just a few are relevant. Due to inadequate indexing, the resultant documents are largely unstructured. Publicly known systems mostly index the research papers using keywords rather than using subject hierarchy. Numerous methods reported for performing single-label classification (SLC) or multi-label classification (MLC) are based on content and metadata features. Content-based techniques offer higher outcomes due to the extreme richness of features. But the drawback of content-based techniques is the unavailability of full text in most cases. The use of metadata-based parameters, such as title, keywords, and general terms, acts as an alternative to content. However, existing metadata-based techniques indicate low accuracy due to the use of traditional statistical measures to express textual properties in quantitative form, such as BOW, TF, and TFIDF. These measures may not establish the semantic context of the words. The existing MLC techniques require a specified threshold value to map articles into predetermined categories for which domain knowledge is necessary. The objective of this paper is to get over the limitations of SLC and MLC techniques. To capture the semantic and contextual information of words, the suggested approach leverages the Word2Vec paradigm for textual representation. The suggested model determines threshold values using rigorous data analysis, obviating the necessity for domain expertise. Experimentation is carried out on two datasets from the field of computer science (JUCS and ACM). In comparison to current state-of-the-art methodologies, the proposed model performed well. Experiments yielded average accuracy of 0.86 and 0.84 for JUCS and ACM for SLC, and 0.81 and 0.80 for JUCS and ACM for MLC. On both datasets, the proposed SLC model improved the accuracy up to 4%, while the proposed MLC model increased the accuracy up to 3%.
Full-text available
The existing plethora of document classification techniques exploits different data sources either from the content or metadata of research articles. Various journal publishers like Springer, Elsevier, IEEE, etc., do not provide open access to the content of research articles, whereas metadata is freely available there. Metadata like title, keyword, and abstract can serve as a better alternative to the content in various scenarios. In the current literature, researchers have assessed the role of some of the metadata individually. We believe that the collective contribution of metadata parameters can play a significant role in classifying research papers. This paper presents a comprehensive evaluation of the role of metadata, individually as well as in combinations to achieve the objective of research paper classification. Moreover, we have classified the research articles into ACM hierarchy root categories (e.g. general literature, hardware, software, etc.). In this comprehensive evaluation, we have assessed all the possible combinations of metadata features against different classifiers such as Random Forest, K Nearest Neighbor, and Decision Tree. The results of this research reveal that the title keywords combination outperforms other combinations with an F-measure score of 0.88.
The sequential recommendation selects and recommends next items for users by modeling their historical interaction sequences, where the chronological order of interactions plays an important role. Most sequential recommendation methods only pay attention to the order information among the interactions and ignore the time intervals information, which leads to the limitations of capturing dynamic user interests. And previous work neglects diversity in order to improve recommendation accuracy. The model Temporal Self-Attention and Multi-Preference Learning (TSAMPL) is proposed to improve sequential recommendation, which learns dynamic and general user interests separately. The proposed temporal gate self-attention network is introduced to learn dynamic user interests, which takes both contextual information and temporal dynamics into account. To model general user interests, we employ a multi-preference matrix to learn users’ multiple types of preferences for improving recommendation diversity. Finally, the interest fusion module combines dynamic user interests (accuracy) and general user interests (diversity) adaptively. The experiments in sequential recommendation confirm our method is superior to all comparison methods, we also study the impact of each component in the model.
Full-text available
In order to avoid malicious competition and select high quality crowd workers to improve the utility of crowdsourcing system, this paper proposes an incentive mechanism based on the combination of reverse auction and multi-attribute auction in mobile crowdsourcing. The proposed online incentive mechanism includes two algorithms. One is the crowd worker selection algorithm based on multi-attribute reverse auction that adopts dynamic threshold to make an online decision for whether accept a crowd worker according to its attributes. Another is the payment determination algorithm which determines payment for a crowd worker based on its reputation and quality of sensing data, that is, a crowd worker can get payment equal to the bidding price before performing task only if his reputation reaches good reputation threshold, otherwise he will get payment based on his data sensing quality. We prove that our proposed online incentive mechanism has the properties of computational efficiency, individual rationality, budget-balance, truthfulness and honesty. Through simulations, the efficiency of our proposed online incentive mechanism is verified which can improve the efficiency, adaptability and trust degree of the mobile crowdsourcing system.
Full-text available
With the rapid development in mobile devices, mobile crowdsourcing has become an important research focus. In large-scale mobile crowdsourcing, the effective evolution prediction and incentive mechanism are the key focuses to improve the efficiency of systems. The evolution model based on evolutionary game theory (EGT) is researched to predict the evolution trends of mobile crowdsourcing systems (MCSs) effectively. Based on the evolution trends, the reputation updating mechanism (RUMG) is proposed to address free-riding and false-reporting problems. According to spatio-temporal privacypreserving, the incentive mechanism with spatio-temporal privacy-preserving for mobile crowdsourcing is researched. In order to protect worker’s spatio-temporal privacy information effectively, a spatio-temporal privacy-preserving based on k-anonymity (LKAC) is proposed. In addition, the effectivenesses of the proposed RUMG and LKAC are verified through comparison experiments. This proposed mechanism also improves the security of system and resolves the free-riding and false-reporting problems of mobile crowdsourcing.
To provide fine-grained access to different dimensions of the physical world, data uploading in smart cyber-physical systems suffers novel challenges on both energy conservation and privacy preservation. It is always critical for participants to consume as little energy as possible for data uploading. However, simply pursuing energy efficiency may lead to extreme disclosure of private information, especially when the uploaded contents from participants are more informative than ever. In this paper, we propose a novel mechanism for data uploading in smart cyber-physical systems, which considers both energy conservation and privacy preservation. The mechanism preserves privacy by concealing abnormal behaviors of participants, while still achieves an energy-efficient scheme for data uploading by introducing an acceptable number of extra contents. To derive an optimal uploading scheme is proved to be NP-hard. Accordingly, we propose a heuristic algorithm and analyze its effectiveness. The evaluation results towards a real-world dataset demonstrate that the results obtained through our proposed algorithm is comparable with the optimal ones.
With the rapid development of mobile devices, mobile crowdsourcing has become an important research focus. In order to improve the efficiency and truthfulness of mobile crowdsourcing systems, this paper proposes a truthful incentive mechanism with location privacy-preserving for mobile crowdsourcing systems. The improved two-stage auction algorithm based on trust degree and privacy sensibility (TATP) is proposed. In addition, the View the k-ε-differential privacy-preserving is proposed to prevent users’ location information from being leaked. Through comparison experiments, the effectiveness of the proposed incentive mechanism is verified. The proposed incentive mechanism with location privacy-preserving can inspire users to participate sensing tasks, and protect users’ location privacy effectively.
Sensor-embedded smartphones have become ubiquitous nowadays, further leveraging the popularity of mobile crowdsensing. A mobile crowdsensing platform gathers sensory data from smartphone users and makes payments to them in return. Due to the spatial correlation of sensory data in various applications, users close to each other in geographical positions usually provide similar sensory data, and it is quite an economic waste for a mobile sensing platform to buy duplicated sensory data with multiple payments to geographically close users. Unfortunately, the existing works do not take this matter into consideration. To prevent waste, our paper considers geographical position conflicting mobile crowdsensing systems in which any two users within a limited geographical distance cannot obtain payments simultaneously while participating in crowdsensing tasks. Two algorithms are proposed to select appropriate mobile crowdsensing participants and calculate the payments to them. Solid theoretical proofs are presented to demonstrate the beneficial properties of our proposed algorithms. The extensive experiment results based on real-world datasets indicate that our proposed algorithms are efficient while providing beneficial properties.
My last column ended with some comments about Kuhn and word2vec. Word2vec has racked up plenty of citations because it satisifies both of Kuhn's conditions for emerging trends: (1) a few initial (promising, if not convincing) successes that motivate early adopters (students) to do more, as well as (2) leaving plenty of room for early adopters to contribute and benefit by doing so. The fact that Google has so much to say on 'How does word2vec work' makes it clear that the definitive answer to that question has yet to be written. It also helps citation counts to distribute code and data to make it that much easier for the next generation to take advantage of the opportunities (and cite your work in the process).
Urban land use information plays an essential role in a wide variety of urban planning and environmental monitoring processes. During the past few decades, with the rapid technological development of remote sensing (RS), geographic information systems (GIS) and geospatial big data, numerous methods have been developed to identify urban land use at a fine scale. Points-of-interest (POIs) have been widely used to extract information pertaining to urban land use types and functional zones. However, it is difficult to quantify the relationship between spatial distributions of POIs and regional land use types due to a lack of reliable models. Previous methods may ignore abundant spatial features that can be extracted from POIs. In this study, we establish an innovative framework that detects urban land use distributions at the scale of traffic analysis zones (TAZs) by integrating Baidu POIs and a Word2Vec model. This framework was implemented using a Google open-source model of a deep-learning language in 2013. First, data for the Pearl River Delta (PRD) are transformed into a TAZ-POI corpus using a greedy algorithm by considering the spatial distributions of TAZs and inner POIs. Then, high-dimensional characteristic vectors of POIs and TAZs are extracted using the Word2Vec model. Finally, to validate the reliability of the POI/TAZ vectors, we implement a K-Means-based clustering model to analyze correlations between the POI/TAZ vectors and deploy TAZ vectors to identify urban land use types using a random forest algorithm (RFA) model. Compared with some state-of-the-art probabilistic topic models (PTMs), the proposed method can efficiently obtain the highest accuracy (OA = 0.8728, kappa = 0.8399). Moreover, the results can be used to help urban planners to monitor dynamic urban land use and evaluate the impact of urban planning schemes.