Conference PaperPDF Available

Identification of User Patterns in Social Networks by Data Mining Techniques: Facebook Case

Authors:

Abstract and Figures

Currently, social networks such as Facebook or Twitter are getting more and more popular due to the opportunities they offer. As of November 2009, Facebook was the most popular and well known social network throughout the world with over 316 million users. Among the countries, Turkey is in third place in terms of Facebook users and half of them are younger than 25 years old (students). Turkey has 14 million Facebook members. The success of Facebook and the rich opportunities offered by social media sites lead to the creation of new web based applications for social networks and open up new frontiers. Thus, discovering the usage patterns of social media sites might be useful in taking decisions about the design and implementation of those applications as well as educational tools. Therefore, in this study, the factors affecting “Facebook usage time” and ”Facebook access frequency” are revealed via various predictive data mining techniques, based on a questionnaire applied on 570 Facebook users. At the same time, the associations of the students’ opinions on the contribution of Facebook in an educational aspect are investigated by employing the association rules method. KeywordsSocial networks-decision trees-Facebook-association rules
Content may be subject to copyright.
S. Kurbanoğlu et al. (Eds.): IMCW 2010, CCIS 96, pp. 145–153, 2010.
© Springer-Verlag Berlin Heidelberg 2010
Identification of User Patterns in Social Networks by
Data Mining Techniques: Facebook Case
A. Selman Bozkir
1
, S. Güzin Mazman
2
, and Ebru Akcapinar Sezer
1
1
Hacettepe University, Department of Computer Engineering, Ankara, Turkey
selman@cs.hacettepe.edu.tr, ebru@hacettepe.edu.tr
2
Hacettepe University, Department of Computer Education and Instructional Technologies,
Ankara, Turkey
s.guzin@gmail.com
Abstract. Currently, social networks such as Facebook or Twitter are getting
more and more popular due to the opportunities they offer. As of November
2009, Facebook was the most popular and well known social network through-
out the world with over 316 million users. Among the countries, Turkey is in
third place in terms of Facebook users and half of them are younger than 25
years old (students). Turkey has 14 million Facebook members. The success of
Facebook and the rich opportunities offered by social media sites lead to the
creation of new web based applications for social networks and open up new
frontiers. Thus, discovering the usage patterns of social media sites might be
useful in taking decisions about the design and implementation of those applica-
tions as well as educational tools. Therefore, in this study, the factors affecting
“Facebook usage time” and ”Facebook access frequency” are revealed via vari-
ous predictive data mining techniques, based on a questionnaire applied on 570
Facebook users. At the same time, the associations of the students’ opinions on
the contribution of Facebook in an educational aspect are investigated by em-
ploying the association rules method.
Keywords: Social networks, decision trees, Facebook, association rules.
1 Introduction
In recent years, a rapid increase in numbers of social networks along with numbers of
people using these networks has been observed. Social networks, also called social
software or collaborative software, are a range of applications that augment group
interactions and shared spaces for collaboration, social connections, and aggregate
information exchanges in a web-based environment [1]. Similarly, [2] defined social
networks as web-based services allowing individuals to 1) construct a public or semi-
public profile within a bounded system, 2) articulate a list of other users with whom
they share a connection, and (3) view and traverse their list of connections and those
made by others within the system.
Millions of users have been interested in them since the introduction of social net-
work sites (SNSs) such as MySpace, Facebook, Cyworld, Bebo, Twitter, etc. The
majority of these users have integrated such sites into their daily lifes. Because most
146 A.S. Bozkir, S.G. Mazman, and E.A. Sezer
of the social network users are young individuals, many of them are university stu-
dents. Therefore, these sites are considered to play an active role in the younger gen-
eration’s daily life [3], [4]. On the other hand, it has been stated that social networks
have a prominent educational context, and this prominence has prompted a growing
number of educators to consider them to be important sites for student learning al-
though these are not intended primarily as educational applications. Besides, it has
been suggested that these social networks help users re-situate learning in an open-
ended social context by providing opportunities for moving beyond the mere access to
content (learning about) to the social application of knowledge in a constant process
of re-orientation (learning as becoming) [5].
There have been various studies about social networks in the educational context
including using social networks as a tool or utilizing them as an environment for
courses [6], [7], the utility of social networks in the teaching and learning process [8],
their value for communication and collaboration [9], educational usage themes of
social networks (e.g. [10], [11]). However, a study in the literature about data mining
analysis of social network usage has not been encountered.
As one the most popular social networks, Facebook is considered in the present
study. Facebook is defined as “a social utility that helps people share information and
communicate more efficiently with their friends, family and co-workers” (face-
book.com). As of November 2009, with 316 million users, Facebook is the most
popular and well known social network throughout the world. Moreover, Turkey,
with 14 million members, is the third country in terms of number of Facebook users
and half of these members are younger than 25 years old [12].
Data mining is a process that uses a variety of data analysis tools to discover pat-
terns and relations in data that may be used for prediction purposes. Supervised data
mining techniques are used to model an output variable based on one or more input
variables and these models can be used to predict or forecast future cases [13].
The purpose of the present study is to discover some usage patterns (i.e. usage time
and access frequency) of Facebook users by data mining techniques. Additionally, an
attempt is made to reveal the educational associations of the users. It is believed that
social network based application development and educational programs can be en-
hanced by the findings of this study.
2 Data Mining
Data mining is the process of exploration and analysis, by automatic or semi-
automatic means, of large quantities of data in order to discover useful patterns [13].
In other words, data mining is the complete process of revealing useful patterns and
relationships in data by using techniques like artificial intelligence, machine learning
and statistics via advanced data analysis tools. Oracle BI, SPSS Clementine, SAS
Enterprise Miner and Microsoft Analysis Services are well known data mining tools
in the marketplace [14].
Data mining methods are classified into two categories as predictive and descrip-
tive. The aim of predictive methods is to make predictions on unseen cases by using
Identification of User Patterns in Social Networks by Data Mining Techniques 147
seen cases via a trained model. However, the goal of descriptive methods is discover-
ing deep relationships, correlations and descriptive properties of data.
In this study, both of these method groups are employed by using SPSS
Clementine 12. Additionally, various decision trees algorithms such as CART,
CHAID and C5; artificial neural networks (ANN) and SVM (Support Vector
Machine) classifiers in prediction of target variables are used. Furthermore, the
variable importance feature of SPSS Clementine is used in discovering the factors
affecting “Facebook usage” and “Facebook access frequency”. Likewise, the Apriori
algorithm is employed in discovering frequent opinions of students on the educational
benefits of Facebook usage.
2.1 Methodology
As stated previously, various data mining techniques are employed during the analy-
ses and except one (association rules mining discovery), their prediction performances
are compared. Thus, in this section, a brief information is presented about the meth-
odologies followed.
The decision tree method is probably the most popular classification method
among the data mining techniques due to the ease of use and visual interpretation
capabilities. Typically, a data mining task for a decision tree is classification; for
example, to identify the credit risk for each customer [15]. The main idea of a deci-
sion tree is to split the data recursively into subsets so that each subset covers more or
fewer homogeneous states of the dependent variable. At each split in the tree, all
independent variables are recalculated for their impact on the dependent variable.
When this recursive process is stopped and the tree is in a stable state, the required
decision tree is formed [15]. At this stage, new cases can be classified via the deci-
sion tree. This stage is called tree deduction. C5, Quest, CHAID [16] and CART [17]
are well-known decision tree algorithms. Nevertheless, SPSS Clementine serves
whole algorithms in its package. In essence, differentiations among these algorithms
are mainly caused by technical capabilities and employing different splitting ap-
proaches and their functions. For instance, C5 and CHAID algorithms are designed to
classify only discrete valued variables by using “gain ratio” and “gini value” splitting
approaches, respectively. However, CART algorithms are designed for both classifi-
cation and regression purposes.
On the other hand, in the pattern recognition literature, SVM (Support Vector Ma-
chine) is a state-of-the-art method with its powerful discriminative features in linear
and non-linear classifications. Generally, SVM is designed to enlarge the boundary of
any two classes in pattern space by searching for an optimal hyper plane that has
maximum distance to the closest points between two classes which are termed support
vectors [18]. However, SVM has support for multiclass predictions via different de-
veloped kernel functions. By the help of these kernel functions, solving the problems
in upper dimensional spaces becomes possible.
ANN are systems which contain intelligence nodes arranged in layers. In essence,
an ANN has an input layer, a hidden layer, and an output layer. The nodes in the hid-
den layer collect the inputs from the input layer into a single output value which is
148 A.S. Bozkir, S.G. Mazman, and E.A. Sezer
passed on to the output layer. Associated with each node in the network is a weight.
The weights in the network are determined in a training phase of the network using
training data. The network performance is then tested on the remaining data, or hold-
out sample [19].
Association rule mining is again one of the best studied descriptive mining meth-
ods since the first design and creation. Agrawal, Imelinski and Swami stated a new
approach to mining association rules in 1993 and designed a new algorithm, namely
Apriori, via two phases seek mechanism on itemsets and by looking their association
frequencies (Romero & Ventura, 2007). In the second stage of this study, the analyses
are performed by using the algorithm Apriori. In association rules, mining analyzing,
support, rule support, confidence and lift values are the important parameters in the
usefulness evaluation of rules. In this study, lift and support values are considered.
Table 1. Variable names and available answers in the first part of the poll
Variable name Type Available answers and related distributions
Sex Discrete Male (50%) / Female (50%)
Age Discrete 18-25 (74.1%) / 26-35 (20.53%) / 36-40
(3.86%) / 41 and above (1.4%)
Frequency of access to
Facebook
Discrete Once a year (0.18%) / Once a month (2.98%) /
Several times a week (25.26%) / Once a day
(22.81%) / Several times a day (48.77%)
Facebook usage time Discrete Less than 15 mins. (32.28%) / Half an hour
(39.82%) / 1 hour (14.39%) / 1-3 hours (8.6%)
/ More than 3 hours (4.74%)
Education level
Membership in any group
Membership in student
groups
Membership in common
interest groups
Membership in internet &
tech groups
Membership in
organizations
Discrete
Discrete
Discrete
Discrete
Discrete
Discrete
High School (5.96%) / Bachelor (70.35%) /
Master (23.16%)
Yes (99.82%) / No (0.18%)
Yes (86.49%) / No (13.51%)
Yes (77.54.5) / No (22.46%)
Yes (27.02%) / No (72.98%)
Yes (61.93%) / No (38.07%)
3 Data
Data was collected from 570 active Turkish Facebook users (students) with an online
poll. This online poll consisted of two sections. In the first section, demographic
characteristics of Facebook users and their frequency of Facebook usage, length of
time spent on Facebook, and memberships in Facebook groups were collected. In the
second section, a 10-point Likert scale with 11 opinions were asked, the answers
ranging from 1 (strongly disagree) to 10 (strongly agree), like “Facebook contributes
to communication between classmates”, “It’s useful for assigning tasks in classes and
Identification of User Patterns in Social Networks by Data Mining Techniques 149
homework assignments”. Thus members’ views of Facebook in relation to its educa-
tional usage were sought.
The variable names of the first part and available answers are given in Table 1.
Although the initial dataset size was larger than 570 people, during the data cleaning
and transforming steps, 13 people were removed due to the absence of sufficient in-
formation. Therefore, the final dataset comprised 570 people. In the dataset, male and
female participants are almost equal and more than 400 applicants are in the 18-25
age range. Furthermore, almost all students are at either undergraduate or graduate
level.
4 Application of Data Mining
To discover important factors that affect Facebook usage time and access frequency
to Facebook, CART, CHAID, C5, artificial neural network and SVM algorithms,
which are built in to SPSS Clementine 12, were employed on the dataset at hand (see
Fig. 1). The overall data is partitioned as 80% training and 20% testing, respectively.
Training and test datasets are selected randomly. As the dataset consists of discrete
valued variables, the true and false prediction rates are listed.
According to the results (see Table 2), SVM achieves the most accurate predictions
for two target variables. Therefore, it is considered that the variable importance re-
sults of SVM are the most accurate predictions. As can be seen in Fig. 2, sex, educa-
tion level, membership in a group and membership in any common interest groups are
the most important factors affecting Facebook usage time. Sex plays a crucial role in
Facebook usage time with 68%. Again, it can be clearly seen that age, membership in
student groups and usage time variables are the most important factors affecting
access frequency to Facebook. The effect of age is more than 80% in access
frequency.
Table 2. Applied algorithms and prediction results
Target variable - Applied algorithm True classification False classification
Facebook usage – SVM 62.63 % 37.37 %
Facebook usage – ANN
Facebook usage – C5
47.72 %
47.54 %
52.28 %
52.46 %
Facebook usage – CART 43.68 % 56.32 %
Facebook usage – CHAID
Access frequency to Facebook – SVM
41.40 %
69.65 %
58.60 %
30.35 %
Access frequency to Facebook – C5
Access frequency to Facebook – CART
Access frequency to Facebook – CHAID
Access frequency to Facebook – ANN
55.79 %
52.81 %
50.35 %
48.77 %
44.21 %
47.19 %
49.65 %
51.23 %
150 A.S. Bozkir, S.G. Mazman, and E.A. Sezer
Fig. 1. Applying data mining methods in Clementine 12
Fig. 2. Variable importance values of two target variables
On the other hand, in the association rules mining study, the association of the stu-
dent ideas on Facebook and its educational benefits has been investigated. To achieve
this, the well-known Apriori algorithm was run with 5% support and 15% confidence
parameters.
As can be seen in Table 3, some interesting rules are listed sorted by lift values.
Lift value shows the usefulness and attractiveness of a rule. Therefore, the rules which
have lift values higher than 1 should be considered carefully for educational purposes.
Identification of User Patterns in Social Networks by Data Mining Techniques 151
Table 3. A sample subset of discovered association rules
Antecedent Consequent Confidence Support Lift
“It contributes to the
communication between teacher
& student” = 7
“It’s useful at accessing
the rich learning
resources” = 9
11.32% 1.03% 3.0
“Facebook contributes to
communication between
classmates” = 4
“It’s useful for executing
the group tasks” = 2
12.83% 0.87% 2.9
“Facebook contributes to
communication between
classmates” = 8 and “It
contributes to communication
between teacher & student” = 8
“It contributes to
transferring course
materials and resources”
= 6
20.69% 1. 05% 2.7
“Facebook contributes to
communication between
classmates.” = 6
“It’s useful at providing
rich multimedia contents
in teaching”= 3
20.51% 1.40% 2.65
“It contributes to the
communication between teacher
& student” = 3
“Facebook contributes to
communication between
classmates” = 6
17.02% 1.40% 2.48
“Facebook contributes to
communication between
classmates”=7 and “It contributes
to communication between
teacher & student”=5
“It contributes to
dissemination of
announcements of
lectures & classes” = 2
12.5% 0.52% 2.03
5 Discussion and Conclusion
This study tried to discover the factors affecting access frequency and usage time of
Facebook by various decision tree algorithms, ANN and state-of-the-art algorithm
SVM. According to the results, SVM exhibits the most accurate results due to the
nature of the dataset at hand. It is believed that the prediction capabilities can be en-
hanced by using more training data. On the other hand, the associations of the student
ideas were explored by employing the Apriori algorithm and, as can be seen from the
results obtained, the contribution of Facebook to communication between classmates
is more than to communication between students and teachers. Moreover, the students
who hold these views believe that Facebook is a good medium for accessing rich
resources. More of these types of rules can be revealed by using the Apriori algorithm
and the use of social network sites for educational ends can be reformed in the light of
these rules.
If the increasing trend in social network sites usage is considered, the importance
of applications and approaches related to social networks can be easily understood.
Targeting specific ages or sex may strategically affect the success of developed appli-
cations. As a consequence, data mining methods can be successfully employed on
social network usage data.
152 A.S. Bozkir, S.G. Mazman, and E.A. Sezer
References
1. Bartlett-Bragg, A.: Reflections on Pedagogy: Reframing Practice to Foster Informal Learn-
ing with Social Software (2006),
http://www.dream.sdu.dk/uploads/files/
Anne%20Bartlett-Bragg.pdf
2. boyd, D.M., Ellison, N.B.: Social Network Sites: Definition, History, and Scholarship.
Journal of Computer-Mediated Communication 13, 210–230 (2007)
3. Lenhart, M.: Adults and Social Network Websites. Pew Internet & American Life Project
Report (2009),
http://www.pewinternet.org/pdfs/
PIP_Adult_social_networking_data_memo_FINAL.pdf
4. Bumgarner, B.A.: You Have Been Poked: Exploring the Uses and Gratifications of Face-
book Among Emerging Adults. First Monday, 22 (2007),
http://firstmonday.org/htbin/cgiwrap/bin/ojs/index.php/fm/
article/viewArticle/2026/1897
5. Mejias, U.: Nomad’s Guide to Learning and Social Software (2005),
http://knowledgetree.flexiblelearning.net.au/edition07/
download/la_mejias.pdf
6. English, R., Duncan-Howell, J.: Facebook Goes to College: Using Social Networking
Tools to Support Students Undertaking Teaching Practicum. Journal of Online Learning
and Teaching 4, 596–601 (2008)
7. Lockyer, L., Patterson, J.: Integrating Social Networking Technologies in Education: A
Case Study of a Formal Learning Environment. In: Proceedings of 8th IEEE International
Conference on Advanced Learning Technologies, Spain, pp. 529–533 (2008)
8. Ajjan, H., Hartshorne, R.: Investigating Faculty Decisions to Adopt Web 2.0 Technolo-
gies: Theory and Empirical Tests. The Internet and Higher Education 11, 71–80 (2008)
9. Saunders, S.: The Role of Social Networking Sites in Teacher Education Programs: A
Qualitative Exploration. In: McFerrin, K., et al. (eds.) Proceedings of Society for Informa-
tion Technology and Teacher Education International Conference, pp. 2223–2228. AACE,
Chesapeake (2008)
10. Mazman, S.G., Usluel, Y.K.: Adoption Process of Social Network and Their Usage in
Educational Context. Master Thesis. The Institute for Graduate Studies in Science and En-
gineering. Hacettepe University, Ankara (2009)
11. Selwyn, N.: Web 2.0 Applications as Alternative Environments for Informal Learning - A
Critical Review. Alternative Learning Environments in Practice: Using ICT to Change Im-
pact and Outcomes, OECD-KERIS Expert Meeting (2007)
12. Check Facebook (2009), http://www.checkfacebook.com
13. Berry, M., Linoff, G.: Mastering Data Mining: The Art and Science of Customer Relation-
ship Management. John Wiley & Sons, Chichester (2000)
14. Bozkir, A.S., Gök, B., Sezer, E.: İnternetin Eğitimsel Amaçlar için Kullanımını Etkileyen
Faktörlerin Veri Madenciliği Yöntemleriyle Tespiti. In: Bilimde Modern Yöntemler Sem-
pozyumu, pp. 833–842. Eskişehir (2008)
15. Tang, Z., MacLennan, J.: Data Mining with SQL Server 2005. John Wiley & Sons, Indiana
(2005)
16. Kass, G.V.: An Exploratory Technique for Investigating Large Quantities of Categorical
Data. Applied Statistics 29, 119–127 (1980)
17. Breiman, L., Friedman, J.H., Olshen, R.A., Stone, P.J.: Classification and Regression
Trees. Wadsworth International Group, Belmont (1984)
Identification of User Patterns in Social Networks by Data Mining Techniques 153
18. Liu, J., Wang, Z., Xiao, X.: A Hybrid SVM/DDBHMM Decision Fusion Modeling for
Robust Continuous Digital Speech Recognition. Pattern Recognition Letters 28, 912–920
(2007)
19. Fuller, C.M., Piros, D.P., Wilson, R.L.: Decision Support for Determining Veracity via
Linguistic-Based Cues. Decision Support Systems 46, 697–703 (2009)
20. Romero, C., Ventura, S.: Educational Data Mining: A survey from 1995 to 2005. Expert
Systems with Applications 33, 135–146 (2007)
... Intelligent text analysis is the process of assessing a great deal of information using fully automated or partially automated means [19]. Analytical methods deal with information in different ways: generalization, grouping, classification, description of trends, and so on [20]. ...
... The work by Bozkır et al. [19] aimed to ensure that data mining, particularly the SVM algorithm, can detect factors that affect the frequency and duration of Facebook usage. The amount of data used for training is four times higher than for the test. ...
... Data mining plays and important role in the identification of important factors that are affecting Facebook usage in a positive and negative way. The authors in paper [8] analyzed the data collected from 570 active users of Facebook from Anakara. Algorithms such as Decision Tree, SVM, and Artificial Neural Network were applied to the dataset. ...
Article
Full-text available
Data mining is an emerging technique with its application in various areas such as health care, education, travel, social media, and banking. The data can be either labeled or unlabeled. When it comes to social media, the various platforms generate an infinite amount of data. This data can be of immense importance as a lot of hidden information can be discovered after data mining. In this paper, machine-learning algorithms such as Decision Trees, SVM, and Linear Regression and their variants are applied on the Facebook comment dataset, obtained from the UCI machine learning repository. The dataset has 40,949 instances and 54 attributes. The goal is to predict the number of comments a Facebook post will get based on various conditions. The results indicate that the Fine Gaussian SVM variation of SVM yielded the highest prediction accuracy. The evaluation was done on different parameters such as average testing accuracy (%), Root Mean Square Error (RMSE), R-Squared, Mean Square Error (MSE), Mean Absolute Error (MAE), prediction speed (Obs/sec), and training time (Machine cycle). It is concluded that SVM is an ideal choice to solve prediction problems associated with social media data.
... Data mining plays and important role in the identification of important factors that are affecting Facebook usage in a positive and negative way. The authors in paper [8] analyzed the data collected from 570 active users of Facebook from Ankara. Algorithms such as Decision Tree, SVM, and Artificial Neural Network were applied to the dataset. ...
Article
Full-text available
Data mining is an emerging technique with its application in various areas such as health care, education, travel, social media, and banking. The data can be either labeled or unlabeled. When it comes to social media, the various platforms generate an infinite amount of data. This data can be of immense importance as a lot of hidden information can be discovered after data mining. In this paper, machine-learning algorithms such as Decision Trees, SVM, and Linear Regression and their variants are applied on the Facebook comment dataset, obtained from the UCI machine learning repository. The dataset has 40,949 instances and 54 attributes. The goal is to predict the number of comments a Facebook post will get based on various conditions. The results indicate that the Fine Gaussian SVM variation of SVM yielded the highest prediction accuracy. The evaluation was done on different parameters such as average testing accuracy (%), Root Mean Square Error (RMSE), R-Squared, Mean Square Error (MSE), Mean Absolute Error (MAE), prediction speed (Obs/sec), and training time (Machine cycle). It is concluded that SVM is an ideal choice to solve prediction problems associated with social media data.
Article
Full-text available
University student networks are recognised to be linked with student performance. Yet, no literature review seems to address student networks together with student learning or achievement. This paper focuses on the quantitative studies conducted since 2000 that relate to this issue. This literature review highlights five research domains: (1) the links between student and peer performance within face‐to‐face networks; (2) the effects of networks’ components (e.g., centrality) on student learning and achievement within face‐to‐face networks; (3) the impacts of online social network use on student performance; (4) the effects of social presence and of interactions within e‐learning; and (5) the effects of e‐learning networks components. This literature review underlines inconsistent findings within each of these research domains. This paper leads to a discussion on the methodological issues that might explain these inconsistencies and to research questions still to be covered when studying student networks in relation with education outcomes. Context and implication Rationale for this study Academic success is not to be taken for granted. There is a need for understanding why some students succeed and others fail at university. Many studies focused on the relationships between achievement at university and student networks. To this day, no literature review exists regarding those links. Furthermore, in social networks analysis (SNA), scientists developed quantitative methodologies to describe networks. SNA techniques include, for instance, computing nodes’ centrality. There is a need to discuss how such quantitative methodologies are and could be implemented in education research. Why the new findings matter This knowledge about relationships between college students’ networks and learning, performance or academic achievement should help develop and implement policies that promote student success at university. Implications for educational researchers and policy makers Education researchers have to be cautious about some methodological challenges encountered when studying networks. Also, alternative centrality measures—that is, those less investigated or not investigated—might be valid candidates for representing the importance of students within networks. With respect to educational policy development: Mixing student abilities within classes appears to be a better option than grouping those abilities, but student prior performance has to be taken into account, as linked to the student type that would most benefit from heterogeneity of performance; Educational practitioners should ensure the sharing and dissemination of relevant and accurate resources and information in student (online) networks; We encourage education professionals to analyse how one might effectively integrate online social platforms into instruction, and to educate students about the adverse effects of misusing these platforms; We recommend education practitioners pay attention to students’ isolation feelings in e‐learning settings; In e‐learning settings, education professionals should ensure that less active students access learning materials and should encourage active participation of students in collaborative tasks.
Chapter
Fake news may have different meaning to different individuals. For the purpose of this paper, we will go by the definition of fake news as those reports that are bogus: The story itself is created, with no relation to realities, sources or statements. In this research on fake news detection through machine learning algorithms, we are implementing two feature selection approaches toward the problem: Bag of words model and TF-IDF vectorization model and are using four classifiers namely, logistic regression classifier, naive Bayes classifier, random forest classifier and passive aggressive classifier for classification purpose. This research is being conducted on two separate datasets, among which for bag of words model along with logistic regression classifier yields average F1 Score of 92.16% and for TF-IDF vectorization, logistic regression classifier yields average F1 Score of 93.47%. Also, passive aggressive classifier works well with high volume of data along with TF-IDF as can be seen by highest increase in F1 Score.
Article
Full-text available
This paper proposes a market segmentation method applied in the field of transportation behavior change using GPS trajectories and socio-demographic data collected from the advanced demand management system “GoEzy” designed by Metropia. User attributes are extracted using several statistical methods such as dynamic time warping, density-based spatial clustering of applications with noise (DBSCAN), and signal processing method to infer users’ sensitivity to incentives, temporal, and spatial travel patterns. Ten personas were generated by K-means clustering, representing different types of people with various travel patterns and sensitivity to incentives. The experiment was conducted on 24 new users to test if the persona could be used as a tool to predict their willingness to change. The results showed that after creating personas for new users and providing them with new incentives, their modified departure time pattern according to the new incentives matched expectations from analysis of the 10 personas.
Article
Full-text available
Availability of large amount of data in unprocessed form has increased the need of data mining. Various data mining techniques are available for this purpose but we need to choose one which is more accurate. Looking at the increasing interest of people in novel reading, data collected for prediction is based on novels. People were asked to rate different genre novel s. This study has predicted ratings of dystopian novel based on ratings given to other genre novels by readers using various data mining techniques and calculated their prediction accuracy.
Article
The websites of an abundance of Australian businesses invite us to enjoy a more French existence through the purchase of French products, lessons, travel and property, and through participating in the opportunities for interaction they offer via social media. This paper proposes a case study of the forms this interaction takes on the Facebook page of one such vendor. Analysis of the multimodal data posted 2012-2014 by the predominantly Australian participants reveals an intracultural community projecting itself into an idealized construction of French life, even during travel to France. Drawing on positioning theory, the paper details the ways in which the affordances of the social networking page facilitate the construction and ratification of a Francophile identity and expand the possibilities for its performance.
Article
Full-text available
This paper proposes that social software can enable informal learning environments through collective learning networks and the fundamental social interactions embedded in those learning processes. Situated in the adult learning organisational context, the challenge for educators is how to re-frame their pedagogical practices for the new technological developments and facilitate the design of online communication and information exchanges to empower the learners and create an enriched social learning landscape. The paper presents a pedagogical framework, developed from practice, which provides multi-linear pathways for facilitating informal learning processes and strategies that enable learners to overcome key issues that may inhibit the creation of informal learning environments. Examples from recent experiences will illustrate areas where educators need to be aware of both the inhibitors and their pedagogical strategies.
Article
Full-text available
Currently there is an increasing interest in data mining and educational systems, making educational data mining as a new growing research community. This paper surveys the application of data mining to traditional educational systems, particular web-based courses, well-known learning content management systems, and adaptive and intelligent web-based educational systems. Each of these systems has different data source and objectives for knowledge discovering. After preprocessing the available data in each case, data mining techniques can be applied: statistics and visualization; clustering, classification and outlier detection; association rule mining and pattern mining; and text mining. The success of the plentiful work needs much more specialized work in order for educational data mining to become a mature area.
Article
The technique set out in the paper, CHAID, is and offshoot of AID (Automatic Interaction Detection) designed for a categorized dependent variable. Some important modifications which are relevant to standard AID include: built-in significance testing with the consequence of using the most significant predictor (rather than the most explanatory), multi-way splits (in contrast to binary) and a new type of predictor which is especially useful in handling missing information.
Article
Enthusiastic educational commentators are casting the internet in a new light through the emergence of so-called 'Web 2.0' technologies, which place learners at the centre of online activities and facilitate supposedly new forms of creation, collaboration, and consumption. Proponents anticipate a host of new pedagogical challenges posed by a 'Facebook generation' of 'wiki kids,' whilst schools and colleges are delivering courses in 'Second Life' rather than real-life environments. An impassioned minority of educationalists has even heralded a 'Web 2.0 transformation of learning' with "potentially groundbreaking implications for the field of education" (Thomas 2008). Yet such enthusiasm has been tempered by a more sceptical reaction throughout other sectors of the educational and technology communities. Mindful of these debates, this presentation will overview briefly the emerging research literature in the area of Web 2.0 enhanced learning (specifically the Facebook and Second Life applications) and focus on the following issues:
Article
Deception detection is an essential skill in careers such as law enforcement and must be accomplished accurately. However, humans are not very competent at determining veracity without aid. This study examined automated text-based deception detection which attempts to overcome the shortcomings of previous credibility assessment methods. A real-world, high-stakes sample of statements was collected and analyzed. Several different sets of linguistic-based cues were used as inputs for classification models. Overall accuracy rates of up to 74% were achieved, suggesting that automated deception detection systems can be an invaluable tool for those who must assess the credibility of text.