Conference PaperPDF Available

Social Media #MOOC Mentions: Lessons for MOOC Research from Analysis of Twitter Data

Authors:

Abstract and Figures

There is a relative dearth of research into what is being said about MOOCs by users in social media, particularly through analysis of large datasets. In this paper we contribute to addressing this gap through an exploratory analysis of a Twitter dataset. We present an analysis of a dataset of tweets that contain the hashtag #MOOC. A three month sample of tweets from the global Twitter stream was obtained using the GNIP API. Using techniques for analysis of large microblogging datasets we conducted descriptive analysis and content analysis of the data. Our findings suggest that the set of tweets containing the hashtag #MOOC has some strong characteristics of an information network. Course providers and platforms are prominent in the data but teachers and learners are also evident. We draw lessons for further research based on our findings. Introduction Although MOOCs learners are known to be educated, digitally literature (Jordan, 2104) and, socially networked (McAuley et al, 2010) little research has been undertaken into how MOOCs are portrayed in social media platforms such as Twitter. Studies to date are limited by relatively small datasets or from having taken samples of manually extracted tweets. One study of note in this area looked at users of the Sina Weibo platform, a popular Chinese microblogging website (Zhang et al, 2015). This study screen-scraped 95,015 postings with mentions of MOOC published by 62,074 users on Sina Weibo from a four year period and analyzed the volume of postings according to four time frames: year, month, day of the week, and the time of day. Their work outlined some trends and made an exploratory foray into this topic. This paper contributes to research into MOOCs by a systematic extraction of a dataset from the global Twitter stream (utilizing the Twitter GNIP API) and interrogating this data via descriptive and content analyses. Our aim was to conduct exploratory analysis of the MOOC discourse on Twitter. We sought to determine, through big data analysis, what conversations are being conducted in the MOOC arena by the range of potential actors such as MOOC platform providers, traditional educational institutions providing MOOCs, MOOC teachers, MOOC leaners and MOOC researchers. Moreover, we sought to probe the use and meaning of the term MOOC itself as negotiated by users of the term on public social media via its hashtag. The GNIP Stream API provides a file containing data for each 10 minute interval of a specified period. Complex analytics on the data were performed mainly in R (an open-source statistical tool). This study is in line with current state-of-the-art frameworks (Chae, 2015; Lynn et al. 2015) for descriptive analytics and content analytics on Twitter data. The methodology follows the approach for descriptive and content analytics outlined by Chae (2014) and extended by Lynn et al. (2015). Findings The MOOC dataset had 32,309 tweets of which 17,910 were original tweets and 14,399 were retweets. Replies constituted 8 percent (1,434) of the total number of the original tweets. The dataset had 4,980 unique hashtags. Obviously #MOOC features in most of the original tweets (17,263). Other popular co-occurring hashtags included #elearning (1,876), #edtech (1,134), #moocs (822), #highered (637), #coursera (631), and #education (594). The average number of hashtags in original tweets was 2.68. There were 14,890 unique user screen names in the dataset. This indicates that each user on an average sends 1.2 tweets, 0.9 retweets and 0.1 replies. The most active and visible users were identified (See Error! Reference source not found.). Activity was calculated as per Chae (2015) i.e. the activity of a user was calculated as the sum of the number of tweets, retweets and replies which the user has contributed to the network. The visibility of a user was determined by the number of followers for each user as on 31 st December 2015. Error! Reference source not found. shows a line graph to describe the relationship between active and visible users. It can clearly be observed from the figure that the most active users are not the most visible users and vice versa. For instance, @MOOCs (the most active user) is not the most visible user. Similarly, @edX, the most visible user, is not among the top 30 active users in this network. Figure 1: Active Users Vs Visible Users in #MOOC Dataset 159 Content analytics is primarily concerned with uncovering the patterns hidden inside content. Word analysis, hashtag analysis and sentiment analysis are the analyses which were performed in this category. For performing word analysis, the 'tm' library in R was used. Frequent words appearing in the tweets were discovered in order to identify the most popular words among the users in the network. The most popular words were unsurprising i.e.
Content may be subject to copyright.
157
Social Media #MOOC Mentions: Lessons for MOOC
Research from Analysis of Twitter Data
Eamon Costello
National Institute for Digital
Learning
Dublin City University
Binesh Nair
Business School
Dublin City University
Mark Brown
National Institute for Digital
Learning
Dublin City University
Jingjing Zhang
Faculty of Education
Beijing Normal University
Mairéad Nic Giolla Mhichíl
National Institute for Digital
Learning
Dublin City University
Enda Donlon
Institute of Education
Dublin City University
Theo Lynn
Business School
Dublin City University
There is a relative dearth of research into what is being said about MOOCs by users in social
media, particularly through analysis of large datasets. In this paper we contribute to addressing
this gap through an exploratory analysis of a Twitter dataset. We present an analysis of a dataset
of tweets that contain the hashtag #MOOC. A three month sample of tweets from the global
Twitter stream was obtained using the GNIP API. Using techniques for analysis of large
microblogging datasets we conducted descriptive analysis and content analysis of the data. Our
findings suggest that the set of tweets containing the hashtag #MOOC has some strong
characteristics of an information network. Course providers and platforms are prominent in the
data but teachers and learners are also evident. We draw lessons for further research based on our
findings.
Keywords: HCI, MOOCs, Data Analytics, Twitter, Social Media, Big Data
Introduction
Although MOOCs learners are known to be educated, digitally literature (Jordan, 2104) and, socially networked
(McAuley et al, 2010) little research has been undertaken into how MOOCs are portrayed in social media
platforms such as Twitter. Studies to date are limited by relatively small datasets or from having taken samples
of manually extracted tweets. One study of note in this area looked at users of the Sina Weibo platform, a
popular Chinese microblogging website (Zhang et al, 2015). This study screen-scraped 95,015 postings with
mentions of MOOC published by 62,074 users on Sina Weibo from a four year period and analyzed the volume
of postings according to four time frames: year, month, day of the week, and the time of day. Their work
outlined some trends and made an exploratory foray into this topic.
This paper contributes to research into MOOCs by a systematic extraction of a dataset from the global Twitter
stream (utilizing the Twitter GNIP API) and interrogating this data via descriptive and content analyses. Our
aim was to conduct exploratory analysis of the MOOC discourse on Twitter. We sought to determine, through
big data analysis, what conversations are being conducted in the MOOC arena by the range of potential actors
such as MOOC platform providers, traditional educational institutions providing MOOCs, MOOC teachers,
MOOC leaners and MOOC researchers. Moreover, we sought to probe the use and meaning of the term MOOC
itself as negotiated by users of the term on public social media via its hashtag.
Data and Methods
Twitter data for the MOOC dataset was extracted from GNIP API for the period September to December 2015
and augmented with additional data including Klout scores - a social network influencer measure as developed
by Rao Spasojevic and Dsouza (2015). The GNIP API produces very large volumes of data and we used cloud
computing, data extraction, storage and processing techniques to handle the data. The GNIP Stream API
produces more reliable data than more manual techniques such as screen scraping of the public Twitter REST
API (Driscoll & Walker, 2014), and also offers more data protection such as excluding data from deleted
158
accounts. The hashtag ‘#MOOC’ was used as a keyword to extract the required data. In this we followed the
work of Zhang et al (2015).
The GNIP Stream API provides a file containing data for each 10 minute interval of a specified period. Complex
analytics on the data were performed mainly in R (an open-source statistical tool). This study is in line with current
state-of-the-art frameworks (Chae, 2015; Lynn et al. 2015) for descriptive analytics and content analytics on
Twitter data. The methodology follows the approach for descriptive and content analytics outlined by Chae (2014)
and extended by Lynn et al. (2015).
Findings
The MOOC dataset had 32,309 tweets of which 17,910 were original tweets and 14,399 were retweets. Replies
constituted 8 percent (1,434) of the total number of the original tweets. The dataset had 4,980 unique hashtags.
Obviously #MOOC features in most of the original tweets (17,263). Other popular co-occurring hashtags included
#elearning (1,876), #edtech (1,134), #moocs (822), #highered (637), #coursera (631), and #education (594). The
average number of hashtags in original tweets was 2.68.
There were 14,890 unique user screen names in the dataset. This indicates that each user on an average sends 1.2
tweets, 0.9 retweets and 0.1 replies. The most active and visible users were identified (See Error! Reference
source not found.). Activity was calculated as per Chae (2015) i.e. the activity of a user was calculated as the
sum of the number of tweets, retweets and replies which the user has contributed to the network. The visibility of
a user was determined by the number of followers for each user as on 31st December 2015. Error! Reference
source not found. shows a line graph to describe the relationship between active and visible users. It can clearly
be observed from the figure that the most active users are not the most visible users and vice versa. For instance,
@MOOCs (the most active user) is not the most visible user. Similarly, @edX, the most visible user, is not among
the top 30 active users in this network.
Figure 1: Active Users Vs Visible Users in #MOOC Dataset
159
Content analytics is primarily concerned with uncovering the patterns hidden inside content. Word analysis,
hashtag analysis and sentiment analysis are the analyses which were performed in this category. For performing
word analysis, the ‘tm’ library in R was used. Frequent words appearing in the tweets were discovered in order to
identify the most popular words among the users in the network. The most popular words were unsurprising i.e.
‘MOOC’ (occurring 21,199 times), ‘course’ (2,733), ‘learn’ (2,686), ‘online’ (2,442), ‘elearn’ (2,079), ‘free’
(1,911), ‘coursera’ (1,410), ‘edxonline’ (1,332) and so on were some of the most popular words. The ‘ngram’
library in R was used to identify the most frequently co-occurring words in the dataset. The most popular co-
occurring words included ‘mooc elearn’ (1,412 times), ‘online course’ (1,333), ‘edxonline mooc’ (869), ‘mooc
course’ (496), ‘free mooc’ (478) and ‘mooc onlinecourse’ (467). The dataset had 4,980 unique hashtags. Some of
the most popular hashtags were #mooc (17,263), #elearning (1,876), #edtech (1,134), #moocs (822), #highered
(637), #coursera (631) and #education (594).
Peak detection algorithms were used to identify events of significance in the dataset. In line with Healy et al.
(2015), the peak detection algorithms were those presented by Du et al. (2006), Palshikar (2009) and Lehmann et
al. (2012). Due to the relatively small number of true peaks and low volume of tweets per peak, the topics were
identified manually. Table 1 summarises the topics identified from the true peaks within the dataset. The table
also mentions the originated tweet for the topic.
Table 1: Topics of True Peaks
Timestamp
Originating Tweet of the Topic
21st September
2015, 1900
@dschatsky leads Deloitte’s #MOOC on #CognitiveTechnology
and its growing importance for business:
http://t.co/AxNIAePgDL
29th September
2015, 1400
Data from @DU_Press' #MOOC paints a picture of the future
applications of #3DPrinting. #DeloitteReview
http://t.co/PiCnozQ0ed
14th October
2015, 1300
@crownagents ISS #MOOC examines how govts improve citizen
svcs, cabinet office coordination & more. Starts 10/21/15
http://t.co/LJDCKKXL90
16th October
2015, 1400
@USGLC These leaders made government work in hard places.
Learn how. #MOOC: http://t.co/iZveyixK4B
http://t.co/7aP0zRShE3
19th October
2015, 1500
@USGLC Still time to enroll! Princeton #MOOC on how leaders
overcome #governance challenges. Starts 10/21/15.
http://t.co/iZveyixK4B
23rd October
2015, 1600
@USGLC Enroll today! ISS #Princeton #MOOC on writing
“Science of Delivery” case studies. Starts 10/28.
https://t.co/AZuu2qFylP
27th October
2015, 1300
@EU_Commission Starts 10/28! Learn to write case studies on
“Science of Delivery” in new free ISS #Princeton
#MOOC. https://t.co/AZuu2qFylP
29th October
2015, 1400
@USGLC Just started 10/28! Learn to write case studies on
“Science of Delivery” in new free ISS #Princeton
#MOOC. https://t.co/AZuu2qFylP
2nd November
2015, 1500
@APH008 Please RT: 9 November start #MOOC #NUTR102x
“Nutrition and Health: Micronutrients and Malnutrition”
https://t.co/q0NBluOac9
13th November
2015, 0900
@EatNutritious Please RT: Learn more about #nutrition and
#health in 2nd part of our #MOOC @UniWageningen now:
https://t.co/q0NBluOac9
8th December
2015, 1500
RT NewsNeus More #Coursera #MOOC #Training TESOL
Certificate, Part 1: Teach #English Now!
25th December
2015, 1800
How to say Merry Christmas in 77 Languages. #Edtech #GBL
#Langchat #MOOC #English https://t.co/5pBv47vjFP
Sentiment analysis is used to examine overall orientation (positive and negative) and intensity (strong or weak)
of opinions in text (Pang & Lee, 2008). The ‘qdap’ library in R was used to perform sentiment analysis on this
dataset. The average sentiment was found to be 0.095; suggesting that the tweets are highly neutral. The standard
deviation of the sentiments across the tweets was found to be 0.202; indicating that the spread of the sentiments
across the tweets was less. Further, a customized algorithm to analyse the distribution of tweets across different
sentiment scores was implemented in R. If a tweet has more positive words, it will get a higher positive sentiment
score. On the contrary, if a tweet has more words negative words then its sentiment score will be more negative.
If a tweet has words which do not belong to either category then it qualifies as ‘neutral’. A tweet having a greater
proportion of neutral words will have a neutral sentiment; that is a sentiment score of 0. Figure 2 provides a
graphical representation for the sentiments distribution in the tweets.
160
Figure 2: Sentiment Scores in the #MOOC Dataset
As can be easily observed from Figure 2, the MOOC dataset has a substantial amount (55 %) of neutral tweets.
Positive tweets make up 38% of the total tweets and the remaining 7% percent constitutes negative tweets. Table
2 lists some exemplar tweets with strong sentiment.
Table 2: Tweets Showing Strong Sentiment
Exemplar Tweet
Sentiment
@thesiswhisperer I think the #MOOC is providing wonderful supportive pillow of trust & honesty- glad
I'm taking part- thank u #survivephd15
6
STUNNING #mathed animations from .@robertghrist in his calculus #MOOC. Beautiful and effective. Kudos.
http://t.co/GgdN2KHFZf
4
Just discovered a great free #Social Innovation online course, on this cool learning platform - #iVersity
#MOOC ~ http://t.co/kGPzmqONFq
4
#ememitalia Teixeira: focusing on dropout as a problem to criticize #MOOC education is a conceptual mistake
-4
CloudComputingApplications - definitely the worst @coursera #MOOC I've ever taken. Irrelevant videos
& useless tuts #unenrolled
-3
Finally URL analysis was performed in order to identify the popular URLs (most mentioned) in the network. It
was found that URLs were widely used in the network with almost 60 percent of the tweets containing links. A
subset of the top URLs are shown in Table 3.
Table 3: Top 15 URLs in the MOOC dataset
URL
Tweets
https://www.edx.org/course/nutrition-health-part-2-micronutrients-wageningenx-nutr102x
327
http://www.owensage.com/2/post/2015/04/how-i-lost-24-pounds-in-12-weeks-amidst-severe-personal-
turmoilwithout-dieting-or-going-to-the-gym.html
204
https://www.futurelearn.com/courses/climate-from-space
168
http://www.europeanschoolnetacademy.eu/en/web/developing-digital-skills-in-your-classroom/course
161
http://www.startup365.fr/entrepreneur-courses/
159
https://www.canvas.net/browse/salto/courses/erasmus-funding-opportunities-2
156
https://www.edx.org/course/making-government-work-hard-places-princetonx-mgwx#!
143
https://www.edx.org/course/writing-case-studies-science-delivery-princetonx-casestudies101x
140
http://blog.coursera.org/post/132434298847/introducing-coursera-for-apple-tv-bringing-online
126
https://www.edx.org/xseries/data-science-analytics-context
120
http://www.moocsurvey.org
108
http://www.startup365.fr/the-1-small-business-guide-to-online-marketing/
103
http://Twitter.com/JimKim_WBG/status/661682878393266177/photo/1
96
https://www.youtube.com/watch?v=ahvuPvm-1YU
96
http://www.europeanschoolnetacademy.eu/web/introducing-computing-in-your-classroom
93
https://hbr.org/2015/09/whos-benefiting-from-moocs-and-why
92
161
Discussion and Conclusion
Peak detection algorithms highlighted tweets of significance in the dataset which largely revolved around the
promotion of several MOOCs. The course pages of several MOOCs from the peak detection are referred to in
the top URLs. However, the URLs also indicate that the MOOC hashtag may be sometimes appropriated by, or
be susceptible, to spam effects e.g. the prominence of weight loss slimming posts. URL 24 points to a book on
amazon which contains negative reviews of people who claim to have been duped into following a Twitter link
to the page.
The term MOOC may be a problematic one for use in defining networks of MOOC actors. The promotional
nature of many tweets suggests this may be more of an informational than a social network (Myers et al., 2014).
Beyond the scope of this paper are the findings of our Social Network Analysis (SNA) which confirmed these
findings. Moreover, it may be that the term MOOC has particular currency only within particular communities
such as the academic one. Some of the top tweets and URLs would appear to bear this out such as a link to a
MOOC survey being conducted as part of an MSc. thesis an item of as much interest to MOOC researchers as
students. It is unknown how widely prevalent the term “MOOC” is in popular discourse and hence many
MOOC students may go undetected. This may limit the value of using the term MOOC to make inferences
about learners. Using other search constructs that would comprise course, platform, provider or some
combinations of these might bring more learners into the dataset.
The sample of top tweets from the sentiment analysis does appear to show interesting data from MOOC learners
however. All but one of these five tweets are from what we may infer to be a MOOC learner, or in one case
prospective learner. The other tweet appears to be from a MOOC commentator/researcher. Of course
researchers may also be MOOC students. Research has shown that MOOC learners have disproportionally high
levels of educational attainment (Jordan, 2014). This is borne out here in that one of the sample tweets from the
sentiment analysis is from well-known academic relating to a MOOC about “surviving” PhDs. Our findings
suggest there may be a value in using sentiment analysis to filter a Twitter dataset before performing other types
of analyses for researchers. For instance, it can be seen from a visual scan that peak tweets which are
informational (and promotional) are relatively lacking in or have weak positive sentiment. This requires further
analysis.
This paper has outlined the techniques we used and the theoretical basis by which we adopted these approaches
in examining MOOCs in a Twitter dataset. We used descriptive and content analysis techniques to probe a
sample of tweets using the hashtag #MOOC. Our results pose perhaps more questions that give definitive
answers but we contribute by conducting exploratory analyses in an underexplored area namely research on
MOOC actors using large Twitter datasets.
References
Abeywardena, I. S. (2014). Public opinion on OER and MOCC: A sentiment analysis of Twitter data.
Proceedings of the International Conference on Open and Flexible Education (ICOFE 2014), Hong Kong
SAR, China.
Chae, B. K. (2014). A complexity theory approach to IT-enabled services (IESs) and service innovation:
Business analytics as an illustration of IES. Decision Support Systems, 57, 110.
Chae, B. K. (2015). Insights from hashtag# supplychain and Twitter analytics: Considering Twitter and Twitter
data for supply chain practice and research. International Journal of Production Economics, 165, 247-259.
Driscoll, K., & Walker, S. (2014). Big data, big questions| working within a black box: Transparency in the
collection and production of big Twitter data. International Journal of Communication, 8, 20.
Du, P., Kibbe, W. A., & Lin, S. M. (2006). Improved peak detection in mass spectrum by incorporating
continuous wavelet transform-based pattern matching. Bioinformatics, 22(17), 2059-2065.
Healy, P., Hunt, G., Kilroy, S., Lynn, T., Morrison, J. P., & Venkatagiri, S. (2015, November). Evaluation of
peak detection algorithms for social media event detection. In Semantic and Social Media Adaptation and
Personalization (SMAP), 2015 10th International Workshop on (pp. 1-9). IEEE.
Jordan, K. (2014). Initial trends in enrolment and completion of massive open online courses. The International
Review of Research in Open and Distributed Learning, 15(1).
Lehmann, J., Gonçalves, B., Ramasco, J. J., & Cattuto, C. (2012, April). Dynamical classes of collective
attention in Twitter. Proceedings of the 21st international conference on World Wide Web (pp. 251-260).
ACM.
Lynn, T., Healy, P., Kilroy, S., Hunt, G., van der Werff, L., Venkatagiri, S., & Morrison, J. (2015). Towards a
general research framework for social media research using big data. Proceedings of 2015 IEEE
International Professional Communication Conference (IPCC) (pp. 1-8). IEEE.
162
McAuley, A., Stewart, B., Siemens, G., & Cormier, D. (2010). The MOOC model for digital practice. In SSHRC
Knowledge Synthesis Grant on the Digital Economy. Retrieved from http://www.edukwest.com/wp-
content/uploads/2011/07/MOOC_Final.pdf [viewed 08 July 2106]
Myers, S. A., Sharma, A., Gupta, P., & Lin, J. (2014). Information network or social network?: the structure of
the Twitter follow graph. Proceedings of the 23rd International Conference on World Wide Web (pp. 493-
498). ACM.
Palshikar, G. (2009). Simple algorithms for peak detection in time-series. Proceedings of 1st International
Conference of Advanced Data Analysis, Business Analytics and Intelligence (pp. 1-13).
Pang, B., & Lee, L. (2008). Opinion mining and sentiment analysis. Foundations and Trends in
Information Retrieval, 2(12), 1135.
Rao, A., Spasojevic, N., Li, Z., & Dsouza, T. (2015). Klout score: Measuring influence across multiple social
networks. In Big Data (Big Data), 2015 IEEE International Conference on (pp. 2282-2289). IEEE.
Zhang, J., Perris, K., Zheng, Q., & Chen, L. (2015). Public Response to “the MOOC Movement” in China:
Examining the Time Series of Microblogging. The International Review of Research in Open and
Distributed Learning, 16(5).
Note: All published papers are refereed, having undergone a double-blind peer-review process.
The author(s) assign a Creative Commons by attribution licence enabling others
to distribute, remix, tweak, and build upon their work, even commercially, as
long as credit is given to the author(s) for the original creation.
Please cite as: Costello, E., Binesh, N., Brown, M., Zhang, J., Giolla-Mhichíl, M.N., Donlon, E. &
Lynn, T. (2016). Social Media #MOOC Mentions: Lessons for MOOC mentions from analysis of
Twitter data. In S. Barker, S. Dawson, A. Pardo, & C. Colvin (Eds.), Show Me The Learning.
Proceedings ASCILITE 2016 Adelaide (pp. 157-162).
... There may be a great many insights we could unlock from mining digital data, but there are equally consequences that this activity could have that may be unforeseen, unintended, or at worst wilfully ignored. This became apparent to members of our research team during a research study of the hashtag #MOOC on Twitter (Costello et al., 2016;Costello et al., 2017). In our study we sought to analyze the discourse of MOOCs on Twitter by conceptualized Twitter users as actors in a form of networked public. ...
... Only 22 articles (71%) discussed how tweets were collected, and of those, not all were explicit with regard to the technique used. Where methods were stated, approaches included Crowdmap (Koutropoulos et al., 2014), Digital Methods Initiative Twitter Capture and Analysis Toolset (de Keijser & van der Vlist, 2014), GNIP API (Costello et al., 2016), gRSShopper (Fournier et al., 2014;Joksimović, Dowell, et al., 2015;Joksimović, Zouaq, et al., 2015;Skrypnyk et al., 2015), NodeXL (Bozkurt et al., 2016;Tu, 2014), TAGsExplorer (Bell et al., 2016), Twitonomy (Enriquez-Gibson, 2014a, 2014b), Twitter API Shen & Kuo, 2015) and web crawlers (Chen et al., 2016;García-Peñalvo et al., 2015;Zhang et al., 2015). ...
... Twenty-one studies (68%) outlined their methods for processing Twitter data with various techniques mentioned, including CohMetrix computational linguistic facility (Joksimović, Dowell, et al., 2015), Dedoose (Salmon et al., 2015), Gephi (Costello et al., 2017;de Keijser & van der Vlist, 2014;Tu, 2014;Yeager et al., 2013), Netlytic (Bell et al., 2016), NVivo (Bozkurt et al., 2016;Fournier et al., 2014;Liu et al., 2016), OpinionFinder (Shen & Kuo, 2015), PHP/MySQL scripts (Veletsianos, 2017), R Big Query (Costello et al., 2016, andJoksimović, Dowell, et al., 2015), Semantria3 (Abeywardena, 2014), spreadsheets , TagMe (Joksimović, Zouaq, et al., 2015), Twitonomy and Wordle (Enriquez-Gibson, 2014a, 2014b, and WEKA, SimpleKMeans and Weka ClassifierSubsetEval (Kravvaris et al., 2016). ...
Article
Full-text available
This study examined the ethical considerations researchers have made when investigating MOOC learners’ and teachers’ Twitter activity. In so doing, it sought to addresses the lack of an evidence-based understanding of the ethical implications of research into Twitter as a site of teaching and learning. Through an analysis of 31 studies we present a mapping of the ethical practices of researchers in this area. We identified potential ethical issues and concerns that have arisen. Our main contribution is to seek to challenge researchers to engage critically with ethical issues and hence develop their own understanding of ethically- appropriate approaches. To this end, we also reflected and reported on our own evolving practice.
... Other analyses of tweets (n = 22) used machine-learning approaches to analyse content. Machine learning approaches comprised those that detected themes via topic modelling and keyword mining Joksimović et al. (2015b) and/or sentiment towards particular topics i.e. sentiment analysis (Abeywardena 2014;Shen and Kuo 2015;Costello et al. 2016). Other machine-learning or computational approaches analysed the metadata of tweets, such as information on time periods of twitter activity, quantity of twitter activity and changes in twitter activity over time Zhang et al. (2015). ...
... Another small but interesting subcategory of the research studies consists of those that considered multiple MOOCs. A total of 15 studies were conducted on more than one MOOC, but several analyses were performed on over 100 MOOCs (Shen and Kuo 2015;Tu 2014;Zhang et al., 2015;Kravvaris et al. 2016;Veletsianos 2017;Costello et al. 2016;Costello et al. 2017). The variance of the dataset sizes in these studies has implications for the comparability of findings. ...
... 95,015 Costello et al. (2016) 32,309 Costello et al. (2017) 32,309 Bozkurt et al. (2016) 20,000 Knox (2014) 18,745 ...
Article
Full-text available
Although research on the use of Twitter in support of learning and teaching has become an established field of study the role of Twitter in the context of Massive Open Online Courses (MOOCs) has not yet been adequately considered and specifically in the literature. Accordingly, this paper addresses a number of gaps in the scholarly interface between Twitter and MOOCs by undertaking a comprehensive mapping of the current literature. In so doing the paper examines research design through: data collection and analysis techniques; scope and scale of existing studies; and theoretical approaches and underpinnings in the empirical research published between 2011 and 2017. Findings serve to demonstrate the diversity of this line of research, particularly in scale and scope of studies and in the approaches taken. By mapping the research using a systematic review methodology it is shown that there is a lack of qualitative data on how Twitter is used by learners and teachers in MOOCs. Moreover, a number of methodological gaps exist in published quantitative survey research at the interface between Twitter and MOOCs, including issues in the trustworthy reporting of results and full consideration of tweet and tweet meta-data collection. At the same time the paper highlights areas of methodological “best practice” in the research around these issues and in other important areas such as large-scale hashtag analyses of the use of Twitter in MOOCs. In reviewing the literature the findings aim to strengthen the methodological foundation of future work and help shape a stronger research agenda in this emerging area.
... This result aligned with a recent study on Chinese user satisfaction with selected online education platforms during the pandemic , but stood in contrast with past studies that examined the public view of online learning such as MOOCs. In a series of time-series studies (Zhou, 2020;Costello et al., 2016), negative opinions of MOOCs seemed to remain stable with only a small portion. Research has shown that remote learning can be as good or better than in-person learning for the students who chose it (Fitter et al., 2020). ...
Article
Due to the novel coronavirus disease (COVID-19) outbreak in China, a large number of Chinese students resorted to online learning resources. The increasingly widespread online education enables the investigation of public opinion about this large-scale untraditional mode of learning during this critical period. Sina Weibo Microblogs (the Chinese equivalent of Twitter) related to online education were collected in three distinctive phases: from July 01, 2019 to January 09, 2020 (pre-pandemic); from January 10, 2020 to April 30, 2020 (amid-pandemic); and from May 01, 2020 to Nov 30, 2020 (post-pandemic), respectively. The aim was to obtain broad insight into how online learning was viewed by the public in the Chinese educational landscape. The public opinion during these three periods were analysed and compared. The findings facilitated a better understanding of what the Chinese public perceived about this online learning mode in becoming the dominant channel for teaching and learning during critical periods.
... We conducted several analyses of the dataset using natural language and other text corpus analysis techniques customized for Twitter datasets. This analysis of frequent words, n-grams, hashtags, urls and sentiment is reported elsewhere[10]. Network analytics, based on graph theory, represents a way to examine networks through statistics such as average degree, network density, diameter, average path length etc. ...
Conference Paper
Full-text available
In this paper we present results of the initial phase of a project which sought to analyze the community who use the hashtag #MOOC in Twitter. We conceptualize this community as a form of networked public. In doing so we ask what the nature of this public is and whether it may be best conceived of as a social or informational network. In addition we seek to uncover who the stakeholders are who most influentially participate. We do this by using Social Network Analysis (SNA) to uncover the key hubs and influencers in the network. We use two approaches to deriving a network typology-one based on follows and on based on replies and compare and contrast the results.
... We conducted several analyses of the dataset using natural language and other text corpus analysis techniques customized for Twitter datasets. This analysis of frequent words, n-grams, hashtags, urls and sentiment is reported elsewhere [10]. ...
Article
The increasing and widespread usage of social media enables the investigation of public preference using the web as a device. Public sentiment as expressed in 44,319 massive open online course (MOOCs) related microblogs from January to December 2017 was examined on Sina Weibo (the Chinese equivalent of Twitter) to obtain broad insight into how MOOCs are viewed by the public in the Chinese educational landscape. Despite the unstable upward trend of public interest in MOOCs over the past 12 months, the public opinion on MOOCs was largely positive. Content and sentiment analyses were conducted to facilitate a better understanding of what is communicated on social media. A general model of public opinions of MOOCs in China has been developed based on the findings. Individuals were classified into a threefold typology based on the sources and purposes of how this recent form of distance education was perceived. Based on the seven themes, the public views towards MOOCs were differentiated among ‘promoters’, ‘commenters’ and ‘experiencers’.Implications of the findings were also discussed.
Chapter
“MOOC” (Massive Open Online Course) is a large-scale open online learning platform. MOOC forum is the learner to learn the mutual place, through the course of the forum interactive text data as the data base, the combination of word2vec and machine learning algorithm to build emotional, and then combined with the learning emotion of this emotion classifier on learners’ emotional tendency judgment, thus obtains the MOOC learning environment learning and emotional changes the exchange of learning, which can make up for the learning between the loss of emotion and increase the learning and improve the learning efficiency and learning quality of learners.
Conference Paper
Full-text available
In this work, we present the Klout Score, an influence scoring system that assigns scores to 750 million users across 9 different social networks on a daily basis. We propose a hierarchical framework for generating an influence score for each user, by incorporating information for the user from multiple networks and communities. Over 3600 features that capture signals of influential interactions are aggregated across multiple dimensions for each user. The features are scalably generated by processing over 45 billion interactions from social networks every day, as well as by incorporating factors that indicate real world influence. Supervised models trained from labeled data determine the weights for features, and the final Klout Score is obtained by hierarchically combining communities and networks. We validate the correctness of the score by showing that users with higher scores are able to spread information more effectively in a network. Finally, we use several comparisons to other ranking systems to show that highly influential and recognizable users across different domains have high Klout scores.
Conference Paper
Full-text available
We evaluate the effectiveness of three peak detection algorithms when applied to collection of social media datasets. Each dataset is composed of a year's worth of tweets relating to a topic. The datasets were converted to time series composed of hourly tweet volumes. The objective of the analysis was to identify abnormal surges of communication, which are taken to be representative of the occurrence of events relevant to the topic under consideration. The ground truth was established by manually tagging the time series in order to identify peaks apparent to a human operator. Candidate algorithms were then evaluated in terms of the precision, recall, and F1 scores obtained when their output was compared to the manually identified peaks. A general-purpose algorithm is found to perform reasonably well, but seasonality in social media data limits the effectiveness of applying simple algorithms without filtering.
Conference Paper
Full-text available
The increasing adoption of cloud computing, social networking, mobile and big data technologies provide challenges and opportunities for both research and practice. Researchers face a deluge of data generated by social network platforms which is further exacerbated by the co-mingling of social network platforms and the emerging Internet of Everything. While the topicality of big data and social media increases, there is a lack of conceptual tools in the literature to help researchers approach, structure and codify knowledge from social media big data in diverse subject matter domains, many of whom are from non-technical disciplines. Researchers do not have a general-purpose scaffold to make sense of the data and the complex web of relationships between entities, social networks, social platforms and other third party databases, systems and objects. This is further complicated when spatio-temporal data is introduced. Based on practical experience of working with social media datasets and existing literature, we propose a general research framework for social media research using big data. Such a framework assists researchers in placing their contributions in an overall context, focusing their research efforts and building the body of knowledge in a given discipline area using social media data in a consistent and coherent manner.
Article
Full-text available
Twitter seems to provide a ready source of data for researchers interested in public opinion and popular communication. Indeed, tweets are routinely integrated into the visual presentation of news and scholarly publishing in the form of summary statistics, tables, and charts provided by commercial analytics software. Without a clear description of how the underlying data were collected, stored, cleaned, and analyzed, however, readers cannot assess their validity. To illustrate the critical importance of evaluating the production of Twitter data, we offer a systematic comparison of two common sources of tweets: the publicly accessible Streaming API and the “fire hose” provided by Gnip PowerTrack. This study represents an important step toward higher standards for the reporting of social media research.
Conference Paper
Full-text available
The Open Educational Resources (OER) movement has gained significant momentum recently as a global effort culminating in the 2012 Paris OER declaration. However, the purist definition of OER has blurred since then morphing into Massive Open Online Courses (MOOC). Even though OER are a significant part of the MOOC movement, it might not be a defining one. However, this has not yet been fully verified with respect to the opinion of the general public who are the main stakeholders of both the movements. To answer this question, this paper attempts to explore the public opinion and perceptions regarding OER, MOOC and their complementary roles. A text mining approach is used to analyse raw Twitter data in the domains of OER and MOOC within a timespan of 12 months. Sentiment analysis is applied to the data to understand how public perceptions have changed during this time period. The major contribution of my paper is a chronological view of public opinion on OER and MOOC post Paris OER declaration.
Article
Full-text available
Identifying and analyzing peaks (or spikes) in a given time-series is important in many applications. Peaks indicate significant events such as sudden increase in price/volume, sharp rise in demand, bursts in data traffic etc. While it is easy to visually identify peaks in a small univariate time-series, there is a need to formalize the notion of a peak to avoid subjectivity and to devise algorithms to automatically detect peaks in any given time-series. The latter is important in applications such as data center monitoring where thousands of large time-series indicating CPU/memory utilization need to be analyzed in real-time. A data point in a time-series is a local peak if (a) it is a large and locally maximum value within a window, which is not necessarily large nor globally maximum in the entire time-series; and (b) it is isolated i.e., not too many points in the window have similar values. Not all local peaks are true peaks; a local peak is a true peak if it is a reasonably large value even in the global context. We offer different formalizations of the notion of a peak and propose corresponding algorithms to detect peaks in the given time-series. We experimentally compare the effectiveness of these algorithms.
Article
Recently, businesses and research communities have paid a lot of attention to social media and big data. However, the field of supply chain management (SCM) has been relatively slow in studying social media and big data for research and practice. In these contexts, this research contributes to the SCM community by proposing a novel, analytical framework (Twitter Analytics) for analyzing supply chain tweets, highlighting the current use of Twitter in supply chain contexts, and further developing insights into the potential role of Twitter for supply chain practice and research. The proposed framework combines three methodologies – descriptive analytics (DA), content analytics (CA) integrating text mining and sentiment analysis, and network analytics (NA) relying on network visualization and metrics – for extracting intelligence from 22,399 #supplychain tweets. Some of the findings are: supply chain tweets are used by different groups of supply chain professionals and organizations (e.g., news services, IT companies, logistic providers, manufacturers) for information sharing, hiring professionals, and communicating with stakeholders, among others; diverse topics are being discussed, ranging from logistics and corporate social responsibility, to risk, manufacturing, SCM IT and even human rights; some tweets carry strong sentiments about companies׳ delivery services, sales performance, and environmental standards, and risk and disruption in supply chains. Based on these findings, this research presents insights into the use and potential role of Twitter for supply chain practices (e.g., professional networking, stakeholder engagement, demand shaping, new product/service development, supply chain risk management) and the implications for research. Finally, the limitations of the current study and suggestions for future research are presented.
Conference Paper
In this paper, we provide a characterization of the topological features of the Twitter follow graph, analyzing properties such as degree distributions, connected components, shortest path lengths, clustering coefficients, and degree assortativity. For each of these properties, we compare and contrast with available data from other social networks. These analyses provide a set of authoritative statistics that the community can reference. In addition, we use these data to investigate an often-posed question: Is Twitter a social network or an information network? The "follow" relationship in Twitter is primarily about information consumption, yet many follows are built on social ties. Not surprisingly, we find that the Twitter follow graph exhibits structural characteristics of both an information network and a social network. Going beyond descriptive characterizations, we hypothesize that from an individual user's perspective, Twitter starts off more like an information network, but evolves to behave more like a social network. We provide preliminary evidence that may serve as a formal model of how a hybrid network like Twitter evolves.
Article
While firms view services as the main source of their revenue and competitive advantage, understanding of service and service innovation is limited. This lack of understanding is especially significant in IT-Enabled Services (IESs) and IES innovation. Much work is needed to understand the contemporary trend of integrating diverse material and social resources to address complex organizational and individual needs. This article proposes a novel framework for IES and IES innovation and develops propositions and implications for research and practice. This work draws upon the tenet of complexity theory and conceptualizes IES as complex adaptive systems (CAS), with such properties and behaviors as diverse adaptive elements, nonlinear interaction, self-organization, and adaptive learning, and IES innovation as a co-evolutionary process of variation, selection, and retention (VSR). The proposed framework is illustrated using business analytics (BA) as a new kind of decision support service (DSS) throughout this paper. Several propositions are developed. Finally, we present a discussion and implications.