Article

Using Convolutional Neural Network to Recognize Learning Images for Early Warning of At-Risk Students

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

This study proposes two innovative approaches, the 1-channel learning image recognition (1-CLIR) and the 3-channel learning image recognition (3-CLIR) to convert student's course involvements into images for early warning predictive analysis. Multiple experiments with 5,235 students and 576 absolute/1728 relative input variables were conducted to verify their effectiveness. The results indicate both methods can significantly capture more at-risk students (the highest average recall rate is equal to 77.26%) than the following machine learning algorithms—Support Vector Machine (SVM), Random Forest (RF), and Deep Neural Network (DNN) in the middle of the semester. In addition, the innovative approaches allow minor subtypes of at-risk student identification and provide visual insights for personalized interventions. Implications and future directions are also discussed in the article.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Combined with the information system success theory, the influence of information quality and system quality on perceived usefulness and ease of use is proposed, and a new research model of influencing factors of mobile location service usage intention is constructed. Literature [19] researches and analyzes users' consumption patterns from the perspective of location data and completes an intelligent system that integrates event detection, event prediction, and estimation of the number of customers. Reference [20] studies the user check-in data of LBSNS, analyzes and predicts the user behavior from the aspects of space, time, and social interaction, and establishes a location prediction model by using the main factors that affect the user's mobile behavior obtained by the study. ...
Article
Full-text available
In order to improve the willingness of continuous use of mobile social network information services, this study combines user behavior perception to analyze the continuous use of mobile social network information services and proposes a data coverage optimization strategy based on service quality perception. Furthermore, this study measures participants’ regional preferences based on the duration of participants in the perceptual region and the number of historical perceptual tasks completed on the perceptual region. In addition, this study designs a perceptual data coverage optimization algorithm to optimize the perceptual data coverage and ensure the real-time validity of the perceptual data. Through algorithm research and systematic evaluation, it can be seen that the continuous use willingness system of mobile social network information service based on user behavior perception can basically meet the actual needs.
... The algorithm's advantages in accuracy and technology were verified by practical cases (Huang and Xiang, 2018). Yang et al. (2020) predicted high-risk students using the Convolutional Neural Network (CNN)'s learning image recognition function. The results showed that the two proposed methods could perform better than algorithms such as support vector machines, random forests, and deep neural networks. ...
Article
Full-text available
The purpose is to minimize the substantial losses caused by public health emergencies to people’s health and daily life and the national economy. The tuberculosis data from June 2017 to 2019 in a city are collected. The Structural Equation Model (SEM) is constructed to determine the relationship between hidden and explicit variables by determining the relevant indicators and parameter estimation. The prediction model based on Artificial Neural Network (ANN) and Convolutional Neural Network (CNN) is constructed. The method’s effectiveness is verified by comparing the prediction model’s loss value and accuracy in training and testing. Meanwhile, 50 pieces of actual cases are tested, and the warning level is determined according to the T-value. The results show that comparing and analyzing ANN, CNN, and the hybrid network of ANN and CNN, the hybrid network’s accuracy (95.1%) is higher than the other two algorithms, 89.1 and 90.1%. Also, the hybrid network has sound prediction effects and accuracy when predicting actual cases. Therefore, the early warning method based on ANN in deep learning has better performance in public health emergencies’ early warning, which is significant for improving early warning capabilities.
Article
Finding students at high risk of poor academic performance as early as possible plays an important role in improving education quality. To do so, most existing studies have used the traditional machine learning algorithms to predict students’ achievement based on their behavior data, from which behavior features are extracted manually thanks to expert experience and knowledge. However, owing to an increase in the varieties and overall volume of behavioral data, it has become more and more challenging to identify high-quality handcrafted features. In this paper, we propose an end-to-end deep learning model that automatically extracts features from students’ multi-source heterogeneous behavior data to predict academic performance. The key innovation of this model is that it uses long short-term memory networks to capture inherent time-series features for each type of behavior, and it takes two-dimensional convolutional networks to extract correlation features among different behaviors. We conducted experiments with four types of daily behavior data from students of the university in Beijing. The experimental results demonstrate that the proposed deep model method outperforms several machine learning algorithms.
Article
Full-text available
Machine learning is emerging nowadays as an important tool for decision support in many areas of research. In the field of education, both educational organizations and students are the target beneficiaries. It facilitates the educational sector in predicting the student’s outcome at the end of their course and for the students in deciding to choose a suitable course for them based on their performances in previous exams and other behavioral features. In this study, a systematic literature review is performed to extract the algorithms and the features that have been used in the prediction studies. Based on the search criteria, 2700 articles were initially considered. Using specified inclusion and exclusion criteria, quality scores were provided, and up to 56 articles were filtered for further analysis. The utmost care was taken in studying the features utilized, database used, algorithms implemented, and the future directions as recommended by researchers. The features were classified as demographic, academic, and behavioral features, and finally, only 34 articles with these features were finalized, whose details of study are provided. Based on the results obtained from the systematic review, we conclude that the machine learning techniques have the ability to predict the students’ performance based on specified features as categorized and can be used by students as well as academic institutions. A specific machine learning model identification for the purpose of student academic performance prediction would not be feasible, since each paper taken for review involves different datasets and does not include benchmark datasets. However, the application of the machine learning techniques in educational mining is still limited, and a greater number of studies should be carried out in order to obtain well-formed and generalizable results. We provide future guidelines to practitioners and researchers based on the results obtained in this work.
Article
In this digital age, there is an abundance of online educational materials in public and proprietary platforms. To allow effective retrieval of educational resources, it is a necessity to build keyword-based search engines over these collections. In modern Web search engines, high-quality rankings are obtained by applying machine learning techniques, known as learning to rank (LTR). In this article, our focus is on constructing machine-learned ranking models to be employed in a search engine in the education domain. Our contributions are threefold. First, we identify and analyze a rich set of features (including click-based and domain-specific ones) to be employed in educational search. LTR models trained on these features outperform various baselines based on ad-hoc retrieval functions and two neural models. As our second contribution, we utilize domain knowledge to build query-dependent ranking models specialized for certain courses or education levels. Our experiments reveal that query-dependent models outperform both the general ranking model and other baselines. Finally, given well-known importance of user clicks in LTR, our third contribution is for handling singleton queries without any click information. To this end, we propose a new strategy to “propagate” click information from the other, similar, queries to the singleton queries. The proposed click propagation approach yields a better ranking performance than the general ranking model and another baseline from the literature. Overall, these findings reveal that both the general and query-dependent ranking models, trained using LTR approaches, yield high effectiveness in educational search, which may ultimately lead to a better learning experience.
Article
The Overview of machine learning Key Branch, and then provide the complete protection of the deep neural network. It covers important critical concepts, testing methods, applications, issues related to assessment level (regression and classification), unattended learning (reduction set and dimension), active learning and semi-tracking (pre-attachment method) cover, and intensive learning. The scope of in-depth neural network communication includes Retrieval Neural Networks (RNNs), and word embedding and related technologies. Discussion issues, the online platform of intelligent library technology and easy relational tools are still so obvious that computer origin is a challenge and is based on the natural visualization of Digital Library (DL) related applications and large data analysis. To search the online platform of intelligent library documents based on the digital library image, recommend designing an assortment that adds descriptions to main images. First, propose a machine learning technique visual description appropriate to the representation of that image. The image is divided into regions based on the type of a particular area and then pointers. Second, propose an image classification method for the freedom of interpretation spaces. This feature is obtained by combining selection and kernel-based method classification.
Article
Full-text available
Background Teaching online is a different experience from that of teaching in a face-to-face setting. Knowledge and skills developed for teaching face-to-face classes are not adequate preparation for teaching online. It is even more challenging to teach science, technology, engineering and math (STEM) courses completely online because these courses usually require more hands-on activities and live demonstrations. Although the demand for online STEM courses has never been higher, little has been done to develop effective instructional and online course design strategies for teaching STEM courses online. This paper reports the effectiveness of the instructional strategies adopted and the online course design features in a fully online statistics course from the students’ perspectives. The online statistics course was an introductory, quantitative research course that covered common statistical concepts and focused on the application of educational research concepts for graduate students in educational technology. In terms of the statistics concepts covered, the course was similar to an introductory statistics class for students majoring in science, technology, math and engineering (STEM). The participants were mostly K-20 (meaning from kindergarten to college) instructors who had knowledge of instructional strategies. Results Data collected from participants’ reflections and course evaluations revealed that a range of instructional strategies and course design features were effective and helped students learn statistics in an online environment. Specifically, case studies, video demonstrations, instructor’s notes, mini projects, and an online discussion forum were most effective. For online course design features, consistent structure, various resources and learning activities, and the application focused course content were found to be effective. Conclusions The implications of this study include effective instructional strategies and online course design for application-oriented STEM courses such as physics and engineering. The study results can be used to guide online teaching and learning as well as online course design for instructors, course designers, and students in STEM fields.
Article
Full-text available
p class="3">In recent years there has been a proliferation of massive open online courses (MOOCs), which provide unprecedented opportunities for lifelong learning. Registrants approach these courses with a variety of motivations for participation. Characterizing the different types of participation in MOOCs is fundamental in order to be able to better evaluate the phenomenon and to support MOOCs developers and instructors in devising courses which are adapted for different learners' needs. Thus, the purpose of this study was to characterize the different types of participant behavior in a MOOC. Using a data mining methodology, 21,889 participants of a MOOC were classified into clusters, based on their activity in the main learning resources of the course: video lectures, discussion forums, and assessments. Thereafter, the participants in each cluster were characterized in regard to demographics, course participation, and course achievement characteristics. Seven types of participant behavior were identified: Tasters (64.8%), Downloaders (8.5%), Disengagers (11.5%), Offline Engagers (3.6%), Online Engagers (7.4%), Moderately Social Engagers (3.7%), and Social Engagers (0.6%). A significant number of 1,020 participants were found to be engaged in the course, but did not achieve a certificate. The types are discussed according to the established research questions. The results provide further evidence regarding the utilization of the flexibility, which is offered in MOOCs, by the participants according to their needs. Furthermore, this study supports the claim that MOOCs' impact should not be evaluated solely based on certification rates but rather based on learning behaviors.</p
Article
Full-text available
Student retention and timely graduation are enduring challenges in higher education. With the rapidly expanding collection and availability of learning data and related analytics, student performance can be accurately monitored, and possibly predicted ahead of time, thus enabling early warning and degree planning ‘expert systems’ to provide disciplined decision support to counselors, advisors, and educators. Previous work in educational data mining has explored matrix factorization techniques for grade prediction, albeit without taking contextual information into account. Temporal information should be informative as it distinguishes between the different class offerings and indirectly captures student experience as well. To exploit temporal and/or other kinds of context, we develop three approaches under the framework of Collaborative Filtering (CF). Two of the proposed approaches build upon Coupled Matrix Factorization (CMF) with a shared latent matrix factor. The third utilizes tensor factorization to model grades and their context, without introducing a new mode per context dimension as is common in the CF literature. The latent factors obtained can be used to predict grades and context, if desired. We evaluate these approaches on grade data obtained from the University of Minnesota. Experimental results show that fairly good prediction is possible even with simple approaches, but very accurate prediction is hard. The more advanced approaches can increase prediction accuracy, but only up to a point for the particular dataset considered.
Article
Full-text available
Studies have shown that articulatory information helps model speech variability and, consequently, improves speech recognition performance. But learning speaker-invariant articulatory models is challenging, as speaker-specific signatures in both the articulatory and acoustic space increase complexity of speech-to-articulatory mapping, which is already an ill-posed problem due to its inherent nonlinearity and non-unique nature. This work explores using deep neural networks (DNNs) and convolutional neural networks (CNNs) for mapping speech data into its corresponding articulatory space. Our speech-inversion results indicate that the CNN models perform better than their DNN counterparts. In addition, we use these inverse-models to generate articulatory information from speech for two separate speech recognition tasks: the WSJ1 and Aurora-4 continuous speech recognition tasks. This work proposes a hybrid convolutional neural network (HCNN), where two parallel layers are used to jointly model the acoustic and articulatory spaces, and the decisions from the parallel layers are fused at the output context-dependent (CD) state level. The acoustic model performs time-frequency convolution on filterbank-energy-level features, whereas the articulatory model performs time convolution on the articulatory features. The performance of the proposed architecture is compared to that of the CNN- and DNN-based systems using gammatone filterbank energies as acoustic features, and the results indicate that the HCNN-based model demonstrates lower word error rates compared to the CNN/DNN baseline systems.
Article
Full-text available
Language learning occurring in authentic contexts has been shown to be more effective. Virtual worlds provide simulated contexts that have the necessary elements of authentic contexts for language learning, and as a result, many studies have adopted virtual worlds as a useful platform for language learning. However, few studies so far have examined the relationship between learning outcomes and learning paths and strategies inside a virtual world. This study was designed to fill this research gap. In order to understand the impact of different learning strategies on learning outcomes in a virtual world, a visualization analytic method was developed to examine the recorded learner paths within a virtual world while learning occurred. In particular, the visualization analytic method adopted in this study was based on social network analysis. This study included 14 participants who were learners of Mandarin Chinese as a foreign language. The learning outcomes were based on their test scores from 7 learning sessions and the post-test. Through the visualization analysis, the current study revealed a link between the learning paths and strategies and learners' outcomes. The strategies include the " nearest strategy, " the " focus strategy, " and the " cluster strategy. " The findings show that high-achieving and low-achieving students tend to use different strategies in learning new words. The visualization analytics thus effectively displays the learning strategies of vocabulary acquisition. Our methods could be applied to other second language learning studies, and the results can also provide insights into the construction of future virtual worlds for learning second languages.
Article
Full-text available
Automatic multimedia learning resources recommendation has become an increasingly relevant problem: it allows students to discover new learning resources that match their tastes, and enables the e-learning system to target the learning resources to the right students. In this paper, we propose a content-based recommendation algorithm based on convolutional neural network (CNN). The CNN can be used to predict the latent factors from the text information of the multimedia resources. To train the CNN, its input and output should first be solved. For its input, the language model is used. For its output, we propose the latent factor model, which is regularized by L1-norm. Furthermore, the split Bregman iteration method is introduced to solve the model. The major novelty of the proposed recommendation algorithm is that the text information is used directly to make the content-based recommendation without tagging. Experimental results on public databases in terms of quantitative assessment show significant improvements over conventional methods. In addition, the split Bregman iteration method which is introduced to solve the model can greatly improve the training efficiency.
Article
Full-text available
Early Warning Systems (EWSs) aggregate multiple sources of data to provide timely information to stakeholders about students in need of academic support. There is an increasing need to incorporate relevant data about student behaviors into the algorithms underlying EWSs to improve predictors of students’ success or failure. Many EWSs currently incorporate counts of course resource use, although these measures provide no information about which resources students are using. We use seven years of data from seven core STEM courses at a large university to investigate the associations between students’ use of categorized course resources (e.g., lecture or exam preparation resources) and their final course grade. Using logistic regression, we find that students who use exam preparation resources to a greater degree than their peers are more likely to receive a final grade of B or higher. In contrast, students who use more lecture-related resources than their peers are less likely to receive a final grade of B or higher. We discuss the implications of our results for developers deciding how to incorporate categories of course resource usage data into EWSs, for academic advisors using this information with students, and for instructors deciding which resources to include on their LMS site.
Article
Full-text available
Using predictive modeling methods, it is possible to identify at-risk students early and inform both the instructors and the students. While some universities have started to use standards-based grading, which has educational advantages over common score-based grading, at–risk prediction models have not been adapted to reap the benefits of standards-based grading in courses that utilize this grading. In this paper, we compare predictive methods to identify at-risk students in a course that used standards-based grading. Only in-semester performance data that were available to the course instructors were used in the prediction methods. When identifying at-risk students, it is important to minimize false negative (i.e., type II) error while not increasing false positive (i.e., type I) error significantly. To increase the generalizability of the models and accuracy of the predictions, we used a feature selection method to reduce the number of variables used in each model. The Naive Bayes Classifier model and an Ensemble model using a sequence of models (i.e., Support Vector Machine, K-Nearest Neighbors, and Naive Bayes Classifier) had the best results among the seven tested modeling methods.
Article
Full-text available
It is important to study and analyse educational data especially students’ performance. Educational Data Mining (EDM) is the field of study concerned with mining educational data to find out interesting patterns and knowledge in educational organizations. This study is equally concerned with this subject, specifically, the students’ performance. This study explores multiple factors theoretically assumed to affect students’ performance in higher education, and finds a qualitative model which best classifies and predicts the students’ performance based on related personal and social factors.
Article
Full-text available
A Convolutional Neural Network (CNN) trained on a corpus of images consists of filters tuned to visual features relevant to the task at hand. Variations in the resolution of the images and in the size of the objects and patterns depicted, require the filters to both ignore task-irrelevant scale variations (for recognizing a face, the size of the face is irrelevant) and to respond to task-relevant features at a specific scale (given a scale, the shape and size of the nose are relevant). Previous work focused on developing scale-invariant filters in CNNs. This paper addresses the combined development of scale-invariant and scale-variant filters. We propose a multi-scale CNN method to encourage the development of both types of filters and evaluate it on a challenging image classification task involving task-relevant characteristics at multiple scales. The results show the multi-scale CNN to outperform single-scale CNNs. This leads to the conclusion that encouraging the combined development of scale-invariant and scale-variant filters in CNNs is beneficial to image recognition performance.
Article
Full-text available
Early prediction of school dropout is a serious problem in education, but it is not an easy issue to resolve. On the one hand, there are many factors that can influence student retention. On the other hand, the traditional classification approach used to solve this problem normally has to be implemented at the end of the course to gather maximum information in order to achieve the highest accuracy. In this paper, we propose a methodology and a specific classification algorithm to discover comprehensible prediction models of student dropout as soon as possible. We used data gathered from 419 high schools students in Mexico. We carried out several experiments to predict dropout at different steps of the course, to select the best indicators of dropout and to compare our proposed algorithm versus some classical and imbalanced well-known classification algorithms. Results show that our algorithm was capable of predicting student dropout within the first four-six weeks of the course and trustworthy enough to be used in an early warning system.
Conference Paper
Full-text available
Predicting the success or failure of a student in a course or program is a problem that has recently been addressed using data mining techniques. In this paper we evaluate some of the most popular classification and regression algorithms on this problem. We address two problems: prediction of ap-proval/failure and prediction of grade. The former is tackled as a classification task while the latter as a regression task. Separate models are trained for each course. The experiments were carried out using administrate data from the University of Porto, concerning approximately 700 courses. The algorithms with best results overall in classification were decision trees and SVM while in regression they were SVM, Random Forest, and AdaBoost.R2. However, in the classification setting, the algorithms are finding useful patterns, while, in regression, the models obtained are not able to beat a simple baseline.
Article
As an emerging field of research, learning analytics (LA) offers practitioners and researchers information about educational data that is helpful for supporting decisions in management of teaching and learning. While often combined with educational data mining (EDM), crucial distinctions exist for LA that mandate a separate review. This study aims to conduct a systematic meta-review of LA for mining key information that could assist in describing new and helpful directions to this field of inquiry. Within 901 LA articles analyzed, eight reviews were identified and synthesised to identify and determine consistencies and gaps. Results show that LA is at the stage of early majority and has attracted great research efforts from other fields. The majority of LA publications were focused on proposing LA concepts or frameworks and conducting proof-of-concept analysis rather than conducting actual data analysis. Collecting small datasets for LA research is predominant, especially in K-12 field. Finally, four major LA research topics, including prediction of performance, decision support for teachers and learners, detection of behavioural patterns & learner modelling and dropout prediction, were identified and discussed deeply. The future research of LA is also outlined for purpose of better understanding and optimising learning as well as learning contexts.
Article
Performance prediction is a leading topic in learning analytics research due to its potential to impact all tiers of education. This study proposes a novel predictive modeling method to address the research gaps in existing performance prediction research. The gaps addressed include: the lack of existing research focus on performance prediction rather than identifying key performance factors; the lack of common predictors identified for both K-12 and higher education environments; and the misplaced focus on absolute engagement levels rather than relative engagement levels. Two datasets, one from higher education and the other from a K-12 online school with 13 368 students in more than 300 courses, were applied using the predictive modeling technique. The results showed the newly suggested approach had higher overall accuracy and sensitivity rates than the traditional approach. In addition, two generalizable predictors were identified from instruction-intensive and discussion-intensive courses.
Article
Massive Open Online Courses (MOOCs) have flourished in recent years, which is conducive to the redistribution of high-quality educational resources globally. However, the high dropout rate in the course of operation has seriously affected its development. Therefore, in order to improve the degree of completion, it is an effective way to study how to effectively predict the dropout in MOOCs and intervene in advance. Traditional methods rely on manually extracted features, which is difficult to guarantee the final prediction effect. In order to solve this problem, this paper proposes an integrated framework with feature selection (FSPred) to predict the dropout in MOOCs, which includes feature generation, feature selection, and dropout prediction. Specifically, FSPred applies a fine-grained feature generation method in days to generate features and then uses an ensemble feature selection method to select valid features and feed them into a logistic regression model for prediction. Extensive experiments on a public dataset have shown that FSPred can achieve the comparable results with other dropout prediction methods in terms of precision, recall, F1 score and AUC score. Finally, through the analysis of the features of the final selection, the suggestions for the construction of the MOOCs are put forward.
Article
The computing education research literature contains a wide variety of methods that can be used to identify students who are either at risk of failing their studies or who could benefit from additional challenges. Many of these are based on machine-learning models that learn to make predictions based on previously observed data. However, in educational contexts, differences between courses set huge challenges for the generalizability of these methods. For example, traditional machine-learning methods assume identical distribution in all data—in our terms, traditional machine-learning methods assume that all teaching contexts are alike. In practice, data collected from different courses can be very different as a variety of factors may change, including grading, materials, teaching approach, and the students. Transfer-learning methodologies have been created to address this challenge. They relax the strict assumption of identical distribution for training and test data. Some similarity between the contexts is still needed for efficient learning. In this work, we review the concept of transfer learning especially for the purpose of predicting the outcome of an introductory programming course and contrast the results with those from traditional machine-learning methods. The methods are evaluated using data collected in situ from two separate introductory programming courses. We empirically show that transfer-learning methods are able to improve the predictions, especially in cases with limited amount of training data, for example, when making early predictions for a new context. The difference in predictive power is, however, rather subtle, and traditional machine-learning models can be sufficiently accurate assuming the contexts are closely related and the features describing the student activity are carefully chosen to be insensitive to the fine differences.
Article
This study presents a model for the early identification of students who are likely to fail in an academic course. To enhance predictive accuracy, sentiment analysis is used to identify affective information from text‐based self‐evaluated comments written by students. Experimental results demonstrated that adding extracted sentiment information from student self‐evaluations yields a significant improvement in early‐stage prediction quality. The results also indicate the limited early‐stage predictive value of structured data, such as homework completion, attendance, and exam grades, due to data sparseness at the beginning of the course. Thus, applying sentiment analysis to unstructured data (e.g., self‐evaluation comments) can play an important role in improving the accuracy of early‐stage predictions. The findings present educators with an opportunity to provide students with real‐time feedback and support to help students become self‐regulated learners. Using the exploring results for improvement in teaching and learning initiatives is important to maintain students' performances and the effectiveness of the learning process. Lay Description What is already known about this topic: • The impact of emotional state on academic performance • Early intervention is a key factor in preventing academic failure by at‐risk students. What this paper adds: • Apply sentiment analysis to enhance predictive accuracy in early stage. • Built a Chinese affective resource with valence ratings for each affective word, and use it to automatically extract emotions from self‐evaluation comments. Implications for practice and/or policy: • Point out the limited early‐stage predictive ability of structured data and using unstructured data to bridge the gap. • Using information visualization to build self‐regulated system by structured and unstructured data. For educators, easy to recognize students' emotion during the class; for students, easy to understand how well they are performing in a class in a timely manner.
Conference Paper
Exponential growth in information has made it totally unimaginable to manually find a relevant product in a quick time, entailing the need for a mechanical recommendation system which would remember the users and recommend most suitable items. Most of the approaches for such machinery have been to first find similarity in users or in items, and then exploit these similarities to recommend the products. These methods produce better results when demographic information about users and items are given to them. In this paper, we propose a deep neural network model which does not require any information be given to it other than the rating triples. We created spurious user profiles and item characteristics by using separate learner weights at the bottom most layer. The weights in the upper layers took these information, created by the weights at bottom most layer, to produce a real valued rating. Our model produced an RMSE 4.1824 on Jester 4-million data set, and this shows our deep network is comparable to the state of the art models.
Conference Paper
We show how the novel use of a semantic representation based on Osgood’s semantic differential scales can lead to effective features in predicting short- and long-term learning in students using a vocabulary learning system. Previous studies in students’ intermediate knowledge states during vocabulary acquisition did not provide much information on which semantic knowledge students gained during word learning practice. Moreover, these studies relied on human ratings to evaluate the students’ responses. To solve this problem, we propose a semantic representation for words based on Osgood’s semantic decomposition of vocabulary [16]. To demonstrate our method can effectively represent students’ knowledge in vocabulary acquisition, we build models for predicting the student’s short-term vocabulary acquisition and long-term retention. We compare the effectiveness of our Osgood-based semantic representation to that provided by Word2Vec neural word embedding [13], and find that prediction models using features based on Osgood scale-based scores (OSG) perform better than the baseline and are comparable in accuracy to those using Word2Vec score-based models (W2V). By using more interpretable Osgood-based scales, our study results can help with better understanding of students’ ongoing learning states and designing personalized learning systems that can address an individual’s weak points in vocabulary acquisition.
Article
Heterogeneous face recognition (HFR) aims to match facial images acquired from different sensing modalities with mission-critical applications in forensics, security and commercial sectors. However, HFR is a much more challenging problem than traditional face recognition because of large intra-class variations of heterogeneous face images and limited training samples of cross-modality face image pairs. This paper proposes a novel approach namely Wasserstein CNN (convolutional neural networks, or WCNN for short) to learn invariant features between near-infrared and visual face images (i.e. NIR-VIS face recognition). The low-level layers of WCNN are trained with widely available face images in visual spectrum. The high-level layer is divided into three parts, i.e., NIR layer, VIS layer and NIR-VIS shared layer. The first two layers aims to learn modality-specific features and NIR-VIS shared layer is designed to learn modality-invariant feature subspace. Wasserstein distance is introduced into NIR-VIS shared layer to measure the dissimilarity between heterogeneous feature distributions. So W-CNN learning aims to achieve the minimization of Wasserstein distance between NIR distribution and VIS distribution for invariant deep feature representation of heterogeneous face images. To avoid the over-fitting problem on small-scale heterogeneous face data, a correlation prior is introduced on the fully-connected layers of WCNN network to reduce parameter space. This prior is implemented by a low-rank constraint in an end-to-end network. The joint formulation leads to an alternating minimization for deep feature representation at training stage and an efficient computation for heterogeneous data at testing stage. Extensive experiments on three challenging NIR-VIS face recognition databases demonstrate the significant superiority of Wasserstein CNN over state-of-the-art methods.
Article
The paper addresses and explains some of the key questions about the use of data mining in educational technology classroom research. Two examples of use of data mining techniques, namely, association rules mining and fuzzy representations are presented, from a study conducted in Europe and another in Australia. Both of these studies examine student learning, behaviors, and experiences within computer-supported classroom activities. In the first study, the technique of association rules mining was used to understand better how learners with different cognitive types interacted with a simulation to solve a problem. Association rules mining was found to be a useful method for obtaining reliable data about learners' use of the simulation and their performance with it. The study illustrates how data mining can be used to advance educational software evaluation practices in the field of educational technology. In the second study, the technique of fuzzy representations was employed to inductively explore questionnaire data. The study provides a good example of how educational technologists can use data mining for guiding and monitoring school-based technology integration efforts. Based on the outcomes, the implications of the study are discussed in terms of the need to develop educational data mining tools that can display results, information, explanations, comments, and recommendations in meaningful ways to non-expert users in data mining. Lastly, issues related to data privacy are addressed.
Article
Plant identification systems developed by computer vision researchers have helped botanists to recognize and identify unknown plant species more rapidly. Hitherto, numerous studies have focused on procedures or algorithms that maximize the use of leaf databases for plant predictive modeling, but this results in leaf features which are liable to change with different leaf data and feature extraction techniques. In this paper, we learn useful leaf features directly from the raw representations of input data using Convolutional Neural Networks (CNN), and gain intuition of the chosen features based on a Deconvolutional Network (DN) approach. We report somewhat unexpected results: (1) different orders of venation are the best representative features compared to those of outline shape, and (2) we observe multi-level representation in leaf data, demonstrating the hierarchical transformation of features from lower-level to higher-level abstraction, corresponding to species classes. We show that these findings fit with the hierarchical botanical definitions of leaf characters. Through these findings, we gained insights into the design of new hybrid feature extraction models which are able to further improve the discriminative power of plant classification systems. The source code and models are available at: https://github.com/cs-chan/Deep-Plant.
Article
Persistence in learning processes is perceived as a central value; therefore, dropouts from studies are a prime concern for educators. This study focuses on the quantitative analysis of data accumulated on 362 students in three academic course website log files in the disciplines of mathematics and statistics, in order to examine whether student activity on course websites may assist in providing early identification of learner dropout from specific courses or from degree track studies in general. It was found in this study that identifying the changes in student activity during the course period could help in detecting at-risk learners in real time, before they actually drop out from the course. Data examination on a monthly basis throughout the semester can enable educators and institutions to flag students that have been identified as having unusual behavior, deviating from the course average. It was found that a large percentage of students (66%) who had been marked as at-risk actually did not finish their courses and/or degree. The presented analysis allows instructors to observe website student usage data during a course, and to locate students who are not using the system as expected. Furthermore, it could enable university decision makers to see the information on a campus level for initiating intervention programs.
Conference Paper
The collaborative learning processes of students in online learning environments can be supported by providing learning analytics-based visualisations that foster awareness and reflection about an individual's as well as the team's behaviour and their learning and collaboration processes. For this empirical study we implemented an activity widget into the online learning environment of a live five-months Master course and investigated the predictive power of the widget indicators towards the students' grades and compared the results to those from an exploratory study with data collected in previous runs of the same course where the widget had not been in use. Together with information gathered from a quantitative as well as a qualitative evaluation of the activity widget during the course, the findings of this current study show that there are indeed predictive relations between the widget indicators and the grades, especially those regarding responsiveness, and indicate that some of the observed differences in the last run could be attributed to the implemented activity widget.
Article
The data about high students' failure rates in introductory programming courses have been alarming many educators, raising a number of important questions regarding prediction aspects. In this paper, we present a comparative study on the effectiveness of educational data mining techniques to early predict students likely to fail in introductory programming courses. Although several works have analyzed these techniques to identify students' academic failures, our study differs from existing ones as follows: (i) we investigate the effectiveness of such techniques to identify students likely to fail at early enough stage for action to be taken to reduce the failure rate; (ii) we analyse the impact of data preprocessing and algorithms fine-tuning tasks, on the effectiveness of the mentioned techniques. In our study we evaluated the effectiveness of four prediction techniques on two different and independent data sources on introductory programming courses available from a Brazilian Public University: one comes from distance education and the other from on-campus. The results showed that the techniques analyzed in our study are able to early identify students likely to fail, the effectiveness of some of these techniques is improved after applying the data preprocessing and/or algorithms fine-tuning, and the support vector machine technique outperforms the other ones in a statistically significant way.
Article
Interactive simulations can facilitate inquiry learning. However, similarly to other Exploratory Learning Environments, students may not always learn effectively in these unstructured environments. Thus, providing adaptive support has great potential to help improve student learning with these rich activities. Providing adaptive support requires a student model that can both evaluate learning as well inform relevant feedback. Building such a model for interactive simulations is especially challenging because the exploratory nature of the interaction makes it hard to know a priori which behaviors are conducive to learning. To address this problem, in this paper we leverage the student modeling framework proposed in (Kardan and Conati, 2011) to specifically address the challenge of modeling students in interactive simulations. The framework has already been successfully applied to build a student model and to give adaptive interventions for an interactive simulation for constraint satisfaction. We seek to investigate the generality of the framework by building student models for a more complex simulation on electric circuits called Circuit Construction Kit (CCK). We evaluate alternative representations of logged interaction data with CCK, capturing different amounts of granularity and feature engineering. We then apply the student modeling framework proposed in (Kardan and Conati, 2011) to group students based on their interaction behaviors, map these behaviors into learning outcomes and leverage the resulting clusters to classify new learners. Data collected from 100 college students working with the CCK simulation indicates that the proposed framework is able to successfully classify students in groups of high and low learners and identify patterns of productive behaviors that are common across representations that can inform real-time feedback. In addition to presenting these results, we discuss trade-offs between levels of granularity and feature engineering in the tested interaction representations in terms of their ability to evaluate learning, classify students, and inform feedback.
Article
In collaborative learning environments, students work together on assignments in virtual teams and depend on each other’s contribution to achieve their learning objectives. The online learning environment, however, may not only facilitate but also hamper group communication, coordination and collaboration. Group awareness widgets that visualise information about the different group members based on information collected from the individuals can foster awareness and reflection processes within the group. In this paper, we present a formative data study about the predictive power of several indicators of an awareness widget based on automatically logged user data from an online learning environment. In order to test whether the information visualised by the widget is in line with the study outcomes, we instantiated the widget indicators with data from four previous runs of the European Virtual Seminar on Sustainable Development (EVS). We analysed whether the tutor gradings in these previous years correlated with the students’ scores calculated for the widget indicators. Furthermore, we tested the predictive power of the widget indicators at various points in time with respect to the final grades of the students. The results of our analysis show that the grades and widget indicator scores are significantly and positively correlated, which provides a useful empirical basis for the development of guidelines for students and tutors on how to interpret the widget’s visualisations in live runs.
Article
In this letter, a new deep learning framework for spectral–spatial classification of hyperspectral images is presented. The proposed framework serves as an engine for merging the spatial and spectral features via suitable deep learning architecture: stacked autoencoders (SAEs) and deep convolutional neural networks (DCNNs) followed by a logistic regression (LR) classifier. In this framework, SAEs is aimed to get useful high-level features for the one-dimensional features which is suitable for the dimension reduction of spectral features, while DCNNs can learn rich features from the training data automatically and has achieved state-of-the-art performance in many image classification databases. Though the DCNNs has shown robustness to distortion, it only extracts features of the same scale, and hence is insufficient to tolerate large-scale variance of object. As a result, spatial pyramid pooling (SPP) is introduced into hyperspectral image classification for the first time by pooling the spatial feature maps of the top convolutional layers into a fixed-length feature. Experimental results with widely used hyperspectral data indicate that classifiers built in this deep learning-based framework provide competitive performance.
Article
In theoretical cognitive science, there is a tension between highly structured models whose parameters have a direct psychological interpretation and highly complex, general-purpose models whose parameters and representations are difficult to interpret. The former typically provide more insight into cognition but the latter often perform better. This tension has recently surfaced in the realm of educational data mining, where a deep learning approach to predicting students' performance as they work through a series of exercises---termed deep knowledge tracing or DKT---has demonstrated a stunning performance advantage over the mainstay of the field, Bayesian knowledge tracing or BKT. In this article, we attempt to understand the basis for DKT's advantage by considering the sources of statistical regularity in the data that DKT can leverage but which BKT cannot. We hypothesize four forms of regularity that BKT fails to exploit: recency effects, the contextualized trial sequence, inter-skill similarity, and individual variation in ability. We demonstrate that when BKT is extended to allow it more flexibility in modeling statistical regularities---using extensions previously proposed in the literature---BKT achieves a level of performance indistinguishable from that of DKT. We argue that while DKT is a powerful, useful, general-purpose framework for modeling student learning, its gains do not come from the discovery of novel representations---the fundamental advantage of deep learning. To answer the question posed in our title, knowledge tracing may be a domain that does not require `depth'; shallow models like BKT can perform just as well and offer us greater interpretability and explanatory power.
Article
An enduring issue in higher education is student retention to successful graduation. National statistics indicate that most higher education institutions have four-year degree completion rates around 50 percent, or just half of their student populations. While there are prediction models which illuminate what factors assist with college student success, interventions that support course selections on a semester-to-semester basis have yet to be deeply understood. To further this goal, we develop a system to predict students' grades in the courses they will enroll in during the next enrollment term by learning patterns from historical transcript data coupled with additional information about students, courses and the instructors teaching them. We explore a variety of classic and state-of-the-art techniques which have proven effective for recommendation tasks in the e-commerce domain. In our experiments, Factorization Machines (FM), Random Forests (RF), and the Personalized Multi-Linear Regression model achieve the lowest prediction error. Application of a novel feature selection technique is key to the predictive success and interpretability of the FM. By comparing feature importance across populations and across models, we uncover strong connections between instructor characteristics and student performance. We also discover key differences between transfer and non-transfer students. Ultimately we find that a hybrid FM-RF method can be used to accurately predict grades for both new and returning students taking both new and existing courses. Application of these techniques holds promise for student degree planning, instructor interventions, and personalized advising, all of which could improve retention and academic performance.
Conference Paper
In this paper we discuss the results of a study of students' academic performance in first year general education courses. Using data from 566 students who received intensive academic advising as part of their enrollment in the institution's pre-major/general education program, we investigate individual student, organizational, and disciplinary factors that might predict a students' potential classification in an Early Warning System as well as factors that predict improvement and decline in their academic performance. Disciplinary course type (based on Biglan's [7] typology) was significantly related to a student's likelihood to enter below average performance classifications. Students were the most likely to enter a classification in fields like the natural science, mathematics, and engineering in comparison to humanities courses. We attribute these disparities in academic performance to disciplinary norms around teaching and assessment. In particular, the timing of assessments played a major role in students' ability to exit a classification. Implications for the design of Early Warning analytics systems as well as academic course planning in higher education are offered.
Article
Although asynchronous online discussion (AOD) is increasingly used as a main activity for blended learning, many students find it difficult to engage in discussions and report low achievement. Early prediction and timely intervention can help potential low achievers get back on track as early as possible. This study presented a data mining process to construct proxy variables that reflect theoretical and empirical evidence and measured the accuracy of a prediction model that incorporated all of the variables for validation. For the empirical study, data were obtained from 105 university students who were enrolled in two blended learning courses that used AOD as their main activity. The results indicated the high accuracy of the prediction model as well as the possibility of early detection and timely interventions. In addition, we examined participants' learning behaviors in the two courses using the proxy variables and provided suggestions for practice. The implications of this study for education data mining and learning analytics are discussed.
Conference Paper
As courses become bigger, move online, and are deployed to the general public at low cost (e.g. through Massive Open Online Courses, MOOCs), new methods of predicting student achievement are needed to support the learning process. This paper presents a novel method for converting educational log data into features suitable for building predictive models of student success. Unlike cognitive modelling or content analysis approaches, these models are built from interactions between learners and resources, an approach that requires no input from instructional or domain experts and can be applied across courses or learning environments.
Article
Given the rapid growth in online coursework within higher education, it is important to establish and validate quality standards for these courses. While many online learning quality rubrics do exist, thus far there has been little empirical evidence establishing a clear link between specific course design features and concrete, student-level course outcomes. In the current study, the authors develop an online course design assessment rubric that includes four areas, and explore the impact of each area on student end-of-semester performance in 23 online courses at two community colleges. The results indicate that the quality of interpersonal interaction within a course relates positively and significantly to student grades. Additional analyses based on course observation and interview data suggest that frequent and effective student–instructor interaction creates an online environment that encourages students to commit themselves to the course and perform at a stronger academic level.
Conference Paper
The pervasive collection of data has opened the possibility for educational institutions to use analytics methods to improve the quality of the student experience. However, the adoption of these methods faces multiple challenges particularly at the course level where instructors and students would derive the most benefit from the use of analytics and predictive models. The challenge lies in the knowledge gap between how the data is captured, processed and used to derive models of student behavior, and the subsequent interpretation and the decision to deploy pedagogical actions and interventions by instructors. Simply put, the provision of learning analytics alone has not necessarily led to changing teaching practices. In order to support pedagogical change and aid interpretation, this paper proposes a model that can enable instructors to readily identify subpopulations of students to provide specific support actions. The approach was applied to a first year course with a large number of students. The resulting model classifies students according to their predicted exam scores, based on indicators directly derived from the learning design.
Article
Recently, attempts have been made to collect millions of videos to train CNN models for action recognition in videos. However, curating such large-scale video datasets requires immense human labor, and training CNNs on millions of videos demands huge computational resources. In contrast, collecting action images from the Web is much easier and training on images requires much less computation. In addition, labeled web images tend to contain discriminative action poses, which highlight discriminative portions of a video's temporal progression. We explore the question of whether we can utilize web action images to train better CNN models for action recognition in videos. We collect 23.8K manually filtered images from the Web that depict the 101 actions in the UCF101 action video dataset. We show that by utilizing web action images along with videos in training, significant performance boosts of CNN models can be achieved. We then investigate the scalability of the process by leveraging crawled web images (unfiltered) for UCF101 and ActivityNet. We replace 16.2M video frames by 393K unfiltered images and get comparable performance.
Article
This study sought to identify significant behavioral indicators of learning using learning management system (LMS) data regarding online course achievement. Because self-regulated learning is critical to success in online learning, measures reflecting self-regulated learning were included to examine the relationship between LMS data measures and course achievement. Data were collected from 530 college students who took an online course. The results demonstrated that students' regular study, late submissions of assignments, number of sessions (the frequency of course logins), and proof of reading the course information packets significantly predicted their course achievement. These findings verify the importance of self-regulated learning and reveal the advantages of using measures related to meaningful learning behaviors rather than simple frequency measures. Furthermore, the measures collected in the middle of the course significantly predicted course achievement, and the findings support the potential for early prediction using learning performance data. Several implications of these findings are discussed.
Article
the purpose of this study is to identify at-risk online students earlier, more often, and with greater accuracy using time-series clustering. The case study showed that the proposed approach could generate models with higher accuracy and feasibility than traditional frequency aggregation approaches. The best performing model can start to capture at-risk students from week 10. In addition, the four phases in student's learning process detected holiday effect and illustrates at-risk students' behaviors before and after a long holiday break. The findings also enable online instructors to develop corresponding instructional interventions via course design or student-Teacher communications.
Article
In recent years, as information technology has become more prevalent, learning management systems have arisen around e-learning and web-based platforms. As a result, huge quantities of data about students’ learning process have been recorded and stored. Teachers can apply data-mining techniques to mine students’ learning performance. One such technique is association rule mining, which can find correlations between student characteristics and performance. For instance, a rule (Attendance = Middle) ∧ (Gender = Male) → (Semester = Low) indicates that the semester grade of students is at the Low level if their gender is Male and attendance rate is Middle, where Low and Middle are predetermined linguistic terms given by teachers. Teachers can rely on such rules to formulate their teaching strategies. However, these rules may be varied in different semesters because of the change of student characteristics or teaching method of teachers. The above rule is used to describe student behavior during the last semester, yet, within this semester, the rule changes to (Attendance = Low) ∧ (Gender = Female) → (Semester = Low). Without updating this knowledge, teachers might adopt inappropriate teaching strategies for students who are learning in different ways across different semesters. In this study, we propose a new change mining model to discover the change in student learning performance and characteristics on the basis of association rules. We conducted experiments with real-life datasets to evaluate the effectiveness of the proposed model.
Article
Deep Neural Network (DNN) acoustic models have yielded many state-of-the-art results in Automatic Speech Recognition (ASR) tasks. More recently, Recurrent Neural Network (RNN) models have been shown to outperform DNNs counterparts. However, state-of-the-art DNN and RNN models tend to be impractical to deploy on embedded systems with limited computational capacity. Traditionally, the approach for embedded platforms is to either train a small DNN directly, or to train a small DNN that learns the output distribution of a large DNN. In this paper, we utilize a state-of-the-art RNN to transfer knowledge to small DNN. We use the RNN model to generate soft alignments and minimize the Kullback-Leibler divergence against the small DNN. The small DNN trained on the soft RNN alignments achieved a 3.93 WER on the Wall Street Journal (WSJ) eval92 task compared to a baseline 4.54 WER or more than 13% relative improvement.
Article
An efficient and effective algorithm which online exploits informative features for visual tracking is presented. First, a high-dimensional multi-scale spatio-colour image feature vector is developed, which takes into account both appearance and spatial layout information; secondly, this feature vector is randomly projected onto a low-dimensional feature space, where its projections preserve intrinsic information of the high-dimensional feature vector but effectively avoid the curse of dimensionality; and finally, an online feature selection technique to design an adaptive appearance model is proposed, which explores the most informative features from the projections via maximising entropy energy. Experiments on extensive challenging sequences demonstrate the superiority of the proposed method over some state-of-the-art algorithms.