Figure 3 - uploaded by Wengran Wang
Content may be subject to copyright.
Comparing BoW, n-Gram, and pq-Gram approaches, our results show that these approaches alone achieve relatively similar predictive outcomes, and that F1 scores are lower when the prevalence of positive samples is relatively small.

Comparing BoW, n-Gram, and pq-Gram approaches, our results show that these approaches alone achieve relatively similar predictive outcomes, and that F1 scores are lower when the prevalence of positive samples is relatively small.

Source publication
Article
Full-text available
Using machine learning to classify student code has many applications in computer science education, such as auto-grading, identifying struggling students from their code, and propagating feedback to address particular misconceptions. However, a fundamental challenge of using machine learning for code classification is how to represent program code...

Contexts in source publication

Context 1
... therefore use F1 scores to tune hyperparameters. Figure 3 shows how the feature sets perform across the five target game behaviors. We present the F1 scores of prediction in each target behavior, with its precision (P) and recall (R), shown in the brackets. ...
Context 2
... suggests that there may be some advantage to more expressive feature representations for identifying more complex program prop- 3 also shows that F1 scores of all approaches decrease as the prevalence of positive samples decreases (i.e., with more class imbalance). For example, on the y-axis of Figure 3, we have marked each label with the prevalence of its positive samples. As the prevalence of features decreases from 197/413 (48%) in KeyboardMove to 25/413 (6%) in CollisionStopGame, The feature extraction methods perform increasingly worse. ...

Similar publications

Article
Full-text available
With modern requirements, there is an increasing tendency of considering multiple objectives/criteria simultaneously in many Software Engineering (SE) scenarios. Such a multi-objective optimization scenario comes with an important issue - how to evaluate the outcome of optimization algorithms, which typically is a set of incomparable solutions (i.e...

Citations

... For example, Wang et. al. found that a large number of code features can lead models to overfit to the training data [18]. This has likely happened with D05 and L02, their LSTM input dimensions increased at least 4-fold from D02 and L01 respectively, making it possible that the models would overfit. ...
Full-text available
Conference Paper
Predicting student performance has been a major task in student modeling. Specifically, in open-ended domains such as computer science classes, the student submissions contain more information, however they also require more advanced analysis methods to extract this information. Traditional student modeling approaches use knowledge components (KCs) to predict a student's success on specific practiced skills. These approaches are useful and necessary in helping learning environments like Intelligent Tutoring Systems (ITS) personalize feedback, hints, and identify struggling students. However, when working with programming data, code features provide more information than skill tags representing KCs, and this information is not leveraged by traditional KC models. This work incorporates an implicit representation of KCs into a student model by including features extracted from students' code with data from an undergraduate introductory programming course. This representation is then evaluated by using deep learning predic-tive models and investigated to see how well they are able to leverage code features to model student knowledge and compare and contrast against other learning models. The study shows a modest, but consistent improvement in models that use time-sequential data with even the simplest code features, implying that these aspects may improve student modelling.
... Data-mined features extracted from Process Data: These features are extracted automatically (or semi-automatically) from students' programming process data [39], discovering behaviors that might be predictive of student success, beyond what experts have identified. For example, Blikstein, Piech, et al. [6,32] used learning analytics approaches to cluster students' problem-solving trajectories on programming assignments and found these clusters to be predictive of students' final grades, moreso than midterm grades. ...
Preprint
Instructors have limited time and resources to help struggling students, and these resources should be directed to the students who most need them. To address this, researchers have constructed models that can predict students' final course performance early in a semester. However, many predictive models are limited to static and generic student features (e.g. demographics, GPA), rather than computing-specific evidence that assesses a student's progress in class. Many programming environments now capture complete time-stamped records of students' actions during programming. In this work, we leverage this rich, fine-grained log data to build a model to predict student course outcomes. From the log data, we extract patterns of behaviors that are predictive of students' success using an approach called differential sequence mining. We evaluate our approach on a dataset from 106 students in a block-based, introductory programming course. The patterns extracted from our approach can predict final programming performance with 79% accuracy using only the first programming assignment, outperforming two baseline methods. In addition, we show that the patterns are interpretable and correspond to concrete, effective -- and ineffective -- novice programming behaviors. We also discuss these patterns and their implications for classroom instruction.
... Some works also took the next step and extracted further features from code snapshots. For example, Wang et al. [30] compare three different popular feature extraction techniques on programming code blocks: bag-ofwords, and two abstract syntax tree features (n-grams and pq-grams) for classifying the code behavior of novice programming projects. Abstract syntax trees (ASTs) are code elements represented as tree nodes. ...
Full-text available
Conference Paper
The role of an integrated IDE capable of several different key features in teaching and learning programming is very clear to everyone. In this work-in-progress paper, we present the new version of our Caring IDE, a cloud-based IDE system integrated with a Learning Management System (LMS), an autograder, databases for storage, and dashboard prototypes to (1) deliver a smoother programming learning experience for students and (2) enhance the instructor's ability to informatively perform student success interventions quickly and early. Here, we report and extrapolate on the design and implementation of the Caring IDE. We also demonstrate the value of the Caring IDE in promoting student self-learning in an online, introductory computer science (CS1) summer course during the COVID-19 pandemic. Finally, we showcase preliminary IDE-based analytics to promote student success in CS courses.