Chapter

Making Sense of Student Success and Risk Through Unsupervised Machine Learning and Interactive Storytelling

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

This paper presents an interactive AI system to enable academic advisors and program leadership to understand the patterns of behavior related to student success and risk using data collected from institutional databases. We have worked closely with advisors in our development of an innovative temporal model of student data, unsupervised k-means algorithm on the data, and interactive user experiences with the data. We report on the design and evaluation of FIRST, Finding Interesting stoRies about STudents, that provides an interactive experience in which the advisor can: select relevant student features to be included in a temporal model, interact with a visualization of unsupervised learning that present patterns of student behavior and their correlation with performance, and to view automatically generated stories about individual students based on student data in the temporal model. We have developed a high fidelity prototype of FIRST using 10 years of student data in our College. As part of our iterative design process, we performed a focus group study with six advisors following a demonstration of the prototype. Our focus group evaluation highlights the sensemaking value in the temporal model, the unsupervised clusters of the behavior of all students in a major, and the stories about individual students.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... This process generates a lot of data for online education Yu et al., 2021b). These changes have contributed significantly to the development of big data technologies in education and provide a unique opportunity for educational anomaly analytics Al-Doulat et al., 2020;AlKhuzaey et al., 2021;Ren et al., 2021). ...
... While these scholars achieved better prediction results, they also brought the biggest drawback of current deep learningunexplainability. This can easily cause educators to mistrust the prediction (Al-Doulat et al., 2020), and thus affect the diffusion and application of the technology in the industry. ...
Article
Full-text available
Anomalies in education affect the personal careers of students and universities' retention rates. Understanding the laws behind educational anomalies promotes the development of individual students and improves the overall quality of education. However, the inaccessibility of educational data hinders the development of the field. Previous research in this field used questionnaires, which are time- and cost-consuming and hardly applicable to large-scale student cohorts. With the popularity of educational management systems and the rise of online education during the prevalence of COVID-19, a large amount of educational data is available online and offline, providing an unprecedented opportunity to explore educational anomalies from a data-driven perspective. As an emerging field, educational anomaly analytics rapidly attracts scholars from a variety of fields, including education, psychology, sociology, and computer science. This paper intends to provide a comprehensive review of data-driven analytics of educational anomalies from a methodological standpoint. We focus on the following five types of research that received the most attention: course failure prediction, dropout prediction, mental health problems detection, prediction of difficulty in graduation, and prediction of difficulty in employment. Then, we discuss the challenges of current related research. This study aims to provide references for educational policymaking while promoting the development of educational anomaly analytics as a growing field.
... AI can support several activities of storytelling, e.g. monitor data sources, collect data, define characters, develop the plot over time, visualize data, measure success, etc. (Al-Doulat et al., 2020;Dur, 2012;Kreminski et al., 2020;Thorne, 2020;Yang et al., 2019). ...
Article
Full-text available
Purpose: Data is collected from all aspects of our lives. Yet, data alone is useless unless converted into information and, ultimately, knowledge. Since data analysts, in most cases, are not the ones in charge of making decisions based on their findings, communicating the results to stakeholders is crucial to passing on information of data-driven insights. That is where the discipline of data storytelling comes into play. Often, data storytelling is considered an effective data visualization. Creating data stories is a structured approach to communicating data insights as an interplay of the three elements data, visuals, and narrative. Sharing data-driven insights to support better business decisions require data storytellers skilled in the “art of storytelling”. Design/Method/Approach: In this paper, the authors discuss the use of data storytelling in business to communicate data to stakeholders for improving decision-making. The findings are derived from (1) an extensive literature review and (2) a qualitative analysis of 13 expert interviews with people incorporating data storytelling into their daily work within their jobs in international companies. Findings: These interviews revealed the importance of providing a flexible tool to support knowledge sharing for people communicating complex data to internal stakeholders. Combining literature with qualitative research enabled the authors to create the "data storytelling cheat sheet", a guide for practical data storytelling. Theoretical Implications: Theories like the Psychological distance or the idea of the theory of dual processing dual are used to base our research idea on. There was no new theory built in this paper. Practical Implications: One of the results is an implementation systematic cheat sheet that helps practitioners to implement data storytelling in their daily business. Originality/Value: The theory of data storytelling is overwhelming the first time to use and based on an empirical study with experts in the field a guideline for hands on use was developed under a based on a cleanly defined empirical study. Research Limitations/Future Research: The paper focus on internal data storytelling – maybe with external stakeholders it might be slightly different. The results the data communication part in any data analytics project.
Article
Full-text available
→ The third wave of AI can be characterized by technological enhancement and application + a human-centered approach. → HCI professionals should take a leading role by providing explainable and comprehensible AI, and useful and usable AI. → HCI professionals should proactively participate in AI R&D to increase their influence, enhance their AI knowledge, and integrate methods between the two fields.
Article
Full-text available
Data models built for analyzing student data often obfuscate temporal relationships for reasons of simplicity, or to aid in generalization. We present a model based on temporal relationships of heterogeneous data as the basis for building predictive models. We show how within- and between-semester temporal patterns can provide insight into the student experience. For example, in a within-semester model, the prediction of the final course grade can be based on weekly activities and submissions recorded in the LMS. In the between-semester model, the prediction of success or failure in a degree program can be based on sequence patterns of grades and activities across multiple semesters. The benefits of our sequence data model include temporal structure, segmentation, contextualization, and storytelling. To demonstrate these benefits, we have collected and analyzed 10 years of student data from the College of Computing at UNC Charlotte in a between-semester sequence model, and used data in an introductory course in computer science to build a within-semester sequence model. Our results for the two sequence models show that analytics based on the sequence data model can achieve higher predictive accuracy than non-temporal models with the same data.
Conference Paper
Full-text available
Data science is now impacting the education sector, with a growing number of commercial products and research prototypes providing learning dashboards. From a human-centred computing perspective, the end-user's interpretation of these visualisations is a critical challenge to design for, with empirical evidence already showing that `usable' visualisations are not necessarily effective from a learning perspective. Since an educator's interpretation of visualised data is essentially the construction of a narrative about student progress, we draw on the growing body of work on Data Storytelling (DS) as the inspiration for a set of enhancements that could be applied to data visualisations to improve their communicative power. We present a pilot study that explores the effectiveness of these DS elements based on educators' responses to paper prototypes. The dual purpose is understanding the contribution of each visual element for data storytelling, and the effectiveness of the enhancements when combined.
Article
Full-text available
This paper explores the phenomena of the emergence of the use of artificial intelligence in teaching and learning in higher education. It investigates educational implications of emerging technologies on the way students learn and how institutions teach and evolve. Recent technological advancements and the increasing speed of adopting new technologies in higher education are explored in order to predict the future nature of higher education in a world where artificial intelligence is part of the fabric of our universities. We pinpoint some challenges for institutions of higher education and student learning in the adoption of these technologies for teaching, learning, student support, and administration and explore further directions for research.
Article
Full-text available
The field of learning analytics was founded with the goal to harness vast amounts of data about learning collected by the extensive use of technology. After the early formation, the field has now entered the next phase of maturation with a growing community who has an evident impact on research, practice, policy, and decision-making. Although learning analytics is a bricolage field borrowing from many related other disciplines, there is still no systematized model that shows how these different disciplines are pieced together. Existing models and frameworks of learning analytics are valuable in identifying elements and processes of learning analytics, but they insufficiently elaborate on the links with foundational disciplines. With this in mind, this paper proposes a consolidated model of the field of research and practice that is composed of three mutually connected dimensions – theory, design, and data science. The paper defines why and how each of the three dimensions along with their mutual relations is critical for research and practice of learning analytics. Finally, the paper stresses the importance of multi-perspective approaches to learning analytics based on its three core dimensions for a healthy development of the field and a sustainable impact on research and practice.
Article
Full-text available
Not all students who fail or drop out would have done so if they had been offered help at the right time. This is particularly true on distance learning modules where there is no direct tutor/student contact, but where it has been shown that making contact at the right time can improve a student's chances. This paper explores the latest work conducted at the Open University, one of Europe's largest distance learning institutions, to identify when is the optimum time to make student interventions and to develop models to identify the at-risk students in this time frame. This work in progress is taking real-time data and feeding it back to module teams as the module is running. Module teams will be indicating which of the predicted at-risk students have received an intervention, and the nature of the intervention.
Article
Full-text available
In this paper, an early intervention solution for collegiate faculty called Course Signals is discussed. Course Signals was developed to allow instructors the opportunity to employ the power of learner analytics to provide real-time feedback to a student. Course Signals relies not only on grades to predict students' performance, but also demographic characteristics, past academic history, and students' effort as measured by interaction with Blackboard Vista, Purdue's learning management system. The outcome is delivered to the students via a personalized email from the faculty member to each student, as well as a specific color on a stoplight -- traffic signal -- to indicate how each student is doing. The system itself is explained in detail, along with retention and performance outcomes realized since its implementation. In addition, faculty and student perceptions will be shared.
Article
Full-text available
Generally, it takes "exotic" (linguists, knowledge engineers) manpower to maintain, and adapt a NLG system. To avoid this problem, we have developed a Knowledge Administration ? station, which is usable by the target population (in our project, weather forecasters). On the other hand, the system is designed to help the forecasters and not replace them. It is able to adapt to each forecaster's style and manage enhancements? they wish to bring to their texts. With that in mind, the Interactive Generation environment was designed to allow forecasters to modify generated texts in their native language, and then generate weather forecasts in several foreign languages based on those modifications. Interactive? Generation is a viable alternative to Automatic Translation.
Article
Full-text available
We propose a novel design of a Student Success System (S3), a holistic analytical system for identifying and treating at-risk students. S3 synthesizes several strands of risk analytics: the use of predictive models to identify academically at-risk students, the creation of data visualizations for reaching diagnostic insights, and the application of a case-based approach for managing interventions. Such a system poses numerous design, implementation, and research challenges. In this paper we discuss a core research challenge for designing early warning systems such as S3. We then propose our approach for meeting that challenge. A practical implementation of an student risk early warning system, utilizing predictive models, must meet two design criteria: a) the methodology for generating predictive models must be flexible to allow generalization from one context to another; b) the underlying mechanism of prediction should be easily interpretable by practitioners whose end goal is to design meaningful interventions on behalf of students. Our proposed solution applies an ensemble method for predictive modeling using a strategy of decomposition. Decomposition provides a flexible technique for generating and generalizing predictive models across different contexts. Decomposition into interpretable semantic units, when coupled with data visualizations and case management tools, allows practitioners, such as instructors and advisors, to build a bridge between prediction and intervention.
Conference Paper
Full-text available
Multilingual environmental information is communicated via different media. These media can be news-papers, TV, internet, WAP, SMS, etc. Each of the media has a presentation mode which fits best. Thus, it turned out that for newspapers, pictograms which indicate good, normal, and bad conditions, a map with pictograms and/or a very short text are to be preferred. In contrast, the information provided in the internet, can be very detailed and personalized and contain the latest data available at the moment the user requests the information. Furthermore, in the internet, the user can interactively select more details or change the presentation mode. For all media, the most challenging information mode is text. Since a template-based method where predefined sentences with empty slots filled at the time of generation cannot ensure coher-ent and cohesive text for all contextual settings, full-fledged generation techniques are needed. In this pa-per, we present the generation techniques as used for the production of multilingual air quality information in the framework of the MARQUIS-project.
Conference Paper
Full-text available
Making sense of a body of data is a common activity in any kind of analysis. Sensemaking is the process of searching for a representation and encoding data in that representation to answer task-specific questions. Different operations during sensemaking require different cognitive and external resources. Representations are chosen and changed to reduce the cost of operations in an information processing task. The power of these representational shifts is generally under-appreciated as is the relation between sensemaking and information retrieval.We analyze sensemaking tasks and develop a model of the cost structure of sensemaking. We discuss implications for the integrated design of user interfaces, representational tools, and information retrieval systems.
Conference Paper
Full-text available
Storytelling applications are increasingly being used and researched due to the fact that they are capable of conveying information and experience to users in a more natural and familiar way for them. The range of developed applications increases as we realize new ways to present content as stories or "sequences of narrative significant events". Nevertheless, implemented storytelling models are usually constrained to a particular application because of the nature of the narrated events and the way those events are linked. In order to develop a more generic model to create storytelling applications, we need to focus the solution on the manner the content is organized and conveyed to the user. We present our proposal for a generic storytelling ontology model based on the organization of events using the relations proposed by the Rhetorical Structure Theory (RST) and how narrative principles are applied to these RST relations to generate coherent stories.
Conference Paper
Full-text available
Suregen-2 applications are intended for use as add-on modules for clinical information systems. Currently, Suregen-2 permits refinement of the predefined medical ontology, specification of text plans and description knowledge for objects of the ontology. It has built-in constructs for referential expressions, aggregation, enumeration and recurrent semantic constellations. A first application built with Suregen-2, which currently supports German only, is in routine use.
Article
Full-text available
This paper describes a methodology for representing and using medical knowledge about temporal relationships to infer the presence of clinical events that evolve over time. The methodology consists of three steps: (1) the incorporation of patient observations into a generic physiologic model, (2) the conversion of model states and predictions into domain-specific temporal abstractions, and (3) the transformation of temporal abstractions into clinically meaningful descriptive text. The first step converts raw observations to underlying model concepts, the second step identifies temporal features of the fitted model that have clinical interest, and the third step replaces features represented by model parameters and predictions into concepts expressed in clinical language. We describe a program, called TOPAZ, that uses this three-step methodology. TOPAZ generates a narrative summary of the temporal events found in the electronic medical record of patients receiving cancer chemotherapy. A unique feature of TOPAZ is its use of numeric and symbolic techniques to perform different temporal reasoning tasks. Time is represented both as a continuous process and as a set of temporal intervals. These two temporal models differ in the temporal ontology they assume and in the temporal concepts they encode. Without multiple temporal models, this diversity of temporal knowledge could not be represented.
Article
Full-text available
For pt.1 see ibid., vol.21, no.4, p. 70-73 (2006). In this paper, we have laid out a theory of sensemaking that might be useful for intelligent systems applications. It's a general, empirically grounded account of sensemaking that goes significantly beyond the myths and puts forward some nonobvious, testable hypotheses about the process. When people try to make sense of events, they begin with some perspective, viewpoint, or framework - however minimal. For now, let's use a metaphor and call this a frame. We can express frames in various meaningful forms, including stories, maps, organizational diagrams, or scripts, and can use them in subsequent and parallel processes. Even though frames define what count as data, they themselves actually shape the data Furthermore, frames change as we acquire data. In other words, this is a two-way street: Frames shape and define the relevant data, and data mandate that frames change in nontrivial ways. We examine five areas of empirical findings: causal reasoning, commitment to hypotheses, feedback and learning, sense-making as a skill, and confirmation bias. In each area the Data/Frame model, and the research it's based on, doesn't align with common beliefs. For that reason, the Data/Frame model cannot be considered a depiction of commonsense views
Chapter
We present a prediction model to detect delayed graduation cases based on student network analysis. In the U.S. only 60% of undergraduate students finish their bachelors’ degrees in 6 years [1]. We present many features based on student networks and activity records. To our knowledge, our feature design, which includes conventional academic performance features, student network features, and fix-point features, is one of the most comprehensive ones. We achieved the F-1 score of 0.85 and AUCROC of 0.86.
Article
Humans are increasingly coming into contact with artificial intelligence (AI) and machine learning (ML) systems. Human‐centered AI is a perspective on AI and ML that algorithms must be designed with awareness that they are part of a larger system consisting of humans. We lay forth an argument that human‐centered AI can be broken down into two aspects: (a) AI systems that understand humans from a sociocultural perspective, and (b) AI systems that help humans understand them. We further argue that issues of social responsibility such as fairness, accountability, interpretability, and transparency.
Article
A university education is widely considered essential to social advancement. Ensuring students pass their courses and graduate on time have thus become issues of concern. This paper proposes a reduced training vector-based support vector machine (RTV-SVM) capable of predicting at-risk and marginal students. It also removes redundant training vectors to reduce the training time and support vectors. To examine the effectiveness of the proposed RTV-SVM, 32,593 university students on seven courses were chosen for performance evaluation. Analysis reveals that the RTV-SVM achieved a training vector reduction of at least 59.7% without altering the margin or accuracy of the classifier. Moreover, the results showed the proposed method to be capable of achieving overall accuracy of 92.2–93.8% and 91.3–93.5% in predicting at-risk and marginal students, respectively.
Article
While learning analytics (LA) practices have been shown to be practical and effective, most of them require a huge amount of data and effort. This paper reports a case study which demonstrates the feasibility of practising LA at a low cost for instructors to identify at-risk students in an undergraduate business quantitative methods course. Instead of using tracking data from a learning management system as predictive variables, this study utilised clicker responses as formative assessments, together with student demographic data and summative assessments. This LA practice makes use of free cloud services, Google Forms and Google Sheets in particular for collecting and analysing clicker data. Despite a small dataset being used, the LA implementation was effective in identifying at-risk students at an early stage. A systematic proactive advising approach is proposed as an intervention strategy based on students' at-risk probability estimated by a prediction model. The result shows that the intervention success rate increases correspondingly with the number of interventions and the intervention effects on peer groups are far more successful than on individual students. Overall, the students' pass rate in the study was 7% higher than that for the whole course. Practical recommendations and concerns about using linear regression and logistic regression for classification are also discussed. © 2018, International Forum of Educational Technology and Society.
Article
Knowledge-Based Report Generation is a technique for automatically generating natural language reports from computer databases. It is so named because it applies knowledge-based expert systems software to the problem of text generation. The first application of the technique, a system for generating natural language stock reports from a daily stock quotes database, is partially implemented. Three fundamental principles of the technique are its use of domain-specific semantic and linguistic knowledge, its use of macro-level semantic and linguistic constructs (such as whole messages, a phrasal lexicon, and a sentence-combining grammar), and its production system approach to knowledge representation.
Article
This article introduces learning analytics dashboards that visualize learning traces for learners and teachers. We present a conceptual framework that helps to analyze learning analytics applications for these kinds of users. We then present our own work in this area and compare with 15 related dashboard applications for learning. Most evaluations evaluate only part of our conceptual framework and do not assess whether dashboards contribute to behavior change or new understanding, probably also because such assessment requires longitudinal studies.
Article
Natural language generation technology is mature enough for implementing an NLG sys-tem in a commercial environment, but the cir-cumstances differ significantly from building a research system. This paper describes the challenges and rewards of building a commer-cial NLG component for an electronic medical records system. While the resulting NLG sys-tem has been successfully completed, the path to that success could have been somewhat smoother knowing the issues in advance.
Conference Paper
Much numerical information is visualized in graphs. However, this is a medium that is problematic for people with visual impairments. We have developed a system called iGraph which provides short verbal descriptions of the information usually depicted in graphs. This system was used as a preliminary solution that was validated through a process of User Needs Analysis (UNA). This process provided some basic data on the needs of people with visual impairments in terms of the components and the language to be used for graph comprehension and also validated our initial approach. The UNA provided important directions for the further development of iGraph particularly in terms of interactive querying of graphs.
Article
Educational data mining is an emerging discipline, concerned with developing methods for exploring the unique types of data that come from the educational context. This work is a survey of the specific application of data mining in learning management systems and a case study tutorial with the Moodle system. Our objective is to introduce it both theoretically and practically to all users interested in this new research area, and in particular to online instructors and e-learning administrators. We describe the full process for mining e-learning data step by step as well as how to apply the main data mining techniques used, such as statistics, visualization, classification, clustering and association rule mining of Moodle data. We have used free data mining tools so that any user can immediately begin to apply data mining without having to purchase a commercial tool or program a specific personalized tool.
Article
We present the global k-means algorithm which is an incremental approach to clustering that dynamically adds one cluster center at a time through a deterministic global search procedure consisting of N (with N being the size of the data set) executions of the k-means algorithm from suitable initial positions. We also propose modifications of the method to reduce the computational load without significantly affecting solution quality. The proposed clustering methods are tested on well-known data sets and they compare favorably to the k-means algorithm with random restarts.
Article
This paper describes a system which incorporates natural language technologies, database manipulation and educational theories in order to offer learners a Negotiated Learner Model, for integration into an Intelligent Tutoring System. The system presents the learner with their learner model, offering them the opportunity to compare their own beliefs regarding their capabilities with those inferred by the system. A conversational agent, or “chatbot” has been developed to allow the learner to negotiate over the representations held about them using natural language. The system aims to support the metacognitive goals of self-assessment and reflection, which are increasingly seen as key to learning and are being incorporated into UK educational policy. The paper describes the design of the system, and reports a user trial, in which the chatbot was found to support users in increasing the accuracy of their self-assessments, and in reducing the number of discrepancies between system and user beliefs in the learner model. Some lessons learned in the development have been highlighted and future research and experimentation directions are outlined.
Article
stop is a Natural Language Generation (nlg) system that generates short tailored smoking cessation letters, based on responses to a four-page smoking questionnaire. A clinical trial with 2553 smokers showed that stop was not effective; that is, recipients of a non-tailored letter were as likely to stop smoking as recipients of a tailored letter. In this paper we describe the stop system and clinical trial. Although it is rare for ai papers to present negative results, we believe that useful lessons can be learned from stop. We also believe that the ai community as a whole could benefit from considering the issue of how, when, and why negative results should be reported; certainly a major difference between ai and more established fields such as medicine is that very few ai papers report negative results.
Article
Improved numerical weather prediction simulations have led weather services to examine how and where human forecasters add value to forecast production. The Forecast Production Assistant (FPA) was developed with that in mind. The authors discuss the Forecast Generator (FOG), the first application developed on the FPA. FOG is a bilingual report generator that produces routine and special purpose forecast directly from the FPA's graphical weather predictions. Using rules and a natural-language generator, FOG converts weather maps into forecast text. The natural-language issues involved are relevant to anyone designing a similar system.< >
Article
Numerical weather prediction (NWP) models produce time series data of basic weather parameters which human forecasters use as guidance while writing textual forecasts. Our studies of humans writing textual weather forecasts led us to build SUMTIME-MOUSAM, a text generator that produces textual marine weather forecasts for offshore oilrig applications.
Article
Introduction During the past few years we have been concerned with developing models for the automatic planning and realization of report texts within technical sublanguages of English and French. Since 1987 we have been implementing Meaning-Text language models (MTMs) [6, 7] for the task of realizing sentences from semantic specifications that are output by a text planner. A relatively complete MTM implementation for English was tested in tlie domain of operating system audit summaries in the Gossip project of 1987-89 [3]. At COLING-90 a report was given on the fully operational FoG system for generating marine forecasts in botli English and Frencli at wcatlier centres in Eastern Canada [1]. The work reported on here concerns the experimental generation of extended bilingual summaries of Canadian statistical data. Our first focus has been on labour force sur- veys (LFS), where an extensive corpus of published reports in each language is available for empirical study. Tlie current LFS
Artificial intelligence applications to support k-1 2 teachers and teaching
  • R F Murphy
Murphy, R.F.: Artificial intelligence applications to support k-1 2 teachers and teaching. RAND Corporation (2019). https://doi.org/10.7249/PE315
Analytics for learning and teaching
  • M Van Harmelen
  • D Workman
Van Harmelen, M., Workman, D.: Analytics for learning and teaching. CETIS Anal. Ser. 1(3), 1-40 (2012)
Predicting at-risk university students in a virtual learning environment via a machine learning algorithm
  • KT Chui
  • DCL Fung
  • MD Lytras
  • TM Lam
Piecing the learning analytics puzzle
  • D Gašević
  • V Kovanović
  • S Joksimović
Gašević, D., Kovanović, V., Joksimović, S.: Piecing the learning analytics puzzle: a consolidated model of a field of research and practice. Learn.: Res. Pract. 3(1), 63-78 (2017)