Article

Integrating Data Science and Machine Learning to Chemistry Education: Predicting Classification and Boiling Point of Compounds

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... In contrast to traditional lecture-based teaching, AI and ML technologies provide interactive and engaging learning experiences that can help students build their knowledge more effectively [50]. The utilization of AI and ML in chemistry education revolutionizes the way students learn, process, and apply chemistry knowledge in a variety of contexts [26]. Through the use of these technologies, students can access a personalized learning environment that adapts to their learning style and pace and provides real-time feedback and assistance [11]. ...
... Daher et al. [15] illustrated that the use of AI (ChatGPT) significantly improves students' problem-solving skills (p > 0.01), particularly in the introduction of material science. In addition, Kim et al. [26] explained that the use of machine learning improves students' understanding of boiling points of an organic compound such as hydrocarbons, alcohol, and amines. Furthermore, Iyamuremye and Ndihokubwayo [22] their showed that the use of Chat GPT AI increased students' performance at the percentage of 16.6% in chemical bonding and atomic structure. ...
Article
Full-text available
The current study aimed to criticize the existing literature on the utilization of artificial intelligence (AI) and machine learning (ML) in teaching and learning chemistry. A comprehensive critical literature review was conducted using electronic databases such as Scopus, PubMed, ISI, Google Scholar, ERIC, Web of Science, and JSTOR. In this regard, 62 articles were extracted from these electronic databases. During the selection of the literature inclusion and exclusion criteria were applied. The inclusion criteria include empirical and theoretical studies examining the effectiveness, challenges, and opportunities of AI/ML, and articles from 2018 to 2024 and written in English. On the other side, the exclusion criteria include literature that unrelated to education, lacking empirical evidence, or not peer-reviewed, as well as non-English publications, and published before 2018. This was done to gain insights into the current implementation status of AI and ML as well as critical issues of using these approaches in chemistry education. The study employed a critical review of the literature, which involves a critical analysis of the themes and concepts that emerge from the selected literature and identifies the opportunities and challenges surrounding the utilization of these technologies. The results revealed that there are opportunities for the integration of AI and ML in chemistry education, including personalized learning experiences, teacher assistance, and accessibility to learning materials. In this regard, intelligent tutoring systems and adaptive learning platforms were identified as potential aides for teachers in various aspects of teaching. The study also revealed the limitations and challenges surrounding AI and ML, such as the dependence on preexisting data, potential biases in models, and concerns around data privacy and security. Moreover, the findings also indicated that the implementation of AI and ML in chemistry education is still in its juvenile stage. Thus, teacher training programs are needed to equip teachers with the necessary skills for the use of these technologies effectively in the classroom. In addition, more efforts should be made to facilitate research, collaboration, and the development of policies and regulations that ensure responsible use of these technologies in the teaching and learning process.
... For example, Amtul et al. (2024) integrated bioinformatics and DS into a graduate-level course using a modular, inquiry-based structure to engage students in analyzing biomolecular data. Likewise, Kim et al. (2024) introduced a no-code/low-code platform-based activity that allowed students to apply machine learning in chemistry by classifying compounds and estimating boiling points. ...
... For instance, Ref. [10] points out the importance of well-designed educational datasets tailored to different school levels in AI and data science education, highlighting how step-by-step experiences in data science processes can enhance learning outcomes. Likewise, Ref. [11] found that integrating data science and machine learning into scientific inquiry enables students to apply AI technologies to domain-specific problems, fostering practical problem-solving skills. ...
Article
Full-text available
Computer vision education is increasingly important in modern technology curricula; yet, it often lacks a systematic approach integrating both theoretical concepts and practical applications. This study proposes a staged framework for computer vision education designed to progressively build learners’ competencies across four levels. This study proposes a four-staged framework for computer vision education, progressively introducing concepts from basic image recognition to advanced video analysis. Validity assessments were conducted twice with 25 experts in the field of AI education and curricula. The results indicated high validity of the staged framework. Additionally, a pilot program, applying computer vision to acid–base titration activities, was implemented with 40 upper secondary school students to evaluate the effectiveness of the staged framework. The pilot program showed significant improvements in students’ understanding and interest in both computer vision and scientific inquiry. This research contributes to the AI educational field by offering a structured, adaptable approach to computer vision education, integrating AI, data science, and computational thinking. It provides educators with a structured guide for implementing progressive, hands-on learning experiences in computer vision, while also highlighting areas for future research and improvement in educational methodologies.
Article
Laser-Induced Breakdown Spectroscopy (LIBS) combined with Artificial Intelligence (AI) offers a powerful method for analyzing and comparing spectral data. This study presents a comparative analysis of conventional and AI-developed methods for processing and interpreting LIBS data, especially in forensic applications, focusing on toner sample discrimination. We propose a novel AI-developed approach that combines normalization, interpolation, and peak detection techniques to simplify LIBS spectral analysis without user preprocessing and easily identify unique spectral features. This method was compared with conventional principal component analysis (PCA) and partial least squares discriminant analysis (PLS-DA), which are commonly used for LIBS data analysis. The AI-developed method demonstrated superior performance in discriminating between toner samples from various brands and models of printers and photocopiers. The quantitative evaluation of the performance of the AI-developed approach was performed using statistical analysis, including accuracy difference percentage, component-wise variance analysis, paired t-test, and cross-validation test. The results confirmed a significant improvement in accuracy with the AI-developed method compared to conventional approaches. This proposed work highlights the potential of AI in enhancing spectroscopic analysis for forensic applications, offering increased efficiency and accuracy in sample discrimination and classification. Additionally, it accelerates the analysis of LIBS data with no need for user preprocessing.
Article
Full-text available
As artificial intelligence (AI) and data science education gain importance in K-12 curricula, there is a growing need for well-designed sustainable educational datasets tailored to different school levels. Sustainable datasets should be reusable, adaptable, and accessible to support long-term AI and data science education goals. However, research on the systematic categorization of difficulty levels in educational datasets is limited. This study aims to address this gap by developing a framework for sustainable educational dataset standards based on learners’ developmental stages and data preprocessing requirements. The proposed framework consists of five levels: Level 1 (grades 1–4), where data preprocessing is unnecessary; Level 2 (grades 5–6), involving basic data cleaning; Level 3 (grades 7–9), requiring attribute manipulation; Level 4 (grades 10–12), involving feature merging and advanced preprocessing; and Level 5 (teachers/adults), requiring the entire data science process. An expert validity survey was conducted with 22 elementary and secondary school teachers holding advanced degrees in AI education. The results showed high validity for Levels 1–4 but relatively lower validity for Level 5, suggesting the need for separate training and resources for teachers. Based on the CVR results and expert feedback, the standards for Educational Datasets were revised, particularly for Stage 5, which targets teachers and adult learners. The findings highlight the importance of expert validation, step-by-step experiences, and an interdisciplinary approach in developing educational datasets. This study contributes to the theoretical understanding of educational datasets and provides practical implications for teachers, students, educational institutions, and policymakers in implementing effective and sustainable AI and data science education in K-12 settings, ultimately fostering a more sustainable future.
Article
Full-text available
The current work develops intelligent tutoring aspects for the DiscoverOChem learning platform. Intelligent tutoring systems are technology-based learning systems that can adapt the learning experience to better serve individual users. DiscoverOChem (www.discoverochem.com) is a free Internet-based platform for learning undergraduate-level organic chemistry. Data from previous years of students were used to analyze how well individual students performed on various pages of the platform. Correlations between pairs of pages were analyzed. Predictive models, which use a user’s results on previous pages to predict that user’s likely performance on upcoming pages, were developed and evaluated. The most successful set of models, which utilizes random forests of one-branch decision trees, was incorporated into the DiscoverOChem platform as a recommender system. This system helps individual users to identify pages that are likely to challenge them and provides targeted recommendations about which previous pages to review in order to help them become better prepared to succeed on the upcoming page. We anticipate that learners will benefit from this new individualization of their learning experiences. We also anticipate that the general 6-step framework that was used to develop this system will be broadly useful for creating intelligent learning platforms for other subjects as well.
Article
Full-text available
Algorithms have evolved from machine code to low-code-no-code (LCNC) in the past 20 years. Observing the growth of LCNC-based algorithm development, the CEO of GitHub mentioned that the future of coding is no coding at all. This paper systematically reviewed several of the recent studies using mainstream LCNC platforms to understand the area of research, the LCNC platforms used within these studies, and the features of LCNC used for solving individual research questions. We identified 23 research works using LCNC platforms, such as SetXRM, the vf-OS platform, Aure-BPM, CRISP-DM, and Microsoft Power Platform (MPP). About 61% of these existing studies resorted to MPP as their primary choice. The critical research problems solved by these research works were within the area of global news analysis, social media analysis, landslides, tornadoes, COVID-19, digitization of process, manufacturing, logistics, and software/app development. The main reasons identified for solving research problems with LCNC algorithms were as follows: (1) obtaining research data from multiple sources in complete automation; (2) generating artificial intelligence-driven insights without having to manually code them. In the course of describing this review, this paper also demonstrates a practical approach to implement a cyber-attack monitoring algorithm with the most popular LCNC platform.
Article
Full-text available
Education is one of the sectors that improves the future of societies; unfortunately, the pandemic generated by coronavirus disease 2019 has caused a variety of problems that directly affect learning. Universities have found it necessary to begin a transition towards remote or online educational models. To do so, the only method that guarantees the continuity of classes is using information and communication technologies. The transition in the foreground points to the use of technological platforms that allow interaction and the development of classes through synchronous sessions. In this way, it has been possible to continue developing both administrative and academic activities. However, in effective education, there are factors that create an ideal environment where the generation of knowledge is possible. By moving from traditional educational models to remote models, this environment has been disrupted, significantly affecting student learning. Identifying the factors that influence academic performance has become the priority of universities. This work proposes the use of intelligent techniques that allow the identification of the factors that affect learning and allow effective decision-making that allows improving the educational model.
Technical Report
Full-text available
Machine learning (ML) is the scientific study of algorithms and statistical models that computer systems use to perform a specific task without being explicitly programmed. Learning algorithms in many applications that's we make use of daily. Every time a web search engine like Google is used to search the internet, one of the reasons that work so well is because a learning algorithm that has learned how to rank web pages. These algorithms are used for various purposes like data mining, image processing, predictive analytics, etc. to name a few. The main advantage of using machine learning is that, once an algorithm learns what to do with data, it can do its work automatically. In this paper, a brief review and future prospect of the vast applications of machine learning algorithms has been made.
Article
Full-text available
The past few years have seen a considerable rise in interest towards artificial intelligence and machine learning applications in radiology. However, in order for such systems to perform adequately, large amounts of training data are required. These data should ideally be standardised and of adequate quality to allow for further usage in training of artificial intelligence algorithms. Unfortunately, in many current clinical and radiological information technology ecosystems, access to relevant pieces of information is difficult. This is mostly because a significant portion of information is handled as a collection of narrative texts and interoperability is still lacking. This review aims at giving a brief overview on how structured reporting can help to facilitate research in artificial intelligence and the context of big data.
Article
Artificial intelligence (AI) is receiving a lot of attention in various fields. There is a need for educating students in AI technology. In this context, we have created a new type of carbon dioxide fountain by integrating AI with the well-established Arduino–carbon dioxide fountain experiment. This experiment consists of an Arduino microcontroller, pressure sensor, solenoid valve, and AI speech recognition and demonstrates the carbon dioxide fountain integrating AI speech recognition. In previous work describing the Arduino–carbon dioxide fountain experiment, the solenoid valve is opened automatically when the measured pressure value is less than a setting value. However, in this experiment, AI speech recognition is added to open (or close) the solenoid valve by the voice of the experimenter. In particular, this experiment introduces a scientific inquiry activity integrating AI technology that even beginners can easily apply.
Article
Recent advances in computer hardware and algorithms are spawning an explosive growth in the use of computer-based systems aimed at analyzing and ultimately correlating large amounts of experimental and synthetic data. As these machine learning tools become more widespread, it is becoming imperative that scientists and researchers become familiar with them, both in terms of understanding the tools and the current limitations of artificial intelligence, and more importantly being able to critically separate the hype from the real potential. This article presents a classroom exercise aimed at first-year science and engineering college students, where a task is set to produce a correlation to predict the normal boiling point of organic compounds from an unabridged data set of >6000 compounds. The exercise, which is fully documented in terms of the problem statement and the solution, guides the students to initially perform a linear correlation of the boiling point data with a plausible relevant variable (the molecular weight) and to further refine it using multivariate linear fitting employing a second descriptor (the acentric factor). Finally, the data are processed through an artificial neural network to eventually provide an engineering-quality correlation. The problem statements, data files for the development of the exercise, and solutions are provided within a MATLAB environment but are general in nature. © 2019 American Chemical Society and Division of Chemical Education, Inc.
Article
The boiling points for 392 organic compounds are tabulated by carbon chain length and functional group to facilitate a wide range of inquiry-based activities that correlate the effects of chemical structure on physical properties. When combined with literature searching or laboratory experimentation, these data provide a flexible resource for instruction across many different aspects of chemical science, including (i) developing empirical models, (ii) verifying empirical models, (iii) conducting statistical data analysis, (iv) using empirical models for insight into fundamental structure–property relationships, and (v) proposing follow-up research investigations to fill in the gaps in scientific knowledge. Keywords (Audience): Second-Year Undergraduate
Plans College for Artificial Intelligence, Backed by $1 Billion
  • S Lohr