Figure 3 - uploaded by Hemant Kumar Gianey
Content may be subject to copyright.
Entropy of a decision tree Source: Entropy of a DT algorithm [22] Algorithm for Information Gain: i. Calculate Entropy of the target.

Entropy of a decision tree Source: Entropy of a DT algorithm [22] Algorithm for Information Gain: i. Calculate Entropy of the target.

Similar publications

Article
Full-text available
The selection of important environmental factors and land quality is crucial to the prediction of oil palm production. This research presents the use of Genetic Algorithm GA in the selection of independent variables in Artificial Neural Network ANN models for the prediction and maximization of oil palm Fresh Fruit Bunch FFB production. Data include...
Article
Full-text available
Despite the ongoing success of populist parties in many parts of the world, we lack comprehensive information about parties’ level of populism over time. A recent contribution to Political Analysis by Di Cocco and Monechi (DCM) suggests that this research gap can be closed by predicting parties’ populism scores from their election manifestos using...

Citations

... In the supervised learning method, Support Vector Machine (SVM) + Genetic Algorithm (GA) & Support Vector Machine (SVM)+ Convolutional Neural Network (CNN) Outperforms the explicit aspect-based sentiment analysis. In the implicit aspect-based sentiment analysis hybrid model, Supervised + Unsupervised gives better results with the LSTM & LDA technique [84][85][86]. Tables 8 and 9 discuss the latest research papers' respective limitations and results. After this literature survey, we found that the Implicit ABSA field is the area that needs to be explored less. ...
Article
Full-text available
In the field of Natural Language Processing (NLP), Aspect-Based Sentiment Analysis (ABSA) is currently a trending area of research. Gathering people’s opinions and exchanging information has always been common practice. With the prevalence of internet-connected devices, individuals can easily share their thoughts and real-time updates. As digital data grows, it can be leveraged to analyze people’s sentiments. Over the last decade, extensive research has been dedicated to sentiment analysis. Aspect extraction has become crucial for effectively categorizing sentiments through Sentiment Analysis (SA). This paper comprehensively reviews aspect-based Sentiment Analysis (ABSA) using the Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) methodology. The study encompasses various implicit and explicit studies and their respective limitations and findings. Additionally, it delves into describing ABSA and its core tasks while also highlighting future directions such as aspect extraction, implicit aspect extraction, fake & sarcasm detection, opinion spam detection, handling grammatical errors, hidden emotion extraction, implicit language detection, double implicit, and spam & fake reviews. The study observes that ABSA tasks often utilize supervised learning and hybrid techniques. Ultimately, the review aims to inspire innovative researchers by providing a comprehensive overview of the field, benefiting both novices and seasoned researchers by helping them better understand implicit and explicit aspect extraction.
... The main differences among those two techniques are label value in regression is numerical. However, the classification procedure is categorical (Hemant and Rishabh, 2017). To well understand, the procedure of supervision is exposed in Fig. 3. ...
Article
Full-text available
Machine learning (ML) approaches cover several aspects of daily life tasks, including knowledge representation, data analysis, regression, classification, recognition, clustering, planning, reasoning, text recommendation, and perception. The ML approaches enable applications to learn and adapt with or without being directly programmed from previous data or experience. The ML techniques, coupled with current technologies, provide a range of solutions, starts from vision-based applications to text-generation applications. To this end, this article presents a comprehensive overview of the approaches of ML, including supervised, unsupervised, semi-supervised, reinforcement, and self-learning. This review critically examines the roles performed by these aforementioned approaches in terms of their weaknesses and strengths. Furthermore, within this study, a new comparative analysis is conducted by reviewing existing studies and evaluating ML techniques using metrics including data requirement, accuracy, complexity, interpretability, scalability, applications, and challenges. Thereafter, the implemented ML techniques are classified, and their key findings are examined. The comprehensive review demonstrates that neither standalone nor hybrid ML techniques can completely satisfy all of the evaluated metrics, the necessity of customized solutions based on the requirements of particular applications.
... Based on their capabilities of malicious activity detection and retention rate the categories distinguishes each other. [7,8] • Anamoly Based detection screens system action and classifies deviations from usual behaviours. ...
... In recent era, countless research is being done in this area to safeguard the network system. [7,8,10] IDS keep an eye on traffic to look for attacks meant to steal data. When machine learning is used, the detection rate is high and the false alarm rate is low. ...
Article
Artificial intelligence has turn out to be a vital part of our ecosystem. Major sectors of the market are considering artificial intelligence for business and some sectors are using the technology in boom for their businesses. Advances in business through machine learning technology are reaching miles. Intrusion Detection systems should be robust if more platforms will be using AI technology all over the globe. The research in this work is predicated on the NSL- KDD dataset for detection of malicious activity. The dataset is evaluated using the Naive Bayes algorithm. A machine learning technique that can both learn and adapt to previously unknown patterns has been used in an attempt to develop an intrusion detection system. Unsupervised feature learning has made use of various classifications as required to train and test the model. The NSL-KDD dataset is then classified using a logistic classifier. Accuracy, precision, and recall metrics have been used to measure the system's performance, and the findings are quite positive for possible future modifications and uses.
... These decision trees are generated using a randomized tree-building algorithm, which creates several trees by randomly sampling the original training set, allowing for certain items to appear more than once. When RF is used for regression, the final output is the average of the predictions from all decision trees [72]. We implemented this method using the R package "randomForest"(V4.7-1.2) [73]. ...
Article
Full-text available
Germplasm improvement is essential for maize breeding. Currently, intra-heterotic-group crossing is the major method for germplasm improvement, while inter-heterotic-group crossing is also used in breeding but not in a systematic way. In this study, five inbred lines from four heterotic groups were used to develop a connected segregating population through inter-heterotic-group line crossing (CSPIC), which comprised 5 subpopulations with 535 doubled haploid (DH) lines and 15 related test-cross populations including 1568 hybrids. Significant genetic variation was observed in most subpopulations, with several DH populations exhibiting superior phenotypes regarding traits such as plant height (PH), ear height (EH), days to anthesis (DTA), and days to silking (DTS). Notably, 10.8% of hybrids in the population POP5/C229 surpassed the high-yielding hybrid ND678 (CK). To reduce field planting costs and quickly screen for the best inter-heterotic-group DH lines and test-cross hybrids, we assessed the accuracy of genomic selection (GS) for within- and between-population predictions in the DH populations and the test-cross populations. Within the DH or the hybrid population, the prediction accuracy varied across populations and traits, with an average hybrid yield prediction accuracy of 0.41, reaching 0.54 in POP5/Z58. In the cross DH population predictions, the prediction accuracy of the half-sib population exceeded that of the non-sib cross population prediction, with the highest accuracy observed when the non-shared parents were from the same heterotic group, and the average phenotypic prediction accuracies of POP3 predicting POP2 and POP2 predicting POP3 were 0.54 and 0.45, respectively. In the cross hybrid population predictions, the accuracy was highest when both the training and the test sets came from the same DH populations, with an average accuracy of 0.43. The proportion of shared polymorphisms with respect to SNPs between the training and the test sets (PSP) exhibited a significant and strong correlation with the prediction accuracy of cross population prediction. This study demonstrates the feasibility of creating new heterotic groups through inter-heterotic-group crossing in germplasm improvement, and some cross population prediction patterns exhibited excellent prediction accuracy.
... They are widely used in fields such as speech and image recognition. [4,17] In large datasets, a priori methods find common item sets and association rules. A Markov decision process is a mathematical model for decision making that incorporates both randomness and controllable factors. ...
Article
Introduction: Machine learning has emerged as a powerful tool for data analysis, enabling systems to identify patterns and make predictions without explicit programming. Among its various approaches, unsupervised learning plays a crucial role in discovering hidden structures within data, especially in scenarios where labeled examples are scarce or costly to obtain. This study provides a comprehensive analysis of unsupervised learning techniques, with a particular focus on clustering and reinforcement learning. Research significance: This study provides an in-depth exploration of unsupervised learning techniques, emphasizing their ability to identify patterns in data without the need for labeled training examples. This is particularly significant in domains where acquiring labeled data is costly or impractical. By highlighting the role of reinforcement learning in unsupervised systems, the research advances the understanding of how agents improve behavior through rewards and penalties, which has implications for robotics and strategic game-playing applications. Methodology: Other options include K-Nearest Neighbors (KNN), Neural Networks, Support Vector Machines (SVM), and Decision Trees. Assessment Criteria: Memory Usage, Accuracy, Training Speed, and Error Rate. Result: According to the results, K-Nearest Neighbors (KNN) had the lowest quality, while neural networks had the highest quality. Conclusion: According to the GRA approach, neural networks are the most valuable datasets for machine learning algorithms. Key words: Unsupervised Learning, Reinforcement Learning, Clustering, Generalization & Over fitting, Decision Trees & Random Forests, Neural Networks & Deep Learning, Ensemble Methods, and Medical Imaging & Cyber security.
... The main differences amongst those two techniques are label value in regression is numerical. However, classification procedure is categorical (Hemant & Rishabh, 2017). To well understand, the procedure of supervised is exposed in fig. 3. ...
... For instance, when a lot of factors like nutrition, physical activity, sleep and stress levels enter some systems. To overcome the limitations of PID control and enable more precise blood sugar regulation, researchers are increasingly using Machine Learning (ML) applications.ML is a branch of Artificial Intelligence (AI) that enables computers to learn from data patterns and make predictions or decisions (Choudhary & Gianey, 2017). ...
Conference Paper
Full-text available
Diabetes management is vital for patients with Type 1 Diabetes (T1D). The main goal is to minimize the patient's risk of hypoglycemia or hyperglycemia by keeping blood sugar levels within a certain range. In this study, the OHIOT1DM dataset consisting of data from 12 T1D patients was used in the Artificial Intelligence (AI) based Closed Loop (CL) PID control system. A CL Artificial Pancreas (AP) system was developed using the eXogenous input Nonlinear Autoregressive (NLARX) model integrated with a Wavelet Neural Network (WNN) to model the complex relationship between basal insulin input and glucose. Based on the relationship between glucose and insulin levels, the system is designed to bring the glucose level range in individuals with T1D to the glucose level range of healthy individuals. The design of CLAP models with PID controllers was done in MATLAB R2024b Simulink environment. In this way, the accuracy and reliability of the model were supported by detailed simulations and the performance of the system was visualized and analyzed with the help of graphics. Model performance was evaluated using metrics such as fit to the estimation data (Fit), final prediction error (FPE), and mean square error (MSE). The results were found to be 99.08, 4.97 and 4.91 for Fit, FPE and MSE for 12 T1D patients, respectively. The results show that the CLAP system design, which will increase the quality of life of T1D patients, has positive results. In addition, the CLAP system designed based on NLARX with WNN will contribute positively to the AP technology in terms of automating the sensors and glucose monitoring devices and allowing them to work at more precise intervals.
... Supervised machine learning techniques are algorithms designed to identify common patterns and make predictions based on provided examples. These techniques include but are not limited to methods such as naïve Bayes, support vector machines, neural networks, kernel methods, deep learning, recurrent neural networks, ARIMA models, boosting, and quadratic classifiers (Choudhary & Gianey, 2017;Saravanan & Sujatha, 2018;Singh et al., 2016). ...
Article
Full-text available
To address the challenges associated with measuring and classifying household consumption (poverty) in developing countries, such as cost, time gaps, and inaccurate socio-economic data, this study suggests leveraging machine learning (ML) algorithms. We assessed the performance of various ML algorithms using data from 14,580 sample households from the Integrated Household Living Condition Survey (EICV5), considering 87 features. Among the 12 classifiers evaluated, multiple kernel support vector machines, eXtreme gradient boosting, and multinomial logit demonstrated the highest predictive accuracy, ranging between 86.6% and 88.5%. Notably, household food expenditure, the total number of children (<14 years) in the household, and household own food expenditures emerged as the most predictive features for consumption classification. Interestingly, including shock-coping strategies did not significantly improve prediction accuracy. The multiple kernel support vector machine consistently outperformed eXtreme gradient boosting and multinomial logit. These findings suggest that survey questions used to assess poverty in Rwanda could be streamlined, prioritizing important features, particularly those related to household food characteristics. This approach has the potential to address challenges associated with measuring and classifying household consumption in developing countries more effectively.
... Supervised learning entails mapping inputs to outputs based on labeled input-output pairs, commonly known as labeled data. 156 The model is trained to minimize a loss function, aligning the accuracy of predicted values with ground-truth values. Classification and regression are two key approaches in supervised learning. ...
Article
Full-text available
Polymeric membranes have become essential for energy-efficient gas separations such as natural gas sweetening, hydrogen separation, and carbon dioxide capture. Polymeric membranes face challenges like permeability-selectivity tradeoffs, plasticization, and physical aging, limiting their broader applicability. Machine learning (ML) techniques are increasingly used to address these challenges. This review covers current ML applications in polymeric gas separation membrane design, focusing on three key components: polymer data, representation methods, and ML algorithms. Exploring diverse polymer datasets related to gas separation, encompassing experimental, computational, and synthetic data, forms the foundation of ML applications. Various polymer representation methods are discussed, ranging from traditional descriptors and fingerprints to deep learning-based embeddings. Furthermore, we examine diverse ML algorithms applied to gas separation polymers. It provides insights into fundamental concepts such as supervised and unsupervised learning, emphasizing their applications in the context of polymer membranes. The review also extends to advanced ML techniques, including data-centric and model-centric methods, aimed at addressing challenges unique to polymer membranes, focusing on accurate screening and inverse design.
... DT is one of the supervised learning algorithms which is used for classifying and also for regression purposes. It operates to classification by breaking the dataset into subsets according to the values of the input variables to generate a tree of decisions [35]. Nodes in the tree represent decision rules, while each leaf node represents an easily understandable and simple outcome for the application of classification rules. ...
Article
Full-text available
Password strength prediction plays an important role in improving protection against cyber threats as their frequency increases. Typically, rules are used more specifically, but not all evaluate passwords effectively. This research aims to explore a more advanced approach to password strength prediction that solves some of the existing shortcomings through a machine learning (ML) and ensemble model for multi-class classification. Here, in this research, we have employed Random Forest (RF), Decision Tree (DT), Stochastic Gradient Descent (SGD), and Logistic Regression (LR) algorithms with Bagging and Stacking ensembling techniques. We used the Sber Dataset from Kaggle, which includes 100,000 passwords for the experiment. In the data preprocessing, the main procedures applied were missing value handling and shuffling. Text preprocessing included tokenization using common stop words and Term Frequency-Inverse Document Frequency (TF-IDF). The dataset was balanced using Synthetic Minority Over-sampling Technique (SMOTE) to address the class imbalance. The results for the Bagging and Stacking ensembles of combining multiple ML models showed that our approach outperformed the individual models in classifying password strength into three categories: weak, medium, and strong. Stacking outperformed the other algorithms in the sense that more than one model was used to improve results and minimize errors. Thus, the proposed approach provides a more accurate and versatile measure for password validation eradicating the problems encountered with the original method. The results proved the high efficiency of the used methods and showed more efficiency in prediction performance in comparison with the baseline models.