Science topic
Feature Selection - Science topic
Explore the latest publications in Feature Selection, and find Feature Selection experts.
Publications related to Feature Selection (10,000)
Sorted by most recent
A special issue of Journal of Risk and Financial Management (ISSN 1911-8074). This special issue belongs to the section "Risk".
Deadline for manuscript submissions: 15 June 2024 |
https://www.mdpi.com/journal/jrfm/special_issues/4QY2C4T410
Keywords
• machine learning solutions, including ensemble models
• variable selection
• explainable artifici...
Dear Colleagues, For this Special Issue, we seek papers concerning current advances in feature selection algorithms for high-dimensional settings, as well as review papers that will motivate ongoing efforts to grasp the challenges commonly faced in this field. High-quality articles that address both theoretical and practical challenges relating to...
The proposed scheme in this research paper is a communication-less islanding detection system based on recurrent neural network (RNN) for hybrid distributed generator (DG) systems that include both synchronous and inverter-based DG devices. The scheme consists of three stages: time-domain feature extraction (FE) from the three-phase voltage signal...
This study investigates the efficacy of Explainable Artificial Intelligence (XAI) methods, specifically Gradient-weighted Class Activation Mapping (Grad-CAM) and Shapley Additive Explanations (SHAP), in the feature selection process for national demand forecasting. Utilising a multi-headed Convolutional Neural Network (CNN), both XAI methods exhibi...
The advent of reliable and inexpensive sensors and advancements in general computing have made data-heavy algorithms feasible for operational, real-time decision-making applications in the geothermal energy industry. This systematic review aims to provide a starting point for researchers interested in developing data-driven systems, tools, and fram...
Evapotranspiration is one of agricultural water management's most significant and impactful hydrologic processes. A new multi-decomposition deep learning-based technique is proposed in this study to forecast weekly reference evapotranspiration (ET o) in western coastal regions of Australia (Redcliffe and Gold Coast). The time-varying filter-based e...
The road system is the main mode used for the transportation of agricultural cargo, and in some cases, it is the only option for handling this type of product. This dependence means that the implementation of tools to support the management of logistical costs can reduce the financial impact with the transport felt by the economic agents operating...
Phishing is a cybercrime that is constantly increasing in the recent years due to the increased use of the Internet and its applications. It is one of the most common types of social engineering that aims to disclose or steel users sensitive or personal information. In this paper, two main objectives are considered. The first is to identify the bes...
Feature selection (FS) is a crucial task in machine learning applications, which aims to select the most appropriate feature subset while maintaining high classification accuracy with the minimum number of selected features. Despite the widespread usage of metaheuristics as wrapper-based FS techniques, they show reduced effectiveness and increased...
In the study, butt welding process of PE100 material, which is of HDPE material class and is used extensively in underground water and gas transportation, has been carried out experimentally. In the study, welded joints were performed using three different welding temperatures, three different joint pressures and three different heating times. The...
Cauliflower disease is a primary cause of reduced cauliflower yield. Preventing cauliflower disease requires early diagnosis. In the scope of this study, we suggested an agro-medical expert system that would make it easier to diagnose cauliflower disease. In this method, a digital image must be taken off the phone or handled device to diagnose caul...
The goal of Feature Selection - comprising filter, wrapper, and embedded approaches - is to find the optimal feature subset for designated downstream tasks. Nevertheless, current feature selection methods are limited by: 1) the selection criteria of these methods are varied for different domains, making them hard to generalize; 2) the selection per...
This paper offers a systematic literature review of real-time detection and classification of Power Quality Disturbances (PQDs). A particular focus is given to voltage sags and notches, as voltage sags cause huge economic losses while research on voltage notches is still very incipient. A systematic method based on scientometrics, text similarity a...
When smart metering systems are used to encourage energy conservation at the residential level, new research challenges arise in the areas of monitoring usage and providing accurate load forecasts. Intelligent smart meters rely on accurate predictions of future electricity use. Many energy and operational improvements, such as more efficient applia...
This study aims to propose a novel backpropagation neural network (BPNN) featured with sequential forward selection (SFS), named the BPNN_s model, to master the leaching characteristics of toxic elements (TEs) in coal fly ash (CFA). A total of 400 datasets and 54 features are involved to predict the fractions of TEs. The determination coefficient (...
Maintaining the content quality on social media Q&A platforms is pivotal for user attraction and retention. Automating post quality assessment offers benefits such as reduced moderator workload, amplified community impact, enhanced expert user recognition, and importance to expert feedback. While existing approaches for post quality mainly employ b...
Particles, biological molecules, and other harmful chemicals in the Earth's atmosphere generate air pollution, which in turn causes many negative health impacts, human deaths, animal deaths, agricultural failures, and infrastructure deterioration. Air pollution can be caused by both human actions and natural occurrences. Large-scale industry, which...
Depression is a psychological state of mind that often influences a person in an unfavorable manner. While it can occur in people of all ages, students are especially vulnerable to it throughout their academic careers. Beginning in 2020, the COVID-19 epidemic caused major problems in people’s lives by driving them into quarantine and forcing them t...
Background
Antibiotic therapy is a known risk factor for Clostridioides difficile infection (CDI), though the risk varies by agent. The antibiotic spectrum index (ASI) captures information on spectrum of antibiotic activity and days of therapy (DOT). We evaluated ASI and other routinely collected clinical data as predictors of hospital-associated (...
Background
Emergence of antibiotic resistant bacteria is a public health threat. Data on the burden of ESBL-producing bacteria is limited, particularly in low to middle income countries, such as Jamaica. Here we sought to identify clinical and demographic features associated with extended spectrum β-lactamase producing Enterobacterales (ESBLs) in p...
In today’s age, we see the increasing influence of technology on people, which begs to raise the question: “Is society determined by technology?” Rising up within the constraints of each society, technology had its limitations, as it catered to the needs and interests of the masses. As society evolved, so did its requirements. We are at a stage whe...
In this study, we present the acquisition and categorization of a geographically-informed, multi-dialectal Albanian National Corpus, derived from Twitter data. The primary dialects from three distinct regions—Albania, Kosovo, and North Macedonia—are considered. The assembled publicly available dataset encompasses anonymized user information, user-g...
Background
California hospitals report information on antimicrobial stewardship (AS) Core Elements, resources, and practices through the National Healthcare Safety Network Annual Survey. The California Department of Public Health has an AS Program (ASP) Honor Roll to recognize and promote hospital AS. We explored associations between AS-related res...
Background
Multisystem inflammatory syndrome in children (MIS-C) is an uncommon but severe hyperinflammatory syndrome occurring weeks after SARS-CoV-2 infection. Presentation can vary and overlap with other conditions, including acute COVID-19 and Kawasaki disease. Identifying clusters of MIS-C phenotypes informs efforts to reduce misclassification...
This paper investigated the changing relationship between socioeconomic factors and mental health over time. Data were analysed from the Understanding Society Database, a representative sample of the UK population consisting of a potential of 150,393 respondents. Multiple regression coefficients over 13 years were compared over time to analyse effe...
We are motivated by the problem of identifying potentially nonlinear regression relationships between high-dimensional outputs and high-dimensional inputs of heterogeneous data. This requires regression, clustering, and model selection, simultaneously. In this framework, we apply the mixture of experts models which are among the most popular ensemb...
Highlights
Machine Learning (ML) models are identified, reviewed, and analyzed for HAB predictions.
Data preprocessing is vital for efficient ML model development.
ML models for toxin production and monitoring are limited.
Abstract. Harmful algal blooms (HABs) are detrimental to livestock, humans, pets, the environment, and the global economy, whi...
Melanoma, a widespread and hazardous form of cancer, has prompted researchers to prioritize dermoscopic image‐based algorithms for classifying skin lesions. Recently, there has been a growing trend in using pre‐trained convolutional neural networks for detecting skin lesions. However, the features extracted from these classifiers may include irrele...
In the food industry, quality and safety issues are associated with consumers’ health condition. There is a growing interest in applying various noninvasive sensorial techniques to obtain quickly quality attributes. One of them, hyperspectral/multispectral imaging technique has been extensively used for inspection of various food products. In this...
Classification and identifying important features from biological datasets has become a crucial problem due to their high dimensionality. Hence, we propose a hybrid feature selection technique, EO-SCA, as a novel wrapper-based feature selection technique to overcome these problems. Equilibrium Optimizer is an efficient optimization model based on m...
Owing to the rapid expansion of data science, data-driven methods have emerged as a dominant trend in chiller fault detection and diagnosis (FDD). Most of these methods prioritize feature selection to achieve optimal diagnostic performance. However, on-site research indicates a common installation of a limited number of sensors, coupled with a nece...
Heart disease is a leading global cause of mortality, demanding early detection for effective and timely medical intervention. In this study, we propose a machine learning-based model for early heart disease prediction. This model is trained on a dataset from the UC Irvine Machine Learning Repository (UCI) and employs the Extra Trees Classifier for...
State of health (SOH) estimation is a critical technology to guarantee the safe and reliable operation of battery energy systems. Data-driven methods have been widely studied in the field of lithium-ion battery SOH estimation. However, random charging in real operating scenarios will result in difficult extraction of health features, which in turn...
In this paper, we propose a generalized expectation model selection (GEMS) algorithm for latent variable selection in multidimensional item response theory models which are commonly used for identifying the relationships between the latent traits and test items. Under some mild assumptions, we prove the numerical convergence of GEMS for model selec...
This paper examines the direct effect of FinCredit on income inequality and the moderating effect of Financial Inclusion. Using a dual-process variable selection approach, we develop an optimal model to test two hypotheses and explore how FinCredit influences direct and indirect income inequality. Our study has made a pioneering contribution to the...
Feature selection plays a crucial role in establishing an effective speech emotion recognition system. To improve recognition accuracy, people always extract as many features as possible from speech signals. However, this may reduce efficiency. We propose a hybrid filter–wrapper feature selection based on a genetic algorithm specifically designed f...
Background
Automated feature selection methods such as the Least Absolute Shrinkage and Selection Operator (LASSO) have recently gained importance in the prediction of quality-related outcomes as well as the risk-adjustment of quality indicators in healthcare. The methods that have been used so far, however, do not account for the fact that patient...
Parkinson’s disease is a neurodegenerative disorder and affects the nerve cells that produce dopamine in the brain. In this paper, we investigated comparative studies on the different scenarios such as AutoEncoder and Ant Colony Optimization feature selection algorithms for the effective features in diagnosis of Parkinson’s disease. These algorithm...
Cloud computing (CC) is an internet-enabled environment that provides computing services such as networking, databases, and servers to clients and organizations in a cost-effective manner. Despite the benefits rendered by CC, its security remains a prominent concern to overcome. An intrusion detection system (IDS) is generally used to detect both n...
The use of pixel-based remote sensing techniques in archaeology is usually limited by spectral confusion between archaeological material and the surrounding environment because they rely on the spectral contrast between features. To deal with this problem, we investigated the possibility of using geographic object-based image analysis (GEOBIA) to p...
Introduction
Microbes are increasingly (re)considered for environmental assessments because they are powerful indicators for the health of ecosystems. The complexity of microbial communities necessitates powerful novel tools to derive conclusions for environmental decision-makers, and machine learning is a promising option in that context. While am...
This article examines intrusion detection systems in depth using the CSE-CIC-IDS-2018 dataset. The investigation is divided into three stages: to begin, data cleaning, exploratory data analysis, and data normalization procedures (min-max and Z-score) are used to prepare data for use with various classifiers; second, in order to improve processing s...
Given the high death rate caused by high-risk prostate cancer (PCa) (>40%) and the reliability issues associated with traditional prognostic markers, the purpose of this study is to investigate planning computed tomography (pCT)-based radiomics for the long-term prognostication of high-risk localized PCa patients who received whole pelvic radiother...
Heart disease is a severe illness that can be challenging to diagnose manually. Faster and more precise artificial intelligence models can help diagnose it early. In this work, different detection models were designed to develop a healthy diagnosis system.
The proposed system highlights three objectives: First, an adaptive feature selection techniq...
Objectives
To develop and validate an 18F-FDG PET/CT-based clinical-radiological-radiomics nomogram and evaluate its value in the diagnosis of MYCN amplification (MNA) in paediatric neuroblastoma (NB) patients.
Methods
A total of 104 patients with NB were retrospectively included. We constructed a nomogram to predict MNA based on radiomics signatu...
Background: Noncommunicable diseases (NCDs) continue to pose a significant health challenge globally, with hyperglycemia serving as a prominent indicator of potential diabetes. This study employed machine learning algorithms to predict hyperglycemia in a cohort of asymptomatic individuals and unraveled crucial predictors contributing to early risk...
Epilepsy is a widespread neurological disorder characterized by recurring seizures that have a significant impact on individuals' lives. Accurately recognizing epileptic seizures is crucial for proper diagnosis and treatment. Deep learning models have shown promise in improving seizure recognition accuracy. However, optimizing their performance for...
Solar flares are among the most severe space-weather phenomena, and they have the capacity to generate radiation storms and radio disruptions on Earth. The accurate prediction of solar-flare events remains a significant challenge, requiring continuous monitoring and identification of specific features that can aid in forecasting this phenomenon, pa...
Purpose
This bi-institutional study aimed to establish a robust model for predicting clinically significant prostate cancer (csPCa) (pathological grade group ≥ 2) in PI-RADS 3 lesions in the transition zone by comparing the performance of combination models.
Materials and methods
This study included 243 consecutive men who underwent 3-Tesla magnet...
Discovering disease biomarkers at the single-cell level is crucial for advancing our understanding of diseases and improving diagnostic accuracy. However, current computational methods often have limitations, such as a reliance on prior knowledge, constraints to unimodal data, and the use of conventional statistical tests for feature selection. To...
In the multifaceted field of oceanic engineering, the quality of underwater images is paramount for a range of applications, from marine biology to robotic exploration. This paper presents a novel approach in underwater image quality assessment (UIQA) that addresses the current limitations by effectively combining low-level image properties with hi...
The assessment of wine quality is of paramount importance to both consumers and the wine industry. Recognizing its impact on customer satisfaction and business success, companies are increasingly turning to product quality certification to enhance sales in the global beverage market. Traditionally, quality testing was conducted towards the end of t...
To enhance the accuracy of predicting a borrower's likelihood to repay their debt to a lender, a comprehensive approach is necessary. The loan approval process involves a considerable amount of quantitative data that can be time killing and subjected to human error. To simplify this process, an expert system named CRPESGA (Credit Risk Prediction Ex...
Vibration monitoring is a critical aspect of assessing the health and performance of machinery and industrial processes. This study explores the application of machine learning techniques, specifically the Random Forest (RF) classification model, to predict and classify chatter—a detrimental self-excited vibration phenomenon—during machining operat...
Hydraulic multi-way valves as core components are widely applied in engineering machinery, mining machinery, and metallurgical industries. Due to the harsh working environment, faults in hydraulic multi-way valves are prone to occur, and the faults that occur are hidden. Moreover, hydraulic multi-way valves are expensive, and multiple experiments a...
Software quality is the main criterion for increasing user demand for software. Therefore, software companies seek to ensure software quality by predicting software defects in the software testing phase. Having an intelligent system capable of predicting software defects helps greatly in reducing time and effort consumption. Despite the great trend...
Many countries have attempted to mitigate and manage issues related to harmful algal blooms (HABs) by monitoring and predicting their occurrence. The infrequency and duration of HABs occurrence pose the challenge of data imbalance when constructing machine learning models for their prediction. Furthermore, the appropriate selection of input variabl...
Background
Knowledge of risk factors for attention-deficit/hyperactivity disorder (ADHD) may facilitate early diagnosis; however, studies examining a broad range of potential risk factors for ADHD in adults are limited. This study aimed to identify risk factors associated with newly diagnosed ADHD among adults in the United States (US).
Methods
El...
Background
Reliable pre-surgical prediction of spreading through air spaces (STAS) in primary lung cancer is essential for precision treatment and surgical decision-making. We aimed to develop and validate a dual-delta deep-learning and radiomics model based on pretreatment computed tomography (CT) image series to predict the STAS in patients with...
Numerous biological environments have been characterized with the advent of metagenomic sequencing using next generation sequencing which lays out the relative abundance values of microbial taxa. Modeling the human microbiome using machine learning models has the potential to identify microbial biomarkers and aid in the diagnosis of a variety of di...
Power quality disturbance (PQD) signal classification is crucial for the real-time monitoring of modern power grids, assuring safe and reliable operation and user safety. Traditional power quality disturbance signal classification approaches are sensitive to noise, feature selection, etc. This study introduces a novel approach utilizing a data-driv...
Mental stress is a prevalent and consequential condition that impacts individuals' well-being and productivity. Accurate classification of mental stress levels using electroencephalogram (EEG) signals is a promising avenue for early detection and intervention. In this study, we present a comprehensive investigation into mental stress classification...
Chronic kidney disease (CKD) is a progressive condition characterized by the gradual deterioration of kidney functions, potentially leading to kidney failure if not promptly diagnosed and treated. Machine learning (ML) algorithms have shown significant promise in disease diagnosis, but in healthcare, clinical data pose challenges: missing values, n...
The slope stability is an important topic because it presents risks of socio-economic losses caused by eventual ruptures. It is necessary to identify the site profile, as well as obtaining soil strength parameters for the slope stability analysis. This paper presents and discusses the use of the Flat Dilatometer Test (DMT) in the geotechnical site...
Several recent studies have evidenced the relevance of machine-learning for soil salinity mapping using Sentinel-2 reflectance as input data and field soil salinity measurement (i.e., Electrical Conductivity-EC) as the target. As soil EC monitoring is costly and time consuming, most learning databases used for training/validation rely on a limited...
This paper aims to present the methods of independent variables selection into multiple linear regression equations. It provides details about the independent variable selection algorithms, advantages, and limitations of independent variable selection methods, and trends in studies and research related to the independent variable selection methods...
Breast Cancer (BC) detection and classification are critical tasks in medical diagnostics. The lives of patients can be greatly enhanced by the precise and early detection of BC. This study suggests a novel approach for detecting BC that combines deep learning models and sophisticated image processing techniques to address those shortcomings. The B...
The human microbiome has become an area of intense research due to its potential impact on human health. However, the analysis and interpretation of this data have proven to be challenging due to its complexity and high dimensionality. Machine learning (ML) algorithms can process vast amounts of data to uncover informative patterns and relationship...
Background
Postoperative delirium (POD) contributes to severe outcomes such as death or development of dementia. Thus, it is desirable to identify vulnerable patients in advance during the perioperative phase. Previous studies mainly investigated risk factors for delirium during hospitalization and further used a linear logistic regression (LR) app...
Speech is a direct and rich way of transmitting information and emotions from one point to another. In this study, we aimed to classify different emotions in speech using various audio features and machine learning models. We extracted various types of audio features such as Mel-frequency cepstral coefficients, chromogram, Mel-scale spectrogram, sp...
Purpose
We sought to develop machine learning models to predict the results of patient‐specific quality assurance (QA) for volumetric modulated arc therapy (VMAT), which were represented by several dose‐evaluation metrics—including the gamma passing rates (GPRs)—and criteria based on the radiomic features of 3D dose distribution in a phantom.
Meth...