Book

Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond

Authors:

Abstract

A comprehensive introduction to Support Vector Machines and related kernel methods. In the 1990s, a new type of learning algorithm was developed, based on results from statistical learning theory: the Support Vector Machine (SVM). This gave rise to a new class of theoretically elegant learning machines that use a central concept of SVMs—-kernels—for a number of learning tasks. Kernel machines provide a modular framework that can be adapted to different tasks and domains by the choice of the kernel function and the base algorithm. They are replacing neural networks in a variety of fields, including engineering, information retrieval, and bioinformatics. Learning with Kernels provides an introduction to SVMs and related kernel methods. Although the book begins with the basics, it also includes the latest research. It provides all of the concepts necessary to enable a reader equipped with some basic mathematical knowledge to enter the world of machine learning using theoretically well-founded yet easy-to-use kernel algorithms and to understand and apply the powerful algorithms that have been developed over the last few years.
... Kernel methods [19,20] are frequently used for statistical learning and metamodeling. The kernel metamodel eventually yields prediction formula similar to Kriging (compare (7) and (13)), although the philosophy is different. ...
... Kernel methods, for inputs in a domain D ⊂ R d , are based on a symmetric nonnegative definite kernel function k : (x, y) ∈ D 2 → R, see [20]. This kernel function defines a Hilbert space H k of functions from D to R, that is called the Reproducing Kernel Hilbert Space (RKHS) corresponding to the kernel function (see [20] for details). ...
... Kernel methods, for inputs in a domain D ⊂ R d , are based on a symmetric nonnegative definite kernel function k : (x, y) ∈ D 2 → R, see [20]. This kernel function defines a Hilbert space H k of functions from D to R, that is called the Reproducing Kernel Hilbert Space (RKHS) corresponding to the kernel function (see [20] for details). ...
Preprint
It is now common practice in nuclear engineering to base extensive studies on numerical computer models. These studies require to run computer codes in potentially thousands of numerical configurations and without expert individual controls on the computational and physical aspects of each simulations.In this paper, we compare different statistical metamodeling techniques and show how metamodels can help to improve the global behaviour of codes in these extensive studies. We consider the metamodeling of the Germinal thermalmechanical code by Kriging, kernel regression and neural networks. Kriging provides the most accurate predictions while neural networks yield the fastest metamodel functions. All three metamodels can conveniently detect strong computation failures. It is however significantly more challenging to detect code instabilities, that is groups of computations that are all valid, but numerically inconsistent with one another. For code instability detection, we find that Kriging provides the most useful tools.
... This observation was discussed informally in Chen et al. (2021), in the setting of the GP-PDE methodology, and our Theorem 5 establishes this connection rigorously. The GP-PDE methodology relies on a representer theorem (see also Smola and Schölkopf 1998) that identifies the solution of (3), in the GP-PDE context, via a finitedimensional optimization problem. In Chen et al. (2021) it is argued that the natural β → 0 limit of (3), namely (4), can also be solved with a representer theorem. ...
... GPs, as a special instance of Gaussian measures, and, by extension, reproducing kernel Hilbert space (RKHS) methods (Kanagawa et al. 2018;van der Vaart et al. 2008) and support vector machines (Smola and Schölkopf 1998), have a long history in approximation theory (Wendland 2004), statistical modeling and inference (Giné and Nickl 2021), inverse problems (Cressie 1990), and machine learning (Rasmussen and Williams 2007; Smola and Schölkopf 1998). While in this article we mainly focus on solving differential equations with GPs as an application of our theory (Särkkä 2011;Owhadi 2015;Chkrebtii et al. 2016;Cockayne et al. 2017;Raissi et al. 2018;Swiler et al. 2020;Chen et al. 2021;Wang et al. 2021) (see also Sect. ...
... GPs, as a special instance of Gaussian measures, and, by extension, reproducing kernel Hilbert space (RKHS) methods (Kanagawa et al. 2018;van der Vaart et al. 2008) and support vector machines (Smola and Schölkopf 1998), have a long history in approximation theory (Wendland 2004), statistical modeling and inference (Giné and Nickl 2021), inverse problems (Cressie 1990), and machine learning (Rasmussen and Williams 2007; Smola and Schölkopf 1998). While in this article we mainly focus on solving differential equations with GPs as an application of our theory (Särkkä 2011;Owhadi 2015;Chkrebtii et al. 2016;Cockayne et al. 2017;Raissi et al. 2018;Swiler et al. 2020;Chen et al. 2021;Wang et al. 2021) (see also Sect. ...
Article
Full-text available
The article presents a systematic study of the problem of conditioning a Gaussian random variable ξ\xi on nonlinear observations of the form Fϕ(ξ)F \circ {\boldsymbol{\phi }}(\xi ) where ϕ:XRN{\boldsymbol{\phi }}: \mathcal {X}\rightarrow \mathbb {R}^N is a bounded linear operator and F is nonlinear. Such problems arise in the context of Bayesian inference and recent machine learning-inspired PDE solvers. We give a representer theorem for the conditioned random variable ξFϕ(ξ)\xi \mid F\circ {\boldsymbol{\phi }}(\xi ), stating that it decomposes as the sum of an infinite-dimensional Gaussian (which is identified analytically) as well as a finite-dimensional non-Gaussian measure. We also introduce a novel notion of the mode of a conditional measure by taking the limit of the natural relaxation of the problem, to which we can apply the existing notion of maximum a posteriori estimators of posterior measures. Finally, we introduce a variant of the Laplace approximation for the efficient simulation of the aforementioned conditioned Gaussian random variables towards uncertainty quantification.
... Limited research exists on the combination of kernel theory and detection theory [24]. To formulate a kernel-based detector, it is necessary to rewrite the test statistics as a function of the original data's inner product [51]. In our case, we can re-express the test statistics (19) and (20) respectively as: ...
... In the detectors of (25) and (27), the Euclidean inner products ⟨f s , f r ⟩ and ⟨z s , z r ⟩ each may provide a suitable indicator of similarity between vectors {f s , f r } and {z s , z r } provided that the representation space of vectors f s and f r in (25) and z s and z r in (27) are sufficiently rich. The input space has been restricted in clarity and expressiveness for many issues, and therefore it does not generally provide the finest assistant for a specific inner product kind of similarity calculation [13], [51]. A nonlinear mapping function ϕ projects the incoming data from the input space X onto a high-dimensional space H using the kernel theory. ...
... To avoid direct implementation, the mapping function ϕ can be implicitly defined by using a function so-called kernel function K, where it is known as kernel trick. As such, it can be shown as follows [51] K(u, v) = ⟨ϕ(u), ϕ(v)⟩ . ...
Article
This paper addresses the active radar target detection problem using two different approaches: learning-based and model-based methods. The learning-based approach uses a convolutional neural network (CNN) to detect targets, while the model-based approach employs detection theory to design detectors. The detection theory framework is used to consider the subspace-based generalized likelihood ratio test (S-GLRT) and sample covariance matrix-based GLRT (SCM-GLRT) detectors. A new recursive implementation of the S-GLRT, called RS-GLRT, is proposed to address the possible ill-conditioning in the clutter cancelation stage of the S-GLRT detector. In addition, two new detectors are proposed by combining the detection theory and kernel theory frameworks, which enables the deployment of a richer feature space in the detection and improves the detection performance. A CNN-based detector is also presented, which provides a robust detector against diverse noise and clutter behaviors in various environments. To achieve this, a universal model is considered for receiver noise and clutter, known as the α\alpha -stable interference model, which allows for the correct definition of noise and clutter properties in the range of impulsive to Gaussian distributions. Extensive simulation results are presented, demonstrating the superior detection performance of the CNN-based method compared to the detection theory-based methods.
... A wide variety of Kernel Methods can be used to fit the model. We have successfully implemented in preliminary testing both Support Vector Machines and Kernel Regression methods, to name a few [25]. However, their training complexity was too high for this application, since they respectively require to solve a Quadratic Optimization problem and inverting a Gramm matrix, per frame. ...
... is the Gaussian kernel [25]. After computing K, the estimate of the symbols transmitted by the l-th user can be obtained asŝ = sign(Ks p l ), where s p l is the l-th column of S p . ...
Article
Full-text available
In this paper, a novel decoding kernel method based on Successive Parzen Windows Interference Cancellation (SPWIC) is proposed for the Non-Orthogonal Multiple Access (NOMA) uplink. The procedure leverages on the diversity in both angle and received power at a 3D antenna combined with a Parzen Windows based decoding to achieve better interference cancellation, providing the decoding process with robustness against multi-user interference and user discrimination. This is specially convenient in vehicular scenarios in crowded cities. We have evaluated SPWIC in various scenarios and concluded that it outperforms the standard Successive Interference Cancellation (SIC) approach even in Multiple-Input Multiple-Output (MIMO) cases such that up to 9 users can be allocated on the same resources -as long as they are not too close to each other-. Although it is proposed for mmWave, it can be directly adapted to lower frequencies.
... Despite these strong limitations it is possible to run so-called hybrid quantum algorithms, that combine the power of classical and quantum computing. One such example from QML, that can be implemented with current hardware, is the One-Class (Quantum) Support Vector Machine (OC(Q)SVM) [5][6][7][8][9][10], that leverages quantum kernels [11][12][13][14]. Quantum kernels implicitly map data into higher-dimensional feature spaces, where non-linearly seperable classification tasks may become linearly seperable by some hyperplane, determined by the Support Vector Machine (SVM) algorithm. ...
... OCSVMs [5][6][7] offer a powerful tool for unsupervised anomaly detection, where the model learns from unlabeled data points to identify anomalies deviating significantly from the normal patterns. Unlike standard SVMs that require labeled data for both normal and anomalous classes, OCSVMs learn a decision boundary, enclosing the normal data in a high-dimensional feature space. ...
Preprint
Full-text available
Whether in fundamental physics, cybersecurity or finance, the detection of anomalies with machine learning techniques is a highly relevant and active field of research, as it potentially accelerates the discovery of novel physics or criminal activities. We provide a systematic analysis of the generalization properties of the One-Class Support Vector Machine (OCSVM) algorithm, using projected quantum kernels for a realistic dataset of the latter application. These results were both theoretically simulated and experimentally validated on trapped-ion and superconducting quantum processors, by leveraging partial state tomography to obtain precise approximations of the quantum states that are used to estimate the quantum kernels. Moreover, we analyzed both platforms respective hardware-efficient feature maps over a wide range of anomaly ratios and showed that for our financial dataset in all anomaly regimes, the quantum-enhanced OCSVMs lead to better generalization properties compared to the purely classical approach. As such our work bridges the gap between theory and practice in the noisy intermediate scale quantum (NISQ) era and paves the path towards useful quantum applications.
... [12,[53][54][55]. A particularly important class of machine learning approaches consists of kernel methods [65], which encompass, for example, Gaussian processes (GPs) [71] and support vector machines [68]. Kernel methods allow the systematic modelling of domain knowledge [66], are supported by a well-developed theory [48,68] and lead to efficient and reliable numerical algorithms [65], capable of scaling up to very large data sets [51,52]. ...
... A particularly important class of machine learning approaches consists of kernel methods [65], which encompass, for example, Gaussian processes (GPs) [71] and support vector machines [68]. Kernel methods allow the systematic modelling of domain knowledge [66], are supported by a well-developed theory [48,68] and lead to efficient and reliable numerical algorithms [65], capable of scaling up to very large data sets [51,52]. All of this makes kernel methods natural candidates for machine learning on IPS. ...
Article
Full-text available
Interacting particle systems (IPSs) are a very important class of dynamical systems, arising in different domains like biology, physics, sociology and engineering. In many applications, these systems can be very large, making their simulation and control, as well as related numerical tasks, very challenging. Kernel methods, a powerful tool in machine learning, offer promising approaches for analyzing and managing IPS. This paper provides a comprehensive study of applying kernel methods to IPS, including the development of numerical schemes and the exploration of mean-field limits. We present novel applications and numerical experiments demonstrating the effectiveness of kernel methods for surrogate modelling and state-dependent feature learning in IPS. Our findings highlight the potential of these methods for advancing the study and control of large-scale IPS.
... Kernel methods have strong theoretical foundations for functional analysis and supervised learning (see e.g. [SS03] for an overview). We review some of these useful properties here and how they can be applied to formulate and learn efficient policies for quantum reinforcement learning. ...
... where the operator R : H K → D can be interpreted as extracting information from the function value which gets penalised during optimisation [SS03]. For instance, it can penalise large higher or lower order derivatives, large function values, or still other properties, leading to smoother optimisation landscapes and therefore improved convergence to the global optimum. ...
Preprint
Parametrised quantum circuits offer expressive and data-efficient representations for machine learning. Due to quantum states residing in a high-dimensional complex Hilbert space, parametrised quantum circuits have a natural interpretation in terms of kernel methods. The representation of quantum circuits in terms of quantum kernels has been studied widely in quantum supervised learning, but has been overlooked in the context of quantum reinforcement learning. This paper proposes parametric and non-parametric policy gradient and actor-critic algorithms with quantum kernel policies in quantum environments. This approach, implemented with both numerical and analytical quantum policy gradient techniques, allows exploiting the many advantages of kernel methods, including available analytic forms for the gradient of the policy and tunable expressiveness. The proposed approach is suitable for vector-valued action spaces and each of the formulations demonstrates a quadratic reduction in query complexity compared to their classical counterparts. Two actor-critic algorithms, one based on stochastic policy gradient and one based on deterministic policy gradient (comparable to the popular DDPG algorithm), demonstrate additional query complexity reductions compared to quantum policy gradient algorithms under favourable conditions.
... RKHS learning has been extensively used in the literature, and has achieved great successes. See, for example, Schölkopf and Smola (2002), Shawe-Taylor and Cristianini (2004), and Hastie et al. (2011). ...
Preprint
Learning with Reproducing Kernel Hilbert Spaces (RKHS) has been widely used in many scientific disciplines. Because a RKHS can be very flexible, it is common to impose a regularization term in the optimization to prevent overfitting. Standard RKHS learning employs the squared norm penalty of the learning function. Despite its success, many challenges remain. In particular, one cannot directly use the squared norm penalty for variable selection or data extraction. Therefore, when there exists noise predictors, or the underlying function has a sparse representation in the dual space, the performance of standard RKHS learning can be suboptimal. In the literature,work has been proposed on how to perform variable selection in RKHS learning, and a data sparsity constraint was considered for data extraction. However, how to learn in a RKHS with both variable selection and data extraction simultaneously remains unclear. In this paper, we propose a unified RKHS learning method, namely, DOuble Sparsity Kernel (DOSK) learning, to overcome this challenge. An efficient algorithm is provided to solve the corresponding optimization problem. We prove that under certain conditions, our new method can asymptotically achieve variable selection consistency. Simulated and real data results demonstrate that DOSK is highly competitive among existing approaches for RKHS learning.
... This information hints towards a non-linear decision boundary between the classes. Kernel methods were introduced by Smola and Scholkopf [36] as a "trick" for generating non-linear classifiers with linear methods like SVM and LR. ...
Preprint
Tests for Esophageal cancer can be expensive, uncomfortable and can have side effects. For many patients, we can predict non-existence of disease with 100% certainty, just using demographics, lifestyle, and medical history information. Our objective is to devise a general methodology for customizing tests using user preferences so that expensive or uncomfortable tests can be avoided. We propose to use classifiers trained from electronic health records (EHR) for selection of tests. The key idea is to design classifiers with 100% false normal rates, possibly at the cost higher false abnormals. We compare Naive Bayes classification (NB), Random Forests (RF), Support Vector Machines (SVM) and Logistic Regression (LR), and find kernel Logistic regression to be most suitable for the task. We propose an algorithm for finding the best probability threshold for kernel LR, based on test set accuracy. Using the proposed algorithm, we describe schemes for selecting tests, which appear as features in the automatic classification algorithm, using preferences on costs and discomfort of the users. We test our methodology with EHRs collected for more than 3000 patients, as a part of project carried out by a reputed hospital in Mumbai, India. Kernel SVM and kernel LR with a polynomial kernel of degree 3, yields an accuracy of 99.8% and sensitivity 100%, without the MP features, i.e. using only clinical tests. We demonstrate our test selection algorithm using two case studies, one using cost of clinical tests, and other using "discomfort" values for clinical tests. We compute the test sets corresponding to the lowest false abnormals for each criterion described above, using exhaustive enumeration of 15 clinical tests. The sets turn out to different, substantiating our claim that one can customize test sets based on user preferences.
... Mean and covariance feature matching GAN (McGAN) [85] extended this concept to match not only the l q mean feature but also the second order moment feature by using the singular value decomposition concept; it aims to also maximize an embedding covariance discrepancy between p data (x) and p g (x). Geometric GAN [72] shows that the McGAN framework is equivalent to a support vector machine (SVM) [106], which separates the two distributions with a hyperplane that maximizes the margin. It encourages the discriminator to move away from the separating hyperplane and the generator to move toward the separating hyperplane. ...
Preprint
Generative Adversarial Networks (GAN) have received wide attention in the machine learning field for their potential to learn high-dimensional, complex real data distribution. Specifically, they do not rely on any assumptions about the distribution and can generate real-like samples from latent space in a simple manner. This powerful property leads GAN to be applied to various applications such as image synthesis, image attribute editing, image translation, domain adaptation and other academic fields. In this paper, we aim to discuss the details of GAN for those readers who are familiar with, but do not comprehend GAN deeply or who wish to view GAN from various perspectives. In addition, we explain how GAN operates and the fundamental meaning of various objective functions that have been suggested recently. We then focus on how the GAN can be combined with an autoencoder framework. Finally, we enumerate the GAN variants that are applied to various tasks and other fields for those who are interested in exploiting GAN for their research.
... In this article, SVM's have been used for regression. The SVM ap proach to regression has been described in Scholkopf and Smola (2001), and some details will be provided here. Assume a training data set … x y ...
Article
Full-text available
In this article we explore the blending of the four models (Satellite, WRF-Solar, Smart Persistence and CIADCast) studied in Part 1 by means of Support Vector Machines with the aim of improving GHI and DN I forecasts. Two blending approaches that use the four models as predictors have been studied: the horizon approach constructs a different blending model for each forecast horizon, while the general approach trains a single model valid for all horizons. The influence on the blending models of adding information about weather types is also studied. The approaches have been evaluated in the same four Iberian Peninsula stations of Part 1. Blending approaches have been extended to a regional context with the goal of obtaining improved regional forecasts. In general, results show that blending greatly outperforms the individual predictors, with no large differences between the blending approaches themselves. Horizon approaches were more suitable to minimize rRMSE and general ap-proaches work better for rMAE. The relative improvement in rRMSE obtained by model blending was up to 17%for GHI (16% for DNI), and up to 15% for rMAE. Similar improvements were observed for the regional forecast An analysis of performance depending on the horizon shows that while the advantage of blending for GHI remains more or less constant along horizons, it tends to increase with horizon for DNI, with the largest im-provements occurring at 6 h. The knowledge of weather conditions helped to slightly improve further the forecasts (up to 3%), but only at some locations and for rRMSE.
... In ridge regression, this overfitting can be avoided by tuning the ridge parameter λ that penalizes solutions with large norms [26]. In our case, despite we set ridge regularization to zero, turning on ε also acts like a regularization. ...
Preprint
Full-text available
A key problem in deep learning and computational neuroscience is relating the geometrical properties of neural representations to task performance. Here, we consider this problem for continuous decoding tasks where neural variability may affect task precision. Using methods from statistical mechanics, we study the average-case learning curves for ε\varepsilon-insensitive Support Vector Regression (ε\varepsilon-SVR) and discuss its capacity as a measure of linear decodability. Our analysis reveals a phase transition in the training error at a critical load, capturing the interplay between the tolerance parameter ε\varepsilon and neural variability. We uncover a double-descent phenomenon in the generalization error, showing that ε\varepsilon acts as a regularizer, both suppressing and shifting these peaks. Theoretical predictions are validated both on toy models and deep neural networks, extending the theory of Support Vector Machines to continuous tasks with inherent neural variability.
... Instead, the entire function is estimated directly from the data (Györfi et al. 2002). Examples of these methods include kernel regression (Nadaraya 1964;Watson 1964), local polynomial regression (Cleveland 1979;Fan 1992Fan , 1993, splinebased regression (Stone 1982;Wahba 1990;Friedman 1991;Green and Silverman 1993), random forests (Breiman 2001), and regression in Reproducing Kernel Hilbert Spaces (Scholkopf and Smola 2018), among others. However, only asymptotic results are available for nonparametric regression methods, and the finite sample behavior remains unclear. ...
Article
Full-text available
Conformal prediction is a general method used to convert a point predictor into a prediction band. The accuracy of this prediction band is heavily reliant on the base estimator. This paper is to investigate the use of conformal prediction by least absolute deviation-based deep nonparametric regression. We demonstrate the consistency of the robust deep regression estimator under mild conditions, leading to the proposed prediction band exhibiting finite-sample marginal validity and asymptotic conditional validity. Through extensive simulation studies and a real-data example, we illustrate the benefits of conformal prediction for robust deep regression.
... By observing the right-hand side of Eq. (1), one can make connections of the SDA (specifically its first term) to dotproduct formulations appearing in kernel machines [31]. Thus, when the objective is to define an approximate formulation of SDA for large numbers of n, matrix approximation schemes like the Nyström approximation [17], [32] can be used. ...
Preprint
Transformers are widely used for their ability to capture data relations in sequence processing, with great success for a wide range of static tasks. However, the computational and memory footprint of their main component, i.e., the Scaled Dot-product Attention, is commonly overlooked. This makes their adoption in applications involving stream data processing with constraints in response latency, computational and memory resources infeasible. Some works have proposed methods to lower the computational cost of transformers, i.e. low-rank approximations, sparsity in attention, and efficient formulations for Continual Inference. In this paper, we introduce a new formulation of the Scaled Dot-product Attention based on the Nystr\"om approximation that is suitable for Continual Inference. In experiments on Online Audio Classification and Online Action Detection tasks, the proposed Continual Scaled Dot-product Attention can lower the number of operations by up to three orders of magnitude compared to the original Transformers while retaining the predictive performance of competing models.
... Support Vector Regression (SVR) is an extension of Support Vector Machines (SVM) designed to address regression tasks. SVM is a powerful ML tool used for both classification and regression tasks [52]. It operates based on the principle of maximizing the separating hyperplane between different classes in a multidimensional space. ...
Article
Full-text available
Indoor localization of wireless nodes is a relevant task for wireless sensor networks with mobile nodes using mobile robots. Despite the fact that outdoor localization is successfully performed by Global Positioning System (GPS) technology, indoor environments face several challenges due to multipath signal propagation, reflections from walls and objects, along with noise and interference. This results in the need for the development of new localization techniques. In this paper, Long-Range Wide-Area Network (LoRaWAN) technology is employed to address localization problems. A novel approach is proposed, based on the preliminary division of the room into sectors using a Received Signal Strength Indicator (RSSI) fingerprinting technique combined with machine learning (ML). Among various ML methods, the Gated Recurrent Unit (GRU) model reached the most accurate results, achieving localization accuracies of 94.54%, 91.02%, and 85.12% across three scenarios with a division into 256 sectors. Analysis of the cumulative error distribution function revealed the average localization error of 0.384 m, while the mean absolute error reached 0.246 m. These results demonstrate that the proposed sectorization method effectively mitigates the effects of noise and nonlinear signal propagation, ensuring precise localization of mobile nodes indoors.
... To overcome the problem arising due to the overfitting due to the presence of outliers or noise in the training data, we use SVR to predict the price using the same dataset. SVR is effective in handling high-dimensional feature spaces and is robust to overfitting, making it a suitable choice for our data, given its characteristics [43,44,45]. After the prediction via both GPR and SVR, we combine both of the predictions by penalizing the prediction with higher error which we have discussed in Section (6) and this improves the overall prediction. ...
Preprint
Full-text available
This paper presents a new hybrid model for predicting German electricity prices. The algorithm is based on combining Gaussian Process Regression (GPR) and Support Vector Regression (SVR). While GPR is a competent model for learning the stochastic pattern within the data and interpolation, its performance for out-of-sample data is not very promising. By choosing a suitable data-dependent covariance function, we can enhance the performance of GPR for the tested German hourly power prices. However, since the out-of-sample prediction depends on the training data, the prediction is vulnerable to noise and outliers. To overcome this issue, a separate prediction is made using SVR, which applies margin-based optimization, having an advantage in dealing with non-linear processes and outliers, since only certain necessary points (support vectors) in the training data are responsible for regression. Both individual predictions are later combined using the performance-based weight assignment method. A test on historic German power prices shows that this approach outperforms its chosen benchmarks such as the autoregressive exogenous model, the naive approach, as well as the long short-term memory approach of prediction.
... It presents a significant advantage due to its ability to accommodate intricate, non-linear associations between features and survival outcomes through the kernel trick 29 . Through this mechanism, a kernel function adeptly transforms input features into higher-dimensional spaces, enabling the depiction of survival via a hyperplane 30 . This versatility renders SSVMs highly adaptable and suitable for diverse datasets 31 . ...
Article
Full-text available
HIV remains a critical global health issue, with an estimated 39.9 million people living with the virus worldwide by the end of 2023 (according to WHO). Although the epidemic’s impact varies significantly across regions, Africa remains the most affected. In the past decade, considerable efforts have focused on developing preventive measures, such as vaccines and pre-exposure prophylaxis, to combat sexually transmitted HIV. Recently, cytokine profiles have gained attention as potential predictors of HIV incidence due to their involvement in immune regulation and inflammation, presenting new opportunities to enhance preventative strategies. However, the high-dimensional, time-varying nature of cytokine data collected in clinical research, presents challenges for traditional statistical methods like the Cox proportional hazards (PH) model to effectively analyze survival data related to HIV. Machine learning (ML) survival models offer a robust alternative, especially for addressing the limitations of the PH model’s assumptions. In this study, we applied survival support vector machine (SSVM) and random survival forest (RSF) models using changes or means in cytokine levels as predictors to assess their association with HIV incidence, evaluate variable importance, measure predictive accuracy using the concordance index (C-index) and integrated Brier score (IBS) and interpret the model’s predictions using Shapley additive explanations (SHAP) values. Our results indicated that RSFs models outperformed SSVMs models, with the difference covariate model performing better than the mean covariate model. The highest C-index for SSVM was 0.7180 under the difference covariate model, while for RSF, it reached 0.8801 under the difference covariate model using the log-rank split rule. Key cytokines identified as positive predictors of HIV incidence included TNF-A, BASIC-FGF, IL-5, MCP-3, and EOTAXIN, while 29 cytokines were negative predictors. Baseline factors such as condom use frequency, treatment status, number of partners, and sexual activity also emerged as significant predictors. This study underscored the potential of cytokine profiles for predicting HIV incidence and highlighted the advantages of RSFs models in analyzing high-dimensional, time-varying data over SSVMs. It further through ablation studies emphasized the importance of selecting key features within mean and difference based covariate models to achieve an optimal balance between model complexity and predictive accuracy.
... Traditional regressive machine learning methods such as linear regression, the support vector machine [48], neural networks [49], improvements of these methods [50,51], the ensemble method of bagged trees [52], random forests [53], and even more advanced and recent techniques such as the gradient-boosting machine (GBM) [54], XGboost, and flexible EHD pumps [55] work as a black box, and their primary focus is on prediction. None of these methods provide a comprehensive optimal threshold for use as a book-inadvance parameter. ...
Article
Full-text available
Global corporations frequently grapple with a dilemma between fulfilling business needs and adhering to travel policies to mitigate excessive fare expenditures. This research examines the multifaceted nature of business travel, delving into its key characteristics and the inherent complexities faced by management in formulating effective policies. An optimal travel policy must both be practical to implement and contribute to budget optimization. The specific requirements of each company necessitate tailored policies; for instance, a manufacturing company with scheduled trips demands a distinct policy, unlike a consulting firm with unplanned travel. This study proposes a modified regression decision tree machine learning algorithm to incorporate the unique features of corporate travel policies. Our algorithm is designed to self-adjust based on the specific data of each individual company. The authors implement the proposed approach using travel data from a real-world company and conduct simulations in various scenarios, comparing the results with the industry standard. This research offers a machine-learning-based approach to determining the optimal advance booking policy for corporate travel.
... By incorporating Lagrange expansion and expressing the Karush-Kuhn-Tucker(KKT) terms, we derive a quadratic optimization program involving a unique linear restriction to be solved with the aim of identifying the support vectors [85,86]. To deal with the problem of sutured binding, a number of authors have introduced the concept of a soft boundary [87]. ...
Article
Full-text available
Algorithms involving kernel functions, such as support vector machine (SVM), have attracted huge attention within the artificial learning communities. The performance of these algorithms is greatly influenced by outliers and the choice of kernel functions. This paper introduces a new version of SVM named Deep Decomposition Neural Network Fuzzy SVM (DDNN-FSVM). To this end, we consider an auto-encoder (AE) deep neural network with three layers: input, hidden, and output. Unusually, the AE’s hidden layer comprises a number of neurons greater than the dimension of the input samples, which guarantees linear data separation. The encoder operator is then introduced into the FSVM’s dual to map the training samples to high-dimension spaces. To learn the support vectors and autoencoder parameters, we introduce the loss function and regularization terms in the FSVM dual. To learn from large-scale data, we decompose the resulting model into three small-dimensional submodels using Lagrangian decomposition. To solve the resulting problems, we use SMO, ISDA, and SCG for optimization problems involving large-scale data. We demonstrate that the optimal values of the three submodels solved in parallel provide a good lower bound for the optimal value of the initial model. In addition, thanks to its use of fuzzy weights, DDNN-FSVM is resistant to outliers. Moreover, DDNN-FSVM simultaneously learns the appropriate kernel function and separation path. We tested DDNN-FSVM on several well-known digital and image datasets and compared it to well-known classifiers on the basis of accuracy, precision, f-measure, g-means, and recall. On average, DDNN-FSVM improved on the performance of the classic FSVM across all datasets and outperformed several well-known classifiers.
... Early works (Abu-Jbara et al., 2013;Jurgens et al., 2018) represent citation contexts by hand-engineered features and pre-trained word embeddings (e.g., GloVe (Pennington et al., 2014), ELMo (Peters et al., 2018). These methods apply traditional classification models such as support vector machines (SVM) (Schölkopf and Smola, 2002) to predict citation intentions. Deep learningbased methods (Cohan et al., 2019) use word embeddings together with a bi-directional long short-term memory network (BiLSTM) (Hochreiter and Schmidhuber, 1997) to learn context representations end-to-end for CIC tasks. ...
... Every kernel function is associated with an implicit function φ : IR p −→ H which maps the input points into a generic feature space H , with possibly an infinite dimensionality, with the expression k(x i , x j ) = �φ(x i ), φ(x j )� . This relation allows the implicitly computation of the dot products in the feature space by applying the kernel function to the input objects, without explicitly computing the mapping function φ [24]. ...
Article
Full-text available
Background Advances in high-throughput technologies have originated an ever-increasing availability of omics datasets. The integration of multiple heterogeneous data sources is currently an issue for biology and bioinformatics. Multiple kernel learning (MKL) has shown to be a flexible and valid approach to consider the diverse nature of multi-omics inputs, despite being an underused tool in genomic data mining. Results We provide novel MKL approaches based on different kernel fusion strategies. To learn from the meta-kernel of input kernels, we adapted unsupervised integration algorithms for supervised tasks with support vector machines. We also tested deep learning architectures for kernel fusion and classification. The results show that MKL-based models can outperform more complex, state-of-the-art, supervised multi-omics integrative approaches. Conclusion Multiple kernel learning offers a natural framework for predictive models in multi-omics data. It proved to provide a fast and reliable solution that can compete with and outperform more complex architectures. Our results offer a direction for bio-data mining research, biomarker discovery and further development of methods for heterogeneous data integration.
... Support vector regression (SVR) [28] is a supervised learning algorithm based on support vector machine (SVM) [29] , which mainly used to deal with regression problems. Different from traditional linear regression, SVR transcribes data into a highdimensional space [30] and finds the best regression hyperplane in the space [31] , thereby improving the generalization ability of the model. The process embraces an introduction of the "ϵ-insensitive loss function" [32] , which allows the model to not penalize prediction errors within a certain range, thereby improving the model's robustness to noise. ...
Article
Full-text available
In recent years, since global warming and human activities have contributed to massive coral bleaching events, it is significant to seek for the causations and predict the rate of coral bleaching to mitigate the influence and to decelerate bleachi,ng rate. The study focused on analyzing coral bleaching database from 1980 to 2020, revealing sea surface temperature anomaly (SSTA) and temperature cumulative thermal stress (TSA_DHW) are the major contributor of corals bleaching. In addition, climatic factors such as wind speed and cyclone frequency also conduce to coral bleaching. Resulted from principial component analysis (PCA), random forest regressor, dominant influencing factors are utilized in training multi-layer long short-term memory RNN (LSTM), support vector regression (SVR), and stacking regressor model, establishing models that predict coral bleaching percentage. Eventually, mean absolute error (MAE), root mean square error (RMSE), and coefficient of determination (R2) are used to evaluate the accuracy of the model, revealing stacking regression model yielded the most accurate and steady predictions on coral bleaching percent comparing with other models.
... By substituting in Equation (7), the dual in (6) is transformed into (8) [29]. ...
Article
Full-text available
Support vector machine (SVM) models apply the Karush–Kuhn–Tucker (KKT-OC) optimality conditions in the ordinary derivative to the primal optimisation problem, which has a major influence on the weights associated with the dissimilarity between the selected support vectors and subsequently on the quality of the model’s predictions. Recognising the capacity of fractional derivatives to provide machine learning models with more memory through more microscopic differentiations, in this paper we generalise KKT-OC based on ordinary derivatives to KKT-OC using fractional derivatives (Frac-KKT-OC). To mitigate the impact of noise and identify support vectors from noise, we apply the Frac-KKT-OC method to the fuzzy intuitionistic version of SVM (IFSVM). The fractional fuzzy intuitionistic SVM model (Frac-IFSVM) is then evaluated on six sets of data from the UCI and used to predict the sentiments embedded in tweets posted by people with diabetes. Taking into account four performance measures (sensitivity, specificity, F-measure, and G-mean), the Frac-IFSVM version outperforms SVM, FSVM, IFSVM, Frac-SVM, and Frac-FSVM.
... These methods, such as support vector machines [28], and Gaussian process [29,30], allow researchers to understand the basis of the model's decisions. Furthermore, the kernel regression model [31] is widely employed in certain regression problems due to its solid theoretical foundation derived from rigorous mathematical reasoning. Bayesian methods, in particular, have found numerous applications in the field of bridge health monitoring. ...
Article
Full-text available
The strain data from health monitoring systems of large-span bridges is influenced by various load effects, with the extraction and forecasting of temperature-induced strain being particularly significant for precise analysis and early warning of monitoring data. This paper presents a forecasting framework for temperature-induced strain in large-span bridges, employing a deep kernel regression (DKR) approach that integrates deep learning with Bayesian regression to enhance accuracy and certainty. Initially, this paper addresses the influence of additional response increment induced by vehicle strain effects and employs a robust data smoothing algorithm to extract temperature-induced effect components from measured strain data offline. Subsequently, a DKR model is proposed, integrating a long short-term memory (LSTM) layer with a fully connected layer. The output of the deep learning module serves as the kernel function parameter for the Gaussian process regression (GPR) module, and the GPR module with updated hyperparameters is used for time series forecasting. This method effectively extracts and utilizes time series features from historical data alongside key environmental factors, enabling real-time forecasting of strain effects and significantly improving the performance and utility of health monitoring systems in large-span bridges. Compared to commonly used time series forecasting algorithms, the algorithm proposed in this paper exhibits significantly improved accuracy, stability, and certainty. Through comparing the inference time, it was verified that the algorithm can meet the performance requirements of real-time inference, which underscores the model's potential as a robust tool in bridge structural health monitoring.
... However, while RKHSs exhibit function space optimality (Scholkopf and Smola, 2001), where the optimal solution to Eq. ...
Preprint
Full-text available
Recent advances in machine learning have led to increased interest in reproducing kernel Banach spaces (RKBS) as a more general framework that extends beyond reproducing kernel Hilbert spaces (RKHS). These works have resulted in the formulation of representer theorems under several regularized learning schemes. However, little is known about an optimization method that encompasses these results in this setting. This paper addresses a learning problem on Banach spaces endowed with a reproducing kernel, focusing on efficient optimization within RKBS. To tackle this challenge, we propose an algorithm based on mirror descent (MDA). Our approach involves an iterative method that employs gradient steps in the dual space of the Banach space using the reproducing kernel. We analyze the convergence properties of our algorithm under various assumptions and establish two types of results: first, we identify conditions under which a linear convergence rate is achievable, akin to optimization in the Euclidean setting, and provide a proof of the linear rate; second, we demonstrate a standard convergence rate in a constrained setting. Moreover, to instantiate this algorithm in practice, we introduce a novel family of RKBSs with p-norm (p2p \neq 2), characterized by both an explicit dual map and a kernel.
... Kernel methodology is a highly popular framework for practical approaches to different research fields. It applies to vectorial as well as non-vectorial data, and it has the ability to use efficient linear algorithms in the study of non-linear relationships (Schölkopf and Smola 2018). The underlying idea of the method is to project the data from the original data space to a Hilbert space generated by a given kernel function. ...
Article
Full-text available
The nonparametric multivariate analysis of variance (NPMANOVA) testing procedure has been proven to be a valuable tool for comparing groups. In the present paper, we propose a kernel extension of this technique in order to effectively confront high-dimensionality, a recurrent problem in many fields of science. The new method is called kernel multivariate analysis of variance (KMANOVA). The basic idea is to take advantage of the kernel framework: we propose to project the data from the original data space to a Hilbert space generated by a given kernel function and then perform the NPMANOVA method in the reproducing kernel Hilbert space (RKHS). Dispersion of the embedded points can be measured by the distance induced by the inner product in the RKHS but also by many other distances best suited in high-dimensional settings. For this purpose, we study two promising distances: a Manhattan-type distance and a distance based on an orthogonal projection of the embedded points in the direction of the group centroids. We show that the NPMANOVA method and the KMANOVA method with the induced distance are essentially equivalent. We also show that the KMANOVA method with the other two distances performs considerably better than the NPMANOVA method. We illustrate the advantages of our approach in the context of genetic association studies and demonstrate its usefulness on Alzheimer’s disease data. We also provide a software implementation of the method that is available on GitHub https://github.com/8699vicente/Kmanova .
... The kernel trick in machine learning provides a way to bypass the linearity. [28] It kernelizes a learning algorithm by substituting all inner products in the algorithm with kernel functions, e.g., the Gaussian kernel. [29] Since many online algorithms can be kernelized, they are able to employ a discriminant based on a nonlinear hypothesis (or multiple nonlinear hypotheses) composed of kernel functions to reach a satisfying prediction performance. ...
Article
Full-text available
Online learning aims to solve a sequence of consecutive prediction tasks by leveraging the knowledge gained from previous tasks. Linearized confidence‐weighted (LCW) learning is the first online learning algorithm introducing the concept of weight confidence into the prediction model through distributions over weights. It provides the flexibility for weights to update their values at different scales. The kernel trick in machine learning can be applied to LCW for a better prediction performance. However, the kernel‐based LCW algorithm is subject to the curse of kernelization which makes it vulnerable to the unlimited growth of the prediction model in runtime and memory consumption. In this study, we present the budgeted LCW (BLCW) algorithm which puts a limit on the growth by a predefined budget with optimization. Consequently, BLCW performs the LCW update and then reduces the information loss by projection. Based on the resource perspective that reinterprets LCW in terms of resources and utilization degrees, we demonstrated that BLCW approximates the kernel‐based LCW algorithm. We evaluate four budget maintenance strategies and suggest that the mean removal is the most stable. By various numerical experiments on real datasets, we demonstrate that BLCW performs competitively and effectively when compared to leading budgeted online algorithms.
... Random Forests mitigate overfitting by averaging predictions from multiple trees, resulting in robust performance (Breiman, 2001;Fernández-Delgado et al., 2014). SVMs, although computationally expensive, are powerful for high-dimensional data (Schölkopf & Smola, 2002). ...
Article
Full-text available
Efficient data preparation is critical for building robust machine learning models that deliver reliable analysis and decision-making. This paper presents a comprehensive machine learning workflow applied to the UCI Adult Income dataset, aiming to predict whether an individual earns more than $50K per year. The study explores key data preprocessing techniques, including handling missing values, scaling numerical features, and encoding categorical variables. Three machine learning models: Logistic Regression, Decision Tree, and Random Forest were trained and evaluated. Results show that Random Forest achieved an accuracy of 86% and an F1-score of 0.91, demonstrating superior classification performance. Key metrics such as accuracy, precision, recall, and F1-score were used to assess model effectiveness. This research emphasizes the importance of efficient data preparation in ensuring robust machine learning analysis, especially when addressing real-world challenges like those presented by the UCI Adult Income dataset. Future work aims to investigate advanced feature engineering techniques and ensemble models to further enhance classification performance.
... An inner product space is a (possibly infinite-dimensional) real vector space I with a symmetric bilinear form ·, · I : I × I → R satisfying x, x I > 0 for all non-zero x ∈ I. 6 It is not very hard to prove that a symmetric function K : X × X → R is a kernel function if and only if there is a vector embedding η : X → I of X into some inner product space I, which we can even take to be a Hilbert space, such that K is the mapping induced by ·, · I , that is, K(x, y) = η(x), η(y) I for all x, y ∈ X (see [49,Lemma 16.2] for a proof). For background on kernels and specifically graph kernels, see [36,47]. ...
Preprint
We give an overview of different approaches to measuring the similarity of, or the distance between, two graphs, highlighting connections between these approaches. We also discuss the complexity of computing the distances.
... Su-perAdam (Huang et al., 2021), an adaptive gradient algorithm that integrates variance reduction with AdamW to achieve improved convergence rates. However, these variance-reduced adaptive gradient methods have primarily been validated on basic computer vision tasks, such as MNIST (Schölkopf and Smola, 2002) and CIFAR-10 (Krizhevsky et al., 2009), and simple natural language modeling tasks, like SWB-300 (Saon et al., 2017), using straightforward architectures such as LeNet (LeCun et al., 1998), ResNet-32 (He et al., 2016), 2-layer LSTMs (Graves and Graves, 2012), and 2-layer Transformers (Vaswani, 2017). As a result, a significant gap remains in the successful application of variance reduction techniques to adaptive gradient methods, particularly in the rapidly evolving domain of large language models. ...
Preprint
Training deep neural networks--and more recently, large models--demands efficient and scalable optimizers. Adaptive gradient algorithms like Adam, AdamW, and their variants have been central to this task. Despite the development of numerous variance reduction algorithms in the past decade aimed at accelerating stochastic optimization in both convex and nonconvex settings, variance reduction has not found widespread success in training deep neural networks or large language models. Consequently, it has remained a less favored approach in modern AI. In this paper, to unleash the power of variance reduction for efficient training of large models, we propose a unified optimization framework, MARS (Make vAriance Reduction Shine), which reconciles preconditioned gradient methods with variance reduction via a scaled stochastic recursive momentum technique. Within our framework, we introduce three instances of MARS that leverage preconditioned gradient updates based on AdamW, Lion, and Shampoo, respectively. We also draw a connection between our algorithms and existing optimizers. Experimental results on training GPT-2 models indicate that MARS consistently outperforms AdamW by a large margin.
... Similarly, feature extraction is considered as a crucial step for precise classification. Many traditional approaches have been proposed for land use classification using HSI, which include support vector machines [13], K-nearest neighbours [14], and Bayesian estimation [15]. These mentioned machine learning approaches provide the foundation for machine learning algorithms to be used for land use classification utilising HSI. ...
Article
Full-text available
Recent advances in deep learning for hyperspectral image (HSI) classification have shown exceptional performance in resource management and environmental planning through land use classification. Despite these successes, challenges continue to persist in land use classification due to the complex topology of natural and man-made structures. The uneven distribution of land cover introduces spectral-spatial variability, causing inter- and intra-class similarity. To address this issue, this study adopts a hybrid approach that combines convolutional neural networks (CNNs) and a transformer model. The technique comprises three key components: a spectral-spatial convolutional module (SSCM), a spatial attention module (SAM), and a transformer module. Each component facilitates the others in the process of classification. SSCM is used to extract shallow features with the help of dilated convolutional layers, while the SAM enhances spatial features for further processing. Additionally, a transformer module with a local neighborhood attention mechanism is employed to extract local semantic information. Several experiments conducted on the Indian Pines and Pavia University hyperspectral datasets validate the performance of the proposed technique, demonstrating higher classification accuracy compared to recent methods in the literature. The technique achieves average accuracies of 97.24% and 99.33% on the Indian Pines and Pavia University datasets, respectively, thus demonstrating its effectiveness for land resource management and environmental planning.
... Readers will need familiarity with the basic principles of kernel methods and gaussian processes in machine learning (see e.g. [31,36]) to follow along with certain results in this section. ...
Preprint
Full-text available
Neural responses encode information that is useful for a variety of downstream tasks. A common approach to understand these systems is to build regression models or ``decoders'' that reconstruct features of the stimulus from neural responses. Popular neural network similarity measures like centered kernel alignment (CKA), canonical correlation analysis (CCA), and Procrustes shape distance, do not explicitly leverage this perspective and instead highlight geometric invariances to orthogonal or affine transformations when comparing representations. Here, we show that many of these measures can, in fact, be equivalently motivated from a decoding perspective. Specifically, measures like CKA and CCA quantify the average alignment between optimal linear readouts across a distribution of decoding tasks. We also show that the Procrustes shape distance upper bounds the distance between optimal linear readouts and that the converse holds for representations with low participation ratio. Overall, our work demonstrates a tight link between the geometry of neural representations and the ability to linearly decode information. This perspective suggests new ways of measuring similarity between neural systems and also provides novel, unifying interpretations of existing measures.
... Furthermore, the feature mapping of the kernel function should have an explicit expression. Considering that the polynomial kernel satisfies these desired properties, the feature mapping of an inhomogeneous polynomial kernel in the following form [32] is selected here. 22 (11) where, vector X is composed of data points ...
Preprint
Reducing the scanning time of very-low field (VLF) magnetic resonance imaging (MRI) scanners, commonly employed for stroke diagnosis, can enhance patient comfort and operational efficiency. The conventional parallel imaging (PI) technique for high-field MRI should be tailored to apply here, considering the differences in the direction of the main magnetic field and the presence of noise. A VLF-specific PI algorithm and phased-array coil are proposed, marking the first application of PI in VLF MRI. Reconstruction quality is enhanced by denoising undersampled k-space data using a linear-prediction based Kalman filter. Subsequently, the denoised k-space data are nonlinearly mapped from the original space onto a high-dimensional feature space, utilizing a polynomial feature mapping defined nonlinear frame. Frame parameters are calculated using auto-calibration signals (ACS) from the center k-space, and missing phase-encoding lines in the original space are estimated using acquired lines in the feature space. An 8-channel phased-array coil, designed for a vertical main magnetic field, is decoupled using geometric overlap and a low input impedance (LII) preamplifier. Healthy volunteer head imaging experiments using the proposed PI technique exhibit the lowest mean-squared-error (MSE) value and the highest peak-signal-to-noise (PSNR) and structural similarity index (SSIM) values compared to two widely used PI methods. The proposed PI technique enables the VLF MRI scanner to achieve similar image quality and a 72.5% improvement in signal-to-noise ratio (SNR) compared to fully sampled images while requiring less than 50% of the scan time. We present a PI technique tailored for VLF MRI scanner for the first time, along with potential research direction to achieve greater reduction factor.
... Support Vector Machines (SVM) are ML algorithms that can be used for classification, regression and clustering. [35][36][37] In the case of two-class classification, SVMs' fundamental goal is to establish a hyperplane that maximises the margin of separation between the two classes, taking noise and outliers into account. By using kernel functions SVMs can produce non-linear decision boundaries and, with a suitable kernel, can also handle non-vectorial data. ...
Article
Full-text available
Objective The objective of this study was to assess the predictability of admissions to a MH inpatient ward using ML models, based on routine data collected during triage in EDs. This research sought to identify the most effective ML model for this purpose while considering the practical implications of model interpretability for clinical use. Methods The study utilised existing data from January 2016 to December 2021. After data pre-processing, an exploratory analysis revealed the non-linear nature of the dataset. Six different ML models were tested: Random Forest, XGBoost, CatBoost, k-Nearest Neighbours (kNN), Explainable Boosting Machine (EBM) using InterpretML, and Support Vector Machine using Support Vector Classification (SVC). The performance of these models was evaluated using various metrics including the Matthews Correlation Coefficient (MCC). Results Among the models evaluated, the CatBoost model achieved the highest MCC score of 0.1952, demonstrating superior balanced accuracy and predictive power, particularly in correctly identifying positive cases. The InterpretML model also performed well, with an MCC score of 0.1914. While CatBoost showed strong predictive capabilities, its complexity poses challenges for clinical interpretation. Conversely, the InterpretML model, though slightly less powerful, offers better transparency and is more practical for clinical use. Conclusion The findings suggest that the CatBoost model is a compelling choice for scenarios prioritising the detection of positive cases. However, the InterpretML model's ease of interpretation makes it more suitable for clinical application. Integrating explanation methods like SHAP with non-linear models could enhance model transparency and foster clinician trust. Further research is recommended to refine non-linear models within decision support systems, explore multi-source data integration, understand clinician attitudes towards ML, and develop real-time data collection systems. This study highlights the potential of ML in predicting MH admissions from ED data while stressing the importance of interpretability, ethical considerations, and ongoing validation for successful clinical implementation.
... Kernels provide a computationally efficient and mathematically tractable method of extending linear methods into nonlinear ones. This work will provide a brief overview of kernels; for a more comprehensive examination, please refer to [43][44][45]. ...
Article
Full-text available
Modeling the network topology of the human brain within the mesoscale has become an increasing focus within the neuroscientific community due to its variation across diverse cognitive processes, in the presence of neuropsychiatric disease or injury, and over the lifespan. Much research has been done on the creation of algorithms to detect these mesoscopic structures, called communities or modules, but less has been done to conduct inference on these structures. The literature on analysis of these community detection algorithms has focused on comparing them within the same subject. These approaches, however, either do not accomodate a more general association between community structure and an outcome or cannot accommodate additional covariates that may confound the association of interest. We propose a semiparametric kernel machine regression model for either a continuous or binary outcome, where covariate effects are modeled parametrically and brain connectivity measures are measured nonparametrically. By incorporating notions of similarity between network community structures into a kernel distance function, the high-dimensional feature space of brain networks, defined on input pairs, can be generalized to non-linear spaces, allowing for a wider class of distance-based algorithms. We evaluate our proposed methodology on both simulated and real datasets.
Article
The randomly pivoted Cholesky algorithm ( RPCholesky ) computes a factorized rank‐ approximation of an positive‐semidefinite (psd) matrix. RPCholesky requires only entry evaluations and additional arithmetic operations, and it can be implemented with just a few lines of code. The method is particularly useful for approximating a kernel matrix. This paper offers a thorough new investigation of the empirical and theoretical behavior of this fundamental algorithm. For matrix approximation problems that arise in scientific machine learning, experiments show that RPCholesky matches or beats the performance of alternative algorithms. Moreover, RPCholesky provably returns low‐rank approximations that are nearly optimal. The simplicity, effectiveness, and robustness of RPCholesky strongly support its use in scientific computing and machine learning applications.
Conference Paper
Heart failure and heart attack are serious cardiovascular diseases that are responsible for a significant number of deaths worldwide. Early detection and accurate prediction of these diseases can be challenging, but machine learning models offer a promising approach to improve diagnosis and treatment. There has been growing interest in using machine learning models to predict heart failure and heart attack disease. These models use various types of data, such as patient demographics, medical history, vital signs, and laboratory tests, to identify patterns and predict the risk of disease in recent years. Some of the commonly used machine learning algorithms for this task includes logistic regression, decision trees, random forests, support vector machines, and neural networks. The use of machine learning models for this purpose has the potential to improve patient outcomes by enabling earlier diagnosis and targeted treatment, leading to better management of cardiovascular diseases and ultimately reducing the burden of these diseases on healthcare systems.
Article
Full-text available
Universal Numerical Integrators (UNIs) can be defined as the coupling of a universal approximator of functions (e.g., artificial neural network) with some conventional numerical integrator (e.g., Euler or Runge–Kutta). The UNIs are used to model non-linear dynamic systems governed by Ordinary Differential Equations (ODEs). Among the main types of UNIs existing in the literature, we can mention (i) The Euler-Type Universal Numerical Integrator (E-TUNI), (ii) The Runge-Kutta Neural Network (RKNN), and (iii) The Non-linear Auto Regressive Moving Average with Exogenous input or NARMAX model. All of them are equally accurate, regardless of their order. Furthermore, one of the reasons for writing this article is to show the reader that there are many other UNIs besides these. Thus, this article aims to carry out a detailed bibliographic review of this object of study, taking into more significant consideration the qualitative aspects of these UNIs. Computational experiments are also presented in this article to prove the numerical effectiveness of the main types of UNIs in the literature. Therefore, it is expected that this paper will help researchers in the future development of new UNIs.
Article
Full-text available
Liquid metal (LM) technologies are rapidly advancing in modern materials science, with low melting point metals playing a pivotal role in emerging applications. Recent studies reveal that doped liquid gallium systems form spectacular and diverse surface structures during cooling, [Tang et al., Nat. Nanotechnol., 2021, 16, 431-439] sparking renewed interest in the possible geometric structuring at the surface of pure liquid gallium. Distinct from the known increase in surface density, this lateral surface order has long been hinted at experimentally and theoretically but has remained enigmatic. Here, we quantitatively characterise the depth and nature of this surface ordering for the first time, using highly accurate and large scale molecular dynamics simulations coupled with machine learning analysis techniques. We also quantify the enhanced structural order introduced by the addition of a gallium oxide film as well as the disruption due to a dopant (bismuth).
Preprint
Adaptive causal representation learning from observational data is presented, integrated with an efficient sample splitting technique within the semiparametric estimating equation framework. The support points sample splitting (SPSS), a subsampling method based on energy distance, is employed for efficient double machine learning (DML) in causal inference. The support points are selected and split as optimal representative points of the full raw data in a random sample, in contrast to the traditional random splitting, and providing an optimal sub-representation of the underlying data generating distribution. They offer the best representation of a full big dataset, whereas the unit structural information of the underlying distribution via the traditional random data splitting is most likely not preserved. Three machine learning estimators were adopted for causal inference, support vector machine (SVM), deep learning (DL), and a hybrid super learner (SL) with deep learning (SDL), using SPSS. A comparative study is conducted between the proposed SVM, DL, and SDL representations using SPSS, and the benchmark results from Chernozhukov et al. (2018), which employed random forest, neural network, and regression trees with a random k-fold cross-fitting technique on the 401(k)-pension plan real data. The simulations show that DL with SPSS and the hybrid methods of DL and SL with SPSS outperform SVM with SPSS in terms of computational efficiency and the estimation quality, respectively.
Article
Short-term metro passenger flow prediction plays a great role in traffic planning and management, and it is an important prerequisite for achieving intelligent transportation. So, a novel hybrid Support Vector Regression (SVR) model based on Twice Clustering (TC) is proposed for short-term metro passenger flow prediction. The training sets and test sets are generated by TC with respect to values of passenger flow in different time periods to improve the prediction accuracy. Furthermore, each obtained cluster is decomposed by using the Complete Ensemble Empirical Mode Decomposition with Adaptive Noise (CEEMDAN) algorithm and the Ensemble Empirical Mode Decomposition (EEMD) algorithm, respectively. The volatility of each component obtained after decomposition is further reduced. Then, the SVR model optimized by the Grey Wolf Optimization (GWO) algorithm is used to predict the decomposed components. Moreover, forecast based on one-month data from Xi’an Metro Line 2 Library Station (China). By comparing the prediction results of the TC condition, the Once Clustering (OC) condition and the non-clustering condition, it shows that the TC approach can adequately model the volatility and effectively improve the prediction accuracy. At the same time, experimental results show that the novel hybrid TC–CEEMDAN–GWO–SVR model has superior performance than Genetic Algorithm (GA) optimized SVR (SVR–GA) model and hybrid Back Propagation Neural Network (BPNN) model.
Article
Full-text available
Vehicular Ad Hoc Networks (VANETs) represent a pivotal element in modern intelligent transportation systems, providing the foundation for vehicle-to-vehicle and vehicle-to-infrastructure communication. Ensuring reliable and stable routing within these networks is paramount for enhancing road safety, traffic management, and the overall efficiency of transportation systems. This paper explores an innovative approach to improving the dynamic behavior of VANETs by integrating game theory and machine learning techniques. In this research, game theory is utilized to model the interactions between vehicles as a strategic game, where each vehicle aims to optimize its routing decisions based on the behavior of other network participants. By applying concepts such as Nash equilibrium, we analyze and predict the optimal strategies for vehicles under various traffic conditions. Concurrently, machine learning algorithms are employed to adaptively learn from the network environment, allowing for real-time adjustments to routing strategies based on historical data and current network states. The proposed methodology involves the development of a hybrid framework that leverages game-theoretic models to determine optimal routing strategies and machine learning techniques to enhance these strategies through continuous learning and adaptation. Specifically, reinforcement learning algorithms are integrated to dynamically adjust routing decisions, providing a robust mechanism to handle the inherent variability and unpredictability of VANETs. Simulation results demonstrate that the integration of game theory and machine learning significantly improves the reliability and stability of routing in VANETs. The hybrid Impact Factor by SJR: 5.93 Indexed in Google Scholar 9823-57xx Refereed Journal Available online: https://jmlai.in/ approach not only reduces packet loss and end-to-end delay but also enhances overall network throughput. Additionally, the adaptability of the proposed system ensures its effectiveness in diverse and rapidly changing traffic scenarios. This study contributes to the field by presenting a comprehensive solution that addresses the challenges of dynamic behavior in VANETs through a synergistic application of game theory and machine learning. The findings have the potential to significantly advance the development of intelligent transportation systems, providing a foundation for future research and practical implementations aimed at achieving safer and more efficient vehicular communication networks.
Chapter
We give an overview of different approaches to measuring the similarity of, or the distance between, two graphs, highlighting connections between these approaches. We also discuss the complexity of computing the distances.
ResearchGate has not been able to resolve any references for this publication.