Figure - available from: Machine Learning
This content is subject to copyright. Terms and conditions apply.
Source publication
This paper introduces a new learning paradigm, called Learning Using Statistical Invariants (LUSI), which is different from the classical one. In a classical paradigm, the learning machine constructs a classification rule that minimizes the probability of expected error; it is data-driven model of learning. In the LUSI paradigm, in order to constru...
Similar publications
In many real time applications, we often have to deal with classification , regression or clustering problems that involve multiple tasks. The conventional machine learning approaches solve these tasks independently by ignoring the task relatedness. In multi-task learning (MTL), these related tasks are learned simultaneously by extracting and utili...
Citations
... Hence, high-complexity architecture designs do not ensure sufficient explanatory and predictive capacities [86,87]. Accordingly, the statistical learning theory claims that the prevalence of learning with low complexity models should be preferred [88][89][90][91]. Therefore, this research aimed to encounter the simplest competent model to adequately predict the geotechnical soil properties (under consideration). ...
... For example, geometric have been applied invariants in computer vision [5] 1 have been used in neural networks [6]. Recently, Vapnik and Izmailov [7] proposed a new invariant learning paradigm for solving classification problems named learning using statistical invariants (LUSI). LUSI incorporates the concept of weak convergence, allowing it to be combined with any method to create a new mechanism that implements both strong and weak convergence. ...
... is the (i, j)-th element of an (l × l)dimension positive semidefinite matrix V , which is referred to as V -matrix [7]. Now, to seek solutions that minimize equation (4) within the set of functions { f (x, α) , α ∈ Λ} belonging to the Reproducing Kernel Hilbert Space (RKHS) [33], associated with the continuous positive semi-definite kernel function K (x, x ) defined for x, x ∈ R n the function to be estimated has the following the representation [7] f ...
... is the (i, j)-th element of an (l × l)dimension positive semidefinite matrix V , which is referred to as V -matrix [7]. Now, to seek solutions that minimize equation (4) within the set of functions { f (x, α) , α ∈ Λ} belonging to the Reproducing Kernel Hilbert Space (RKHS) [33], associated with the continuous positive semi-definite kernel function K (x, x ) defined for x, x ∈ R n the function to be estimated has the following the representation [7] f ...
Learning using statistical invariants (LUSI) is a new learning paradigm, which adopts weak convergence mechanism, and can be applied to a wider range of classification problems. However, the computation cost of invariant matrices in LUSI is high for large-scale datasets during training. To settle this issue, this paper introduces a granularity statistical invariant for LUSI, and develops a new learning paradigm called learning using granularity statistical invariants (LUGSI). LUGSI employs both strong and weak convergence mechanisms, taking a perspective of minimizing expected risk. As far as we know, it is the first time to construct granularity statistical invariants. Compared to LUSI, the introduction of this new statistical invariant brings two advantages. Firstly, it enhances the structural information of the data. Secondly, LUGSI transforms a large invariant matrix into a smaller one by maximizing the distance between classes, achieving feasibility for large-scale datasets classification problems and significantly enhancing the training speed of model operations. Experimental results indicate that LUGSI not only exhibits improved generalization capabilities but also demonstrates faster training speed, particularly for large-scale datasets.
Graphical abstract
... Does intracellular pressure integration enhance PC clustered input detection (Rvachev, 2003(Rvachev, , 2010(Rvachev, , 2013? Is classification within various cortical areas related to learning using statistical invariants (Vapnik and Izmailov, 2020)? How does temporal processing proceed [see Rvachov (2012) for some interesting ideas]? ...
A feature of the brains of intelligent animals is the ability to learn to respond to an ensemble of active neuronal inputs with a behaviorally appropriate ensemble of active neuronal outputs. Previously, a hypothesis was proposed on how this mechanism is implemented at the cellular level within the neocortical pyramidal neuron: the apical tuft or perisomatic inputs initiate “guess” neuron firings, while the basal dendrites identify input patterns based on excited synaptic clusters, with the cluster excitation strength adjusted based on reward feedback. This simple mechanism allows neurons to learn to classify their inputs in a surprisingly intelligent manner. Here, we revise and extend this hypothesis. We modify synaptic plasticity rules to align with behavioral time scale synaptic plasticity (BTSP) observed in hippocampal area CA1, making the framework more biophysically and behaviorally plausible. The neurons for the guess firings are selected in a voluntary manner via feedback connections to apical tufts in the neocortical layer 1, leading to dendritic Ca ²⁺ spikes with burst firing, which are postulated to be neural correlates of attentional, aware processing. Once learned, the neuronal input classification is executed without voluntary or conscious control, enabling hierarchical incremental learning of classifications that is effective in our inherently classifiable world. In addition to voluntary, we propose that pyramidal neuron burst firing can be involuntary, also initiated via apical tuft inputs, drawing attention toward important cues such as novelty and noxious stimuli. We classify the excitations of neocortical pyramidal neurons into four categories based on their excitation pathway: attentional versus automatic and voluntary/acquired versus involuntary. Additionally, we hypothesize that dendrites within pyramidal neuron minicolumn bundles are coupled via depolarization cross-induction, enabling minicolumn functions such as the creation of powerful hierarchical “hyperneurons” and the internal representation of the external world. We suggest building blocks to extend the microcircuit theory to network-level processing, which, interestingly, yields variants resembling the artificial neural networks currently in use. On a more speculative note, we conjecture that principles of intelligence in universes governed by certain types of physical laws might resemble ours.
... The integration of artificial intelligence (AI) in education is expected to increase in the coming years due to advances in AI technology and the demand for flexible and personalized learning (Chen, 2022;Whalley et al., 2021). However, the implementation of AI in education requires a nuanced approach that considers the complexity and diversity of educational settings and emphasizes human-centered design (Blikstein, 2013;Vapnik & Izmailov, 2019). Effective integration of AI into education requires a multidisciplinary strategy that draws on expertise from computer science, education, psychology, and sociology (Raji et al., 2021;Xia & Li, 2022;Yadav et al., 2022). ...
Explainable artificial intelligence (AI) has drawn a lot of attention recently since AI systems are being employed more often across a variety of industries, including education. Building trust and increasing the efficacy of AI systems in educational settings requires the capacity to explain how they make decisions. This article provides a comprehensive review of the current level of explainable AI (XAI) research and its application to education. We begin with the challenges of XAI in education, the complexity of AI algorithms, and the necessity for transparency and interpretability. Furthermore, we discuss the obstacles involved with using AI in education, and explore several solutions, including human-AI collaboration, explainability techniques, and ethical and legal frameworks. Subsequently, we debate about the importance of developing new competencies and skills among students and educators to interact with AI effectively, as well as how XAI impacts politics and government. Finally, we provide recommendations for additional research in this field and suggest potential future directions for XAI in educational research and practice.
... Model selection is a crucial step in building statistical models that accurately capture the relationships between variables ( [13,14]). It can choose the most suitable model from a set of candidate models based on certain criteria such as goodness-of-fit, predictive performance, and interpretability. ...
Many methods have been developed to study nonparametric function-on-function regression models. Nevertheless, there is a lack of model selection approach to the regression function as a functional function with functional covariate inputs. To study interaction effects among these functional covariates, in this article, we first construct a tensor product space of reproducing kernel Hilbert spaces and build an analysis of variance (ANOVA) decomposition of the tensor product space. We then use a model selection method with the L1 criterion to estimate the functional function with functional covariate inputs and detect interaction effects among the functional covariates. The proposed method is evaluated using simulations and stroke rehabilitation data.
... The Support Vector Machine (SVM) has been developed for determining a certain boundary plane which holds the largest gap with the nearest random variables located on both sides of the hyperplane [208][209][210][211]. For a given input set of random variables being trained, which is denoted as x train = [x train,1 , x train,2 , . . . ...
Structural systems are consistently encountering the variabilities in material properties, undesirable defects and loading environments, which may potentially shorten their designed service life. To ensure a reliable structural performance, it is vital to track and quantify the effects of different random/uncertainty factors upon the structural fracture performance. In this research, a critical review of the past, current and future computational modelling of the non-deterministic fracture mechanics is presented. By considering the variously numerical solutions tackling the fracture problems, they are mainly categorized into the discrete and continuous approaches. This study discusses the quantification performance of the extended finite element method, the crack band method and the phase-field approaches combined with different sources of uncertainties. These well-known computational techniques are typical representatives of the common fracture modelling philosophies including the embedded, smeared and regularized ones. The essence of this work is to compare the main differences of the uncertainty quantification models (i.e., probabilistic, non-probabilistic) at the fracture formulation levels and investigate the major progress and challenges existing in the real-life applications for the past and future decades. Some critical remarks, which are denoting the advantages and major issues of various non-deterministic fracture models, are provided and explained in the practical structural failure conditions. Different fracture simulation cases are implemented with comparative results amongst analytical, numerical and experimental methods, and the corresponding fracture quantification ability is evaluated through the standards of the random fracture capacity, load–deflection plots, crack propagation, crack mechanisms, and computational efficiency, etc.
... Recently, Vapnik and Izmailov [14] further studied the classification problem by estimating the cumulative distribution function and conditional probability. And proposed a new way to estimate the conditional probability P (y = 1|x) by solving the Fredholm integral equation based on the definition of conditional probability and the density function directly. ...
... And proposed a new way to estimate the conditional probability P (y = 1|x) by solving the Fredholm integral equation based on the definition of conditional probability and the density function directly. To solve the Fredholm integral equation, [14], [15] presented a form of least squares implementation called VSVM and compares it with the traditional least squares and support vector machine classifiers. The advantages of using the integral equation are that f (x) can be solved by estimat-ing the distribution function in input space, and the integral equation has wider convergence properties. ...
... In the classical approach to the learning machines (analyzed by the above VC theory), we ignored the fact that the desired decision rule is related to the conditional probability function P (y = 1|x) used by supervisor and the different distribution structure of the data F (x), which will have an impact on the learned mapping f : X → {0, 1}. [14] proposes to consider the problem of directly estimating the conditional probability function P (y = 1|x) using training data (1) as the main problem of learning. And further [14], [15] gives a direct setting of conditional probability estimation problem based on the standard definitions of a density and a conditional probability. ...
In this paper, we study the classification problem by estimating the conditional probability function of the given data. Different from the traditional expected risk estimation theory on empirical data, we calculate the probability via Fredholm equation, this leads to estimate the distribution of the data. Based on the Fredholm equation, a new expected risk estimation theory by estimating the cumulative distribution function is presented. The main characteristic of the new expected risk estimation is to measure the risk on the distribution of the input space. The corresponding empirical risk estimation is also presented, and an ε-insensitive
L
1
cumulative support vector machines (ε-
L
1
VSVM) is proposed by introducing an insensitive loss. It is worth mentioning that the classification model and the classification evaluation indicator based on the new mechanism are different from the traditional one. Experimental results show the effectiveness of the proposed ε-
L
1
VSVM and the corresponding cumulative distribution function indicator on validity and interpretability of small data classification.
... In a machine learning paradigm, learning machines often compute statistical invariants for specific problems with the objective of reducing the expected values of errors in a such way that preserves these invariants. In contrast to classical machine learning that employs the mechanism of strong convergence for approximations to the desired function, LUSI can significantly increase the rate of convergence by combining the mechanisms of strong convergence and weak convergence [17]. Furthermore, the notion of weak topology is also important when dealing with shift spaces for signal analysis that uses symbolic dynamics, as explained in [18,19]. ...
... The RKHS has many engineering and scientific applications, including those in harmonic analysis, wavelet analysis, and quantum mechanics. In particular, functions from RKHS have special properties that make them useful for function estimation problems in high-dimensional spaces, which is critically important in the fields of statistical learning theory and machine learning [17]. In fact, every function in RKHS that minimizes an empirical risk functional can be expressed as a linear combination of the kernel functions evaluated at the training points. ...
Functional analysis is a well-developed field in the discipline of Mathematics, which provides unifying frameworks for solving many problems in applied sciences and engineering. In particular, several important topics (e.g., spectrum estimation, linear prediction, and wavelet analysis) in signal processing had been initiated and developed through collaborative efforts of engineers and mathematicians who used results from Hilbert spaces, Hardy spaces, weak topology, and other topics of functional analysis to establish essential analytical structures for many subfields in signal processing. This paper presents a concise tutorial for understanding the theoretical concepts of the essential elements in functional analysis, which form a mathematical framework and backbone for central topics in signal processing, specifically statistical and adaptive signal processing. The applications of these concepts for formulating and analyzing signal processing problems may often be difficult for researchers in applied sciences and engineering, who are not adequately familiar with the terminology and concepts of functional analysis. Moreover, these concepts are not often explained in sufficient details in the signal processing literature; on the other hand, they are well-studied in textbooks on functional analysis, yet without emphasizing the perspectives of signal processing applications. Therefore, the process of assimilating the ensemble of pertinent information on functional analysis and explaining their relevance to signal processing applications should have significant importance and utility to the professional communities of applied sciences and engineering. The information, presented in this paper, is intended to provide an adequate mathematical background with a unifying concept for apparently diverse topics in signal processing. The main objectives of this paper from the above perspectives are summarized below: (1) Assimilation of the essential information from different sources of functional analysis literature, which are relevant to developing the theory and applications of signal processing. (2) Description of the underlying concepts in a way that is accessible to non-specialists in functional analysis (e.g., those with bachelor-level or first-year graduate-level training in signal processing and mathematics). (3) Signal-processing-based interpretation of functional-analytic concepts and their concise presentation in a tutorial format.
... Various industries have applied k-NN, including cyber security, information security, and aviation. Valuable life forecast, defect categorization, nephropathy diagnosis in children, and infiltration prevention systems have been implemented using the k-NN method [55][56][57][58]. ...
... Vapnik [58] proposed the SVM method for the first time in 1995, and it has received a tremendous amount of attention from the machine learning applications community. Many studies have found that the SVM approach outperforms other data classification algorithms depending on the data type regarding classification accuracy compared with other methods [59,60]. ...
Bioelectrical impedance analysis (BIA) was established to quantify diverse cellular characteristics. This technique has been widely used in various species, such as fish, poultry, and humans for compositional analysis. This technology was limited to offline quality assurance/detection of woody breast (WB); however, inline technology that can be retrofitted on the conveyor belt would be more helpful to processors. Freshly deboned (n = 80) chicken breast fillets were collected from a local processor and analyzed by hand-palpation for different WB severity levels. Data collected from both BIA setups were subjected to supervised and unsupervised learning algorithms. The modified BIA showed better detection ability for regular fillets than the probe BIA setup. In the plate BIA setup, fillets were 80.00% for normal, 66.67% for moderate (data for mild and moderate merged), and 85.00% for severe WB. However, hand-held BIA showed 77.78, 85.71, and 88.89% for normal, moderate, and severe WB, respectively. Plate BIA setup is more effective in detecting WB myopathies and could be installed without slowing the processing line. Breast fillet detection on the processing line can be significantly improved using a modified automated plate BIA.
... Recently, Vapnik and Izmailov [14] further studied the classification problem by estimating the cumulative distribution function and conditional probability. And proposed a new way to estimate the conditional probability P (y = 1|x) by solving the integral equation based on the definition of conditional probability and the density function directly. ...
... And proposed a new way to estimate the conditional probability P (y = 1|x) by solving the integral equation based on the definition of conditional probability and the density function directly. To solve the integral equation, [14,15] presented a form of least squares implementation called VSVM and compares it with the traditional least squares and support vector machine classifiers. The advantages of using the integral equation is that f (x) can be solved by estimating the distribution function in input space, and the integral equation has wider convergence properties. ...
... In the classical approach to the learning machines (analyzed by the above VC theory), we ignored the fact that the desired decision rule is related to the conditional probability function P (y = 1|x) used by supervisor and the different distribution structure of the data F (x), which will have an impact on the learned mapping f : X → {0, 1}. [14] proposes to consider the problem of directly estimating the conditional probability function P (y = 1|x) using training data (1) as the main problem of learning. And further [14,15] gives a direct setting of conditional probability estimation problem based on the standard definitions of a density and a conditional probability. ...
In this paper, we study the classification problem by estimating the conditional probability function of the given data. Different from the traditional expected risk estimation theory on empirical data, we calculate the probability via Fredholm equation, this leads to estimate the distribution of the data. Based on the Fredholm equation, a new expected risk estimation theory by estimating the cumulative distribution function is presented. The main characteristics of the new expected risk estimation is to measure the risk on the distribution of the input space. The corresponding empirical risk estimation is also presented, and an -insensitive cumulative support vector machines (-) is proposed by introducing an insensitive loss. It is worth mentioning that the classification models and the classification evaluation indicators based on the new mechanism are different from the traditional one. Experimental results show the effectiveness of the proposed - and the corresponding cumulative distribution function indicator on validity and interpretability of small data classification.