Article

Synergy of monotonic rules

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

This article describes a method for constructing a special rule (we call it synergy rule) that uses as its input information the outputs (scores) of several monotonic rules which solve the same pattern recognition problem. As an example of scores of such monotonic rules we consider here scores of SVM classifiers. In order to construct the optimal synergy rule, we estimate the conditional probability function based on the direct problem setting, which requires solving a Fredholm integral equation. Generally, solving a Fredholm equation is an ill-posed problem. However, in our model, we look for the solution of the equation in the set of monotonic and bounded functions, which makes the problem well-posed. This allows us to solve the equation accurately even with training data sets of limited size. In order to construct a monotonic solution, we use the set of functions that belong to Reproducing Kernel Hilbert Space (RKHS) associated with the INK-spline kernel (splines with Infinite Numbers of Knots) of degree zero. The paper provides details of the methods for finding multidimensional conditional probability in a set of monotonic functions to obtain the corresponding synergy rules. We demonstrate effectiveness of such rules for 1) solving standard pattern recognition problems, 2) constructing multi-class classification rules, 3) constructing a method for knowledge transfer from multiple intelligent teachers in the LUPI paradigm.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... We consider a stochastic optimization problem associated with, so-called, empirical risk minimization in the learning theory (see, for instance [2,3,11,14,15]). We assume that all random vectors and variables, which appear in this paper, are defined on a common probability space (Ω, F, P). ...
... The Lipschitz (or Hölder) conditions on a loss function (or on a objective function) are common in stochastic programming ( [5,7,9,10,12]) and the learning theory ( [2,3,11,14,15]). At the same time, the Lipschitz conditions on functions f from F in (1) and (2) are also in use (for example, considering linear functions or splines of first order, [2,3,5,11,14,15]. ...
... The Lipschitz (or Hölder) conditions on a loss function (or on a objective function) are common in stochastic programming ( [5,7,9,10,12]) and the learning theory ( [2,3,11,14,15]). At the same time, the Lipschitz conditions on functions f from F in (1) and (2) are also in use (for example, considering linear functions or splines of first order, [2,3,5,11,14,15]. ...
Article
Full-text available
We deal with a stochastic programming problem that can be inconsistent. To overcome the inconsistency we apply Tikhonov’s regularization technique, and, using recent results on the convergence rate of empirical measures in Wasserstein metric, we treat the following two related problems: 1. A choice of regularization parameters that guarantee the convergence of the minimization procedure. 2. Estimation of the rate of convergence in probability. Considering both light and heavy tail distributions and Lipschitz objective functions (which can be unbounded), we obtain the power bounds for the convergence rate.
... The Fredholm integral operator is a distribution-dependent operator that can be used to identify a desired function in the context of inverse ill-posed problems. Vapnik and Izmailov have demonstrated that by solving this operator, we can obtain the V-Matrix property, which can be used to gain insight into the solution of ill-posed supervised machine learning problems [30,28,31]. Specifically, the V-Matrix method offers a clear understanding of the distributional location of each training sample, as it is represented by the coefficients of the loss function. ...
... These V-Matrices are used in some works to re-weight least-square such as parametric regression estimation [14], support vector regression [40], SVM [9,31,30] and gradient boosting [12]. All methods are supervised machine learning, and they attempted to boost the accuracy of the model just based on training data. ...
Article
Full-text available
Supervised learning problems may become ill-posed when there is a lack of information, resulting in unstable and non-unique solutions. However, instead of solely relying on regularization, initializing an informative ill-posed operator is akin to posing better questions to achieve more accurate answers. The Fredholm integral equation of the first kind (FIFK) is a reliable ill-posed operator that can integrate distributions and prior knowledge as input information. By incorporating input distributions and prior knowledge, the FIFK operator can address the limitations of using high-dimensional input distributions by semi-supervised assumptions, leading to more precise approximations of the integral operator. Additionally, the FIFK's incorporation of probabilistic principles can further enhance the accuracy and effectiveness of solutions. In cases of noisy operator equations and limited data, the FIFK's flexibility in defining problems using prior information or cross-validation with various kernel designs is especially advantageous. This capability allows for detailed problem definitions and facilitates achieving high levels of accuracy and stability in solutions. In our study, we examined the FIFK through two different approaches. Firstly, we implemented a semi-supervised assumption by using the same Fredholm operator kernel and data function kernel and incorporating unlabeled information. Secondly, we used the MSDF method, which involves selecting different kernels on both sides of the equation to define when the mapping kernel is different from the data function kernel. To assess the effectiveness of the FIFK and the proposed methods in solving ill-posed problems, we conducted experiments on a real-world dataset. Our goal was to compare the performance of these methods against the widely used least-squares method and other comparable methods.
... The Fredholm integral operator is a distribution-dependent operator that can be used to identify a desired function in the context of inverse ill-posed problems. Vapnik and Izmailov have demonstrated that by solving this operator, we can obtain the V-Matrix property, which can be used to gain insight into the solution of ill-posed supervised machine learning problems [30,28,31]. Specifically, the V-Matrix method offers a clear understanding of the distributional location of each training sample, as it is represented by the coefficients of the loss function. ...
... These V-Matrices are used in some works to re-weight least-square such as parametric regression estimation [14], support vector regression [40], SVM [9,31,30] and gradient boosting [12]. All methods are supervised machine learning, and they attempted to boost the accuracy of the model just based on training data. ...
Preprint
Full-text available
Supervised learning problems may become ill-posed when there is a lack of information, resulting in unstable and non-unique solutions. However, instead of solely relying on regularization, initializing an informative ill-posed operator is akin to posing better questions to achieve more accurate answers. The Fredholm integral equation of the first kind (FIFK) is a reliable ill-posed operator that can integrate distributions and prior knowledge as input information. By incorporating input distributions and prior knowledge, the FIFK operator can address the limitations of using high-dimensional input distributions by semi-supervised assumptions, leading to more precise approximations of the integral operator. Additionally, the FIFK's incorporation of probabilistic principles can further enhance the accuracy and effectiveness of solutions. In cases of noisy operator equations and limited data, the FIFK's flexibility in defining problems using prior information or cross-validation with various kernel designs is especially advantageous. This capability allows for detailed problem definitions and facilitates achieving high levels of accuracy and stability in solutions. In our study, we examined the FIFK through two different approaches. Firstly, we implemented a semi-supervised assumption by using the same Fredholm operator kernel and data function kernel and incorporating unlabeled information. Secondly, we used the MSDF method, which involves selecting different kernels on both sides of the equation to define when the mapping kernel is different from the data function kernel. To assess the effectiveness of the FIFK and the proposed methods in solving ill-posed problems, we conducted experiments on a real-world dataset. Our goal was to compare the performance of these methods against the widely used least-squares method and other comparable methods.
... However, traditional SVMs only provide classification outputs and do not directly estimate class probabilities, which can be of significant scientific interest as they indicate the strength or confidence of the classification outcome. To obtain probability estimates, additional techniques such as Platt scaling [10,13], Isotonic Regression [19], or monotonic approximation [16] are needed. These techniques enhance the basic SVM framework by enabling it to produce probabilistic outputs, thereby making the models more versatile and applicable to a wider range of real-world problems, such as utility models [12] or credit card fraud detection [14]. ...
Preprint
Full-text available
Support Vector Machines (SVMs) are powerful tools in machine learning, widely used for classification and regression tasks. Over time, various extensions, such as Probability Estimation SVM (PSVM) and Conditional Probability SVM (CPSVM), have been proposed to enhance SVM performance across different conditions and datasets. This article offers a comprehensive review and analysis of SVMs, with a particular focus on the PSVM and CPSVM models. We delve into the intricacies of these models, addressing computational nuances and presenting corrected formulations where necessary. Our empirical evaluation, conducted on diverse benchmark datasets, implements both linear and nonlinear versions of these SVMs. Performance is benchmarked using the Balanced Accuracy metric. The results highlight the comparative strengths of these models in handling varied datasets and their potential advantages over traditional SVM formulations. To rigorously assess and compare the performance of these SVM variants, we employ statistical tests, including the Friedman test and post hoc analysis.
... If they were distributed across modalities, it could combine them to beat any unimodal one. Several approaches for multimodal classification have thus been employed on neuroimaging data, including concatenation, multi-kernel learning or, as a more recent development through the synergy rule, which is constructed based on an integration of multiple related monotonic, for example, SVM, classifiers (see for examples Liem et al., 2017;Schmaal et al., 2015;Vapnik & Izmailov, 2016;Youssofzadeh, McGuinness, Maguire, & Wong-Lin, 2017). It remains to be seen, though, whether machine learning based on multimodal MR imaging data can outperform unimodal ones in strictly independent evaluations, given the higher number of features and (usually for logistic reasons) lower number of observations for the former, which may lead to over-optimistic assessments as discussed above. ...
Article
Full-text available
Computer systems for medical diagnosis based on machine learning are not mere science fiction. Despite undisputed potential benefits, such systems may also raise problems. Two (interconnected) issues are particularly significant from an ethical point of view: The first issue is that epistemic opacity is at odds with a common desire for understanding and potentially undermines information rights. The second (related) issue concerns the assignment of responsibility in cases of failure. The core of the two issues seems to be that understanding and responsi- bility are concepts that are intrinsically tied to the discursive practice of giving and asking for reasons. The challenge is to find ways to make the outcomes of machine learning algorithms compatible with our discursive practice. This comes down to the claim that we should try to integrate discursive elements into machine learning algorithms. Under the title of “explainable AI” initiatives heading in this direction are already under way. Extensive research in this field is needed for finding adequate solutions.
... Vapnik and other scholars have shown that the selection of parameters (C, g) directly affects the performance of the SVM classifier. [8][9][10][11] Among them, C is the penalty factor of misclassified samples, and its main function is to control the punishment for wrong sample level and optimization in different sub-sample spaces if the value of C is different, and g is the main complexity change sample subspace, so the linear classification can minimum error plays a key role. ...
Article
Full-text available
Defects in product packaging are one of the key factors that affect product sales. Traditional defect detection depends primarily on artificial vision detection. With the rapid development of machine vision, image processing, pattern recognition, and other technologies, industrial automation detection has become an inevitable trend because machine vision technology can greatly improve accuracy and efficiency; therefore, it is of great practical value to study automatic detection technology of the surface defects encountered in packaging boxes. In this study, machine vision and machine learning were combined to examine a surface defect detection method based on support vector machine where defective products are eliminated by a sorting robot system. After testing, the support vector machine training model using radial basis function kernel detects three kinds of defects at the same time under the ideal condition of parameter selection, and the effective detection rate is 98.0296%.
... However, such correlated features are not necessarily redundant and in many cases may carry additional independent information (Averbeck et al., 2006). An important recent development is known as the synergy rule (Vapnik & Izmailov, 2016), which is constructed based on an integration of multiple related monotonic classifiers (e.g. SVM classifiers) with partially correlated sets of features. ...
Article
Despite its initial promise, neuroimaging has not been widely translated into clinical psychiatry to assist in the prediction of diagnoses, prognoses, and optimal therapeutic strategies. Machine learning approaches may enhance the translational potential of neuroimaging because they specifically focus on overcoming biases by optimizing the generalizability of pipelines that measure complex brain patterns to predict targets at a single-subject level. This article introduces some fundamentals of a translational machine learning approach before selectively reviewing literature to-date. Promising initial results are then balanced by the description of limitations that should be considered in order to interpret existing research and maximize the possibility of future translation. Future directions are then presented in order to inspire further research and progress the field towards clinical translation.
... In particular it was inspired by recent works [12] with application of CP to big data. and [13] covering the theme of merging data splits as well as merging prediction algorithms. We are grateful to Paolo Toccaceli, Vladimir Vovk, Alex Gammerman and Zhiyuan Luo for useful discussions in this area. ...
Article
Full-text available
Conformal Prediction is a recently developed framework for reliable confident predictions. In this work we discuss its possible application to big data coming from different, possibly heterogeneous data sources. On example of anomaly detection problem, we study the question of saving validity of Conformal Prediction in this case. We show that the straight forward averaging approach is invalid, while its easy alternative of maximizing is not very efficient because of its conservativeness. We propose the third compromised approach that is valid, but much less conservative. It is supported by both theoretical justification and experimental results in the area of energy engineering.
Article
The paper is devoted to two problems: (1) reinforcement of SVM algorithms, and (2) justification of memorization mechanisms for generalization. (1) Current SVM algorithm was designed for the case when the risk for the set of nonnegative slack variables is defined by l1 norm. In this paper, along with that classical l1 norm, we consider risks defined by l2 norm and l∞ norm. Using these norms, we formulate several modifications of the existing SVM algorithm and show that the resulting modified SVM algorithms can improve (sometimes significantly) the classification performance. (2) Generalization ability of existing learning algorithms is usually explained by arguments involving uniform convergence of empirical losses to the corresponding expected losses over a given set of functions. However, along with bounds for uniform convergence of empirical losses to the expected losses, the VC theory also provides bounds for relative uniform convergence. These bounds lead to a more accurate estimate of the expected loss. Advanced methods of estimating of expected risk of error have to leverage these bounds, which also support mechanisms of training data memorization, which, as the paper demonstrates, can improve classification performance.
ResearchGate has not been able to resolve any references for this publication.