ArticlePublisher preview available

Knowledge transfer in SVM and neural networks

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

The paper considers general machine learning models, where knowledge transfer is positioned as the main method to improve their convergence properties. Previous research was focused on mechanisms of knowledge transfer in the context of SVM framework; the paper shows that this mechanism is applicable to neural network framework as well. The paper describes several general approaches for knowledge transfer in both SVM and ANN frameworks and illustrates algorithmic implementations and performance of one of these approaches for several synthetic examples.
Ann Math Artif Intell (2017) 81:3–19
DOI 10.1007/s10472-017-9538-x
Knowledge transfer in SVM and neural networks
Vladimir Vapnik1,2 ·Rauf Izmailov3
Published online: 20 February 2017
© Springer International Publishing Switzerland 2017
Abstract The paper considers general machine learning models, where knowledge transfer
is positioned as the main method to improve their convergence properties. Previous research
was focused on mechanisms of knowledge transfer in the context of SVM framework; the
paper shows that this mechanism is applicable to neural network framework as well. The
paper describes several general approaches for knowledge transfer in both SVM and ANN
frameworks and illustrates algorithmic implementations and performance of one of these
approaches for several synthetic examples.
Keywords Intelligent teacher ·Privileged information ·Similarity control ·Knowledge
transfer ·Knowledge representation ·Frames ·Support vector machine ·Neural network ·
Classification ·Learning theory ·Regression
Mathematics Subject Classification (2010) 68Q32 ·68T05 ·68T30 ·83C32
This material is based upon work partially supported by AFRL and DARPA under contract
FA8750-14-C-0008 and the work partially supported by AFRL under contract FA9550-15-1-0502. Any
opinions, findings and / or conclusions in this material are those of the authors and do not necessarily
reflect the views of AFRL and DARPA.
Rauf Izmailov
rizmailov@vencorelabs.com
Vladimir Vapnik
vladimir.vapnik@gmail.com
1Columbia University, New York, NY, USA
2AI Research Lab, Facebook, New York, NY, USA
3Vencore Labs, Basking Ridge, NJ, USA
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
... SVMs are a family of machine learning algorithms developed by Vladimir Vapnik [10], used to solve classification, regression and anomaly detection problems. They aim to separate data into classes using a boundary or hyperplane, while maximizing the distance between the different groups of data and the separating boundary. ...
Chapter
In the emergency context of COVID-19 and its variants, rapid and accurate diagnosis based on radiographic images is of major importance. This avoids confusion with other types of pneumonia and ensures appropriate treatment. This paper presents a hybrid model com-bining five pre-trained CNNs (VGG16, VGG19, MobileNet, Inception-v3 and DenseNet201) with the SVM classifier. The study was conducted on a recent public database of radiographic images of COVID-19. Experiments on 3 different patterns (COVID-19 vs Normal), (COVID-19 vs Normal vs Lung Opacity) and (COVID-19 vs Normal vs Lung Opacity vs Viral Pneumonia) showed encouraging accuracies, with higher recog- nition rates compared to studies using CNNs alone. In addition, the approach was validated on a separate dataset linked to Pakistani population, achieving acceptable results.
... Finally, the model is retrained using the pseudo-parallel corpus generated by the machine translation model and the original real corpus. [24] employs the source language monolingual corpus to enhance the efficacy of machine translation. This approach employs a multi-task learning framework to predict the translation at the target end, while reordering the source language sentences to align with the grammatical and syntactic structure of the target language. ...
Preprint
Full-text available
Machine translation technology, which employs computers to autonomously convert text between source and target languages, represents a pivotal realm within artificial intelligence and natural language processing research. This paper introduces a novel algorithm grounded in multi-task learning, which is aimed at enhancing the efficacy of Chinese-English neural machine translation systems. This proposition addresses three key challenges: the scarcity of parallel Chinese-English corpora, substantial disparities in sentence structure between the two languages, and the intricate, mutable nature of word formations in Mongolian, a factor influencing Chinese due to historical linguistic interactions. To counter these issues, we devise a parameter transfer strategy. Our methodology commences with the training of a high-resource neural machine translation model leveraging the encoder-decoder architecture prevalent in neural machine translation systems. Subsequently, the learned parameters are utilised to initialise a low-resource model, thereby kickstarting its training with a more informed starting point. It should be noted that the word embeddings and fully-connected layers of the low-resource model are randomly initialised and undergo continuous updating throughout the iterative training process. The experimental outcomes affirm the superiority of our proposed Dual-Task Multi-Task Learning (DFMTL) method, which achieves a BLEU score of 10.1. This not only outperforms the performance of three established baseline models but also demonstrates a notable 0.7 BLEU score increase over models trained exclusively on a mixed-corpus dataset. These findings highlight the potential of our parameter migration strategy in enhancing the precision and fluency of Chinese-English machine translations under resource-constrained scenarios.
... The perceptron, recognized as the fundamental unit akin to brain cells and synonymous with neurons, serves as the foundational element of neural networks in machine learning [18,19]. This concept also shares similarities with the support vector machine [20] and the expectation of prediction values across different periods. However, as of now, there has been no direct application of the notion of knot and link effects to perceptrons. ...
Article
Full-text available
The Wilson loop is indicative of the pathway encompassed within the market cocycle, which carries the coherent gauge field behavior present in the financial time series data. We enhance the capabilities of the support vector machine by integrating supplementary attributes through the incorporation of the knot and link characteristics of the Wilson loop, as derived from market microstructure contexts. This framework was employed to capture the financial market dynamics within time series data, with particular emphasis on the parallel transport of co-state between predictor and predictant transfer along the continuous behavior field’s predictive evolutionary lift path trajectory. We found that the average performance of Wilson loop perceptron in empirical analysis of a sample set of closed price of DE30 and EUR/USD exchange rate is 68.42% when compared with SVM which have 49.17%. Furthermore, we conclude from data analysis of statistical DM-test that has Wilson loop perceptron better performance than SVM in directional prediction of intraday financial time series.
Article
Full-text available
Bu çalışmada, sınıflandırma yöntemlerinden, yapay zekâ tabanlı, Yapay Sinir Ağları (YSA) ve Destek Vektör Makineleri (DVM) geleneksel yöntemlerden ise Lojistik Regresyon (LR) bir bankadan alınan kurumsal müşteri veri kümesine, iki farklı şekilde, uygulanmıştır. 893 tanesi "kusurlu", 7896 tanesi "kusursuz" toplam 8789 adet kurumsal müşteri verisinin yer aldığı "kurumsal veri" kümesine ve ikincil olarak da 893 tanesi kusurlu, 893 tanesi kusursuz toplam 1786 adet müşteri verisinin yer aldığı "dengeli kurumsal veri" kümesine uygulanmıştır. Her iki veri kümesinde YSA en yüksek doğruluk oranını (sırasıyla %96 ve %93), DVM ise kurumsal veride yine en yüksek doğruluk oranını (%96), LR ise yapay zekâ tabanlı uygulamalara kıyasla daha düşük bir doğruluk oranı (%89) vermiştir. Kurumsal veriden, dengeli kurumsal veriye geçildiğinde, verideki yaklaşık %80'lik kayıptan, YSA ve LR %3 oranında etkilenirken DVM ise %5 oranında etkilenmiştir. DVM, modeller arasında, en küçük standart sapmaya sahip yöntem olmuştur. Çalışma, yapay zekâ tabanlı YSA ve DVM yöntemlerinin, LR gibi geleneksel yöntemlere kıyasla, daha iyi sonuçlar verdiğini, diğer bir deyişle daha iyi sınıflandırma yaptığını, göstermiştir. Anahtar Kelimeler: Sınıflandırma, Yapay sinir ağları, Destek vektör makineleri, Kredi başvurusu değerlendirme. Abstract Article Info Received: 06/11/2020 In this study, among the classification methods, artificial intelligence-based, Artificial Neural Networks (ANN) and Support Vector Machines (SVM), and traditional methods, Logistic Regression (LR) were applied to the corporate customer data set from a bank in two different ways.893 of them are flawed, 7896 of them are flawless, a total of 8789 corporate customer data is applied to the "corporate data" set, and secondary to the "balanced corporate data" set, which includes 1786 customer data, 893 of which are flawed and 893 of which are flawless. In both data 1 Doktor, İstanbul Üniversitesi, ORCID ID: 0000-0002-1702-2965, gkhnkrkmz3873@gmail.com.
Article
Full-text available
This work considers “Learning Using Privileged Information” (LUPI) paradigm. LUPI improves classification accuracy by incorporating additional information available at training time and not available during testing. In this contribution, the LUPI paradigm is tested on a Wide Area Motion Imagery (WAMI) dataset and on images from the Caltech 101 dataset. In both cases a consistent improvement in classification accuracy is observed. The results are discussed and the directions of future research are outlined.
Article
Full-text available
In some pattern analysis problems, there exists expert knowledge, in addition to the original data involved in the classification process. The vast majority of existing approaches simply ignore such auxiliary (privileged) knowledge. Recently a new paradigm-learning using privileged information-was introduced in the framework of SVM+. This approach is formulated for binary classification and, as typical for many kernel-based methods, can scale unfavorably with the number of training examples. While speeding up training methods and extensions of SVM+ to multiclass problems are possible, in this paper we present a more direct novel methodology for incorporating valuable privileged knowledge in the model construction phase, primarily formulated in the framework of generalized matrix learning vector quantization. This is done by changing the global metric in the input space, based on distance relations revealed by the privileged information. Hence, unlike in SVM+, any convenient classifier can be used after such metric modification, bringing more flexibility to the problem of incorporating privileged information during the training. Experiments demonstrate that the manipulation of an input space metric based on privileged data improves classification accuracy. Moreover, our methods can achieve competitive performance against the SVM+ formulations.
Conference Paper
This paper introduces an advanced setting of machine learning problem in which an Intelligent Teacher is involved. During training stage, Intelligent Teacher provides Student with information that contains, along with classification of each example, additional privileged information (explanation) of this example. The paper describes two mechanisms that can be used for significantly accelerating the speed of Student’s training: (1) correction of Student’s concepts of similarity between examples, and (2) direct Teacher-Student knowledge transfer.
Article
This paper describes a new paradigm of machine learning, in which Intelligent Teacher is involved. During training stage, Intelligent Teacher provides Student with information that contains, along with classification of each example, additional privileged information (for example, explanation) of this example. The paper describes two mechanisms that can be used for significantly accelerating the speed of Student's learning using privileged information: (1) correction of Student's concepts of similarity between examples, and (2) direct Teacher-Student knowledge transfer.
Conference Paper
Many computer vision problems have an asymmetric distribution of information between training and test time. In this work, we study the case where we are given additional information about the training data, which however will not be available at test time. This situation is called learning using privileged information (LUPI). We introduce two maximum-margin techniques that are able to make use of this additional source of information, and we show that the framework is applicable to several scenarios that have been studied in computer vision before. Experiments with attributes, bounding boxes, image tags and rationales as additional information in object classification show promising results.
Book
Introduction to Neural Networks with Java, Second Edition, introduces the Java programmer to the world of Neural Networks and Artificial Intelligence. Neural network architectures, such as the feedforward, Hopfield, and self-organizing map architectures are discussed. Training techniques, such as backpropagation, genetic algorithms and simulated annealing are also introduced. Practical examples are given for each neural network. Examples include the traveling salesman problem, handwriting recognition, financial prediction, game strategy, mathematical functions, and Internet bots. All Java source code is available online for easy downloading.
Conference Paper
In this paper we propose a method that utilises privileged information, that is information that is available only at the training phase, in order to train Regression Forests for facial feature detection. Our method chooses the split functions at some randomly chose internal tree nodes according to the information gain calculated from the privileged information, such as head pose or gender. In this way the training patches arrive at leaves that tend to have low variance both in displacements to facial points and in privileged information. At each leaf node, we learn both the probability of the privileged information and regression models conditioned on it. During testing, the marginal probability of privileged information is estimated and the facial feature locations are localised using the appropriate conditional regression models. The proposed model is validated by comparing with very recent methods on two challenging datasets, namely Labelled Faces in the Wild and Labelled Face Parts in the Wild.