ArticlePDF Available

Learning Algorithms For Classification: A Comparison On Handwritten Digit Recognition

Authors:

Abstract and Figures

This paper compares the performance of several classifier algorithms on a standard database of handwritten digits. We consider not only raw accuracy, but also training time, recognition time, and memory requirements. When available, we report measurements of the fraction of patterns that must be rejected so that the remaining patterns have misclassification rates less than a given threshold. 1. Introduction Great strides have been achieved in pattern recognition in recent years. Particularly striking results have been attained in the area of handwritten digit recognition. This rapid progress has resulted from a combination of a number of developments including the proliferation of powerful, inexpensive computers, the invention of new algorithms that take advantage of these computers, and the availability of large databases of characters that can be used for training and testing. At AT&T Bell Laboratories we have developed a suite of classifier algorithms. In this paper we contras...
Content may be subject to copyright.
INPUT
28x28
feature maps
4@24x24
feature maps
4@12x12
feature maps
12@8x8
feature maps
12@4x4
Subsampling
Convolution
Convolution
Subsampling
Convolution
OUTPUT
10@1x1
−−−− 8.4 −−−−>
−−−− 7.6 −−−−>
−−−− 3.3 −−−−>
−−−− 3.6 −−−−>
1.6
1.7
1.1
1.1
1.1
0.9
0.7
2.4
1.1
1.1
400−10
pairwise
PCA+quadratic
1000 RBF
400−300−10
LeNet 1
LeNet 4
LeNet 4 / Local
LeNet 4 / K−NN
LeNet 5
Boosted LeNet 4
K−NN Euclidean
Tangent Distance
Soft Margin
0 0.5 1 1.5 2 2.5 3
3.2
3.7
1.8
1.4
1.6
0.5
8.1
1.9
1.8
40030010
LeNet 1
LeNet 4
LeNet 4 / Local
LeNet 4 / KNN
Boosted LeNet 4
KNN Euclidean
Tangent Distance
Soft Margin
0123456789
0.5
2
2.5
60
10
15
30
2000
1000
40
50
1000
2000
2000
40010
pairwise
PCA+quadratic
1000 RBF
40030010
LeNet 1
LeNet 4
LeNet 4 / Local
LeNet 4 / KNN
LeNet 5
Boosted LeNet 4
KNN Euclidean
Tangent Distance
Soft Margin
0 500 1000 1500 2000
0.016
0.072
0.1
0.44
0.49
0.012
0.068
−−− 12 MBytes −−−>
−−− 12 MBytes −−−>
0.24
0.21
−−− 12 MBytes −−−>
−−− 25 MBytes −−−>
−−− 11 MBytes −−−>
40010
pairwise
PCA+quadratic
1000 RBF
40030010
LeNet 1
LeNet 4
LeNet 4 / Local
LeNet 4 / KNN
LeNet 5
Boosted LeNet 4
KNN Euclidean
Tangent Distance
Soft Margin
0 0.1 0.2 0.3 0.4 0.5 0.6
... CNN architecture is a widely used deep learning tool for categorizing images, and it is also a promising approach for modelling applications that use a lot of data 33,34 . Using CNN with a gradient-based method, Lecun et al. 35 successfully tackled the problem of classifying handwritten digits. Figure 2 depicts the typical architecture of a CNN. ...
... The development of several fundamental CNN architectures for image recognition applications has resulted in their practical application to various complex visual imagery challenges. CNN architecture is a widely used deep learning tool for categorizing images, and it is also a promising approach for modeling applications that use a lot of data [33][34][35] . Using CNN in conjunction with a gradient-based method, Lecun et al. 35 successfully tackled the problem of classifying handwritten digits. ...
... CNN architecture is a widely used deep learning tool for categorizing images, and it is also a promising approach for modeling applications that use a lot of data [33][34][35] . Using CNN in conjunction with a gradient-based method, Lecun et al. 35 successfully tackled the problem of classifying handwritten digits. In this study, we develop the proposed models using pre-trained models such as VGG16, ResNet and Wide ResNet. ...
Article
Full-text available
COVID-19, a global pandemic, has killed thousands in the last three years. Pathogenic laboratory testing is the gold standard but has a high false-negative rate, making alternate diagnostic procedures necessary to fight against it. Computer Tomography (CT) scans help diagnose and monitor COVID-19, especially in severe cases. But, visual inspection of CT images takes time and effort. In this study, we employ Convolution Neural Network (CNN) to detect coronavirus infection from CT images. The proposed study utilized transfer learning on the three pre-trained deep CNN models, namely VGG-16, ResNet, and wide ResNet, to diagnose and detect COVID-19 infection from the CT images. However, when the pre-trained models are retrained, the model suffers the generalization capability to categorize the data in the original datasets. The novel aspect of this work is the integration of deep CNN architectures with Learning without Forgetting (LwF) to enhance the model’s generalization capabilities on both trained and new data samples. The LwF makes the network use its learning capabilities in training on the new dataset while preserving the original competencies. The deep CNN models with the LwF model are evaluated on original images and CT scans of individuals infected with Delta-variant of the SARS-CoV-2 virus. The experimental results show that of the three fine-tuned CNN models with the LwF method, the wide ResNet model’s performance is superior and effective in classifying original and delta-variant datasets with an accuracy of 93.08% and 92.32%, respectively.
... CNNs have emerged as one of the most potentially fruitful avenues in the computer vision field in recent years, delivering outstanding results in a variety of vision recognition tasks, including pose estimation [1], face recognition [2], digit recognition [3] [4], image classification [5][6] medical analysis [7], content-based image retrieval [8] [9] and object and pedestrian detection [10] [11]. ...
Conference Paper
Full-text available
Drones are unmanned aerial vehicles utilized for a broad range of functions, including delivery, aerial surveillance, traffic monitoring, architecture monitoring and even in War field. Indeed, drones confront significant obstacles while navigating independently in unstable and highly dynamic environments. In comparison with the standard "map-localize-plan" approaches, this research paper investigates an information-driven strategy to address the issues mentioned above. For this purpose, a convolutional neural network (CNN) is proposed to pilot a drone across city streets safely. Every input image generates two outputs: a steering angle for the drone's movement whilst dodging the hindrances and the possibility of a collision for alerting the Unmanned Aerial Vehicle (UAV) far away from spotting risky situations and responding quickly. However, gathering enough data in an unorganized outside area such as a metropolis is a challenge. It is not easy to collect data for the sake of the pilots, so data is used, which is collected by mounting a camera on the cars and bikes. This paper suggests that data collected by automobiles and bicycles can be used to train a UAV that would not threaten other vehicles or pedestrians. Dataset is divided using the train-test split method. Despite being trained on urban roads from the perspective of urban cars, the navigation strategy of the proposed architecture is very generalized. The high recall (99.38%), great accuracy (97.78%) and good f-score (94.81%) of the proposed technique suggest that it can be used for autonomous driving of drones. The proposed technique masters a navigation strategy that accomplishes a relatively higher level of accuracy, recall, f-score and precision, which shows that it outperforms previous approaches in the literature.
Article
Full-text available
Learning classification tasks of $$({2}^{n}\times {2}^{n})$$ ( 2 n × 2 n ) inputs typically consist of $$\le n(2\times 2$$ ≤ n ( 2 × 2 ) max-pooling (MP) operators along the entire feedforward deep architecture. Here we show, using the CIFAR-10 database, that pooling decisions adjacent to the last convolutional layer significantly enhance accuracies. In particular, average accuracies of the advanced-VGG with $$m$$ m layers (A-VGGm) architectures are 0.936, 0.940, 0.954, 0.955, and 0.955 for m = 6, 8, 14, 13, and 16, respectively. The results indicate A-VGG8’s accuracy is superior to VGG16’s, and that the accuracies of A-VGG13 and A-VGG16 are equal, and comparable to that of Wide-ResNet16. In addition, replacing the three fully connected (FC) layers with one FC layer, A-VGG6 and A-VGG14, or with several linear activation FC layers, yielded similar accuracies. These significantly enhanced accuracies stem from training the most influential input–output routes, in comparison to the inferior routes selected following multiple MP decisions along the deep architecture. In addition, accuracies are sensitive to the order of the non-commutative MP and average pooling operators adjacent to the output layer, varying the number and location of training routes. The results call for the reexamination of previously proposed deep architectures and their accuracies by utilizing the proposed pooling strategy adjacent to the output layer.
Article
Full-text available
Aiming at protecting device data privacy, Federated Learning (FL) is a framework of distributed machine learning in which devices’ local model parameters are exchanged with a centralized server without revealing the actual data. Hierarchical Federated Learning (HFL) framework was introduced to improve FL communication efficiency where devices are clustered and seek model consensus with the support of edge servers (e.g., base stations). Devices in a cluster submit their local model updates to their assigned local edge server for aggregation at each iteration. The edge servers transmit the aggregated models to a centralized server and establish a global consensus. However, similar to FL, adversaries may threaten the security and privacy of HFL. The client devices within a cluster may deliberately provide unreliable local model updates through poisoning attacks or poor-quality model updates due to inconsistent communication channels, increased device mobility, or inadequate device resources. To address the above challenges, this paper investigates the client selection problem in the HFL framework to eliminate the impact of unreliable clients while maximizing the global model accuracy of HFL. Each FL edge server is equipped with a Deep Reinforcement Learning (DRL)-based reputation model to optimally measure the reliability and trustworthiness of FL workers within its cluster. A Multi-Agent Deep Deterministic Policy Gradient (MADDPG) is utilized to enhance the accuracy and stability of the HFL global model, given the workers’ dynamic behaviors in the HFL environment. The experimental results indicate that our proposed MADDPG improves the accuracy and stability of HFL compared with the conventional reputation model and single-agent DDPG-based reputation model.
Conference Paper
In the changing era of AI, Deep Learning plays a vital role. It is basically a part of Machine Learning. The main attribute of Deep Learning is that its models works without any human intervention. As the new technologies taking place day by day Deep Learning models are used in several areas such as in Healthcare, Agriculture, Bioinformatics and so on. This study discusses about one of the deep learning model, namely CNN, its introduction, overview, building blocks of CNN, different architecture of CNN, applications in several Domain areas, issues and challenges where researchers used CNN model successfully.
Article
Full-text available
A training algorithm that maximizes the margin between the training patterns and the decision boundary is presented. The technique is applicable to a wide variety of classifiaction functions, including Perceptrons, polynomials, and Radial Basis Functions. The effective number of parameters is adjusted automatically to match the complexity of the problem. The solution is expressed as a linear combination of supporting patterns. These are the subset of training patterns that are closest to the decision boundary. Bounds on the generalization performance based on the leave-one-out method and the VC-dimension are given. Experimental results on optical character recognition problems demonstrate the good generalization obtained when compared with other learning algorithms. 1
Article
Results of recent research suggest that carefully designed multiplayer neural networks with local receptive fields and shared weights may be unique in providing low error rates on handwritten digit recognition tasks. This study, however, demonstrates that these networks, radial basis function (RBF) networks, and k nearest-neighbor (kNN) classifiers, all provide similar low error rates on a large handwritten digit database. The backpropagation network is overall superior in memory usage and classification time but can provide false positive classifications when the input is not a digit. The backpropagation network also has the longest training time. The RBF classifier requires more memory and more classification time, but less training time. When high accuracy is warranted, the RBF classifier can generate a more effective confidence judgment for rejecting ambiguous inputs. The simple kNN classifier can also perform handwritten digit recognition, but requires a prohibitively large amount of memory and is much slower at classification. Nevertheless, the simplicity of the algorithm and fast training characteristics makes the kNN classifier an attractive candidate in hardware-assisted classification tasks. These results on a large, high input dimensional problem demonstrate that practical constraints including training time, memory usage, and classification time often constrain classifier selection more strongly than small differences in overall error rate.
Conference Paper
An evaluation is made of several neural network classifiers, comparing their performance on a typical problem, namely handwritten digit recognition. For this purpose, the authors use a database of handwritten digits, with relatively uniform handwriting styles. The authors propose a novel way of organizing the network architectures by training several small networks so as to deal separately with subsets of the problem, and then combining the results. This approach works in conjunction with various techniques including: layered networks with one or several layers of adaptive connections, fully connected recursive networks, ad hoc networks with no adaptive connections, and architectures with second-degree polynomial decision surfaces.< >
Article
Very rarely are training data evenly distributed in the input space. Local learning algorithms attempt to locally adjust the capacity of the training system to the properties of the training set in each area of the input space. The family of local learning algorithms contains known methods, like the k-nearest neighbors method (kNN) or the radial basis function networks (RBF), as well as new algorithms. A single analysis models some aspects of these algorithms. In particular, it suggests that neither kNN or RBF, nor nonlocal classifiers, achieve the best compromise between locality and capacity. A careful control of these parameters in a simple local learning algorithm has provided a performance breakthrough for an optical character recognition problem. Both the error rate and the rejection performance have been significantly improved.
Conference Paper
A mixed analog/digital chip (ANNA) for fast 2-d convolution and matrix-vector multiplication has been developed (peak speed 20,000 MOPS). Two of these chips have been integrated on a 6U VME board, to serve as a high-speed platform for a wide variety of algorithms used in neural-network applications as well as in image analysis. The system has been tested for such tasks as character recognition, noise removal, and text location as well as for emulating cellular neural networks (CNN). A sustained speed of up to 2 billion connections per second (GC/s) and a recognition speed of 1000 characters per second with a sophisticated neural network has been measured
We present an application of back-propagation networks to handwritten digit recognition. Minimal preprocessing of the data was required, but architecture of the network was highly constrained and specifically designed for the task. The input of the network consists of normalized images of isolated digits. The method has 1% error rate and about a 9% reject rate on zipcode digits provided by the U.S. Postal Service. 1 INTRODUCTION The main point of this paper is to show that large back-propagation (BP) networks can be applied to real image-recognition problems without a large, complex preprocessing stage requiring detailed engineering. Unlike most previous work on the subject (Denker et al., 1989), the learning network is directly fed with images, rather than feature vectors, thus demonstrating the ability of BP networks to deal with large amounts of low level information. Previous work performed on simple digit images (Le Cun, 1989) showed that the architecture of the network s...
Boosting Performance i n N e u r al Networks
  • H Drucker
  • R Schapire
  • P Simard
H. Drucker, R. Schapire, and P. Simard, Boosting Performance i n N e u r al Networks, International Journal of Pattern Recognition and Arti cial Intelligence 7 705-720 (1993).