Article

The mnist database of handwritten digits

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... This research presents the handwritten digit classification on the MNIST dataset [23] task using the HDC model as in [24] as a case study for illustration. A distinguishing feature of our approach is the comprehensive analysis of Hamming distance characteristics in HDC classification tasks at the software level before hardware implementation. ...
... To assess the effects of these variations, Monte Carlo simulations were performed, accounting for variations in transistors and MTJ resistances, using the simulation parameters outlined in Table II. The evaluation utilizes query and class vectors derived from HDC model in [24] trained on the MNIST dataset [23], as previously discussed in Section III, providing a realistic test framework for evaluating the proposed design performance under actual classification conditions. ...
... To evaluate the energy efficiency of the proposed STT-HDC architecture, we conducted a case study based on the handwritten digit classification task using the MNIST dataset [23], following the hyperdimensional computing (HDC) model described in [24]. The classification task involves 10 classes, with one 1000-bit trained hypervector per class stored in the memory array. ...
Article
Full-text available
This paper presents an efficient in-memory hyperdimensional computing (HDC) design based on spin transfer-torque magnetoresistive RAM (STT-MRAM), named STT-HDC. A novel time-domain sense amplifier circuit is proposed that significantly simplifies Hamming distance computation of HDC models while dramatically improving energy efficiency. Our design is evaluated using HSPICE simulation under the 28nm FD-SOI technology PDK (Process Design Kit). Simulation results indicate that our approach delivers an energy efficiency of 3.12(fJ) per bit, achieving highly reduction in energy consumption relative to previous implementations. This substantial enhancement in energy performance, coupled with the simplified computation model, paves the way for more practical and scalable HDC systems in resource-constrained environments. The influence of variations in different process corners, and temperature impacts are also thoroughly covered in the analysis.
... Note that the autoencoder is fine-tuned in order to minimize the reconstruction loss. The authors reported DEC accuracies of 84.3% and 75.63% using MNIST [40] and REUTERS [41] datasets, respectively. Later, DEC was adopted as a baseline for the empirical comparison of several deep clustering approaches [34,42,43]. ...
... The proposed DC-SSDEC was first assessed using three benchmark datasets widely adopted by deep clustering researchers. The first is the MNIST dataset [40], which comprises 70,000 grayscale images of handwritten digits. Each image is represented as a 28 × 28-pixel grid, resulting in a 784-dimensional feature vector. ...
... These experiments aimed at assessing the performance of DC-SSDEC using benchmark datasets. Specifically, the objective function introduced in (1) was validated using MNIST [40], USPS [69], and STL-10 datasets. One should note that the model hyperparameters went through a tuning process to obtain better settings and initialization. ...
Article
Full-text available
Semi-supervised clustering can be viewed as a clustering paradigm that exploits both labeled and unlabeled data to steer learning accurate data clusters and avoid local minimum solutions. Nonetheless, the attempts to refine existing semi-supervised clustering methods are relatively limited when compared to the advancements witnessed in the current benchmark methods in fully unsupervised clustering. This research introduces a novel semi-supervised method for deep clustering that leverages deep neural networks and fuzzy memberships to better capture the data partitions. In particular, the proposed Dual-Constraint-based Semi-Supervised Deep Clustering (DC-SSDEC) method utilizes two sets of pairwise soft constraints; “should-link” and “shouldNot-link”, to guide the clustering process. The intended clustering task is expressed as an optimization of a newly designed objective function. Additionally, DC-SSDEC performance was evaluated through comprehensive experiments using three real-world and benchmark datasets. Moreover, a comparison with related state-of-the-art clustering techniques was conducted to showcase the DC-SSDEC outperformance. In particular, DC-SSDEC significance consists of the proposed dual-constraint formulation and its integration into a novel objective function. This contribution yielded an improvement in the resulting clustering performance compared to relevant state-of-the-art approaches. In addition, the assessment of the proposed model using real-world datasets represents another contribution of this research. In fact, increases of 3.25%, 1.44%, and 1.82% in the clustering accuracy were gained by DC-SSDEC over the best performing single-constraint-based approach, using MNIST, STL-10, and USPS datasets, respectively.
... We evaluate all three dynamics across four dimensions: (i) classification performance on image (MNIST [7], FMNIST [8], CIFAR-10 [9]) and tabular (Iris [10], Breast Cancer [11]) datasets, (ii) resiliency to membership inference attacks (MIAs), (iii) computational cost (GPU/CPU memory and power), and (iv) transferability of learned representations across tasks. ...
... Our experiments utilize both image-based and tabular datasets. For image classification, we used MNIST [7], Fashion-MNIST (FMNIST) [8], and CIFAR-10 [9], each consisting of grayscale or RGB images with varying resolutions. To ensure architectural consistency across experiments, all images were resized to 32 × 32 pixels. ...
Preprint
Full-text available
Biological neurons exhibit diverse temporal spike patterns, which are believed to support efficient, robust, and adaptive neural information processing. While models such as Izhikevich can replicate a wide range of these firing dynamics, their complexity poses challenges for directly integrating them into scalable spiking neural networks (SNN) training pipelines. In this work, we propose two probabilistically driven, input-level temporal spike transformations: Poisson-Burst and Delayed-Burst that introduce biologically inspired temporal variability directly into standard Leaky Integrate-and-Fire (LIF) neurons. This enables scalable training and systematic evaluation of how spike timing dynamics affect privacy, generalization, and learning performance. Poisson-Burst modulates burst occurrence based on input intensity, while Delayed-Burst encodes input strength through burst onset timing. Through extensive experiments across multiple benchmarks, we demonstrate that Poisson-Burst maintains competitive accuracy and lower resource overhead while exhibiting enhanced privacy robustness against membership inference attacks, whereas Delayed-Burst provides stronger privacy protection at a modest accuracy trade-off. These findings highlight the potential of biologically grounded temporal spike dynamics in improving the privacy, generalization and biological plausibility of neuromorphic learning systems.
... We consider the binary classification problem on MNIST dataset [33] to distinguish similar handwritten digits 4 and 9. We adopt the support vector machine (SVM) as the classification algorithm and solve the problem under the frameworks of BDR, DRO, and SAA, respectively. ...
... Tasks: We apply the proposed BDR model to 2D image classification tasks using MNIST [33], CIFAR-10, and CIFAR-100 [34] datasets, as well as 3D point cloud classification utilizing ModelNet40 [35] dataset. To evaluate the generalization capacity of our method, we perform experiments under a low-shot data setting; that is, the model is learned on a subset of the training dataset. ...
Article
Full-text available
Trustworthy machine learning aims at combating distributional uncertainties in training data distributions compared to population distributions. Typical treatment frameworks include the Bayesian approach, (min-max) distributionally robust optimization (DRO), and regularization. However, three issues have to be raised: 1) the prior distribution in the Bayesian method and the regularizer in the regularization method are difficult to specify; 2) the DRO method tends to be overly conservative; 3) all the three methods are biased estimators of the true optimal cost. This paper studies a new framework that unifies the three approaches and addresses the three challenges above. The asymptotic properties (e.g., consistencies and asymptotic normalities), non-asymptotic properties (e.g., generalization bounds and unbiasedness), and solution methods of the proposed model are studied. The new model reveals the trade-off between the robustness to the unseen data and the specificity to the training data. Experiments on various real-world tasks validate the superiority of the proposed learning framework.
... The Environment configuration is shown in .The MNIST [10], Fashion-MNIST [18] and CIFAR-100 [8] datasets are used to test the effect of neural architecture search. For all datasets, the crossover probability is set 0.9, all mutation operators probability is set 0.2. ...
... easier to identify and the search space is simpler. For other datasets, the number of ResNet blocks is chosen randomly between [1,10]. The fixed part, a 64-channel 3 × 3 convolution followed by a Batch-Norm [7] layer and a Relu [4] layer, is included only for the Fashion-MNIST and CIFAR-100 datasets. ...
Preprint
Full-text available
This paper proposes a neural architecture search space using ResNet as a framework, with search objectives including parameters for convolution, pooling, fully connected layers, and connectivity of the residual network. In addition to recognition accuracy, this paper uses the loss value on the validation set as a secondary objective for optimization. The experimental results demonstrate that the search space of this paper together with the optimisation approach can find competitive network architectures on the MNIST, Fashion-MNIST and CIFAR100 datasets.
... Instead, the vanilla Softmax operator yields a right-stochastic matrix with distance 1.12 ±0. 33 . Hence Sinkformer attention is only approximately doubly stochastic. ...
... We evaluate the different ViTs on MNIST [33], Fashion MNIST [57], seven datasts from the MedMNIST benchmark [58] and a compositional task requiring multistep reasoning as proposed by Hoffmann et al. [24]. In that task, a 2 × 2 grid contains two MNIST digits (upper left and lower right) and two FashionMNIST items (upper right and lower left). ...
Preprint
Full-text available
At the core of the Transformer, the Softmax normalizes the attention matrix to be right stochastic. Previous research has shown that this often destabilizes training and that enforcing the attention matrix to be doubly stochastic (through Sinkhorn's algorithm) consistently improves performance across different tasks, domains and Transformer flavors. However, Sinkhorn's algorithm is iterative, approximative, non-parametric and thus inflexible w.r.t. the obtained doubly stochastic matrix (DSM). Recently, it has been proven that DSMs can be obtained with a parametric quantum circuit, yielding a novel quantum inductive bias for DSMs with no known classical analogue. Motivated by this, we demonstrate the feasibility of a hybrid classical-quantum doubly stochastic Transformer (QDSFormer) that replaces the Softmax in the self-attention layer with a variational quantum circuit. We study the expressive power of the circuit and find that it yields more diverse DSMs that better preserve information than classical operators. Across multiple small-scale object recognition tasks, we find that our QDSFormer consistently surpasses both a standard Vision Transformer and other doubly stochastic Transformers. Beyond the established Sinkformer, this comparison includes a novel quantum-inspired doubly stochastic Transformer (based on QR decomposition) that can be of independent interest. The QDSFormer also shows improved training stability and lower performance variation suggesting that it may mitigate the notoriously unstable training of ViTs on small-scale data.
... The networks will be trained using a standard task of handwritten digit recognition from MNIST database 25 . This database contains 70,000 images of size 28×28 pixels in a grayscale. ...
Preprint
In this paper, we investigate the impact of noise on a simplified trained convolutional network. The types of noise studied originate from a real optical implementation of a neural network, but we generalize these types to enhance the applicability of our findings on a broader scale. The noise types considered include additive and multiplicative noise, which relate to how noise affects individual neurons, as well as correlated and uncorrelated noise, which pertains to the influence of noise across one layers. We demonstrate that the propagation of uncorrelated noise primarily depends on the statistical properties of the connection matrices. Specifically, the mean value of the connection matrix following the layer impacted by noise governs the propagation of correlated additive noise, while the mean of its square contributes to the accumulation of uncorrelated noise. Additionally, we propose an analytical assessment of the noise level in the network's output signal, which shows a strong correlation with the results of numerical simulations.
... We evaluate our Fed-GAN framework on MNIST [57], Fashion-MNIST [58] and CelebA [59]. The experiments revolve around the quality of the synthetic data, including the accuracy of the synthetic data on the test dataset, the visual quality, and the power to protect privacy. ...
Article
Full-text available
Huge amounts of data from various sources are substantial to dependable distributed machine learning, especially for trustworthy federated learning (FL). However, existing FL methods are difficult to collect enough data for training the global model more accurately, especially in cross-device scenarios. In this paper, we propose a new federated generative adversarial network empowered by differential privacy and knowledge transfer named Fed-GAN, which can be used to address the problem of data shortage and prevent generator leakage from resource-constrained devices, as well as generate high-quality synthetic data while ensuring strict DP guarantees. Different from other generative model methods, our Fed-GAN framework can achieve efficient and secure generative model training and limited permission for resource-constrained devices to prevent them from leaking or misusing the generator. In addition, we propose a pHash-KT method for our Fed-GAN framework, which selects potentially high-quality data through the knowledge of each client for improving the utility of synthetic data. Our FedGAN framework satisfies ( \frac{2kJ\lambda}{\sigma^2}+\frac{log 1/\delta}{\lambda-1},\delta} )-DP, and also has high resistance when number of adversaries is 10%-70% of the total number of clients. Extensive experiments demonstrate that our Fed-GAN framework not only generates high-quality synthetic data, but also provides strict DP guarantees, compared with other generative model methods. Our code is publicly available at https://github.com/daxx1/fed-gan
... To study coupling between networks we design a special experiment protocol, illustrated in Figure 2. First, we take κ = 0 and train Hopfield subnetwork to recognise MNIST digits [39]. After this stage we have trained weights W and bias term b such that starting from MNIST digit x(0), last 10 elements of x(1) converge to class labels encoded as +1 for correct class and −1 for incorrect class. ...
Preprint
Full-text available
Artificial Kuramoto oscillatory neurons were recently introduced as an alternative to threshold units. Empirical evidence suggests that oscillatory units outperform threshold units in several tasks including unsupervised object discovery and certain reasoning problems. The proposed coupling mechanism for these oscillatory neurons is heterogeneous, combining a generalized Kuramoto equation with standard coupling methods used for threshold units. In this research note, we present a theoretical framework that clearly distinguishes oscillatory neurons from threshold units and establishes a coupling mechanism between them. We argue that, from a biological standpoint, oscillatory and threshold units realise distinct aspects of neural coding: roughly, threshold units model intensity of neuron firing, while oscillatory units facilitate information exchange by frequency modulation. To derive interaction between these two types of units, we constrain their dynamics by focusing on dynamical systems that admit Lyapunov functions. For threshold units, this leads to Hopfield associative memory model, and for oscillatory units it yields a specific form of generalized Kuramoto model. The resulting dynamical systems can be naturally coupled to form a Hopfield-Kuramoto associative memory model, which also admits a Lyapunov function. Various forms of coupling are possible. Notably, oscillatory neurons can be employed to implement a low-rank correction to the weight matrix of a Hopfield network. This correction can be viewed either as a form of Hebbian learning or as a popular LoRA method used for fine-tuning of large language models. We demonstrate the practical realization of this particular coupling through illustrative toy experiments.
... To demonstrate SMPD's versatility beyond OAM retrieval, we extended its application to handwritten digit recognition using the MNIST dataset 31 . Conventional speckle-based digit classi cation relies on full-eld speckle imaging and computationally intensive spatial feature extraction. ...
Preprint
Full-text available
Orbital angular momentum (OAM) recognition of vortex beams is critical for applications ranging from optical communications to quantum technologies. However, conventional approaches designed for free-space propagation struggle when vortex beams propagate within or through complex media, such as multimode fibers (MMF), and often rely on high-resolution imaging sensors with tens of thousands of pixels to record dense intensity profiles. Here, we introduce a speckle-driven OAM recognition technique that exploits the intrinsic correlation between speckle patterns and OAM states, circumventing the limitations of scattering media while drastically reducing sampling requirements. Our method, termed spatially multiplexed points detection (SMPD), extracts intensity information from spatially distributed points in a multiplexed speckle plane. Remarkably, it achieves > 99% retrieval accuracy for OAMs recognition using just 16 sampling points, corresponding to a sampling density of 0.024%—4096 times lower than conventional imaging-based approaches. Furthermore, high-capacity OAM-multiplexed communication decoding with an error rate of < 0.2% and handwritten digit recognition with an accuracy of 89% are implemented to verify the versatility of SMPD. This work transcends the trade-off between sampling density and accuracy, establishing a scalable platform for resource-efficient photonic applications like quantum communication and endoscopic sensing.
... Datasets. We evaluated the performance of the Epistemic Wrapper on three classification benchmarks: MNIST [29], Fashion-MNIST [55] and CIFAR-10 [27]. The MNIST dataset comprises 70,000 grayscale images of handwritten digits (0-9), each with a resolution of 28 × 28 pixels, and is mostly used for classification and pattern recognition tasks due to its simplicity and accessibility. ...
Preprint
Full-text available
Uncertainty estimation is pivotal in machine learning, especially for classification tasks, as it improves the robustness and reliability of models. We introduce a novel `Epistemic Wrapping' methodology aimed at improving uncertainty estimation in classification. Our approach uses Bayesian Neural Networks (BNNs) as a baseline and transforms their outputs into belief function posteriors, effectively capturing epistemic uncertainty and offering an efficient and general methodology for uncertainty quantification. Comprehensive experiments employing a Bayesian Neural Network (BNN) baseline and an Interval Neural Network for inference on the MNIST, Fashion-MNIST, CIFAR-10 and CIFAR-100 datasets demonstrate that our Epistemic Wrapper significantly enhances generalisation and uncertainty quantification.
... setup, giving a new untrained model the same training data for training and the same test data for evaluation, allowing for full apples-to-apples comparisons. This approach was wildly successful, with canonical benchmarks such as MNIST (LeCun & Cortes, 2010) and ImageNet (Deng et al., 2009) responsible for driving incredibly rapid progress in computer vision, for instance, and benchmarks such as (Rajpurkar et al., 2016;Marcus et al., 1993;Diemert Eustache, Betlei Artem et al., 2018) moving forward email spam classification, natural language processing, and many others. Websites such as the UCI repository (Kelly et al.) and ...
Preprint
Full-text available
In this position paper, we observe that empirical evaluation in Generative AI is at a crisis point since traditional ML evaluation and benchmarking strategies are insufficient to meet the needs of evaluating modern GenAI models and systems. There are many reasons for this, including the fact that these models typically have nearly unbounded input and output spaces, typically do not have a well defined ground truth target, and typically exhibit strong feedback loops and prediction dependence based on context of previous model outputs. On top of these critical issues, we argue that the problems of {\em leakage} and {\em contamination} are in fact the most important and difficult issues to address for GenAI evaluations. Interestingly, the field of AI Competitions has developed effective measures and practices to combat leakage for the purpose of counteracting cheating by bad actors within a competition setting. This makes AI Competitions an especially valuable (but underutilized) resource. Now is time for the field to view AI Competitions as the gold standard for empirical rigor in GenAI evaluation, and to harness and harvest their results with according value.
... We use the term instance to refer to a data example along with the corresponding verification specification. We adopt some MNIST [22] models with Sigmoid and Tanh activation functions from previous works [27,33,34], along with their data instances. Besides, to test our method on more models with various nonlinearities using a consistent training setting for all the models, we train many new models with various nonlinearities on CIFAR-10 [21] by PGD adversarial training [25], using an ℓ ∞ perturbation with ϵ = 1/255 in both training and verification. ...
Chapter
Full-text available
Branch-and-bound (BaB) is among the most effective techniques for neural network (NN) verification. However, existing works on BaB for NN verification have mostly focused on NNs with piecewise linear activations, especially ReLU networks. In this paper, we develop a general framework, named GenBaB, to conduct BaB on general nonlinearities to verify NNs with general architectures, based on linear bound propagation for NN verification. To decide which neuron to branch, we design a new branching heuristic which leverages linear bounds as shortcuts to efficiently estimate the potential improvement after branching. To decide nontrivial branching points for general nonlinear functions, we propose to pre-optimize branching points, which can be efficiently leveraged during verification with a lookup table. We demonstrate the effectiveness of our GenBaB on verifying a wide range of NNs, including NNs with activation functions such as Sigmoid, Tanh, Sine and GeLU, as well as NNs involving multi-dimensional nonlinear operations such as multiplications in LSTMs and Vision Transformers. Our framework also allows the verification of general nonlinear computation graphs and enables verification applications beyond simple NNs, particularly for AC Optimal Power Flow (ACOPF). GenBaB is part of the latest α, ⁣β\alpha ,\!\beta α , β -CROWN 6^6 6 ( https://github.com/Verified-Intelligence/alpha-beta-CROWN ), the winner of the 4th and the 5th International Verification of Neural Networks Competition (VNN-COMP 2023 and 2024). Code for reproducing the experiments is available at https://github.com/shizhouxing/GenBaB . Appendices can be found at http://arxiv.org/abs/2405.21063 .
... One of the major challenges hindering progress in these studies is the limited availability of high-quality emotional speech data. In the realm of image processing, popular datasets such as CIFAR10 [12], ImageNet [13], and MNIST [14] have been extensively utilized to train deep learning models. However, these large-scale datasets are inadequate for emotional speech datasets. ...
Article
Full-text available
Although emotional speech recognition has received increasing emphasis in research and applications, it remains challenging due to the diversity and complexity of emotions and limited datasets. To address these limitations, we propose a novel approach utilizing DCGAN to augment data from the RAVDESS and EmoDB databases. Then, we assess the efficacy of emotion recognition using mel-spectrogram data by utilizing a model that combines CNN and BiLSTM. The preliminary experimental results reveal that the suggested technique contributes to enhancing the emotional speech identification performance. The results of this study provide directions for further development in the field of emotional speech recognition and the potential for practical applications.
... The federated learning training on convolutional neural networks performed in this research utilized the MNIST and CIFAR-10 standard datasets. MNIST comprises 60,000 training images of size 28 * 28 and 10,000 test images, with 1 input channel and images categorized into 10 classes [33]. CIFAR-10 includes 50,000 training images of size 32 * 32 and 10,000 test images, with 3 input channels and images categorized into 10 classes [34]. ...
... In this experiment, we use the MNIST handwritten digit dataset [48] and the Fashion-MNIST dataset [49]. The datasets consist of 60,000 training samples and 10,000 test samples. ...
Article
Full-text available
Autoencoders are a type of deep neural network and are widely used for unsupervised learning, particularly in tasks that require feature extraction and dimensionality reduction. While most research focuses on compressing input data, less attention has been given to reducing the size and complexity of the autoencoder model itself, which is crucial for deployment on resource-constrained edge devices. This paper introduces a layer-wise pruning algorithm specifically for multilayer perceptron-based autoencoder. The resulting pruned model is referred to as a Shapley Value-based Sparse AutoEncoder (SV-SAE). Using cooperative game theory, the proposed algorithm models the autoencoder as a coalition of interconnected units and links, where the Shapley value quantifies their individual contributions to overall performance. This enables the selective removal of less important components, achieving an optimal balance between sparsity and accuracy. Experimental results confirm that the SV-SAE reaches an accuracy of 99.25%, utilizing only 10% of the original links. Notably, the SV-SAE remains robust under high sparsity levels with minimal performance degradation whereas other algorithms experience sharp declines as the pruning ratio increases. Designed for edge environments, the SV-SAE offers an interpretable framework for controlling layer-wise sparsity while preserving essential features in latent representations. The results highlight its potential for efficient deployment in resource-constrained scenarios, where model size and inference speed are critical factors.
... To showcase the applicability of the Brenier-GAN to computable real world problems, we trained Brenier-GANs to generated handwritten digits based on the MNIST dataset [41] as well as T-shirts based on the Fashion-MNIST dataset [42]. The MNIST dataset consists of 60 000 gray-scale trainings images of handwritten digits from 0 to 9. Each digit is centered in a 32 × 32 image. ...
Preprint
Brenier proved that under certain conditions on a source and a target probability measure there exists a strictly convex function such that its gradient is a transport map from the source to the target distribution. This function is called the Brenier potential. Furthermore, detailed information on the H\"older regularity of the Brenier potential is available. In this work we develop the statistical learning theory of generative adversarial neural networks that learn the Brenier potential. As by the transformation of densities formula, the density of the generated measure depends on the second derivative of the Brenier potential, we develop the universal approximation theory of ReCU networks with cubic activation ReCU(x)=max{0,x}3\mathtt{ReCU}(x)=\max\{0,x\}^3 that combines the favorable approximation properties of H\"older functions with a Lipschitz continuous density. In order to assure the convexity of such general networks, we introduce an adversarial training procedure for a potential function represented by the ReCU networks that combines the classical discriminator cross entropy loss with a penalty term that enforces (strict) convexity. We give a detailed decomposition of learning errors and show that for a suitable high penalty parameter all networks chosen in the adversarial min-max optimization problem are strictly convex. This is further exploited to prove the consistency of the learning procedure for (slowly) expanding network capacity. We also implement the described learning algorithm and apply it to a number of standard test cases from Gaussian mixture to image data as target distributions. As predicted in theory, we observe that the convexity loss becomes inactive during the training process and the potentials represented by the neural networks have learned convexity.
... A. Experiment Settings 1) Datasets: Following the existing work, experiments were conducted on three commonly used datasets: NICO-Animal [37], NICO-Vehicle [37], and ColorMNIST [38]. Table I provides detailed statistics for these datasets. ...
Preprint
Federated learning aims to collaboratively model by integrating multi-source information to obtain a model that can generalize across all client data. Existing methods often leverage knowledge distillation or data augmentation to mitigate the negative impact of data bias across clients. However, the limited performance of teacher models on out-of-distribution samples and the inherent quality gap between augmented and original data hinder their effectiveness and they typically fail to leverage the advantages of incorporating rich contextual information. To address these limitations, this paper proposes a Federated Causal Augmentation method, termed FedCAug, which employs causality-inspired data augmentation to break the spurious correlation between attributes and categories. Specifically, it designs a causal region localization module to accurately identify and decouple the background and objects in the image, providing rich contextual information for causal data augmentation. Additionally, it designs a causality-inspired data augmentation module that integrates causal features and within-client context to generate counterfactual samples. This significantly enhances data diversity, and the entire process does not require any information sharing between clients, thereby contributing to the protection of data privacy. Extensive experiments conducted on three datasets reveal that FedCAug markedly reduces the model's reliance on background to predict sample labels, achieving superior performance compared to state-of-the-art methods.
... In this study, we follow a similar experimental setup to Liu et al. (2020) and explore the relationship between generated image quality and estimated density using the MNIST dataset-a widely recognized benchmark comprising 70,000 grayscale images (28×28 pixels) of handwritten digits (LeCun & Cortes, 2010). We trained a DDPM on this dataset and, by selecting the lowest diffusion timestep (t = 1), obtained an estimate of the score function for individual images. ...
Preprint
We propose a novel method for density estimation that leverages an estimated score function to debias kernel density estimation (SD-KDE). In our approach, each data point is adjusted by taking a single step along the score function with a specific choice of step size, followed by standard KDE with a modified bandwidth. The step size and modified bandwidth are chosen to remove the leading order bias in the KDE. Our experiments on synthetic tasks in 1D, 2D and on MNIST, demonstrate that our proposed SD-KDE method significantly reduces the mean integrated squared error compared to the standard Silverman KDE, even with noisy estimates in the score function. These results underscore the potential of integrating score-based corrections into nonparametric density estimation.
... These are built from data based on a predefined distance metric and are highly suitable for predominantly discrete feature spaces; here, counterfactuals are retrieved through pathfinding. Specifically, we experiment on three tabular -German Credit (Hofmann, 1994), Adult Income (Becker & Kohavi, 1996) and Credit Default (Yeh, 2009) -and three image -MNIST (LeCun, 1998) as well as BreastMNIST and PneumoniaMNIST (from the MedMNIST collection (Yang, Shi, and Ni, 2021)) -data set (Sect. 5). ...
Article
Full-text available
Counterfactual explanations are the de facto standard when tasked with interpreting decisions of (opaque) predictive models. Their generation is often subject to technical and domain-specific constraints that aim to maximise their real-life utility. In addition to considering desiderata pertaining to the counterfactual instance itself, guaranteeing existence of a viable path connecting it with the factual data point has recently gained relevance. While current explainability approaches ensure that the steps of such a journey as well as its destination adhere to selected constraints, they neglect the multiplicity of these counterfactual paths. To address this shortcoming we introduce the novel concept of explanatory multiverse that encompasses all the possible counterfactual journeys. We define it using vector spaces, showing how to navigate, reason about and compare the geometry of counterfactual trajectories found within it. To this end, we overview their spatial properties–such as affinity, branching, divergence and possible future convergence–and propose an all-in-one metric, called opportunity potential, to quantify them. Notably, the explanatory process offered by our method grants explainees more agency by allowing them to select counterfactuals not only based on their absolute differences but also according to the properties of their connecting paths. To demonstrate real-life flexibility, benefit and efficacy of explanatory multiverse we propose its graph-based implementation, which we use for qualitative and quantitative evaluation on six tabular and image data sets.
... We evaluate the performance of LUMEN-PRO on two popular multi-task learning datasets. The first is MNIST family, which consists of four public image classification datasets: MNIST-10 (MNIST) 10 , Fashion-MNIST (FMNIST) 11 , Kuzushiji-MNIST (KMNIST) 12 , and Extension-MNIST-Letters (EMNIST) 13 . For EMNIST, we customize the dataset by selecting the first ten classes "A-J". ...
Article
Full-text available
The democratization of AI encourages multi-task learning (MTL), demanding more parameters and processing time. To achieve highly energy-efficient MTL, Diffractive Optical Neural Networks (DONNs) have garnered attention due to extremely low energy and high computation speed. However, implementing MTL on DONNs requires manually reconfiguring & replacing layers, and rebuilding & duplicating the physical optical systems. To overcome the challenges, we propose LUMEN-PRO, an automated MTL framework using DONNs. We first propose to automate MTL utilizing an arbitrary backbone DONN and a set of tasks, resulting in a high-accuracy multi-task DONN model with small memory footprint that surpasses existing MTL. Second, we leverage the rotability of the physical optical system and replace task-specific layers with rotation of the corresponding shared layers. This replacement eliminates the storage requirement of task-specific layers, further optimizing the memory footprint. LUMEN-PRO provides flexibility in identifying optimal sharing patterns across diverse datasets, facilitating the search for highly energy-efficient DONNs. Experiments show that LUMEN-PRO provides up to 49.58% higher accuracy and 4× better cost efficiency than single-task and existing DONN approaches. It achieves memory lower bound of MTL, with memory efficiency matching single-task models. Compared to IBM-TrueNorth, LUMEN-PRO achieves an 8.78×8.78\times energy efficiency gain, while it matches Nanophotonic in efficiency but surpasses it in per-operator efficiency due to its larger system.
... The dataset used for our experiment is the MNIST image dataset [44] which has 10 classes, 60,000 images used for training and 10,000 images for testing. We perform all our experiments in non-iid settings and keep the total number of clients to be 100. ...
Article
Full-text available
Federated Learning (FL) is a machine learning training method that leverages local model gradients instead of accessing private data from individual clients, ensuring privacy. However, the practical implementation of FL faces significant challenges. Heterogeneous clients and edge devices with varying computational abilities and unreliable communication channels introduce latency issues to the algorithm. Furthermore, the algorithm is susceptible to attacks from malicious clients, allowing them to insert unwanted updates while benefiting from the global model. These challenges severely impact the algorithm’s performance, rendering it unsuitable for real-time applications. To address these issues, we propose FedHSP, a comprehensive system that tackles device heterogeneity and protects against various forms of attacks. FedHSP incorporates multiple model complexities to accommodate heterogeneous clients. Additionally, it employs a Variational Auto Encoder with dynamic thresholding to detect and eliminate malicious clients. In this paper, we demonstrate FedHSP’s effectiveness in detecting model poisoning attacks. We also show the mitigation of malicious model updates sent to the server. The evaluation is done using the MNIST dataset under various settings. Our experiment results show the drastic performance deviation due to attacks and the successful detection and mitigation with the proposed system.
... In this section, we numerically evaluate the performance of our method compared with other methods on both synthetic problems and a data hyper-cleaning (DHC) problem on the MNIST [53] dataset. For our methods, we include the result for different choices for α and γ from Theorem (4.1) and Theorem (4.4). ...
Preprint
Bilevel optimization is a fundamental tool in hierarchical decision-making and has been widely applied to machine learning tasks such as hyperparameter tuning, meta-learning, and continual learning. While significant progress has been made in bilevel optimization, existing methods predominantly focus on the {nonconvex-strongly convex, or the} nonconvex-PL settings, leaving the more general nonconvex-nonconvex framework underexplored. In this paper, we address this gap by developing an efficient gradient-based method inspired by the recently proposed Relaxed Gradient Flow (RXGF) framework with a continuous-time dynamic. In particular, we introduce a discretized variant of RXGF and formulate convex quadratic program subproblems with closed-form solutions. We provide a rigorous convergence analysis, demonstrating that under the existence of a KKT point and a regularity assumption {(lower-level gradient PL assumption)}, our method achieves an iteration complexity of O(1/ϵ1.5)\mathcal{O}(1/\epsilon^{1.5}) in terms of the squared norm of the KKT residual for the reformulated problem. Moreover, even in the absence of the regularity assumption, we establish an iteration complexity of O(1/ϵ3)\mathcal{O}(1/\epsilon^{3}) for the same metric. Through extensive numerical experiments on convex and nonconvex synthetic benchmarks and a hyper-data cleaning task, we illustrate the efficiency and scalability of our approach.
... As the first experiment, we consider an MLP with one hidden layer of width 300 trained for MNIST classification [34]. We train the MLP for 200 epochs using stochastic gradient descent with an initial learning rate η = 0.001 and a momentum of 0.9. ...
Preprint
As state of the art neural networks (NNs) continue to grow in size, their resource-efficient implementation becomes ever more important. In this paper, we introduce a compression scheme that reduces the number of computations required for NN inference on reconfigurable hardware such as FPGAs. This is achieved by combining pruning via regularized training, weight sharing and linear computation coding (LCC). Contrary to common NN compression techniques, where the objective is to reduce the memory used for storing the weights of the NNs, our approach is optimized to reduce the number of additions required for inference in a hardware-friendly manner. The proposed scheme achieves competitive performance for simple multilayer perceptrons, as well as for large scale deep NNs such as ResNet-34.
... We evaluate GRANITE on image and tabular datasets, namely, MNIST [53] and Purchase100 [54]. For the earlier, we train a CNN comprising two convolutional layers followed by two fully-connected layers, similarly to prior work [50]. ...
Preprint
Full-text available
Gossip Learning (GL) is a decentralized learning paradigm where users iteratively exchange and aggregate models with a small set of neighboring peers. Recent GL approaches rely on dynamic communication graphs built and maintained using Random Peer Sampling (RPS) protocols. Thanks to graph dynamics, GL can achieve fast convergence even over extremely sparse topologies. However, the robustness of GL over dy- namic graphs to Byzantine (model poisoning) attacks remains unaddressed especially when Byzantine nodes attack the RPS protocol to scale up model poisoning. We address this issue by introducing GRANITE, a framework for robust learning over sparse, dynamic graphs in the presence of a fraction of Byzantine nodes. GRANITE relies on two key components (i) a History-aware Byzantine-resilient Peer Sampling protocol (HaPS), which tracks previously encountered identifiers to reduce adversarial influence over time, and (ii) an Adaptive Probabilistic Threshold (APT), which leverages an estimate of Byzantine presence to set aggregation thresholds with formal guarantees. Empirical results confirm that GRANITE maintains convergence with up to 30% Byzantine nodes, improves learning speed via adaptive filtering of poisoned models and obtains these results in up to 9 times sparser graphs than dictated by current theory.
... We choose + = − = /2, and = 3. For small grayscale images, such as the 28 × 28 size images used in the MNIST dataset [25], our scheme can compute the -polynomial of an image x in about 0.26 seconds. This is a one-off cost for the client, and hence not prohibitive. ...
Preprint
Full-text available
Perceptual hashing is used to detect whether an input image is similar to a reference image with a variety of security applications. Recently, they have been shown to succumb to adversarial input attacks which make small imperceptible changes to the input image yet the hashing algorithm does not detect its similarity to the original image. Property-preserving hashing (PPH) is a recent construct in cryptography, which preserves some property (predicate) of its inputs in the hash domain. Researchers have so far shown constructions of PPH for Hamming distance predicates, which, for instance, outputs 1 if two inputs are within Hamming distance t. A key feature of PPH is its strong correctness guarantee, i.e., the probability that the predicate will not be correctly evaluated in the hash domain is negligible. Motivated by the use case of detecting similar images under adversarial setting, we propose the first PPH construction for an 1\ell_1-distance predicate. Roughly, this predicate checks if the two one-sided 1\ell_1-distances between two images are within a threshold t. Since many adversarial attacks use 2\ell_2-distance (related to 1\ell_1-distance) as the objective function to perturb the input image, by appropriately choosing the threshold t, we can force the attacker to add considerable noise to evade detection, and hence significantly deteriorate the image quality. Our proposed scheme is highly efficient, and runs in time O(t2)O(t^2). For grayscale images of size 28×2828 \times 28, we can evaluate the predicate in 0.0784 seconds when pixel values are perturbed by up to 1%1 \%. For larger RGB images of size 224×224224 \times 224, by dividing the image into 1,000 blocks, we achieve times of 0.0128 seconds per block for 1%1 \% change, and up to 0.2641 seconds per block for 14%14\% change.
... We conducted a comprehensive experiments comparing various datasets, including MNIST [31], Fashion MNIST [32], SVHN [33], CIFAR-10 [34], CIFAR-100 [34], and Tiny ImageNet [35] for image classifcation. In these experiments, we trained the datasets on diferent standard models and reported the Top-1 accuracy. ...
Article
Full-text available
In deep learning, the choice of activation function plays a vital role in enhancing model performance. We propose AHerfReLU, a novel activation function that combines the rectified linear unit (ReLU) function with the error function (erf), complemented by a regularization term 1/1+x2, ensuring smooth gradients even for negative inputs. The function is zero centered, bounded below, and nonmonotonic, offering significant advantages over traditional activation functions like ReLU. We compare AHerfReLU with 10 adaptive activation functions and state-of-the-art activation functions, including ReLU, Swish, and Mish. Experimental results show that replacing ReLU with AHerfReLU leads to 3.18% improvement in Top-1 accuracy on the LeNet network for the CIFAR100 dataset, 0.63% improvement on CIFAR10%, and 1.3% improvement in mean average precision (mAP) on the SSD300 model in the Pascal VOC dataset. Our results demonstrate that AHerfReLU enhances model performance, offering improved accuracy, loss reduction, and convergence stability. The function outperforms existing activation functions, providing a promising alternative for deep learning tasks.
... To mitigate that, the 'healthy' astrocyte tripartite synapses can be boosted with the V BG to recover similar output levels and restore the accuracy. To test the self-repair capabilities on real-world data, we train and test the network on the MNIST handwritten digit recognition dataset[50]. Details regarding the network and simulation are in the Supplementary section. ...
Preprint
Full-text available
Neuromorphic systems seek to replicate the functionalities of biological neural networks to attain significant improvements in performance and efficiency of AI computing platforms. However, these systems have generally remained limited to emulation of simple neurons and synapses; and ignored higher order functionalities enabled by other components of the brain like astrocytes and dendrites. In this work, drawing inspiration from biology, we introduce a compact Double-Gate Ferroelectric Field Effect Transistor (DG-FeFET) cell that can emulate the dynamics of both astrocytes and dendrites within neuromorphic architectures. We demonstrate that with a ferroelectric top gate for synaptic weight programming as in conventional synapses and a non-ferroelectric back gate, the DG-FeFET realizes a synapse with a dynamic gain modulation mechanism. This can be leveraged as an analog for a compact astrocyte-tripartite synapse, as well as enabling dendrite-like gain modulation operations. By employing a fully-depleted silicon-on-insulator (FDSOI) FeFET as our double-gate device, we validate the linear control of the synaptic weight via the back gate terminal (i.e., the gate underneath the buried oxide (BOX) layer) through comprehensive theoretical and experimental studies. We showcase the promise such a tripartite synaptic device holds for numerous important neuromorphic applications, including autonomous self-repair of faulty neuromorphic hardware mediated by astrocytic functionality. Coordinate transformations based on dragonfly prey-interception circuitry models are also demonstrated based on dendritic function emulation by the device. This work paves the way forward for developing truly "brain-like" neuromorphic hardware that go beyond the current dogma focusing only on neurons and synapses.
... All experiments are implemented in PyTorch [19] and run using fixed random seeds. We use the standard train/test splits for MNIST [17] (60,000/10,000) and CIFAR-10 [16] (50,000/10,000), and validate our theory on the following setups: ...
Preprint
Full-text available
We propose a combinatorial and graph-theoretic theory of dropout by modeling training as a random walk over a high-dimensional graph of binary subnetworks. Each node represents a masked version of the network, and dropout induces stochastic traversal across this space. We define a subnetwork contribution score that quantifies generalization and show that it varies smoothly over the graph. Using tools from spectral graph theory, PAC-Bayes analysis, and combinatorics, we prove that generalizing subnetworks form large, connected, low-resistance clusters, and that their number grows exponentially with network width. This reveals dropout as a mechanism for sampling from a robust, structured ensemble of well-generalizing subnetworks with built-in redundancy. Extensive experiments validate every theoretical claim across diverse architectures. Together, our results offer a unified foundation for understanding dropout and suggest new directions for mask-guided regularization and subnetwork optimization.
Preprint
Full-text available
In class-incremental learning (CIL), effective incremental learning strategies are essential to mitigate task confusion and catastrophic forgetting, especially as the number of tasks t increases. Current exemplar replay strategies impose O(t)\mathcal{O}(t) memory/compute complexities. We propose an autoencoder-based hybrid replay (AHR) strategy that leverages our new hybrid autoencoder (HAE) to function as a compressor to alleviate the requirement for large memory, achieving O(0.1t)\mathcal{O}(0.1 t) at the worst case with the computing complexity of O(t)\mathcal{O}(t) while accomplishing state-of-the-art performance. The decoder later recovers the exemplar data stored in the latent space, rather than in raw format. Additionally, HAE is designed for both discriminative and generative modeling, enabling classification and replay capabilities, respectively. HAE adopts the charged particle system energy minimization equations and repulsive force algorithm for the incremental embedding and distribution of new class centroids in its latent space. Our results demonstrate that AHR consistently outperforms recent baselines across multiple benchmarks while operating with the same memory/compute budgets. The source code is included in the supplementary material and will be open-sourced upon publication.
Preprint
The integration of AI into daily life has generated considerable attention and excitement, while also raising concerns about automating algorithmic harms and re-entrenching existing social inequities. While the responsible deployment of trustworthy AI systems is a worthy goal, there are many possible ways to realize it, from policy and regulation to improved algorithm design and evaluation. In fact, since AI trains on social data, there is even a possibility for everyday users, citizens, or workers to directly steer its behavior through Algorithmic Collective Action, by deliberately modifying the data they share with a platform to drive its learning process in their favor. This paper considers how these grassroots efforts to influence AI interact with methods already used by AI firms and governments to improve model trustworthiness. In particular, we focus on the setting where the AI firm deploys a differentially private model, motivated by the growing regulatory focus on privacy and data protection. We investigate how the use of Differentially Private Stochastic Gradient Descent (DPSGD) affects the collective's ability to influence the learning process. Our findings show that while differential privacy contributes to the protection of individual data, it introduces challenges for effective algorithmic collective action. We characterize lower bounds on the success of algorithmic collective action under differential privacy as a function of the collective's size and the firm's privacy parameters, and verify these trends experimentally by simulating collective action during the training of deep neural network classifiers across several datasets.
Chapter
In this paper, we experimentally analyze the robustness of selected Federated Learning (FL) systems in the presence of adversarial clients. We find that temporal attacks significantly affect model performance in the FL models tested, especially when the adversaries are active throughout or during the later rounds. We consider a variety of classic learning models, including Multinominal Logistic Regression (MLR), Random Forest, XGBoost, Support Vector Classifier (SVC), as well as various Neural Network models including Multilayer Perceptron (MLP), Convolution Neural Network (CNN), Recurrent Neural Network (RNN), and Long Short-Term Memory (LSTM). Our results highlight the effectiveness of temporal attacks and the need to develop strategies to make the FL process more robust against such attacks. We also briefly consider the effectiveness of defense mechanisms, including outlier detection in the aggregation algorithm.
Chapter
In this paper, we empirically analyze adversarial attacks on selected Federated Learning (FL) models. The specific models considered are FL versions of Multinominal Logistic Regression (MLR), Support Vector Classifier (SVC), Multilayer Perceptron (MLP), Convolution Neural Network (CNN), Random Forest, XGBoost, and Long Short-Term Memory (LSTM). For each model, we simulate label-flipping attacks, experimenting extensively with 10 federated clients and 100 federated clients. We vary the percentage of adversarial clients from 10 to 100% and, simultaneously, the percentage of labels flipped by each adversarial client is also varied from 10 to 100%. Among other results, we find that models differ in their inherent robustness to the two vectors in our label-flipping attack, i.e., the percentage of adversarial clients, and the percentage of labels flipped by each adversarial client. We discuss the potential practical implications of our results.
Preprint
Partial Bayesian neural networks (pBNNs) have been shown to perform competitively with fully Bayesian neural networks while only having a subset of the parameters be stochastic. Using sequential Monte Carlo (SMC) samplers as the inference method for pBNNs gives a non-parametric probabilistic estimation of the stochastic parameters, and has shown improved performance over parametric methods. In this paper we introduce a new SMC-based training method for pBNNs by utilising a guided proposal and incorporating gradient-based Markov kernels, which gives us better scalability on high dimensional problems. We show that our new method outperforms the state-of-the-art in terms of predictive performance and optimal loss. We also show that pBNNs scale well with larger batch sizes, resulting in significantly reduced training times and often better performance.
Preprint
Full-text available
Generative diffusion models have achieved remarkable success in producing high-quality images. However, because these models typically operate in continuous intensity spaces - diffusing independently per pixel and color channel - they are fundamentally ill-suited for applications where quantities such as particle counts or material units are inherently discrete and governed by strict conservation laws such as mass preservation, limiting their applicability in scientific workflows. To address this limitation, we propose Discrete Spatial Diffusion (DSD), a framework based on a continuous-time, discrete-state jump stochastic process that operates directly in discrete spatial domains while strictly preserving mass in both forward and reverse diffusion processes. By using spatial diffusion to achieve mass preservation, we introduce stochasticity naturally through a discrete formulation. We demonstrate the expressive flexibility of DSD by performing image synthesis, class conditioning, and image inpainting across widely-used image benchmarks, with the ability to condition on image intensity. Additionally, we highlight its applicability to domain-specific scientific data for materials microstructure, bridging the gap between diffusion models and mass-conditioned scientific applications.
Preprint
Full-text available
With the advent of score-matching techniques for model training and Langevin dynamics for sample generation, energy-based models (EBMs) have gained renewed interest as generative models. Recent EBMs usually use neural networks to define their energy functions. In this work, we introduce a novel hybrid approach that combines an EBM with an exponential family model to incorporate inductive bias into data modeling. Specifically, we augment the energy term with a parameter-free statistic function to help the model capture key data statistics. Like an exponential family model, the hybrid model aims to align the distribution statistics with data statistics during model training, even when it only approximately maximizes the data likelihood. This property enables us to impose constraints on the hybrid model. Our empirical study validates the hybrid model's ability to match statistics. Furthermore, experimental results show that data fitting and generation improve when suitable informative statistics are incorporated into the hybrid model.
Preprint
Full-text available
Generative diffusion models have achieved remarkable success in producing high-quality images. However, because these models typically operate in continuous intensity spaces-diffusing independently per pixel and color channel-they are fundamentally ill-suited for applications where quantities such as particle counts or material units are inherently discrete and governed by strict conservation laws like mass preservation , which limits their applicability in scientific workflows. To address this limitation, we propose Discrete Spatial Diffusion (DSD), a framework based on a continuous-time, discrete-state jump stochastic process that operates directly in discrete spatial domains while strictly preserving mass in both forward and reverse diffusion processes. By using spatial diffusion to achieve mass preservation , we introduce stochasticity naturally through a discrete formulation. We demonstrate the expressive flexibility of DSD by performing image synthesis, class conditioning, and image inpainting across widely-used image benchmarks, with the ability to condition on image intensity. Additionally , we highlight its applicability to domain-specific scientific data for materials microstructure, bridging the gap between diffusion models and mass-conditioned scientific applications.
Article
Facial expression recognition (FER) using artificial intelligence remains a challenging task due to data limitations and intra-class variations. This study proposes a novel deep learning framework that synergizes convolutional neural networks (CNNs) for classification with generative adversarial networks (GANs) for data augmentation, effectively addressing data scarcity and enhancing model generalization. The integration of GANs enables the generation of high-fidelity synthetic images, mitigating overfitting and improving classification robustness. Experimental validation on the FER2013 dataset demonstrates the efficacy of the proposed approach, achieving superior accuracy compared to conventional methods. This work contributes to advancing FER by enhancing recognition performance and establishing a scalable solution applicable to diverse facial expression datasets.
Preprint
Full-text available
Recently, neural networks have gained attention for creating parametric and invertible multidimensional data projections. Parametric projections allow for embedding previously unseen data without recomputing the projection as a whole, while invertible projections enable the generation of new data points. However, these properties have never been explored simultaneously for arbitrary projection methods. We evaluate three autoencoder (AE) architectures for creating parametric and invertible projections. Based on a given projection, we train AEs to learn a mapping into 2D space and an inverse mapping into the original space. We perform a quantitative and qualitative comparison on four datasets of varying dimensionality and pattern complexity using t-SNE. Our results indicate that AEs with a customized loss function can create smoother parametric and inverse projections than feed-forward neural networks while giving users control over the strength of the smoothing effect.
ResearchGate has not been able to resolve any references for this publication.