Figure - available from: Scientific Reports
This content is subject to copyright. Terms and conditions apply.
Number of parameters, i.e., weights, in recent landmark neural networks 1,2,31–43 (references dated by first release, e.g., on arXiv). The number of multiplications (not always reported) is not equivalent to the number of parameters, but larger models tend to require more compute power, notably in fully-connected layers. The two outlying nodes (pink) are AlexNet and VGG16, now considered over-parameterized. Subsequently, efforts have been made to reduce DNN sizes, but there remains an exponential growth in model sizes to solve increasingly complex problems with higher accuracy.
Source publication
As deep neural network (DNN) models grow ever-larger, they can achieve higher accuracy and solve more complex problems. This trend has been enabled by an increase in available compute power; however, efforts to continue to scale electronic processors are impeded by the costs of communication, thermal management, power delivery and clocking. To impr...
Citations
... The computational complexity of processing raw videos increases further with the adaptation of larger models-a trend that continues to expand over the years (Tan & Le, 2019;Bernstein et al., 2021). When the training data is limited, using larger input dimensionality and larger models also increases the chance of overfitting (Defernez & Kemsley, 1999). ...
We introduce a novel method for movie genre classification, capitalizing on a diverse set of readily accessible pretrained models. These models extract high-level features related to visual scenery, objects, characters, text, speech, music, and audio effects. To intelligently fuse these pretrained features, we train small classifier models with low time and memory requirements. Employing the transformer model, our approach utilizes all video and audio frames of movie trailers without performing any temporal pooling, efficiently exploiting the correspondence between all elements, as opposed to the fixed and low number of frames typically used by traditional methods. Our approach fuses features originating from different tasks and modalities, with different dimensionalities, different temporal lengths, and complex dependencies as opposed to current approaches. Our method outperforms state-of-the-art movie genre classification models in terms of precision, recall, and mean average precision (mAP). To foster future research, we make the pretrained features for the entire MovieNet dataset, along with our genre classification code and the trained models, publicly available.
... Advances in machine learning, particularly deep learning, have enabled applications in various real-world scenarios [1]. Accompanying improvements in performance, the increasing model complexity and the exploding number of parameters [2] prohibit human users from comprehending the decisions made by these data-driven models, as the decision rules are implicitly learned from the data presented. The absence of reasoning for model decisions keeps raising concerns about the transparency of AI-driven systems [3]. ...
Recent literature highlights the critical role of neighborhood construction in deriving model-agnostic explanations, with a growing trend toward deploying generative models to improve synthetic instance quality, especially for explaining text classifiers. These approaches overcome the challenges in neighborhood construction posed by the unstructured nature of texts, thereby improving the quality of explanations. However, the deployed generators are usually implemented via neural networks and lack inherent explainability, sparking arguments over the transparency of the explanation process itself. To address this limitation while preserving neighborhood quality, this paper introduces a probability-based editing method as an alternative to black-box text generators. This approach generates neighboring texts by implementing manipulations based on in-text contexts. Substituting the generator-based construction process with recursive probability-based editing, the resultant explanation method, XPROB (explainer with probability-based editing), exhibits competitive performance according to the evaluation conducted on two real-world datasets. Additionally, XPROB's fully transparent and more controllable construction process leads to superior stability compared to the generator-based explainers.
... -Performance Bottleneck Identification: Analyze architectural designs to find and eliminate performance bottlenecks [288,289]. -Scalability Optimization: Ensure that the chip architecture scales well with increasing system complexity (e.g., more cores or memory) [290,291]. ...
Large Language Models (LLMs) are emerging as promising tools in hardware design and verification, with recent advancements suggesting they could fundamentally reshape conventional practices. In this survey, we analyze over 54 research papers to assess the current role of LLMs in enhancing automation, optimization, and innovation within hardware design and verification workflows. Our review highlights LLM applications across synthesis, simulation, and formal verification, emphasizing their potential to streamline development processes while upholding high standards of accuracy and performance. We identify critical challenges, such as scalability, model interpretability, and the alignment of LLMs with domain-specific languages and methodologies. Furthermore, we discuss open issues, including the necessity for tailored model fine-tuning, integration with existing Electronic Design Automation (EDA) tools, and effective handling of complex data structures typical of hardware projects. This survey not only consolidates existing knowledge but also outlines prospective research directions, underscoring the transformative role LLMs could play in the future of hardware design and verification.
... The computational complexity of processing raw videos increases further with the adaptation of larger models-a trend that continues to expand over the years (Tan & Le, 2019;Bernstein et al., 2021). When the training data is limited, using larger input dimensionality and larger models also increases the chance of overfitting (Defernez & Kemsley, 1999). ...
We introduce a novel method for movie genre classification, capitalizing on a diverse set of readily accessible pretrained models. These models extract high-level features related to visual scenery, objects, characters, text, speech, music, and audio effects. To intelligently fuse these pretrained features, we train small classifier models with low time and memory requirements. Employing the transformer model, our approach utilizes all video and audio frames of movie trailers without performing any temporal pooling, efficiently exploiting the correspondence between all elements, as opposed to the fixed and low number of frames typically used by traditional methods. Our approach fuses features originating from different tasks and modalities, with different dimensionalities, different temporal lengths, and complex dependencies as opposed to current approaches. Our method outperforms state-of-the-art movie genre classification models in terms of precision, recall, and mean average precision (mAP). To foster future research, we make the pretrained features for the entire MovieNet dataset, along with our genre classification code and the trained models, publicly available.
... Photonics is widely known for its high-bandwidth and multiplexing ability in multiple domains [21][22][23][24] . The matrix-vector multiplication accelerator with both on-chip 25,26 and free-space 27,28 optics already reveals great potential for photonic processing 29,30 in general. Especially, demonstrations of integrated photonic NNs have shown superior performance in both processing speed 31 and energy efficiency 32 . ...
Physical neural networks (PNNs) are emerging paradigms for neural network acceleration due to their high-bandwidth, in-propagation analogue processing. Despite the advantages of PNN for inference, training remains a challenge. The imperfect information of the physical transformation means the failure of conventional gradient-based updates from backpropagation (BP). Here, we present the asymmetrical training (AT) method, which treats the PNN structure as a grey box. AT performs training while only knowing the last layer output and neuron topological connectivity of a deep neural network structure, not requiring information about the physical control-transformation mapping. We experimentally demonstrated the AT method on deep grey-box PNNs implemented by uncalibrated photonic integrated circuits (PICs), improving the classification accuracy of Iris flower and modified MNIST hand-written digits from random guessing to near theoretical maximum. We also showcased the consistently enhanced performance of AT over BP for different datasets, including MNIST, fashion-MNIST, and Kuzushiji-MNIST. The AT method demonstrated successful training with minimal hardware overhead and reduced computational overhead, serving as a robust light-weight training alternative to fully explore the advantages of physical computation.
... Deep learning has seen a significant rise to prominence in recent years due to various technological advancements (Dave et al. 2022;Bernstein et al. 2021;Lucas et al. 2021). Systems powered by this technology have displaced conventional approaches in many domains. ...
Deep learning intellectual properties (IPs) are high-value assets that are frequently susceptible to theft. This vulnerability has led to significant interest in defending the field's intellectual properties from theft. Recently, watermarking techniques have been extended to protect deep learning hardware from privacy. These technique embed modifications that change the hardware's behavior when activated. In this work, we propose the first method for embedding watermarks in deep learning hardware that incorporates the owner's key samples into the embedding methodology. This improves our watermarks' reliability and efficiency in identifying the hardware over those generated using randomly selected key samples. Our experimental results demonstrate that by considering the target key samples when generating the hardware modifications, we can significantly increase the embedding success rate while targeting fewer functional blocks, decreasing the required hardware overhead needed to defend it.
... This has led to a "bigger is better" mentality. 6 However, the disadvantage of this mentality is the energy required to train and use very large networks. For instance, only training the language model GPT-3, which has 175 billion parameters, consumed 1.3 GWh of electricity, which is the energy required to fully charge 13,000 Tesla Model S cars. ...
... Implementations of weighted addition for Optical Neural Networks (ONNs) include Mach-Zehnder Interferometerbased Optical Interference Units [18], time-multiplexed and, coherent detection [19], free space systems using spatial light modulators [20] and Micro-Ring-Resonator-based weighting bank on silicone [21]. Furthermore, Indium phosphideintegrated optical cross-connect using Semiconductor Optical Amplifiers as single stage weight elements, as well as Semiconductor Optical Amplifier-based wavelength converters [22,23,24] have been demonstrated for allowing All-Optical (AO) Neural Networks (NNs). ...
All analog signal processing is fundamentally subject to noise, and this is also the case in modern implementations of Optical Neural Networks (ONNs). Therefore, to mitigate noise in ONNs, we propose two designs that are constructed from a given, possibly trained, Neural Network (NN) that one wishes to implement. Both designs have the capability that the resulting ONNs gives outputs close to the desired NN. To establish the latter, we analyze the designs mathematically. Specifically, we investigate a probabilistic framework for the first design that establishes that the design is correct, i.e., for any feed-forward NN with Lipschitz continuous activation functions, an ONN can be constructed that produces output arbitrarily close to the original. ONNs constructed with the first design thus also inherit the universal approximation property of NNs. For the second design, we restrict the analysis to NNs with linear activation functions and characterize the ONNs' output distribution using exact formulas. Finally, we report on numerical experiments with LeNet ONNs that give insight into the number of components required in these designs for certain accuracy gains. We specifically study the effect of noise as a function of the depth of an ONN. The results indicate that in practice, adding just a few components in the manner of the first or the second design can already be expected to increase the accuracy of ONNs considerably.
... In weight-stationary digital electronics [e.g., systolic arrays like Google's TPU (7)], on the other hand, where inputs are message-passed across the weight matrix due to wiring constraints, the latency is at least N + K clock cycles for an (N × K)-sized MVM. Similarly, in output-stationary architectures (25,55), latency scales with K, as the inputs are streamed in over time. If K = N = 1000 (see the Supplementary Materials), our proposed near-term optical processor then outperforms these architectures by two orders of magnitude. ...
Analog optical and electronic hardware has emerged as a promising alternative to digital electronics to improve the efficiency of deep neural networks (DNNs). However, previous work has been limited in scalability (input vector length K ≈ 100 elements) or has required nonstandard DNN models and retraining, hindering widespread adoption. Here, we present an analog, CMOS-compatible DNN processor that uses free-space optics to reconfigurably distribute an input vector and optoelectronics for static, updatable weighting and the nonlinearity-with K ≈ 1000 and beyond. We demonstrate single-shot-per-layer classification of the MNIST, Fashion-MNIST, and QuickDraw datasets with standard fully connected DNNs, achieving respective accuracies of 95.6, 83.3, and 79.0% without preprocessing or retraining. We also experimentally determine the fundamental upper bound on throughput (∼0.9 exaMAC/s), set by the maximum optical bandwidth before substantial increase in error. Our combination of wide spectral and spatial bandwidths enables highly efficient computing for next-generation DNNs.
... We are experiencing a significant shift in computer resource allocation, with more and more hardware being dedicated to AI applications. As convolutional neural networks with billions of parameters are used to solve a growing set of problems [1], it becomes necessary to optimize their implementations. In these networks, convolution has the largest energy overhead [2], requiring intensive use of multiply-accumulate (MAC) operations. ...