Figure - available from: Scientific Reports
This content is subject to copyright. Terms and conditions apply.
Digital fully-connected neural network (FC-NN) and hardware implementations. (a) FC-NN with input activations (red, vector length K) connected to output activations (vector length N) via weighted paths, i.e., weights (blue, matrix size K×N\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$K\times N$$\end{document}). (b) Matrix representation of one layer of an FC-NN with B-sized batching. (c) Example bit-serial multiplier array, with output-stationary accumulation across k. Fan-out of X across n∈1…N\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n \in \left\{ 1 \ldots N\right\} $$\end{document}; fan-out of W across b∈1…B\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$b \in \left\{ 1 \ldots B\right\} $$\end{document}. Bottom panel: all-electronic version with fan-out by copper wire (for clarity, fan-out of W not illustrated). Top panel: digital optical neural network version, where X and W are fanned out passively using optics, and transmitted to an array of photodetectors. Each pixel contains two photodetectors, where the activations and weights can be separated by, e.g., polarization or wavelength filters. Each photodetector pair is directly connected to a multiplier in close proximity.

Digital fully-connected neural network (FC-NN) and hardware implementations. (a) FC-NN with input activations (red, vector length K) connected to output activations (vector length N) via weighted paths, i.e., weights (blue, matrix size K×N\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$K\times N$$\end{document}). (b) Matrix representation of one layer of an FC-NN with B-sized batching. (c) Example bit-serial multiplier array, with output-stationary accumulation across k. Fan-out of X across n∈1…N\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n \in \left\{ 1 \ldots N\right\} $$\end{document}; fan-out of W across b∈1…B\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$b \in \left\{ 1 \ldots B\right\} $$\end{document}. Bottom panel: all-electronic version with fan-out by copper wire (for clarity, fan-out of W not illustrated). Top panel: digital optical neural network version, where X and W are fanned out passively using optics, and transmitted to an array of photodetectors. Each pixel contains two photodetectors, where the activations and weights can be separated by, e.g., polarization or wavelength filters. Each photodetector pair is directly connected to a multiplier in close proximity.

Source publication
Article
Full-text available
As deep neural network (DNN) models grow ever-larger, they can achieve higher accuracy and solve more complex problems. This trend has been enabled by an increase in available compute power; however, efforts to continue to scale electronic processors are impeded by the costs of communication, thermal management, power delivery and clocking. To impr...

Citations

... Photonic accelerators make use of silicon fabrication to create a small number of high-speed, nonlinear photonic neurons 18-20 and recent implementations have reached computational power rivaling modern-day GPUs 21-23 . Freespace accelerators typically have many more neurons at slower operating speeds and are potentially able to achieve even higher computation speeds 16,[24][25][26][27][28][29][30] . ...
... We introduce and experimentally demonstrate a computing paradigm based on paired optoelectronic boards and optical interconnects, respectively describing nonlinear activation and weight matrix operations of a neural network (Fig. 1). Our system builds upon and is smoothly extended by prior work implementing optoelectronic activation functions 18,35-37 and matrix operations 16,25,27,34,[38][39][40][41][42] . ...
... Photonic accelerators make use of silicon fabrication to create a small number of high-speed, nonlinear photonic neurons [18][19][20] and recent implementations have reached computational power rivaling modern-day GPUs [21][22][23] . Freespace accelerators typically have many more neurons at slower operating speeds and are potentially able to achieve even higher computation speeds 16,[24][25][26][27][28][29][30] . ...
Article
Full-text available
Optical approaches have made great strides towards the goal of high-speed, energy-efficient computing necessary for modern deep learning and AI applications. Read-in and read-out of data, however, limit the overall performance of existing approaches. This study introduces a multilayer optoelectronic computing framework that alternates between optical and optoelectronic layers to implement matrix-vector multiplications and rectified linear functions, respectively. Our framework is designed for real-time, parallelized operations, leveraging 2D arrays of LEDs and photodetectors connected via independent analog electronics. We experimentally demonstrate this approach using a system with a three-layer network with two hidden layers and operate it to recognize images from the MNIST database with a recognition accuracy of 92% and classify classes from a nonlinear spiral data with 86% accuracy. By implementing multiple layers of a deep neural network simultaneously, our approach significantly reduces the number of read-ins and read-outs required and paves the way for scalable optical accelerators requiring ultra low energy.
... The computational complexity of processing raw videos increases further with the adaptation of larger models-a trend that continues to expand over the years (Tan & Le, 2019;Bernstein et al., 2021). When the training data is limited, using larger input dimensionality and larger models also increases the chance of overfitting (Defernez & Kemsley, 1999). ...
Article
Full-text available
We introduce a novel method for movie genre classification, capitalizing on a diverse set of readily accessible pretrained models. These models extract high-level features related to visual scenery, objects, characters, text, speech, music, and audio effects. To intelligently fuse these pretrained features, we train small classifier models with low time and memory requirements. Employing the transformer model, our approach utilizes all video and audio frames of movie trailers without performing any temporal pooling, efficiently exploiting the correspondence between all elements, as opposed to the fixed and low number of frames typically used by traditional methods. Our approach fuses features originating from different tasks and modalities, with different dimensionalities, different temporal lengths, and complex dependencies as opposed to current approaches. Our method outperforms state-of-the-art movie genre classification models in terms of precision, recall, and mean average precision (mAP). To foster future research, we make the pretrained features for the entire MovieNet dataset, along with our genre classification code and the trained models, publicly available.
... Advances in machine learning, particularly deep learning, have enabled applications in various real-world scenarios [1]. Accompanying improvements in performance, the increasing model complexity and the exploding number of parameters [2] prohibit human users from comprehending the decisions made by these data-driven models, as the decision rules are implicitly learned from the data presented. The absence of reasoning for model decisions keeps raising concerns about the transparency of AI-driven systems [3]. ...
Preprint
Recent literature highlights the critical role of neighborhood construction in deriving model-agnostic explanations, with a growing trend toward deploying generative models to improve synthetic instance quality, especially for explaining text classifiers. These approaches overcome the challenges in neighborhood construction posed by the unstructured nature of texts, thereby improving the quality of explanations. However, the deployed generators are usually implemented via neural networks and lack inherent explainability, sparking arguments over the transparency of the explanation process itself. To address this limitation while preserving neighborhood quality, this paper introduces a probability-based editing method as an alternative to black-box text generators. This approach generates neighboring texts by implementing manipulations based on in-text contexts. Substituting the generator-based construction process with recursive probability-based editing, the resultant explanation method, XPROB (explainer with probability-based editing), exhibits competitive performance according to the evaluation conducted on two real-world datasets. Additionally, XPROB's fully transparent and more controllable construction process leads to superior stability compared to the generator-based explainers.
... -Performance Bottleneck Identification: Analyze architectural designs to find and eliminate performance bottlenecks [288,289]. -Scalability Optimization: Ensure that the chip architecture scales well with increasing system complexity (e.g., more cores or memory) [290,291]. ...
Preprint
Full-text available
Large Language Models (LLMs) are emerging as promising tools in hardware design and verification, with recent advancements suggesting they could fundamentally reshape conventional practices. In this survey, we analyze over 54 research papers to assess the current role of LLMs in enhancing automation, optimization, and innovation within hardware design and verification workflows. Our review highlights LLM applications across synthesis, simulation, and formal verification, emphasizing their potential to streamline development processes while upholding high standards of accuracy and performance. We identify critical challenges, such as scalability, model interpretability, and the alignment of LLMs with domain-specific languages and methodologies. Furthermore, we discuss open issues, including the necessity for tailored model fine-tuning, integration with existing Electronic Design Automation (EDA) tools, and effective handling of complex data structures typical of hardware projects. This survey not only consolidates existing knowledge but also outlines prospective research directions, underscoring the transformative role LLMs could play in the future of hardware design and verification.
... The computational complexity of processing raw videos increases further with the adaptation of larger models-a trend that continues to expand over the years (Tan & Le, 2019;Bernstein et al., 2021). When the training data is limited, using larger input dimensionality and larger models also increases the chance of overfitting (Defernez & Kemsley, 1999). ...
Preprint
Full-text available
We introduce a novel method for movie genre classification, capitalizing on a diverse set of readily accessible pretrained models. These models extract high-level features related to visual scenery, objects, characters, text, speech, music, and audio effects. To intelligently fuse these pretrained features, we train small classifier models with low time and memory requirements. Employing the transformer model, our approach utilizes all video and audio frames of movie trailers without performing any temporal pooling, efficiently exploiting the correspondence between all elements, as opposed to the fixed and low number of frames typically used by traditional methods. Our approach fuses features originating from different tasks and modalities, with different dimensionalities, different temporal lengths, and complex dependencies as opposed to current approaches. Our method outperforms state-of-the-art movie genre classification models in terms of precision, recall, and mean average precision (mAP). To foster future research, we make the pretrained features for the entire MovieNet dataset, along with our genre classification code and the trained models, publicly available.
... Photonics is widely known for its high-bandwidth and multiplexing ability in multiple domains [21][22][23][24] . The matrix-vector multiplication accelerator with both on-chip 25,26 and free-space 27,28 optics already reveals great potential for photonic processing 29,30 in general. Especially, demonstrations of integrated photonic NNs have shown superior performance in both processing speed 31 and energy efficiency 32 . ...
Preprint
Full-text available
Physical neural networks (PNNs) are emerging paradigms for neural network acceleration due to their high-bandwidth, in-propagation analogue processing. Despite the advantages of PNN for inference, training remains a challenge. The imperfect information of the physical transformation means the failure of conventional gradient-based updates from backpropagation (BP). Here, we present the asymmetrical training (AT) method, which treats the PNN structure as a grey box. AT performs training while only knowing the last layer output and neuron topological connectivity of a deep neural network structure, not requiring information about the physical control-transformation mapping. We experimentally demonstrated the AT method on deep grey-box PNNs implemented by uncalibrated photonic integrated circuits (PICs), improving the classification accuracy of Iris flower and modified MNIST hand-written digits from random guessing to near theoretical maximum. We also showcased the consistently enhanced performance of AT over BP for different datasets, including MNIST, fashion-MNIST, and Kuzushiji-MNIST. The AT method demonstrated successful training with minimal hardware overhead and reduced computational overhead, serving as a robust light-weight training alternative to fully explore the advantages of physical computation.
... Deep learning has seen a significant rise to prominence in recent years due to various technological advancements (Dave et al. 2022;Bernstein et al. 2021;Lucas et al. 2021). Systems powered by this technology have displaced conventional approaches in many domains. ...
Article
Deep learning intellectual properties (IPs) are high-value assets that are frequently susceptible to theft. This vulnerability has led to significant interest in defending the field's intellectual properties from theft. Recently, watermarking techniques have been extended to protect deep learning hardware from privacy. These technique embed modifications that change the hardware's behavior when activated. In this work, we propose the first method for embedding watermarks in deep learning hardware that incorporates the owner's key samples into the embedding methodology. This improves our watermarks' reliability and efficiency in identifying the hardware over those generated using randomly selected key samples. Our experimental results demonstrate that by considering the target key samples when generating the hardware modifications, we can significantly increase the embedding success rate while targeting fewer functional blocks, decreasing the required hardware overhead needed to defend it.
... This has led to a "bigger is better" mentality. 6 However, the disadvantage of this mentality is the energy required to train and use very large networks. For instance, only training the language model GPT-3, which has 175 billion parameters, consumed 1.3 GWh of electricity, which is the energy required to fully charge 13,000 Tesla Model S cars. ...
... E.g. AlexNet and Resnet-50 both contain more than one million weights, biases and activations [102]. That is, without considering the memory required for the model structure, input data, and application. ...
Article
Full-text available
Downtime caused by failing equipment can be extremely costly for organizations. Predictive Maintenance (PdM), which uses data to predict when maintenance should be conducted, is an essential tool for increasing safety, maximizing uptime and minimizing costs. Contempoary PdM systems primarily have sensors collect information about the equipment under observation. This information is afterwards transmitted off the device for processing at a high-performance computer system. While this can allow highquality predictions, it also imposes barriers that keep some organisations from adopting PdM. For example, some applications prevent data transmission off sensor devices due to regulatory or infrastructure limitations. Being able to process the collected information right at the sensor device is, therefore, desirable in many sectors - something that recent progress in the field of TinyML promises to deliver. This paper investigates the intersection between PdM and TinyML and explores how TinyML can enable many new PdM applications. We consider a holistic view of TinyML-based PdM, focusing on the full stack of Machine Learning (ML) models, hardware, toolchains, data and PdM applications. Our main findings are that each part of the TinyML stack has received varying degrees of attention. In particular, ML models and their optimisations have seen a lot of attention, while data optimisations and TinyML datasets lack contributions. Furthermore, most TinyML research focuses on image and audio classification, with little attention paid to other application areas such as PdM. Based on our observations, we suggest promising avenues of future research to scale and improve the application of TinyML to PdM.
... Therefore, MRR-based devices that support a large matrix dimension are difficult. The matrix multiplication dominates the computations in deep learning because of batch processing [5]. In this context, multiple input vectors are grouped together into a matrix and multiplied by a weight matrix. ...