Figure - available from: Scientific Reports

This content is subject to copyright. Terms and conditions apply.

# Number of parameters, i.e., weights, in recent landmark neural networks 1,2,31–43 (references dated by first release, e.g., on arXiv). The number of multiplications (not always reported) is not equivalent to the number of parameters, but larger models tend to require more compute power, notably in fully-connected layers. The two outlying nodes (pink) are AlexNet and VGG16, now considered over-parameterized. Subsequently, efforts have been made to reduce DNN sizes, but there remains an exponential growth in model sizes to solve increasingly complex problems with higher accuracy.

Source publication

As deep neural network (DNN) models grow ever-larger, they can achieve higher accuracy and solve more complex problems. This trend has been enabled by an increase in available compute power; however, efforts to continue to scale electronic processors are impeded by the costs of communication, thermal management, power delivery and clocking. To impr...

## Citations

... Implementations of weighted addition for Optical Neural Networks (ONNs) include Mach-Zehnder Interferometerbased Optical Interference Units [18], time-multiplexed and, coherent detection [19], free space systems using spatial light modulators [20] and Micro-Ring-Resonator-based weighting bank on silicone [21]. Furthermore, Indium phosphideintegrated optical cross-connect using Semiconductor Optical Amplifiers as single stage weight elements, as well as Semiconductor Optical Amplifier-based wavelength converters [22,23,24] have been demonstrated for allowing All-Optical (AO) Neural Networks (NNs). ...

All analog signal processing is fundamentally subject to noise, and this is also the case in modern implementations of Optical Neural Networks (ONNs). Therefore, to mitigate noise in ONNs, we propose two designs that are constructed from a given, possibly trained, Neural Network (NN) that one wishes to implement. Both designs have the capability that the resulting ONNs gives outputs close to the desired NN. To establish the latter, we analyze the designs mathematically. Specifically, we investigate a probabilistic framework for the first design that establishes that the design is correct, i.e., for any feed-forward NN with Lipschitz continuous activation functions, an ONN can be constructed that produces output arbitrarily close to the original. ONNs constructed with the first design thus also inherit the universal approximation property of NNs. For the second design, we restrict the analysis to NNs with linear activation functions and characterize the ONNs' output distribution using exact formulas. Finally, we report on numerical experiments with LeNet ONNs that give insight into the number of components required in these designs for certain accuracy gains. We specifically study the effect of noise as a function of the depth of an ONN. The results indicate that in practice, adding just a few components in the manner of the first or the second design can already be expected to increase the accuracy of ONNs considerably.

... In weight-stationary digital electronics [e.g., systolic arrays like Google's TPU (7)], on the other hand, where inputs are message-passed across the weight matrix due to wiring constraints, the latency is at least N + K clock cycles for an (N × K)-sized MVM. Similarly, in output-stationary architectures (25,55), latency scales with K, as the inputs are streamed in over time. If K = N = 1000 (see the Supplementary Materials), our proposed near-term optical processor then outperforms these architectures by two orders of magnitude. ...

Analog optical and electronic hardware has emerged as a promising alternative to digital electronics to improve the efficiency of deep neural networks (DNNs). However, previous work has been limited in scalability (input vector length K ≈ 100 elements) or has required nonstandard DNN models and retraining, hindering widespread adoption. Here, we present an analog, CMOS-compatible DNN processor that uses free-space optics to reconfigurably distribute an input vector and optoelectronics for static, updatable weighting and the nonlinearity-with K ≈ 1000 and beyond. We demonstrate single-shot-per-layer classification of the MNIST, Fashion-MNIST, and QuickDraw datasets with standard fully connected DNNs, achieving respective accuracies of 95.6, 83.3, and 79.0% without preprocessing or retraining. We also experimentally determine the fundamental upper bound on throughput (∼0.9 exaMAC/s), set by the maximum optical bandwidth before substantial increase in error. Our combination of wide spectral and spatial bandwidths enables highly efficient computing for next-generation DNNs.

... Photonic matrix-vector multiplication (MVM), as one of typical linear operations [5][6][7], can be implemented by various optical elements such as Mach-Zehnder interferometer (MZI) networks [8][9][10], plane light conversion (PLC) devices [11][12][13], and microring resonator (MRR) arrays [14][15][16]. The superiority of ample reconfigurability has also been demonstrated by numerous researches [17][18][19][20]. However, when it comes to nonlinear operations like optical digital computing [21][22][23], the difficulty in the implementation of accurate targeted nonlinear transform increases dramatically, owing to the limited reconfigurability and diversity of photonic nonlinear schemes. ...

As photonic linear computations are diverse and easy to realize while photonic nonlinear computations are relatively limited and difficult, we propose a novel way to perform photonic nonlinear computations by linear operations in a high-dimensional space, which can achieve many nonlinear functions different from existing optical methods. As a practical application, the arbitrary binary nonlinear computations between two Boolean signals are demonstrated to implement a programmable logic array. In the experiment, by programming the high-dimensional photonic matrix multiplier, we execute fourteen different logic operations with only one fixed nonlinear operation. Then the combined logic functions of half-adder and comparator are demonstrated at 10 Gbit/s. Compared with current methods, the proposed scheme simplifies the devices and the nonlinear operations for programmable logic computing. More importantly, nonlinear realization assisted by space transformation offers a new solution for optical digital computing and enriches the diversity of photonic nonlinear computing.

... The complexity and performance of classical neural networks employed to solve data-intensive problems has grown dramatically in the last decade. Although algorithmic efficiency has played a partial role in improving performance, hardware development (including parallelism and increased scale and spending) is the primary driver behind the progress of artificial intelligence [13,14]. Unlike their classical counterparts, QNNs are able to learn a generalized model of a dataset from a substantially smaller training set [15][16][17] and typically have the potential to do so with polynomially or exponentially simpler models [18][19][20]. ...

Powerful hardware services and software libraries are vital tools for quickly and affordably designing, testing, and executing quantum algorithms. A robust large‐scale study of how the performance of these platforms scales with the number of qubits is key to providing quantum solutions to challenging industry problems. This work benchmarks the runtime and accuracy for a representative sample of specialized high‐performance simulated and physical quantum processing units. Results show the QMware simulator can reduce the runtime for executing a quantum circuit by up to 78% compared to the next fastest option for algorithms with fewer than 27 qubits. The Amazon Web Service State‐Vector Simulator 1 offers a runtime advantage for larger circuits, up to the maximum 34 qubits. Beyond this limit, QMware can execute circuits as large as 40 qubits. Physical quantum devices, such as Rigetti's Aspen‐M2, can provide an exponential runtime advantage for circuits with more than 30 qubits. However, the high financial cost of physical quantum processing units presents a serious barrier to practical use. Moreover, only IonQ's Harmony quantum device achieves high fidelity with more than four qubits. This study paves the way to understanding the optimal combination of available software and hardware for executing practical quantum algorithms.

... Although the above-mentioned systems emphasize VMM acceleration, it is crucial to note that general matrix-matrix multiplication frequently predominates in deep learning computing [19,20]. Multiple input vectors must be combined into a single matrix and multiplied simultaneously with a weight matrix to achieve optical matrix-matrix multiplication. ...

We propose and experimentally demonstrate a highly parallel photonic acceleration processor based on a wavelength division multiplexing (WDM) system and a non-coherent Mach–Zehnder interferometer (MZI) array for matrix–matrix multiplication. The dimensional expansion is achieved by WDM devices, which play a crucial role in realizing matrix–matrix multiplication together with the broadband characteristics of an MZI. We implemented a 2 × 2 arbitrary nonnegative valued matrix using a reconfigurable 8 × 8 MZI array structure. Through experimentation, we verified that this structure could achieve 90.5% inference accuracy in a classification task for the Modified National Institute of Standards and Technology (MNIST) handwritten dataset. This provides a new effective solution for large-scale integrated optical computing systems based on convolution acceleration processors.

... unseen data than their less complex counterparts, motivating exponential growth each year in the number of model parameters since 2015 and the size of training datasets since 1988. [2][3][4] In particular, the past decade has witnessed models from ResNet-50 (> 10 7 model parameters) to Generative Pretrained Transformer 3 (GPT-3) (> 10 11 model parameters) and datasets from ImageNet ($ 10 6 images) to JFT-3B (> 10 9 images). By overcoming this bottleneck in electronic communication, clocking, thermal management, and power delivery, neuromorphic systems bring the promise of scalable hardware that can keep pace with the exponential growth of deep neural networks, leading us to define the first major thrust of neuromorphic computing: acceleration. ...

... By overcoming this bottleneck in electronic communication, clocking, thermal management, and power delivery, neuromorphic systems bring the promise of scalable hardware that can keep pace with the exponential growth of deep neural networks, leading us to define the first major thrust of neuromorphic computing: acceleration. 2 Those neuromorphic systems concerned with acceleration are built for heightened speed and energy efficiency in existing machinelearning models and tend to have a relatively immediate impact. One common example would be crossbar arrays for vector-matrix multiplication (VMM) in the forward pass of deep neural networks. ...

2D materials represent an exciting frontier for devices and architec-tures beyond von Neumann computing due to their atomically small structure, superior physical properties, and ability to enable gate tunability. All four major classes of emerging non-volatile memory (NVM) devices (resistive, phase change, ferroelectric, and ferromag-netic) have been integrated with 2D materials and their corresponding heterostructures. Device performance for neuromorphic archi-tectures will be compared across each of these classes, and applications ranging from crossbar arrays for multi-layer percep-trons (MLPs) to synaptic devices for spiking neural networks (SNNs) will be presented. To aid in the understanding of neuromor-phic computing, the terms ''acceleration'' and ''actualization'' are used, with the former referring to neuromorphic systems that heighten the speed and energy efficiency of existing machine-learning models and the latter more broadly representing the realization of human neurobiological functions in non-von Neumann architectures. The benefits of 2D materials are addressed in both contexts. Additionally, the landscape of 2D materials-based opto-electronic devices is briefly discussed. These devices leverage the strong optical properties of 2D materials for actualization-based systems that aim to emulate the human visual cortex. Lastly, limitations of 2D materials are considered, with the progress of 2D materials as a novel class of electronic materials for neuromorphic computing depending on their scalable synthesis in thin-film form with desired crystal quality, defect density, and phase purity.

... Encoded diffraction introduces a mask M (x, y) that shapes the sensed patterns and regularizes the phase retrieval problem. In some cases, the encoder may be a metasurface or reconfigurable diffractive optical element [24,42,43]. The sensor field pattern ψ o (u, v) is proportional to the Fourier transform of the field before the lens aperture ψ(x, y), where the transform pairs are (x, y) ↔ (u, v) [9]. ...

Spectral computational methods leverage modal or nonlocal representations of data, and a physically realized approach to spectral computation pertains to encoded diffraction. Encoded diffraction offers a hybrid approach that pairs analog wave propagation with digital back-end electronics, however the intermediate sensor patterns are correlations rather than linear signal weights, which limits the development of robust and efficient downstream analyses. Here, with vortex encoders, we show that the solution for the signal field from sensor intensity adopts the form of polynomial regression, which is subsequently solved with a learned, linear transformation. This result establishes an analytic rationale for a spectral-methods paradigm in physically realized machine learning systems. To demonstrate this paradigm, we quantify the learning that is transferred with an image basis using speckle parameters, Singular-Value Decomposition Entropy ($H_{SVD}$) and Speckle-Analogue Density (SAD). We show that $H_{SVD}$, a proxy for image complexity, indicates the rate at which a model converges. Similarly, SAD, an averaged spatial frequency, marks a threshold for structurally similar reconstruction. With a vortex encoder, this approach with parameterized training may be extended to distill features. In fact, with images reconstructed with our models, we achieve classification accuracies that rival decade-old, state-of-the-art computer algorithms. This means that the process of learning compressed spectral correlations distills features to aid image classification, even when the goal images are feature-agnostic speckles. Our work highlights opportunities for analytic and axiom-driven machine-learning designs appropriate for real-time applications.

... In 2018, Lin et al. trained the neural networks for all-optical machine learning, which are formed by several transmissive phase masks in a similar structure to MPLC [12]. Since then, the potential of using MPLC to do matrix multiplication at light speed has raised much attention [28,29,30], such as the research on photonic Ising machine [31] and the neuromorphic optoelectronic computing [32]. In 2022, Ozer et al. reported using an MPLC-based mode sorter for super-resolution imaging [33]. ...

Multi-plane light conversion (MPLC) has recently been developed as a versatile tool for manipulating spatial distributions of the optical field through repeated phase modulations. An MPLC Device consists of a series of phase masks separated by free-space propagation. It can convert one orthogonal set of beams into another orthogonal set through unitary transformation, which is useful for a number of applications. In telecommunication, for example, mode-division multiplexing (MDM) is a promising technology that will enable continued scaling of capacity by employing spatial modes of a single fiber. MPLC has shown great potential in MDM devices with ultra-wide bandwidth, low insertion loss (IL), low mode-dependent loss (MDL), and low crosstalk. The fundamentals of design, simulation, fabrication, and characterization of practical MPLC mode (de)multiplexers will be discussed in this tutorial.

... Advances in machine learning, deep learning in particular, prompt their applications in diverse real-world scenarios over past decades [45]. However, the rising model complexity and the exploding number of parameters [6] keep drawing concerns over the transparency [21] of the decision-making process for the systems steered by artificial intelligence. Explainable AI (XAI) hence becomes a popular aspect that strives to uncover the underlying behaviors of black boxes wrapped by AI. ...

... With the probability-based edition constrained by edition length, we can gradually approach the target sentence through iterative manipulation on prototypes with words w * ∈ x (Algorithm 3). The prototype pool is initialized by the kclosest counterfactuals from the corpus (line 1) and updated by the edited version repeatedly (lines 6,9). To balance the distributions of the ingredients in x over the constructed neighborhood, we generate equally for each word k |x| variants through editions on randomly chosen prototypes at a step (lines 4, 5). ...

... To balance the distributions of the ingredients in x over the constructed neighborhood, we generate equally for each word k |x| variants through editions on randomly chosen prototypes at a step (lines 4, 5). During the iterative process, editions on prototypes are determined greedily according to the given ingredients as defined by (6). The neighborhood set N absorbs the newly generated variants from each iteration (lines 6, 8) with the repeated instances discarded. ...

The importance of neighborhood construction in local explanation methods has been already highlighted in the literature. And several attempts have been made to improve neighborhood quality for high-dimensional data, for example, texts, by adopting generative models. Although the generators produce more realistic samples, the intuitive sampling approaches in the existing solutions leave the latent space underexplored. To overcome this problem, our work, focusing on local model-agnostic explanations for text classifiers, proposes a progressive approximation approach that refines the neighborhood of a to-be-explained decision with a careful two-stage interpolation using counterfactuals as landmarks. We explicitly specify the two properties that should be satisfied by generative models, the reconstruction ability and the locality-preserving property, to guide the selection of generators for local explanation methods. Moreover, noticing the opacity of generative models during the study, we propose another method that implements progressive neighborhood approximation with probability-based editions as an alternative to the generator-based solution. The explanation results from both methods consist of word-level and instance-level explanations benefiting from the realistic neighborhood. Through exhaustive experiments, we qualitatively and quantitatively demonstrate the effectiveness of the two proposed methods.

... The detection accuracy increased from around 20 % to 70 % over a relatively short period of time. (b) Trends of total number of parameters included in noted ANN models (Data taken from[17]). ...

The characterization of in-place material properties is important for quality control and condition assessment of the built infrastructure. Although various methods have been developed to characterize structural materials in situ, many suffer limitations and cannot provide complete or desired characterization, especially for inhomogeneous and complex materials such as concrete and rock. Recent advances in machine learning and artificial neural networks (ANN) can help address these limitations. In particular, physics-informed neural networks (PINN) portend notable advantages over traditional physics-based or purely data-driven approaches. PINN is a particular form of ANN, where physics-based equations are embedded within an ANN structure in order to regularize the outputs during the training process. This paper reviews the fundamentals of PINN, notes its differences from traditional ANN, and reviews applications of PINN for selected material characterization tasks. A specific application example is presented where mechanical wave propagation data are used to characterize in-place material properties. Ultrasonic data are obtained from experiments on long rod-shaped mortar and glass samples; PINN is applied to these data to extract inhomogeneous wave velocity data, which can indicate mechanical material property variations with respect to length.