A preview of the PDF is not available
Hardware-Aware Co-Optimization of Deep Convolutional Neural Networks
Abstract and Figures
The unprecedented success of deep neural networks (DNNs), especially convolutional neural networks (CNNs), stems from its high representational power and capability to model complex functions. The representational power in DNNs comes from its complex structure, which increases the computational complexity and (memory) size of models. Thus, high memory capacity and high compute power are required for DNN's processing. However, the embedded devices and mobile platforms have very limited (on-chip)memory and compute capacity, which prohibits the wide deployment of DNNs. To overcome these challenges, compact DNNs with low computational complexity and model size have been proposed. However, the data-reuse unaware compute reduction leads to higher data movement, which consumes orders of magnitude higher energy than an arithmetic operation and renders the compact model energy-inefficient. Moreover, on systolic-array based accelerators, the low data reuse in compact DNNs causes PE (processing element) underutilization, which results in higher (inference) latency. Numerous co-design (DNN accelerator and algorithm) techniques have been proposed to address the inefficiencies of compact DNNs (low energy efficiency and sub-optimal latency). However, their generalizability is quite limited, and more importantly, these co-design techniques are oblivious to the predictive performance of model, which leads to sub-optimal inference accuracy. In this thesis, first, we investigate the performance and security implications of designing compact DNNs. We found that the contemporary methods of reducing the number of parameters and computations increase the total number of activations, which in turn increases the memory-footprint and data movements; thus, lower energy-efficiency. Also, we demonstrate that distinctive characteristics of (compact) DNNs can easily be exploited to decipher the architecture of building blocks in DNNs through side-channel attacks. Then, we propose security-aware design methodologies that are robust against side-channel attacks. Further, we propose data-reuse aware co-design, which balances the computational complexity with data reuse and enables a sweet point for optimal energy-efficiency and latency on both GPUs and systolic accelerators. Moreover, unlike previous co-design methods, our co-design approach enables a trade-off between representational power (of DNNs) and generalization capability, thus, maximize the predictive performance (accuracy on image classification tasks). Furthermore, we propose a subspace self-attention mechanism that improves computational efficiency and boosts the representational power of DNNs. This attention-mechanism incurs negligible parameter overhead and hence, suitable for deployment in compact DNNs. In the end, we employ knowledge distillation as a learning paradigm to achieve a gain in predictive performance without changing the architecture of DNNs. We investigate the efficacy of knowledge distillation as a substitute for residual connections in the residual network. We found that knowledge distillation serves as a good (weight) initializer, which regularizes the gradient-flow in student network. In effect, training DNNs with knowledge distillation avoids gradient flow in (convex) chaotic regions on loss surface and enables convergence in well-behaved regions on error surface.
Figures - uploaded by Nandan Kumar Jha
All figure content in this area was uploaded by Nandan Kumar Jha
Content may be subject to copyright.