Figure 2 - available via license: Creative Commons Attribution 4.0 International
Content may be subject to copyright.
Source publication
Dataflow/mapping decides the compute and energy efficiency of DNN accelerators. Many mappers have been proposed to tackle the intra-layer map-space. However, mappers for inter-layer map-space (aka layer-fusion map-space), have been rarely discussed. In this work, we propose a mapper, DNNFuser, specifically focusing on this layer-fusion map-space. W...
Contexts in source publication
Context 1
... propose DNNFuser, a pre-trained Transformer-based mapper for layer fusion optimization for DNN workloads. DNNFuser, with a fully trained model, features the ability to infer an optimized mapping for different HW conditions at inference time. As shown in Fig. 2, DNNFuser takes in the input of a workload (a DNN model and its batch size), HW parameters (number of PEs, on-chip BW, offchip BW), and an HW condition (requesting on-chip buffer usage), and outputs an optimized ...
Context 2
... 0 ), ..., ( í µí± í µí±¡ ,í µí± í µí±¡ ,í µí± í µí±¡ ), .., ( í µí± í µí± ,í µí± í µí± ,í µí± í µí± )), with N being the number of layers of the targeting DNN models, and reward (return-to-go) being the sum of future reward. The Transformer will take this sequence as input and generate a prediction for the next action, í µí±, as shown in Fig. 2. For supervised learning, the loss is taken as the Mean Square Error between the predicted action, í µí± í µí±¡ , and the actual action, í µí± í µí±¡ , for í µí±¡ ∈ [0, 1, ..., í µí± ...
Context 3
... than the common practice of taking only state or state-action pairs to generate a policy in traditional RLs [16,17], we take the full reward-state-action pair as inputs as previously described. The benefit is that the reward is now exposed as input at the inference phase Fig. 2), which opens the opportunity to control the output (mapping solution) by the "desired" reward. This technique is also leveraged in ...
Context 4
... works well as a layer fusion mapper, which beats many optimization methods. However, G-Teacher is still a search-based method, which is inevitably slower than most of the inference-based DNN models. Therefore, G-Teacher only serves as a training data generator, which samples several good solutions (trajectories) in the layer-fusion environment (Fig. 2). DNNFuser, as a student, learns from these demonstrations and generalizes the knowledge to learn to generate optimized mapping at inference ...
Context 5
... Training. With all the above-mentioned preparation and setup, we can now train our model. As shown in Fig. 2, the steps are as follows. 1) Data collection with teacher model. We take G-Teacher and ask it to search for several (4-10) sets of optimized mapping in different conditions (on-chip memory sizes). 2) Formulating into RL state transition. We take those solutions, which is a sequence of actions in RL terminology, and decorate them with ...
Similar publications
After transformer is proposed, lots of pre-trained language models have been come up with and sentiment analysis (SA) task has been improved. In this paper, we proposed a method that uses an auxiliary sentence about aspects that the sentence contains to help sentiment prediction. The first is aspect detection, which uses a multi-aspects detection m...