Figure B.1: CIFAR100, IPC=50: Inner Loss and gradient norm for Neumann

Figure B.1: CIFAR100, IPC=50: Inner Loss and gradient norm for Neumann

Source publication
Preprint
Full-text available
Bi-level optimization (BO) has become a fundamental mathematical framework for addressing hierarchical machine learning problems. As deep learning models continue to grow in size, the demand for scalable bi-level optimization solutions has become increasingly critical. Traditional gradient-based bi-level optimization algorithms, due to their inhere...

Contexts in source publication

Context 1
... provide an overview of (FG) 2 U in Figure 1 to illustrate its strengths and role in large-scale bi-level optimization. The rest of the paper is organized as follows. ...
Context 2
... practice, a more cost-effective two-phase paradigm can be achieved by strategically placing (FG) 2 U and other methods at different stages of the training process, as we will discuss in Section 3.2. For an illustration of the role of (FG) 2 U in large-scale bi-level optimization, please refer to Figure 1. ...
Context 3
... memory and computational efficiencies of TRGU, Hessian-Free, and (FG) 2 U in the most challenging case (CIFAR-100, IPC=50) are reported in Figure 1 (Bottom Right), demonstrating that the efficiency of (FG) 2 U can be significantly enhanced through intra/inter-GPU parallelism. ...
Context 4
... we increased the unrolled depth while maintaining the base model as a DistilGPT2. We plotted the F1 scores and GPU memory usages for RGU with unrolled depths of {1, 2, 4, 6} and (FG) 2 U with unrolled depths of {24, 48} on StreamingQA in Figure 1 (Bottom Left). The performance of the weight model is positively correlated with the unrolled depth, substantiating the benefits of training with larger unrolled depths. ...