April 2017
·
244 Reads
·
165 Citations
Distill
This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.
April 2017
·
244 Reads
·
165 Citations
Distill
... To answer these questions, one has to understand how the momentum method affects the training process. Goh (2017) analyzed the momentum method from the aspect of convergence and dynamics. Several prior studies (Cutkosky and Orabona, 2019;Ma and Yarats, 2018) speculated that averaging past stochastic gradients through momentum might reduce the variance of the noise in the parameter update, thus making the loss decrease faster. ...
April 2017
Distill