David Woodruff 1, Youyu Zhu2, Xiye Zhang3
1 Carnegie Mellon University,5000 Forbes Avenue Pittsburgh, PA 15213
2 Shanghai Kongjiang Middle School, Shanghai, China
3 Sichuan University, Sichuan, China
In Batch Gradient Descent, the most efficient constant learning rate should be , and L is the Lipschitz constant. In “Learning the Learning Rate”(Xiaoxia Wu et al., 2018), some of their algorithms seem t-o be inefficient. This paper is gonna to further improve their method. The searching-L algorithm is raised in the following passage, which limits the WNGrad-Batch’s complexity from square to linear by gradually approximating our learning rate to . In stochastic gradient descent, we find that b-y using the learning rate , which conforms the updating rule: , can have a more stable upper bound of the time complexity, which won’t have a parameter γ.
Learning rate; Gradient descent
David Woodruff , Youyu Zhu, Xiye Zhang. Simpler Algorithm in Gradient Descent. Frontiers in Educational Research (2020) Vol. 3 Issue 4: 59-62. https://doi.org/10.25236/FER.2020.030414.
 Y Nesterov (1998). Introductory lectures on convex programming volume i: Basic course, pp.75-76.
 Xiaoxia Wu, Rachel Ward, Leon Bottou (2018). WNGrad: Learn the Learning Rate in Gradient Descent, pp.89-90.