Simpler Algorithm in Gradient Descent

<p>David Woodruff <sup>1</sup>, Youyu Zhu<sup>2</sup>, Xiye Zhang<sup>3</sup></p>

doi:10.25236/FER.2020.030414

Frontiers in Educational Research, 2020, 3(4); doi: 10.25236/FER.2020.030414.

Simpler Algorithm in Gradient Descent

Author(s)

David Woodruff ¹, Youyu Zhu², Xiye Zhang³

Corresponding Author:

David Woodruff

Affiliation(s)

1 Carnegie Mellon University,5000 Forbes Avenue Pittsburgh, PA 15213
2 Shanghai Kongjiang Middle School, Shanghai, China
3 Sichuan University, Sichuan, China

Download PDF
|
Download: 46
|
View: 1361

Abstract

In Batch Gradient Descent, the most efficient constant learning rate should be , and L is the Lipschitz constant. In “Learning the Learning Rate”(Xiaoxia Wu et al., 2018), some of their algorithms seem t-o be inefficient. This paper is gonna to further improve their method. The searching-L algorithm is raised in the following passage, which limits the WNGrad-Batch’s complexity from square to linear by gradually approximating our learning rate to . In stochastic gradient descent, we find that b-y using the learning rate , which conforms the updating rule: , can have a more stable upper bound of the time complexity, which won’t have a parameter γ.

Keywords

Learning rate; Gradient descent

Cite This Paper

David Woodruff , Youyu Zhu, Xiye Zhang. Simpler Algorithm in Gradient Descent. Frontiers in Educational Research (2020) Vol. 3 Issue 4: 59-62. https://doi.org/10.25236/FER.2020.030414.

References

[1] Y Nesterov (1998). Introductory lectures on convex programming volume i: Basic course, pp.75-76.
[2] Xiaoxia Wu, Rachel Ward, Leon Bottou (2018). WNGrad: Learn the Learning Rate in Gradient Descent, pp.89-90.