深度學習系列專題之優化方法(1)

2022-11-24 15:31:31 字數 683 閱讀 4371

深度學習中定義的損失函式基本上都是極度非凸的函式,僅使用梯度下降法(sgd)很容易陷入區域性最優解,本系列打算講解以下方法:

1、sgd (on the importance of initialization and momentum in deep learning)

2、momentum

3、nesterov accelerated gradient

4、adagrad (adaptive subgradient methods for online learning and stochastic optimization)

5、rmsprop (genderating sequences with recurrent neural networks)

6、rprop (resilient backpropagation algorithm)

7、adadelta (adadelta: an adaptive learning rate method)

8、adam (a method for stochastic optimization)

9、amsgrad (on the convergence of adam and beyond)

10、adabound (adaptive gradient methods with dynamic bound of learning rate)