# Optimizer¶

Neural network in essence is a Optimization problem . With forward computing and back propagation , Optimizer use back-propagation gradients to optimize parameters in a neural network.

## 1.SGD/SGDOptimizer¶

SGD is an offspring class of Optimizer implementing Random Gradient Descent which is a method of Gradient Descent . When it needs to train a large number of samples, we usually choose SGD to make loss function converge more quickly.

API Reference: SGDOptimizer

## 2.Momentum/MomentumOptimizer¶

Momentum optimizer adds momentum on the basis of SGD , reducing noise problem in the process of random gradient descent. You can set ues_nesterov as False or True, respectively corresponding to traditional Momentum(Section 4.1 in thesis) algorithm and Nesterov accelerated gradient(Section 4.2 in thesis) algorithm.

API Reference: MomentumOptimizer

Adagrad Optimizer can adaptively allocate different learning rates for parameters to solve the problem of different sample sizes for different parameters.

## 4.RMSPropOptimizer¶

RMSProp optimizer is a method to adaptively adjust learning rate. It mainly solves the problem of dramatic decrease of learning rate in the mid-term and end term of model training after Adagrad is used.

API Reference: RMSPropOptimizer

Optimizer of Adam is a method to adaptively adjust learning rate, fit for most non- convex optimization , big data set and high-dimensional scenarios. Adam is the most common optimization algorithm.

Adamax is a variant of Adam algorithm, simplifying limits of learning rate, especially upper limit.

DecayedAdagrad Optimizer can be regarded as an Adagrad algorithm incorporated with decay rate to solve the problem of dramatic descent of learning rate in mid-term and end term of model training.

ModelAverage Optimizer accumulates history parameters through sliding window during the model training. We use averaged parameters at inference time to upgrade general accuracy of inference.