Model Parameters

Model parameters are weights and biases in a model. In fluid, they are instances of fluid.Parameter class which is inherited from fluid, and they are all persistable variables. Model training is a process of learning and updating model parameters. The attributes related to model parameters can be configured by ParamAttr . The configurable contents are as follows:

  • Initialization method
  • Regularization
  • gradient clipping
  • Model Average

Initialization method

Fluid initializes a single parameter by setting attributes of initializer in ParamAttr .

examples:

param_attrs = fluid.ParamAttr(name="fc_weight",
                          initializer=fluid.initializer.ConstantInitializer(1.0))
y_predict = fluid.layers.fc(input=x, size=10, param_attr=param_attrs)

The following is the initialization method supported by fluid:

1. BilinearInitializer

Linear initialization. The deconvolution operation initialized by this method can be used as a linear interpolation operation.

Alias:Bilinear

API reference: BilinearInitializer

2. ConstantInitializer

Constant initialization. Initialize the parameter to the specified value.

Alias:Constant

API reference: ConstantInitializer

3. MSRAInitializer

Please refer to https://arxiv.org/abs/1502.01852 for initialization.

Alias:MSRA

API reference: MSRAInitializer

4. NormalInitializer

Initialization method of random Gaussian distribution.

Alias:Normal

API reference: NormalInitializer

5. TruncatedNormalInitializer

Initialization method of stochastic truncated Gauss distribution.

Alias:TruncatedNormal

API reference: TruncatedNormalInitializer

6. UniformInitializer

Initialization method of random uniform distribution.

Alias:Uniform

API reference: UniformInitializer

7. XavierInitializer

Please refer to http://proceedings.mlr.press/v9/glorot10a/glorot10a.pdf for initialization.

Alias:Xavier

API reference: XavierInitializer

Regularization

Fluid regularizes a single parameter by setting attributes of regularizer in ParamAttr .

param_attrs = fluid.ParamAttr(name="fc_weight",
                          regularizer=fluid.regularizer.L1DecayRegularizer(0.1))
y_predict = fluid.layers.fc(input=x, size=10, param_attr=param_attrs)

The following is the regularization approach supported by fluid:

Clipping

Fluid sets clipping method for a single parameter by setting attributes of gradient_clip in ParamAttr .

param_attrs = fluid.ParamAttr(name="fc_weight",
                          regularizer=fluid.regularizer.L1DecayRegularizer(0.1))
y_predict = fluid.layers.fc(input=x, size=10, param_attr=param_attrs)

The following is the clipping method supported by fluid:

1. ErrorClipByValue

Used to clipping the value of a tensor to a specified range.

API reference: ErrorClipByValue

2. GradientClipByGlobalNorm

Used to limit the global-norm of multiple Tensors to clip_norm.

API reference: GradientClipByGlobalNorm

3. GradientClipByNorm

Limit the L2-norm of Tensor to max_norm . If Tensor’s L2-norm exceeds: max_norm , it will calculate a scale . And then all values of the Tensor multiply the scale .

API reference: GradientClipByNorm

4. GradientClipByValue

Limit the value of the gradient on a parameter to [min, max].

API reference: GradientClipByValue

Model Averaging

Fluid determines whether to average a single parameter by setting attributes of do_model_average in ParamAttr . Examples:

param_attrs = fluid.ParamAttr(name="fc_weight",
                          do_model_average=true)
y_predict = fluid.layers.fc(input=x, size=10, param_attr=param_attrs)

In the miniBatch training process, parameters will be updated once after each batch, and the average model averages the parameters generated by the latest K updates.

The averaged parameters are only used for testing and prediction, and they do not get involved in the actual training process.

API reference ModelAverage