Single-node training


To perform single-node training in PaddlePaddle Fluid, you need to read Prepare Data and Set up Simple Model . When you have finished reading Set up Simple Model , you can get two fluid.Program, namely startup_program and main_program . By default, you can use fluid.default_startup_program() and fluid.default_main_program() to get global fluid.Program .

For example:

import paddle.fluid as fluid

image ="image", shape=[784])
label ="label", shape=[1])
hidden = fluid.layers.fc(input=image, size=100, act='relu')
prediction = fluid.layers.fc(input=hidden, size=10, act='softmax')
loss = fluid.layers.mean(

sgd = fluid.optimizer.SGD(learning_rate=0.001)

# Here the fluid.default_startup_program() and fluid.default_main_program()
# has been constructed.

After the configuration of model, the configurations of fluid.default_startup_program() and fluid.default_main_program() have been finished.

Initialize Parameters

Random Initialization of Parameters

After the configuration of model,the initialization of parameters will be written into fluid.default_startup_program() . By running this program in fluid.Executor() , the random initialization of parameters will be finished in global fluid.global_scope() .For example:

exe = fluid.Executor(fluid.CUDAPlace(0))

Note that in multi-GPU training, the parameters should be initialized on GPU0 and then will be distributed to multiple graphic cards through fluid.ParallelExecutor .

Load Predefined Parameters

In the neural network training, predefined models are usually loaded to continue training. For how to load predefined parameters, please refer to Save, Load Models or Variables & Incremental Learning.

Single-card Training

Single-card training can be performed through calling run() of fluid.Executor() to run training fluid.Program . In the runtime, feed data with run(feed=...) and get persistable data with run(fetch=...) . For example:

loss = fluid.layers.mean(...)

exe = fluid.Executor(...)
# the result is an numpy array
result ={"image": ..., "label": ...}, fetch_list=[loss])


  1. About data type supported by feed, please refer to the article Transfer Train Data to Executor.
  2. The return value of is the variable value of fetch_list=[...] .The fetched Variable must be persistable. fetch_list can be fed with either Variable list or name list of variables . returns Fetch result list.
  3. If the fetched data contain sequence information, you can set, ...) to directly get fluid.LoDTensor . You can directly access the information in fluid.LoDTensor .

Multi-card Training

In multi-card training, you can use fluid.compiler.CompiledProgram to compile the fluid.Program, and then call with_data_parallel. For example:

exe = fluid.Executor(...)

compiled_prog = fluid.compiler.CompiledProgram(

result =,
                feed={"image": ..., "label": ...})


  1. The constructor of CompiledProgram needs to be set with fluid.Program to be run which can not be modified at runtime.
  2. If exe is initialized with CUDAPlace, the model will be run in GPU. In the mode of graphics card training, all graphics card will be occupied. Users can configure CUDA_VISIBLE_DEVICES to change graphics cards that are being used.
  3. If exe is initialized with CPUPlace, the model will be run in CPU. In this situation, the multi-threads are used to run the model, and the number of threads is equal to the number of logic cores. Users can configure CPU_NUM to change the number of threads that are being used.