fluid

BuildStrategy

class paddle.fluid.BuildStrategy

BuildStrategy allows the user to more preciously control how to build the SSA Graph in ParallelExecutor by setting the property.

Examples

import paddle.fluid as fluid
build_strategy = fluid.BuildStrategy()
build_strategy.reduce_strategy = fluid.BuildStrategy.ReduceStrategy.Reduce
debug_graphviz_path

The type is STR, debug_graphviz_path indicate the path that writing the SSA Graph to file in the form of graphviz. It is useful for debugging. Default “”

Examples

import paddle.fluid as fluid
build_strategy = fluid.BuildStrategy()
build_strategy.debug_graphviz_path = ""
enable_sequential_execution

The type is BOOL. If set True, the execution order of ops would be the same as what is in the program. Default False.

Examples

import paddle.fluid as fluid
build_strategy = fluid.BuildStrategy()
build_strategy.enable_sequential_execution = True
fuse_broadcast_ops

The type is BOOL, fuse_broadcast_op indicates whether to fuse the broadcast ops. Note that, in Reduce mode, fusing broadcast ops may make the program faster. Because fusing broadcast OP equals delaying the execution of all broadcast Ops, in this case, all nccl streams are used only for NCCLReduce operations for a period of time. Default False.

fuse_elewise_add_act_ops

The type is BOOL, fuse_elewise_add_act_ops indicate whether to fuse elementwise_add_op and activation_op, it may make the execution faster. Default False

Examples

import paddle.fluid as fluid
build_strategy = fluid.BuildStrategy()
build_strategy.fuse_elewise_add_act_ops = True
fuse_relu_depthwise_conv

The type is BOOL, fuse_relu_depthwise_conv indicate whether to fuse relu and depthwise_conv2d, it will save GPU memory and may make the execution faster. This options is only available in GPU devices. Default False.

Examples

import paddle.fluid as fluid
build_strategy = fluid.BuildStrategy()
build_strategy.fuse_relu_depthwise_conv = True
gradient_scale_strategy

The type is STR, there are three ways of defining \(loss@grad\) in ParallelExecutor, ‘CoeffNumDevice’, ‘One’ and ‘Customized’. By default, ParallelExecutor sets the \(loss@grad\) according to the number of devices. If you want to customize \(loss@grad\), you can choose ‘Customized’. Default ‘CoeffNumDevice’.

Examples

import paddle.fluid as fluid
build_strategy = fluid.BuildStrategy()
build_strategy.gradient_scale_strategy = True
memory_optimize

The type is BOOL, memory opitimize aims to save total memory consumption, set to True to enable it.

Memory Optimize is our experimental feature, some variables may be reused/removed by optimize strategy. If you need to fetch some variable values when using this feature, please set the persistable property of the variables to True.

Default False

reduce_strategy

The type is STR, there are two reduce strategies in ParallelExecutor, ‘AllReduce’ and ‘Reduce’. If you want that all the parameters’ optimization are done on all devices independently, you should choose ‘AllReduce’; if you choose ‘Reduce’, all the parameters’ optimization will be evenly distributed to different devices, and then broadcast the optimized parameter to other devices. In some models, Reduce is faster. Default ‘AllReduce’.

Examples

import paddle.fluid as fluid
build_strategy = fluid.BuildStrategy()
build_strategy.reduce_strategy = fluid.BuildStrategy.ReduceStrategy.Reduce
remove_unnecessary_lock

The type is BOOL. If set True, some locks in GPU ops would be released and ParallelExecutor would run faster. Default True.

Examples

import paddle.fluid as fluid
build_strategy = fluid.BuildStrategy()
build_strategy.remove_unnecessary_lock = True
sync_batch_norm

The type is BOOL, sync_batch_norm indicates whether to use synchronous batch normalization which synchronizes the mean and variance through multi-devices in training phase.

Current implementation doesn’t support FP16 training and CPU. And only synchronous on one machine, not all machines.

Default False

Examples

import paddle.fluid as fluid
build_strategy = fluid.BuildStrategy()
build_strategy.sync_batch_norm = True

CompiledProgram

class paddle.fluid.CompiledProgram(program_or_graph)[source]

Compiles to Graph for execution.

  1. Users first create the program with layers.
  2. Optionally, users use CompiledProgram to optimize the program before run.
  3. The original program or CompiledProgram is run by executor.

The CompiledProgram is used to transform a program for various optimizations, for example.

  • Pre-compute some logic once so that each run is faster.
  • Transform the program so that it can run in multiple devices.
  • Transform the program for optimized inference or distributed training. Note that: this part is not finished.

Example

import paddle.fluid as fluid
import paddle.fluid.compiler as compiler
import numpy
import os

place = fluid.CUDAPlace(0) # fluid.CPUPlace()
exe = fluid.Executor(place)

data = fluid.layers.data(name='X', shape=[1], dtype='float32')
hidden = fluid.layers.fc(input=data, size=10)
loss = fluid.layers.mean(hidden)
fluid.optimizer.SGD(learning_rate=0.01).minimize(loss)

fluid.default_startup_program().random_seed=1
exe.run(fluid.default_startup_program())
compiled_prog = compiler.CompiledProgram(
         fluid.default_main_program())

x = numpy.random.random(size=(10, 1)).astype('float32')
loss_data, = exe.run(compiled_prog,
                     feed={"X": x},
                     fetch_list=[loss.name])
Parameters:program_or_graph (Graph|Program) – If it’s Program, it will be first lowered to a graph for further optimizations. If it’s a graph (potentially optimized before), it will be directly used for further optimizations. Note: graph is only supported when compiled with with_data_parallel option.
with_data_parallel(loss_name=None, build_strategy=None, exec_strategy=None, share_vars_from=None, places=None)[source]

Configs the program to run in data parallel way.

Example

import paddle.fluid as fluid
import paddle.fluid.compiler as compiler
import numpy
import os

use_cuda = True
place = fluid.CUDAPlace(0) if use_cuda else fluid.CPUPlace()

# NOTE: If you use CPU to run the program, you need
# to specify the CPU_NUM, otherwise, fluid will use
# all the number of the logic core as the CPU_NUM,
# in that case, the batch size of the input should be
# greater than CPU_NUM, if not, the process will be
# failed by an exception.
if not use_cuda:
    os.environ['CPU_NUM'] = str(2)

exe = fluid.Executor(place)

data = fluid.layers.data(name='X', shape=[1], dtype='float32')
hidden = fluid.layers.fc(input=data, size=10)
loss = fluid.layers.mean(hidden)
fluid.optimizer.SGD(learning_rate=0.01).minimize(loss)

fluid.default_startup_program().random_seed=1
exe.run(fluid.default_startup_program())
compiled_prog = compiler.CompiledProgram(
         fluid.default_main_program()).with_data_parallel(
                  loss_name=loss.name)

x = numpy.random.random(size=(10, 1)).astype('float32')
loss_data, = exe.run(compiled_prog,
                     feed={"X": x},
                     fetch_list=[loss.name])
Parameters:
  • loss_name (str) – The loss name must set in training. Default None.
  • build_strategy (BuildStrategy) – build_strategy is used to build the graph so it can run on multiple devices/cores with optimized topology. For more information, please refer to fluid.BuildStrategy. Default None.
  • exec_strategy (ExecutionStrategy) – exec_strategy is used to to select the a way to execute the graph, for example how many threads are used, how many iterations to clean up the temp variables. For more information, please refer to fluid.ExecutionStrategy. Default None.
  • share_vars_from (CompiledProgram) – If provided, this CompiledProgram will share variables from share_vars_from. share_vars_from must be run by the executor before this CompiledProgram so that vars are ready.
  • places (list(CUDAPlace)|list(CPUPlace)|None) – If provided, only compile program in the given places. Otherwise, the places used when compiled is determined by the Executor, and the places used are controlled by environment variables: FLAGS_selected_gpus or CUDA_VISIBLE_DEVICES if using GPU; or CPU_NUM if using CPU. For example, if you want to run on GPU 0 and 1, set places=[fluid.CUDAPlace(0), fluid.CUDAPlace(1)]. If you want to run on 2 CPU cores, set places=[fluid.CPUPlace()]*2.
Returns:

self

with_inference_optimize(config)[source]

Add inference optimize

Parameters:config – instance of NativeConfig or AnalysisConfig to create predictor
Returns:self

cpu_places

paddle.fluid.cpu_places(device_count=None)[source]

Create a list of fluid.CPUPlace objects.

If device_count is None, the device count would be determined by environment variable CPU_NUM. If CPU_NUM is not set, the default value is 1, i.e. CPU_NUM=1.

Parameters:device_count (None|int) – device number.
Returns:cpu place list.
Return type:out (list(fluid.CPUPlace))

Examples

import paddle.fluid as fluid
cpu_places = fluid.cpu_places()

CPUPlace

class paddle.fluid.CPUPlace

CPUPlace is a descriptor of a device. It represents a CPU, and the memory CPUPlace can be accessed by CPU.

Examples

import paddle.fluid as fluid
cpu_place = fluid.CPUPlace()

create_lod_tensor

paddle.fluid.create_lod_tensor(data, recursive_seq_lens, place)[source]

Create a lod tensor from a numpy array, a list, or an existing lod tensor.

Create a lod tensor by doing the following:

  1. Check that the length-based level of detail (LoD) also known as recursive_sequence_lengths of the input is valid.
  2. Convert recursive_sequence_lengths to a offset-based LoD.
  3. Copy the data from a numpy array, a list or a existing lod tensor to CPU or GPU device (based on input place).
  4. Set the level of detail (LoD) using the offset-based LoD.

Examples

Suppose we want LoDTensor to hold data for sequences of word, where each word is represented by an integer. If we want to create a LoDTensor to represent two sentences, one of 2 words, and one of 3 words.

Then data can be a numpy array of integers with shape (5, 1). recursive_seq_lens will be [[2, 3]], indicating the length(# of words) in each sentence. This length-based recursive_seq_lens [[2, 3]] will be converted to offset-based LoD [[0, 2, 5]] inside the function call.

import paddle.fluid as fluid
import numpy as np

t = fluid.create_lod_tensor(np.ndarray([5, 30]), [[2, 3]], fluid.CPUPlace())

Please reference api_guide_low_level_lod_tensor for more details regarding LoD.

Parameters:
  • data (numpy.ndarray|list|LoDTensor) – a numpy array or a LoDTensor or a list holding the data to be copied.
  • recursive_seq_lens (list) – a list of lists indicating the length-based level of detail info specified by the user.
  • place (Place) – CPU or GPU place indicating where the data in the new LoDTensor will be stored.
Returns:

A fluid LoDTensor object with tensor data and recursive_seq_lens info.

create_random_int_lodtensor

paddle.fluid.create_random_int_lodtensor(recursive_seq_lens, base_shape, place, low, high)[source]

Create a LoDTensor containing random integers.

This function is frequently used in the book examples. So we revised it based on the new create_lod_tensor API and put it here in the lod_tensor module to simplify the code.

The function does the following:

  1. Calculate the overall shape of the LoDTensor based on the length-based recursive_seq_lens input and the shape of the basic element in base_shape.
  2. Create a numpy array of this shape.
  3. Create the LoDTensor using create_lod_tensor API.

Suppose we want LoDTensor to hold data for sequences of word, where each word is represented by an integer. If we want to create a LoDTensor to represent two sentences, one of 2 words, and one of 3 words. Then ‘base_shape’ is [1], input length-based ‘recursive_seq_lens’ is [[2, 3]]. Then the overall shape of the LoDTensor would be [5, 1], holding 5 words for two sentences.

Parameters:
  • recursive_seq_lens (list) – a list of lists indicating the length-based level of detail info specified by the user.
  • base_shape (list) – the shape of the basic element to be held by the LoDTensor.
  • place (Place) – CPU or GPU place indicating where the data in the new LoDTensor will be stored.
  • low (int) – the lower bound of the random integers.
  • high (int) – the upper bound of the random integers.
Returns:

A fluid LoDTensor object with tensor data and recursive_seq_lens info.

Examples

import paddle.fluid as fluid

t = fluid.create_random_int_lodtensor(recursive_seq_lens=[[2, 3]],
      base_shape=[30], place=fluid.CPUPlace(), low=0, high=10)

cuda_pinned_places

paddle.fluid.cuda_pinned_places(device_count=None)[source]

Create a list of fluid.CUDAPinnedPlace objects.

If device_count is None, the device count would be determined by environment variable CPU_NUM. If CPU_NUM is not set, the device count would be determined by multiprocessing.cpu_count().

Parameters:device_count (None|int) – device number.
Returns:cuda pinned place list.
Return type:out (list(fluid.CUDAPinnedPlace))

Examples

import paddle.fluid as fluid
cuda_pinned_places_cpu_num = fluid.cuda_pinned_places()
# or
cuda_pinned_places = fluid.cuda_pinned_places(1)

cuda_places

paddle.fluid.cuda_places(device_ids=None)[source]

Create a list of fluid.CUDAPlace objects.

If device_ids is None, environment variable of FLAGS_selected_gpus would be checked first. If FLAGS_selected_gpus=0,1,2, the returned list would be [fluid.CUDAPlace(0), fluid.CUDAPlace(1), fluid.CUDAPlace(2)]. If FLAGS_selected_gpus is not set, all visible gpu places would be returned.

If device_ids is not None, it should be the device ids of gpus. For example, if device_ids=[0,1,2], the returned list would be [fluid.CUDAPlace(0), fluid.CUDAPlace(1), fluid.CUDAPlace(2)].

Parameters:device_ids (None|list(int)|tuple(int)) – gpu device id list.
Returns:gpu place list.
Return type:out (list(fluid.CUDAPlace))

Examples

import paddle.fluid as fluid
cuda_places = fluid.cuda_places()

CUDAPinnedPlace

class paddle.fluid.CUDAPinnedPlace

CUDAPinnedPlace is a descriptor of a device. The memory of CUDAPinnedPlace can be accessed by GPU and CPU.

Examples

import paddle.fluid as fluid
place = fluid.CUDAPinnedPlace()

CUDAPlace

class paddle.fluid.CUDAPlace

CUDAPlace is a descriptor of a device. It represents a GPU, and each CUDAPlace has a dev_id to indicate the number of cards represented by the current CUDAPlace. The memory of CUDAPlace with different dev_id is not accessible.

Examples

import paddle.fluid as fluid
gpu_place = fluid.CUDAPlace(0)

DataFeedDesc

class paddle.fluid.DataFeedDesc(proto_file)[source]

Datafeed descriptor, describing input training data format. This class is currently only used for AsyncExecutor (See comments for class AsyncExecutor for a brief introduction)

DataFeedDesc shall be initialized from a valid protobuf message from disk.

See paddle/fluid/framework/data_feed.proto for message definition. A typical message might look like:

import paddle.fluid as fluid
f = open("data.proto", "w")
print >> f, 'name: "MultiSlotDataFeed"'
print >> f, 'batch_size: 2'
print >> f, 'multi_slot_desc {'
print >> f, '    slots {'
print >> f, '         name: "words"'
print >> f, '         type: "uint64"'
print >> f, '         is_dense: false'
print >> f, '         is_used: true'
print >> f, '     }'
print >> f, '     slots {'
print >> f, '         name: "label"'
print >> f, '         type: "uint64"'
print >> f, '         is_dense: false'
print >> f, '         is_used: true'
print >> f, '    }'
print >> f, '}'
f.close()
data_feed = fluid.DataFeedDesc('data.proto')

However, users usually shouldn’t care about the message format; instead, they are encouragd to use Data Generator as a tool to generate a valid data description, in the process of converting their raw log files to training files acceptable to AsyncExecutor.

DataFeedDesc can also be changed during runtime. Once you got familiar with what each field mean, you can modify it to better suit your need. E.g.:

import paddle.fluid as fluid
data_feed = fluid.DataFeedDesc('data.proto')
data_feed.set_batch_size(128)
data_feed.set_dense_slots('wd')  # The slot named 'wd' will be dense
data_feed.set_use_slots('wd')    # The slot named 'wd' will be used

Finally, the content can be dumped out for debugging purpose:

print(data_feed.desc())
Parameters:proto_file (string) – Disk file containing a data feed description.
set_batch_size(batch_size)[source]

Set batch size. Will be effective during training

Example

import paddle.fluid as fluid
f = open("data.proto", "w")
print >> f, 'name: "MultiSlotDataFeed"'
print >> f, 'batch_size: 2'
print >> f, 'multi_slot_desc {'
print >> f, '    slots {'
print >> f, '         name: "words"'
print >> f, '         type: "uint64"'
print >> f, '         is_dense: false'
print >> f, '         is_used: true'
print >> f, '     }'
print >> f, '     slots {'
print >> f, '         name: "label"'
print >> f, '         type: "uint64"'
print >> f, '         is_dense: false'
print >> f, '         is_used: true'
print >> f, '    }'
print >> f, '}'
f.close()
data_feed = fluid.DataFeedDesc('data.proto')
data_feed.set_batch_size(128)
Parameters:batch_size – batch size
set_dense_slots(dense_slots_name)[source]

Set if a specific slot will be dense. Will be effective during training. features for a dense slot will be fed into a Tensor, while those for a sparse slot will be fed into a LoDTensor

Example

import paddle.fluid as fluid
f = open("data.proto", "w")
print >> f, 'name: "MultiSlotDataFeed"'
print >> f, 'batch_size: 2'
print >> f, 'multi_slot_desc {'
print >> f, '    slots {'
print >> f, '         name: "words"'
print >> f, '         type: "uint64"'
print >> f, '         is_dense: false'
print >> f, '         is_used: true'
print >> f, '     }'
print >> f, '     slots {'
print >> f, '         name: "label"'
print >> f, '         type: "uint64"'
print >> f, '         is_dense: false'
print >> f, '         is_used: true'
print >> f, '    }'
print >> f, '}'
f.close()
data_feed = fluid.DataFeedDesc('data.proto')
data_feed.set_dense_slots(['words'])
Parameters:dense_slots_name – a list of slot names which will be set dense

Note

Default is sparse for all slots

set_use_slots(use_slots_name)[source]

Set if a specific slot will be used for training. A dataset shall contain a lot of features, through this function one can select which ones will be used for a specific model.

Example

import paddle.fluid as fluid
f = open("data.proto", "w")
print >> f, 'name: "MultiSlotDataFeed"'
print >> f, 'batch_size: 2'
print >> f, 'multi_slot_desc {'
print >> f, '    slots {'
print >> f, '         name: "words"'
print >> f, '         type: "uint64"'
print >> f, '         is_dense: false'
print >> f, '         is_used: true'
print >> f, '     }'
print >> f, '     slots {'
print >> f, '         name: "label"'
print >> f, '         type: "uint64"'
print >> f, '         is_dense: false'
print >> f, '         is_used: true'
print >> f, '    }'
print >> f, '}'
f.close()
data_feed = fluid.DataFeedDesc('data.proto')
data_feed.set_use_slots(['words'])
Parameters:use_slots_name – a list of slot names which will be used in training

Note

Default is not used for all slots

desc()[source]

Returns a protobuf message for this DataFeedDesc

Example

import paddle.fluid as fluid
f = open("data.proto", "w")
print >> f, 'name: "MultiSlotDataFeed"'
print >> f, 'batch_size: 2'
print >> f, 'multi_slot_desc {'
print >> f, '    slots {'
print >> f, '         name: "words"'
print >> f, '         type: "uint64"'
print >> f, '         is_dense: false'
print >> f, '         is_used: true'
print >> f, '     }'
print >> f, '     slots {'
print >> f, '         name: "label"'
print >> f, '         type: "uint64"'
print >> f, '         is_dense: false'
print >> f, '         is_used: true'
print >> f, '    }'
print >> f, '}'
f.close()
data_feed = fluid.DataFeedDesc('data.proto')
print(data_feed.desc())
Returns:A string message

DataFeeder

class paddle.fluid.DataFeeder(feed_list, place, program=None)[source]

DataFeeder converts the data that returned by a reader into a data structure that can feed into Executor and ParallelExecutor. The reader usually returns a list of mini-batch data entries. Each data entry in the list is one sample. Each sample is a list or a tuple with one feature or multiple features.

The simple usage shows below:

import paddle.fluid as fluid
place = fluid.CPUPlace()
img = fluid.layers.data(name='image', shape=[1, 28, 28])
label = fluid.layers.data(name='label', shape=[1], dtype='int64')
feeder = fluid.DataFeeder([img, label], fluid.CPUPlace())
result = feeder.feed([([0] * 784, [9]), ([1] * 784, [1])])

If you want to feed data into GPU side separately in advance when you use multi-GPU to train a model, you can use decorate_reader function.

import paddle
import paddle.fluid as fluid

place=fluid.CUDAPlace(0)
data = fluid.layers.data(name='data', shape=[3, 224, 224], dtype='float32')
label = fluid.layers.data(name='label', shape=[1], dtype='int64')

feeder = fluid.DataFeeder(place=place, feed_list=[data, label])
reader = feeder.decorate_reader(
        paddle.batch(paddle.dataset.flowers.train(), batch_size=16), multi_devices=False)
Parameters:
  • feed_list (list) – The Variables or Variables’name that will feed into model.
  • place (Place) – place indicates feed data into CPU or GPU, if you want to feed data into GPU, please using fluid.CUDAPlace(i) (i represents the GPU id), or if you want to feed data into CPU, please using fluid.CPUPlace().
  • program (Program) – The Program that will feed data into, if program is None, it will use default_main_program(). Default None.
Raises:

ValueError – If some Variable is not in this Program.

Examples

import numpy as np
import paddle
import paddle.fluid as fluid

place = fluid.CPUPlace()

def reader():
    yield [np.random.random([4]).astype('float32'), np.random.random([3]).astype('float32')],

main_program = fluid.Program()
startup_program = fluid.Program()

with fluid.program_guard(main_program, startup_program):
    data_1 = fluid.layers.data(name='data_1', shape=[1, 2, 2])
    data_2 = fluid.layers.data(name='data_2', shape=[1, 1, 3])
    out = fluid.layers.fc(input=[data_1, data_2], size=2)
    # ...

feeder = fluid.DataFeeder([data_1, data_2], place)

exe = fluid.Executor(place)
exe.run(startup_program)
for data in reader():
    outs = exe.run(program=main_program,
                   feed=feeder.feed(data),
                   fetch_list=[out])
feed(iterable)[source]

According to feed_list and iterable, converters the input into a data structure that can feed into Executor and ParallelExecutor.

Parameters:iterable (list|tuple) – the input data.
Returns:the result of conversion.
Return type:dict

Examples

import numpy.random as random
import paddle.fluid as fluid

def reader(limit=5):
    for i in range(limit):
        yield random.random([784]).astype('float32'), random.random([1]).astype('int64'), random.random([256]).astype('float32')

data_1 = fluid.layers.data(name='data_1', shape=[1, 28, 28])
data_2 = fluid.layers.data(name='data_2', shape=[1], dtype='int64')
data_3 = fluid.layers.data(name='data_3', shape=[16, 16], dtype='float32')
feeder = fluid.DataFeeder(['data_1','data_2', 'data_3'], fluid.CPUPlace())

result = feeder.feed(reader())
feed_parallel(iterable, num_places=None)[source]

Takes multiple mini-batches. Each mini-batch will be feed on each device in advance.

Parameters:
  • iterable (list|tuple) – the input data.
  • num_places (int) – the number of devices. Default None.
Returns:

the result of conversion.

Return type:

dict

Notes

The number of devices and number of mini-batches must be same.

Examples

import numpy.random as random
import paddle.fluid as fluid

def reader(limit=10):
    for i in range(limit):
        yield [random.random([784]).astype('float32'), random.randint(10)],

x = fluid.layers.data(name='x', shape=[1, 28, 28])
y = fluid.layers.data(name='y', shape=[1], dtype='int64')

feeder = fluid.DataFeeder(['x','y'], fluid.CPUPlace())
place_num = 2
places = [fluid.CPUPlace() for x in range(place_num)]
data = []
exe = fluid.Executor(fluid.CPUPlace())
exe.run(fluid.default_startup_program())
program = fluid.CompiledProgram(fluid.default_main_program()).with_data_parallel(places=places)
for item in reader():
    data.append(item)
    if place_num == len(data):
        exe.run(program=program, feed=list(feeder.feed_parallel(data, place_num)), fetch_list=[])
        data = []
decorate_reader(reader, multi_devices, num_places=None, drop_last=True)[source]

Converter the input data into a data that returned by reader into multiple mini-batches. Each mini-batch will be feed on each device.

Parameters:
  • reader (function) – the reader is the function which can generate data.
  • multi_devices (bool) – whether to use multiple devices or not.
  • num_places (int) – if multi_devices is True, you can specify the number of GPU to use, if multi_devices is None, the function will use all the GPU of the current machine. Default None.
  • drop_last (bool) – whether to drop the last batch if the size of the last batch is less than batch_size. Default True.
Returns:

the result of conversion.

Return type:

dict

Raises:

ValueError – If drop_last is False and the data batch cannot fit for devices.

Examples

import numpy.random as random
import paddle
import paddle.fluid as fluid

def reader(limit=5):
    for i in range(limit):
        yield (random.random([784]).astype('float32'), random.random([1]).astype('int64')),

place=fluid.CUDAPlace(0)
data = fluid.layers.data(name='data', shape=[1, 28, 28], dtype='float32')
label = fluid.layers.data(name='label', shape=[1], dtype='int64')

feeder = fluid.DataFeeder(place=place, feed_list=[data, label])
reader = feeder.decorate_reader(reader, multi_devices=False)

exe = fluid.Executor(place)
exe.run(fluid.default_startup_program())
for data in reader():
    exe.run(feed=data)

default_main_program

paddle.fluid.default_main_program()[source]

Get default/global main program. The main program is used for training or testing.

All layer function in fluid.layers will append operators and variables to the default_main_program.

The default_main_program is the default program in a lot of APIs. For example, the Executor.run() will execute the default_main_program when the program is not specified.

Returns:main program
Return type:Program

Examples

import paddle.fluid as fluid

# Sample Network:
data = fluid.layers.data(name='image', shape=[3, 224, 224], dtype='float32')
label = fluid.layers.data(name='label', shape=[1], dtype='int64')

conv1 = fluid.layers.conv2d(data, 4, 5, 1, act=None)
bn1 = fluid.layers.batch_norm(conv1, act='relu')
pool1 = fluid.layers.pool2d(bn1, 2, 'max', 2)
conv2 = fluid.layers.conv2d(pool1, 16, 5, 1, act=None)
bn2 = fluid.layers.batch_norm(conv2, act='relu')
pool2 = fluid.layers.pool2d(bn2, 2, 'max', 2)

fc1 = fluid.layers.fc(pool2, size=50, act='relu')
fc2 = fluid.layers.fc(fc1, size=102, act='softmax')

loss = fluid.layers.cross_entropy(input=fc2, label=label)
loss = fluid.layers.mean(loss)
opt = fluid.optimizer.Momentum(
    learning_rate=0.1,
    momentum=0.9,
    regularization=fluid.regularizer.L2Decay(1e-4))
opt.minimize(loss)

print(fluid.default_main_program())

default_startup_program

paddle.fluid.default_startup_program()[source]

Get default/global startup program.

The layer function in fluid.layers will create parameters, readers, NCCL handles as global variables. The startup_program will initialize them by the operators in startup program. The layer function will append these initialization operators into startup program.

This method will return the default or the current startup program. Users can use fluid.program_guard to switch program.

Returns:startup program
Return type:Program

Examples

import paddle.fluid as fluid

main_program = fluid.Program()
startup_program = fluid.Program()
with fluid.program_guard(main_program=main_program, startup_program=startup_program):
    x = fluid.layers.data(name="x", shape=[-1, 784], dtype='float32')
    y = fluid.layers.data(name="y", shape=[-1, 1], dtype='int32')
    z = fluid.layers.fc(name="fc", input=x, size=10, act="relu")

    print("main program is: {}".format(fluid.default_main_program()))
    print("start up program is: {}".format(fluid.default_startup_program()))

DistributeTranspiler

class paddle.fluid.DistributeTranspiler(config=None)[source]

DistributeTranspiler

Convert the fluid program to distributed data-parallelism programs. Supports two modes: pserver mode and nccl2 mode.

In pserver mode, the main_program will be transformed to use a remote parameter server to do parameter optimization. And the optimization graph will be put into a parameter server program.

In nccl2 mode, the transpiler will append a NCCL_ID broadcasting op in startup_program to share the NCCL_ID across the job nodes. After transpile_nccl2 called, you *must* pass trainer_id and num_trainers argument to ParallelExecutor to enable NCCL2 distributed mode.

Examples

x = fluid.layers.data(name='x', shape=[13], dtype='float32')
y = fluid.layers.data(name='y', shape=[1], dtype='float32')
y_predict = fluid.layers.fc(input=x, size=1, act=None)

cost = fluid.layers.square_error_cost(input=y_predict, label=y)
avg_loss = fluid.layers.mean(cost)

sgd_optimizer = fluid.optimizer.SGD(learning_rate=0.001)
sgd_optimizer.minimize(avg_loss)

# for pserver mode
pserver_endpoints = "192.168.0.1:6174,192.168.0.2:6174"
trainer_endpoints = "192.168.0.1:6174,192.168.0.2:6174"
current_endpoint = "192.168.0.1:6174"
trainer_id = 0
trainers = 4
role = "PSERVER"
t = fluid.DistributeTranspiler()
t.transpile(
     trainer_id, pservers=pserver_endpoints, trainers=trainers)
if role == "PSERVER":
     pserver_program = t.get_pserver_program(current_endpoint)
     pserver_startup_program = t.get_startup_program(current_endpoint,
                                                    pserver_program)
elif role == "TRAINER":
     trainer_program = t.get_trainer_program()

# for nccl2 mode
trainer_num = 2
trainer_id = 0
config = fluid.DistributeTranspilerConfig()
config.mode = "nccl2"
trainer_endpoints = "192.168.0.1:6174,192.168.0.2:6174"
t = fluid.DistributeTranspiler(config=config)
t.transpile(trainer_id=trainer_id, trainers=trainer_endpoints, current_endpoint="192.168.0.1:6174")
exe = fluid.ParallelExecutor(
    use_cuda=True,
    loss_name=avg_loss.name,
    num_trainers=trainer_num,
    trainer_id=trainer_id
)
transpile(trainer_id, program=None, pservers='127.0.0.1:6174', trainers=1, sync_mode=True, startup_program=None, current_endpoint='127.0.0.1:6174')[source]

Run the transpiler. Transpile the input program.

Parameters:
  • trainer_id (int) – id for current trainer worker, if you have n workers, the id may range from 0 ~ n-1
  • program (Program|None) – program to transpile, default is fluid.default_main_program().
  • startup_program (Program|None) – startup_program to transpile, default is fluid.default_startup_program().
  • pservers (str) – comma separated ip:port string for the pserver list.
  • trainers (int|str) – in pserver mode this is the number of trainers, in nccl2 mode this is a string of trainer endpoints.
  • sync_mode (bool) – Do sync training or not, default is True.
  • startup_program – startup_program to transpile, default is fluid.default_main_program().
  • current_endpoint (str) – need pass current endpoint when transpile as nccl2 distributed mode. In pserver mode this argument is not used.

Examples

transpiler = fluid.DistributeTranspiler()
t.transpile(
    trainer_id=0,
    pservers="127.0.0.1:7000,127.0.0.1:7001",
    trainers=2,
    sync_mode=False,
    current_endpoint="127.0.0.1:7000")
get_trainer_program(wait_port=True)[source]

Get transpiled trainer side program.

Returns:trainer side program.
Return type:Program

Examples

import paddle.fluid as fluid
#this is an example, find available endpoints in your case
pserver_endpoints = "192.168.0.1:6174,192.168.0.2:6174"
trainer_id = 0
trainers = 4
t = fluid.DistributeTranspiler()
t.transpile(trainer_id, trainers=trainers, pservers=pserver_endpoints)
trainer_program = t.get_trainer_program()
get_pserver_program(endpoint)[source]

Get parameter server side program.

Parameters:endpoint (str) – current parameter server endpoint.
Returns:the program for current parameter server to run.
Return type:Program

Examples

import paddle.fluid as fluid
#this is an example, find available endpoints in your case
pserver_endpoints = "192.168.0.1:6174,192.168.0.2:6174"
current_endpoint = "192.168.0.1:6174"
trainer_id = 0
trainers = 4
t = fluid.DistributeTranspiler()
t.transpile(
     trainer_id, pservers=pserver_endpoints, trainers=trainers)
pserver_program = t.get_pserver_program(current_endpoint)
get_pserver_programs(endpoint)[source]

Get pserver side main program and startup program for distributed training.

Parameters:endpoint (str) – current pserver endpoint.
Returns:(main_program, startup_program), of type “Program”
Return type:tuple

Examples

import paddle.fluid as fluid
#this is an example, find available endpoints in your case
pserver_endpoints = "192.168.0.1:6174,192.168.0.2:6174"
current_endpoint = "192.168.0.1:6174"
trainer_id = 0
trainers = 4
t = fluid.DistributeTranspiler()
t.transpile(
     trainer_id, pservers=pserver_endpoints, trainers=trainers)
pserver_program, pserver_startup_program = t.get_pserver_programs(current_endpoint)
get_startup_program(endpoint, pserver_program=None, startup_program=None)[source]

Deprecated

Get startup program for current parameter server. Modify operator input variables if there are variables that were split to several blocks.

Parameters:
  • endpoint (str) – current pserver endpoint.
  • pserver_program (Program) – deprecated, call get_pserver_program first.
  • startup_program (Program) – deprecated, should pass startup_program when initalizing
Returns:

parameter server side startup program.

Return type:

Program

Examples

pserver_endpoints = "192.168.0.1:6174,192.168.0.2:6174"
trainer_endpoints = "192.168.0.1:6174,192.168.0.2:6174"
current_endpoint = "192.168.0.1:6174"
trainer_id = 0
trainers = 4

t = fluid.DistributeTranspiler()
t.transpile(trainer_id, pservers=pserver_endpoints, trainers=trainers)
pserver_program = t.get_pserver_program(current_endpoint)
pserver_startup_program = t.get_startup_program(current_endpoint,
                                                pserver_program)

DistributeTranspilerConfig

class paddle.fluid.DistributeTranspilerConfig[source]
slice_var_up(bool)

Do Tensor slice for pservers, default is True.

split_method(PSDispatcher)

RoundRobin or HashName can be used. Try to choose the best method to balance loads for pservers.

min_block_size(int)

Minimum number of splitted elements in block.

According to : https://github.com/PaddlePaddle/Paddle/issues/8638#issuecomment-369912156 We can use bandwidth effiently when data size is larger than 2MB.If you want to change it, please be sure you have read the slice_variable function.

Examples

config = fluid.DistributeTranspilerConfig()
config.slice_var_up = True

ExecutionStrategy

class paddle.fluid.ExecutionStrategy

ExecutionStrategy allows the user to more preciously control how to run the program in ParallelExecutor by setting the property.

Examples

import paddle.fluid as fluid
x = fluid.layers.data(name='x', shape=[13], dtype='float32')
y = fluid.layers.data(name='y', shape=[1], dtype='float32')
y_predict = fluid.layers.fc(input=x, size=1, act=None)

cost = fluid.layers.square_error_cost(input=y_predict, label=y)
avg_loss = fluid.layers.mean(cost)

sgd_optimizer = fluid.optimizer.SGD(learning_rate=0.001)
sgd_optimizer.minimize(avg_loss)

exec_strategy = fluid.ExecutionStrategy()
exec_strategy.num_threads = 4

train_exe = fluid.ParallelExecutor(use_cuda=False,
                                   loss_name=avg_loss.name,
                                   exec_strategy=exec_strategy)
allow_op_delay

The type is BOOL, allow_op_delay represents whether to delay the communication operators to run, it may make the execution faster. Note that this option is invalid now, and it will be removed in next version. Default False.

num_iteration_per_drop_scope

The type is INT, num_iteration_per_drop_scope indicates how many iterations to clean up the temp variables which is generated during execution. It may make the execution faster, because the temp variable’s shape maybe the same between two iterations. Default 1.

Notes

  1. If you fetch data when calling the ‘run’, the ParallelExecutor will clean up the temp variables at the end of the current iteration.
  2. In some NLP model, it may cause the GPU memory is insufficient, in this case, you should reduce num_iteration_per_drop_scope.
num_iteration_per_run

This config that how many iteration the executor will run when user call pe.run() in python

num_threads

The type is INT, num_threads represents the size of thread pool that used to run the operators of the current program in ParallelExecutor. If \(num\_threads=1\), all the operators will execute one by one, but the order maybe difference between iterations. If it is not set, it will be set in ParallelExecutor according to the device type and device count, for GPU, \(num\_threads=device\_count*4\), for CPU, \(num\_threads=CPU\_NUM*4\), the explanation of:math:CPU_NUM is in ParallelExecutor. if it is not set, ParallelExecutor will get the cpu count by calling multiprocessing.cpu_count(). Default 0.

Executor

class paddle.fluid.Executor(place)[source]

An Executor in Python, supports single/multiple-GPU running, and single/multiple-CPU running. Python executor takes a program, adds feed operators and fetch operators to this program according to feed map and fetch_list. Feed map provides input data for the program. fetch_list provides the variables(or names) that user wants to get after program runs. Note: the executor will run all operators in the program but not only the operators dependent by the fetch_list. It stores the global variables into the global scope, and creates a local scope for the temporary variables. The contents in local scope may be discarded after every minibatch forward/backward finished. But the global scope variables will be persistent through different runs.

Examples

import paddle.fluid as fluid
import paddle.fluid.compiler as compiler
import numpy
import os

use_cuda = True
place = fluid.CUDAPlace(0) if use_cuda else fluid.CPUPlace()
exe = fluid.Executor(place)

train_program = fluid.Program()
startup_program = fluid.Program()
with fluid.program_guard(train_program, startup_program):
    data = fluid.layers.data(name='X', shape=[1], dtype='float32')
    hidden = fluid.layers.fc(input=data, size=10)
    loss = fluid.layers.mean(hidden)
    fluid.optimizer.SGD(learning_rate=0.01).minimize(loss)

# Run the startup program once and only once.
# Not need to optimize/compile the startup program.
startup_program.random_seed=1
exe.run(startup_program)

# Run the main program directly without compile.
x = numpy.random.random(size=(10, 1)).astype('float32')
loss_data, = exe.run(train_program,
                     feed={"X": x},
                     fetch_list=[loss.name])

# Or, compiled the program and run. See `CompiledProgram`
# for more detail.
# NOTE: If you use CPU to run the program, you need
# to specify the CPU_NUM, otherwise, fluid will use
# all the number of the logic core as the CPU_NUM,
# in that case, the batch size of the input should be
# greater than CPU_NUM, if not, the process will be
# failed by an exception.
if not use_cuda:
    os.environ['CPU_NUM'] = str(2)

compiled_prog = compiler.CompiledProgram(
    train_program).with_data_parallel(
    loss_name=loss.name)
loss_data, = exe.run(compiled_prog,
                     feed={"X": x},
                     fetch_list=[loss.name])
Parameters:place (fluid.CPUPlace|fluid.CUDAPlace(n)) – indicate the executor run on which device.
close()[source]

Close this executor.

You can no longer use this executor after calling this method. For the distributed training, this method would free the resource on PServers related to the current Trainer.

Examples

import paddle.fluid as fluid

cpu = fluid.CPUPlace()
exe = fluid.Executor(cpu)
# execute training or testing
exe.close()
run(program=None, feed=None, fetch_list=None, feed_var_name='feed', fetch_var_name='fetch', scope=None, return_numpy=True, use_program_cache=False)[source]

Run program by this Executor. Feed data by feed map, fetch result by fetch_list. Python executor takes a program, add feed operators and fetch operators to this program according to feed map and fetch_list. Feed map provides input data for the program. fetch_list provides the variables(or names) that user want to get after program run.

Note: the executor will run all operators in the program but not only the operators dependent by the fetch_list.

Examples

import paddle.fluid as fluid
import numpy

# First create the Executor.
place = fluid.CPUPlace() # fluid.CUDAPlace(0)
exe = fluid.Executor(place)

data = fluid.layers.data(name='X', shape=[1], dtype='float32')
hidden = fluid.layers.fc(input=data, size=10)
loss = fluid.layers.mean(hidden)
adam = fluid.optimizer.Adam()
adam.minimize(loss)

# Run the startup program once and only once.
exe.run(fluid.default_startup_program())

x = numpy.random.random(size=(10, 1)).astype('float32')
outs = exe.run(feed={'X': x},
               fetch_list=[loss.name])
Parameters:
  • program (Program|CompiledProgram) – the program that need to run, if not provided, then default_main_program (not compiled) will be used.
  • feed (dict) – feed variable map, e.g. {“image”: ImageData, “label”: LabelData}
  • fetch_list (list) – a list of variable or variable names that user wants to get, this method will return them according to this list.
  • feed_var_name (str) – the name for the input variable of feed Operator.
  • fetch_var_name (str) – the name for the output variable of fetch Operator.
  • scope (Scope) – the scope used to run this program, you can switch it to different scope. default is global_scope
  • return_numpy (bool) – if convert the fetched tensor to numpy
  • use_program_cache (bool) – whether to use the cached program settings across batches. Setting it be true would be faster only when (1) the program is not compiled with data parallel, and (2) program, feed variable names and fetch_list variable names do not changed compared to the last step.
Returns:

fetch result according to fetch_list.

Return type:

list(numpy.array)

infer_from_dataset(program=None, dataset=None, scope=None, thread=0, debug=False, fetch_list=None, fetch_info=None, print_period=100)[source]

The document of infer_from_dataset is almost the same as train_from_dataset, except that in distributed training, push gradients will be disabled in infer_from_dataset. infer_from_dataset() can be used for evaluation in multi-thread very easily.

Parameters:
  • program (Program|CompiledProgram) – the program that needs to be run, if not provided, then default_main_program (not compiled) will be used.
  • dataset (paddle.fluid.Dataset) – dataset created outside this function, a user should provide a well-defined dataset before calling this function. Please check the document of Dataset if needed. default is None
  • scope (Scope) – the scope used to run this program, you can switch it to different scope for each run. default is global_scope
  • thread (int) – number of thread a user wants to run in this function. The actual number of thread will be min(Dataset.thread_num, thread) if thread > 0, default is 0
  • debug (bool) – whether a user wants to run infer_from_dataset, default is False
  • fetch_list (Variable List) – fetch variable list, each variable will be printed during training, default is None
  • fetch_info (String List) – print information for each variable, default is None
  • print_period (int) – the number of mini-batches for each print, default is 100
Returns:

None

Examples

import paddle.fluid as fluid

place = fluid.CPUPlace() # you can set place = fluid.CUDAPlace(0) to use gpu
exe = fluid.Executor(place)
x = fluid.layers.data(name="x", shape=[10, 10], dtype="int64")
y = fluid.layers.data(name="y", shape=[1], dtype="int64", lod_level=1)
dataset = fluid.DatasetFactory().create_dataset()
dataset.set_use_var([x, y])
dataset.set_thread(1)
filelist = [] # you should set your own filelist, e.g. filelist = ["dataA.txt"]
dataset.set_filelist(filelist)
exe.run(fluid.default_startup_program())
exe.infer_from_dataset(program=fluid.default_main_program(),
                       dataset=dataset)
train_from_dataset(program=None, dataset=None, scope=None, thread=0, debug=False, fetch_list=None, fetch_info=None, print_period=100)[source]

Train from a pre-defined Dataset. Dataset is defined in paddle.fluid.dataset. Given a program, either a program or compiled program, train_from_dataset will consume all data samples in dataset. Input scope can be given by users. By default, scope is global_scope(). The total number of thread run in training is thread. Thread number used in training will be minimum value of threadnum in Dataset and the value of thread in this interface. Debug can be set so that executor will display Run-Time for all operators and the throughputs of current training task.

Note: train_from_dataset will destroy all resources created within executor for each run.

Parameters:
  • program (Program|CompiledProgram) – the program that needs to be run, if not provided, then default_main_program (not compiled) will be used.
  • dataset (paddle.fluid.Dataset) – dataset created outside this function, a user should provide a well-defined dataset before calling this function. Please check the document of Dataset if needed.
  • scope (Scope) – the scope used to run this program, you can switch it to different scope for each run. default is global_scope
  • thread (int) – number of thread a user wants to run in this function. The actual number of thread will be min(Dataset.thread_num, thread)
  • debug (bool) – whether a user wants to run train_from_dataset
  • fetch_list (Variable List) – fetch variable list, each variable will be printed during training
  • fetch_info (String List) – print information for each variable
  • print_period (int) – the number of mini-batches for each print
Returns:

None

Examples

import paddle.fluid as fluid

place = fluid.CPUPlace() # you can set place = fluid.CUDAPlace(0) to use gpu
exe = fluid.Executor(place)
x = fluid.layers.data(name="x", shape=[10, 10], dtype="int64")
y = fluid.layers.data(name="y", shape=[1], dtype="int64", lod_level=1)
dataset = fluid.DatasetFactory().create_dataset()
dataset.set_use_var([x, y])
dataset.set_thread(1)
filelist = [] # you should set your own filelist, e.g. filelist = ["dataA.txt"]
dataset.set_filelist(filelist)
exe.run(fluid.default_startup_program())
exe.train_from_dataset(program=fluid.default_main_program(),
                       dataset=dataset)

global_scope

paddle.fluid.global_scope()[source]

Get the global/default scope instance. There are a lot of APIs use global_scope as its default value, e.g., Executor.run

Examples

import paddle.fluid as fluid
import numpy

fluid.global_scope().var("data").get_tensor().set(numpy.ones((2, 2)), fluid.CPUPlace())
numpy.array(fluid.global_scope().find_var("data").get_tensor())
Returns:The global/default scope instance.
Return type:Scope

gradients

paddle.fluid.gradients(targets, inputs, target_gradients=None, no_grad_set=None)[source]

Backpropagate the gradients of targets to inputs.

Parameters:
  • targets (Variable|list[Variable]) – The target variables.
  • inputs (Variable|list[Variable]) – The input variables.
  • target_gradients (Variable|list[Variable]|None) – The gradient variables of targets which has the same shape with targets, If None, ones will be created for them.
  • no_grad_set (set[string]) – The names of variables that have no gradients in Block 0. All variables with stop_gradient=True from all blocks will be automatically added.
Returns:

A list of gradients for inputs If an input does not affect targets, the corresponding gradient variable will be None.

Return type:

(list[Variable])

Examples

import paddle.fluid as fluid

x = fluid.layers.data(name='x', shape=[2,8,8], dtype='float32')
x.stop_gradient=False
y = fluid.layers.conv2d(x, 4, 1, bias_attr=False)
y = fluid.layers.relu(y)
y = fluid.layers.conv2d(y, 4, 1, bias_attr=False)
y = fluid.layers.relu(y)
z = fluid.gradients([y], x)
print(z)

in_dygraph_mode

paddle.fluid.in_dygraph_mode()[source]

Check program status(tracer), Whether it runs in dygraph mode or not

Returns:True if the program is running in dynamic graph mode
Return type:out (boolean)

Examples

import paddle.fluid as fluid
if fluid.in_dygraph_mode():
    pass

LoDTensor

class paddle.fluid.LoDTensor

LoDTensor is a Tensor with optional LoD information.

np.array(lod_tensor) can convert LoDTensor to numpy array. lod_tensor.lod() can retrieve the LoD information.

LoD is short for Level of Details and is usually used for varied sequence length. You can skip the following comment if you don’t need optional LoD.

For example, a LoDTensor X can look like the example below. It contains 2 sequences. The first has length 2 and the second has length 3, as described by x.lod.

The first tensor dimension 5=2+3 is calculated from LoD if it’s available. It means the total number of sequence element. In X, each element has 2 columns, hence [5, 2].

x.lod = [[2, 3]]

x.data = [[1, 2], [3, 4], [5, 6], [7, 8], [9, 10]]

x.shape = [5, 2]

LoD can have multiple levels (for example, a paragraph can have multiple sentences and a sentence can have multiple words). In the following LodTensor Y, the lod_level is 2. It means there are 2 sequence, the first sequence length is 2 (has 2 sub-sequences), the second one’s length is 1. The first sequence’s 2 sub-sequences have length 2 and 2, respectively. And the second sequence’s 1 sub-sequence has length 3.

y.lod = [[2 1], [2 2 3]]

y.shape = [2+2+3, ...]

Examples:
import paddle.fluid as fluid

t = fluid.LoDTensor()

Note

In above description, LoD is length-based. In Paddle internal implementation, lod is offset-based. Hence, internally, y.lod is represented as [[0, 2, 3], [0, 2, 4, 7]] (length-based equivlent would be [[2-0, 3-2], [2-0, 4-2, 7-4]]).

Sometimes LoD is called recursive_sequence_length to be more self-explanatory. In this case, it must be length-based. Due to history reasons. when LoD is called lod in public API, it might be offset-based. Users should be careful about it.

has_valid_recursive_sequence_lengths(self: paddle.fluid.core_avx.LoDTensor) → bool

Check whether the lod of the LoDTensor is valid.

Returns:whether the lod is valid.
Return type:out (bool)

Examples

import paddle.fluid as fluid
import numpy as np

t = fluid.LoDTensor()
t.set(np.ndarray([5, 30]), fluid.CPUPlace())
t.set_recursive_sequence_lengths([[2, 3]])
print(t.has_valid_recursive_sequence_lengths()) # True
lod(self: paddle.fluid.core_avx.LoDTensor) → List[List[int]]

Return the LoD of the LoDTensor.

Returns:the lod of the LoDTensor.
Return type:out (List[List[int]])

Examples

import paddle.fluid as fluid
import numpy as np

t = fluid.LoDTensor()
t.set(np.ndarray([5, 30]), fluid.CPUPlace())
t.set_lod([[0, 2, 5]])
print(t.lod()) # [[0, 2, 5]]
recursive_sequence_lengths(self: paddle.fluid.core_avx.LoDTensor) → List[List[int]]

Return the sequence length of the LoDTensor corresponding to LoD.

Returns:the sequence lengths.
Return type:out (List[List[int])

Examples

import paddle.fluid as fluid
import numpy as np

t = fluid.LoDTensor()
t.set(np.ndarray([5, 30]), fluid.CPUPlace())
t.set_recursive_sequence_lengths([[2, 3]])
print(t.recursive_sequence_lengths()) # [[2, 3]]
set(*args, **kwargs)

Overloaded function.

  1. set(self: paddle.fluid.core_avx.Tensor, arg0: numpy.ndarray[float32], arg1: paddle::platform::CPUPlace) -> None
  2. set(self: paddle.fluid.core_avx.Tensor, arg0: numpy.ndarray[int32], arg1: paddle::platform::CPUPlace) -> None
  3. set(self: paddle.fluid.core_avx.Tensor, arg0: numpy.ndarray[float64], arg1: paddle::platform::CPUPlace) -> None
  4. set(self: paddle.fluid.core_avx.Tensor, arg0: numpy.ndarray[int64], arg1: paddle::platform::CPUPlace) -> None
  5. set(self: paddle.fluid.core_avx.Tensor, arg0: numpy.ndarray[bool], arg1: paddle::platform::CPUPlace) -> None
  6. set(self: paddle.fluid.core_avx.Tensor, arg0: numpy.ndarray[uint16], arg1: paddle::platform::CPUPlace) -> None
  7. set(self: paddle.fluid.core_avx.Tensor, arg0: numpy.ndarray[uint8], arg1: paddle::platform::CPUPlace) -> None
  8. set(self: paddle.fluid.core_avx.Tensor, arg0: numpy.ndarray[int8], arg1: paddle::platform::CPUPlace) -> None
set_lod(self: paddle.fluid.core_avx.LoDTensor, lod: List[List[int]]) → None

Set LoD of the LoDTensor.

Parameters:lod (List[List[int]]) – the lod to be set.

Examples

import paddle.fluid as fluid
import numpy as np

t = fluid.LoDTensor()
t.set(np.ndarray([5, 30]), fluid.CPUPlace())
t.set_lod([[0, 2, 5]])
set_recursive_sequence_lengths(self: paddle.fluid.core_avx.LoDTensor, recursive_sequence_lengths: List[List[int]]) → None

Set LoD of the LoDTensor according to recursive sequence length.

For example, if recursive_sequence_lengths=[[2, 3]], meaning that there are two sequences with length 2 and 3 respectively, the corresponding lod would be [[0, 2, 2+3]], i.e, [[0, 2, 5]].

Parameters:recursive_sequence_lengths (List[List[int]]) – sequence lengths.

Examples

import paddle.fluid as fluid
import numpy as np

t = fluid.LoDTensor()
t.set(np.ndarray([5, 30]), fluid.CPUPlace())
t.set_recursive_sequence_lengths([[2, 3]])
shape(self: paddle.fluid.core_avx.Tensor) → List[int]

LoDTensorArray

class paddle.fluid.LoDTensorArray

Array of LoDTensor.

Examples

import paddle.fluid as fluid

arr = fluid.LoDTensorArray()
append(self: paddle.fluid.core_avx.LoDTensorArray, tensor: paddle.fluid.core_avx.LoDTensor) → None

Append a LoDensor to LoDTensorArray.

Examples

import paddle.fluid as fluid
import numpy as np

arr = fluid.LoDTensorArray()
t = fluid.LoDTensor()
t.set(np.ndarray([5, 30]), fluid.CPUPlace())
arr.append(t)

memory_optimize

paddle.fluid.memory_optimize(input_program, skip_opt_set=None, print_log=False, level=0, skip_grads=True)[source]
Legacy memory optimization strategy, reduce total memory consumption by reuse variable memory between different operators.
Simple sample to explain the algorithm:
c = a + b  # assume this is the last time a is used
d = b * c
since a will not be used anymore after “c = a + b”, and the size of a and d are the same, we can use variable a to replace variable d, so actually we can optimize the above code to below:
c = a + b
a = b * c
Please notice that, in this legacy design, we are using variable a to replace d directly, which means after you call this API, some variables may disappear, and some variables may hold unexpected values, like the above case, actually a holds the value of d after execution.
So to protect important variables from being reused/removed in the optimization, we provide skip_opt_set to allow you specify a variable whitelist. The variables in the skip_opt_set will not be affected by memory_optimize API.

Note

This API is deprecated, please avoid to use it in your new code.
Does not support operators which will create sub-block like While, IfElse etc.
Parameters:
  • input_program (str) – Input Program
  • skip_opt_set (set) – vars wil be skipped in memory optimze
  • print_log (bool) – whether to print debug log.
  • level (int) – 0 or 1, 0 means we replace a with b only when a.size == b.size, 1 means we can replace a with b if a.size <= b.size
Returns:

None

Examples

import paddle.fluid as fluid
main_prog = fluid.Program()
startup_prog = fluid.Program()

place = fluid.CPUPlace()
exe = fluid.Executor(place)

exe.run(startup_prog)
fluid.memory_optimize(main_prog)

name_scope

paddle.fluid.name_scope(prefix=None)[source]

Generate hierarchical name prefix for the operators.

Note: This should only used for debugging and visualization purpose. Don’t use it for serious analysis such as graph/program transformations.

Parameters:prefix (str) – prefix.

Examples

import paddle.fluid as fluid
with fluid.name_scope("s1"):
    a = fluid.layers.data(name='data', shape=[1], dtype='int32')
    b = a + 1
    with fluid.name_scope("s2"):
        c = b * 1
    with fluid.name_scope("s3"):
        d = c / 1
with fluid.name_scope("s1"):
    f = fluid.layers.pow(d, 2.0)
with fluid.name_scope("s4"):
    g = f - 1

ParallelExecutor

class paddle.fluid.ParallelExecutor(use_cuda, loss_name=None, main_program=None, share_vars_from=None, exec_strategy=None, build_strategy=None, num_trainers=1, trainer_id=0, scope=None)[source]

ParallelExecutor is designed for data parallelism, which focuses on distributing the data across different nodes and every node operates on the data in parallel. If you use ParallelExecutor to run the current program on GPU, the node means GPU device, and ParallelExecutor will get the available GPU device automatically on the current machine. If you use ParallelExecutor to run the current program on CPU, the node means the CPU device, and you can specify the CPU device number by adding ‘CPU_NUM’ environment variable, for example ‘CPU_NUM=4’, if the environment variable is not found, ParallelExecutor will call multiprocessing.cpu_count to get the number of CPUs in the system.

Examples

import paddle.fluid as fluid
import numpy
import os

use_cuda = True
place = fluid.CUDAPlace(0) if use_cuda else fluid.CPUPlace()

# NOTE: If you use CPU to run the program, you need
# to specify the CPU_NUM, otherwise, fluid will use
# all the number of the logic core as the CPU_NUM,
# in that case, the batch size of the input should be
# greater than CPU_NUM, if not, the process will be
# failed by an exception.
if not use_cuda:
    os.environ['CPU_NUM'] = str(2)

exe = fluid.Executor(place)

train_program = fluid.Program()
startup_program = fluid.Program()
with fluid.program_guard(train_program, startup_program):
    data = fluid.layers.data(name='X', shape=[1], dtype='float32')
    hidden = fluid.layers.fc(input=data, size=10)
    loss = fluid.layers.mean(hidden)
    test_program = fluid.default_main_program().clone(for_test=True)
    fluid.optimizer.SGD(learning_rate=0.01).minimize(loss)

startup_program.random_seed=1
exe.run(startup_program)

train_exe = fluid.ParallelExecutor(use_cuda=use_cuda,
                                   main_program=train_program,
                                   loss_name=loss.name)
test_exe = fluid.ParallelExecutor(use_cuda=use_cuda,
                                  main_program=test_program,
                                  share_vars_from=train_exe)

x = numpy.random.random(size=(10, 1)).astype('float32')
loss_data, = train_exe.run(feed={"X": x},
                           fetch_list=[loss.name])

loss_data, = test_exe.run(feed={"X": x},
                          fetch_list=[loss.name])
Parameters:
  • use_cuda (bool) – Whether to use CUDA or not.
  • loss_name (str) – The loss name must set in training. Default None.
  • main_program (Program) – The program that need to run, if not provided, then default_main_program will be used. Default None.
  • share_vars_from (ParallelExecutor) – If provide, it will share variables from the specified ParallelExecutor. Default None.
  • exec_strategy (ExecutionStrategy) – exec_strategy is used to control how to run the program in ParallelExecutor, for example how many threads are used to execute the program, how many iterations to clean up the temp variables which is generated during execution. For more information, please refer to fluid.ExecutionStrategy. Default None.
  • build_strategy (BuildStrategy) – build_strategy is used to control how to build the SSA Graph in ParallelExecutor by setting the property, for example reduce_strategy, gradient_scale_strategy. For more information, please refer to fluid.BuildStrategy. Default None.
  • num_trainers (int) – If greater than 1, NCCL will be initialized with multiple rank of nodes, each node should have same number of GPUs. Distributed training will be enabled then. Default 1.
  • trainer_id (int) – Must use together with num_trainers. trainer_id is the “rank” of current node starts from 0. Default 0.
  • scope (Scope) – scope to run with, default use fluid.global_scope().
Returns:

The initialized ParallelExecutor object.

Return type:

ParallelExecutor

Raises:

TypeError – If share_vars_from is provided, but not ParallelExecutor object.

run(fetch_list, feed=None, feed_dict=None, return_numpy=True)[source]

Run a parallel executor with fetch_list.

The feed parameter can be a dict or a list. If feed is a dict, the feed data will be split into multiple devices. If feed is a list, we assume the data has been splitted into multiple devices, the each element in the list will be copied to each device directly.

Examples

import paddle.fluid as fluid
import numpy
import os

use_cuda = True
place = fluid.CUDAPlace(0) if use_cuda else fluid.CPUPlace()

# NOTE: If you use CPU to run the program, you need
# to specify the CPU_NUM, otherwise, fluid will use
# all the number of the logic core as the CPU_NUM,
# in that case, the batch size of the input should be
# greater than CPU_NUM, if not, the process will be
# failed by an exception.
if not use_cuda:
    os.environ['CPU_NUM'] = str(2)

exe = fluid.Executor(place)

train_program = fluid.Program()
startup_program = fluid.Program()
with fluid.program_guard(train_program, startup_program):
    data = fluid.layers.data(name='X', shape=[1], dtype='float32')
    hidden = fluid.layers.fc(input=data, size=10)
    loss = fluid.layers.mean(hidden)
    fluid.optimizer.SGD(learning_rate=0.01).minimize(loss)

startup_program.random_seed=1
exe.run(startup_program)

train_exe = fluid.ParallelExecutor(use_cuda=use_cuda,
                                   main_program=train_program,
                                   loss_name=loss.name)

# If the feed is a dict:
# the image will be splitted into devices. If there is two devices
# each device will process an image with shape (5, 1)
x = numpy.random.random(size=(10, 1)).astype('float32')
loss_data, = train_exe.run(feed={"X": x},
                           fetch_list=[loss.name])

# If the feed is a list:
# each device will process each element in the list.
# the 1st device will process an image with shape (10, 1)
# the 2nd device will process an image with shape (9, 1)
#
# you can use exe.device_count to get the device number.
x2 = numpy.random.random(size=(9, 1)).astype('float32')
loss_data, = train_exe.run(feed=[{"X": x}, {"X": x2}],
                           fetch_list=[loss.name])
Parameters:
  • fetch_list (list) – The fetched variable names
  • feed (list|dict|None) – The feed variables. If the feed is a dict, tensors in that dict will be splitted into each devices. If the feed is a list, each element of the list will be copied to each device. Default None.
  • feed_dict – Alias for feed parameter, for backward compatibility. This parameter has been deprecated. Default None.
  • return_numpy (bool) – Whether converts the fetched tensor to numpy. Default: True.
Returns:

The fetched result list.

Return type:

List

Raises:

ValueError – If the feed is a list, but its length is not equal the length of active places, or its element’s is not dict.

Notes

  1. If the feed’s type is dict, the number of data that feeds to ParallelExecutor must be bigger than active places. Otherwise, it will throw exception from C++ side. Special attention should be paid to check whether the last batch of the dataset is bigger than active places.
  2. If active places are more than one, the fetch results for each variable is a list, and each element of this list is the variable of respective active place.

Examples

pe = fluid.ParallelExecutor(use_cuda=use_cuda,
                            loss_name=avg_cost.name,
                            main_program=fluid.default_main_program())
loss = pe.run(feed=feeder.feed(cur_batch),
              fetch_list=[avg_cost.name]))
drop_local_exe_scopes()[source]

Drop the local execution scope immediately.

During the execution of the Program, the generate intermediate results are placed in local execution scope, in some model the creation and deletion of those intermediate results are time-consuming. To resolve that problem, ParallelExecutor provides an option in ExecutionStrategy, i.g. num_iteration_per_drop_scope, this option indicates how many iterations to run before dropping the local execution scope. But in some situation, each iteration generates different intermediate results, it will lead to the result that the memory which is needed by local execution scope gradually increase. And if you want to run another program at this time, there may be insufficient storage, At this point you should drop the local execution scope of other Programs.

Examples

import paddle.fluid as fluid
import numpy
import os

use_cuda = True
# NOTE: If you use CPU to run the program, you need
# to specify the CPU_NUM, otherwise, fluid will use
# all the number of the logic core as the CPU_NUM,
# in that case, the batch size of the input should be
# greater than CPU_NUM, if not, the process will be
# failed by an exception.
if not use_cuda:
    os.environ['CPU_NUM'] = str(2)

train_program = fluid.Program()
startup_program = fluid.Program()
with fluid.program_guard(train_program, startup_program):
    data = fluid.layers.data(name='X', shape=[1], dtype='float32')
    hidden = fluid.layers.fc(input=data, size=10)
    loss = fluid.layers.mean(hidden)

place = fluid.CUDAPlace(0) if use_cuda else fluid.CPUPlace()
exe = fluid.Executor(place)
exe.run(startup_program)

parallel_exe = fluid.ParallelExecutor(use_cuda=use_cuda,
                                   main_program=train_program,
                                   loss_name=loss.name)

x = numpy.random.random(size=(10, 1)).astype('float32')
loss_data, = parallel_exe.run(feed={"X": x},
                           fetch_list=[loss.name])

parallel_exe.drop_local_exe_scopes()

ParamAttr

class paddle.fluid.ParamAttr(name=None, initializer=None, learning_rate=1.0, regularizer=None, trainable=True, gradient_clip=None, do_model_average=False)[source]

Parameter attributes object. To fine-tuning network training process, user can set parameter’s attributes to control training details. Such as learning rate, regularization, trainable, do_model_average and the method to initialize param.

Parameters:
  • name (str) – The parameter’s name. Default None.
  • initializer (Initializer) – The method to initial this parameter. Default None.
  • learning_rate (float) – The parameter’s learning rate. The learning rate when optimize is \(global\_lr * parameter\_lr * scheduler\_factor\). Default 1.0.
  • regularizer (WeightDecayRegularizer) – Regularization factor. Default None.
  • trainable (bool) – Whether this parameter is trainable. Default True.
  • gradient_clip (BaseGradientClipAttr) – The method to clip this parameter’s gradient. Default None.
  • do_model_average (bool) – Whether this parameter should do model average. Default False.

Examples

import paddle.fluid as fluid

w_param_attrs = fluid.ParamAttr(name="fc_weight",
                                learning_rate=0.5,
                                regularizer=fluid.regularizer.L2Decay(1.0),
                                trainable=True)
x = fluid.layers.data(name='X', shape=[1], dtype='float32')
y_predict = fluid.layers.fc(input=x, size=10, param_attr=w_param_attrs)

Program

class paddle.fluid.Program[source]

Python Program. Beneath it is a ProgramDesc, which is used for create c++ Program. A program is a self-contained programing language like container. It has at least one Block, when the control flow op like conditional_block, while_op is included, it will contains nested block. Please reference the framework.proto for details.

Notes: we have default_startup_program and default_main_program by default, a pair of them will shared the parameters. The default_startup_program only run once to initialize parameters, default_main_program run in every mini batch and adjust the weights.

Returns:A empty program.

Examples

import paddle.fluid as fluid

main_program = fluid.Program()
startup_program = fluid.Program()
with fluid.program_guard(main_program=main_program, startup_program=startup_program):
    x = fluid.layers.data(name="x", shape=[-1, 784], dtype='float32')
    y = fluid.layers.data(name="y", shape=[-1, 1], dtype='int32')
    z = fluid.layers.fc(name="fc", input=x, size=10, act="relu")

print("main program is: {}".format(main_program))
print("start up program is: {}".format(startup_program))
to_string(throw_on_error, with_details=False)[source]

To debug string.

Parameters:
  • throw_on_error (bool) – raise Value error when any of required fields is not set.
  • with_details (bool) – True if more details about variables and parameters, e.g., trainable, optimize_attr, need to print.
Returns:

The debug string.

Return type:

str

Raises:

ValueError – If any of required fields is not set and throw_on_error is True.

Examples

import paddle.fluid as fluid

prog = fluid.default_main_program()
prog_string = prog.to_string(throw_on_error=True, with_details=False)
print(prog_string)
clone(for_test=False)[source]

Create a new, duplicated program.

Some operators, e.g., batch_norm, behave differently between training and testing. They have an attribute, is_test, to control this behaviour. This method will change the is_test attribute of them to True when for_test=True.

  • Set for_test to False when we want to clone the program for training.
  • Set for_test to True when we want to clone the program for testing. We will not do any prune on program here, So if you just want an forward program for testing, please use clone before using Opimizer.minimize

Notes: 1. Program.clone() method DOES NOT clone py_reader. 2. This API DOES NOT prune any operator. Use clone(for_test=True) before backward and optimization please. E.g.

test_program = fluid.default_main_program().clone(for_test=True)
optimizer = fluid.optimizer.Momentum(learning_rate=0.01, momentum=0.9)
optimizer.minimize()
Parameters:for_test (bool) – True if change the is_test attribute of operators to True.
Returns:The new, duplicated Program object.
Return type:Program

Examples:

Notes: The Program Descs’ order maybe different after clone and this will not affect your training or testing progress. In the following example we give you an simple method print_prog(program) to print Program Descs inorder to make sure you have same print result after clone:

import paddle.fluid as fluid
import six


def print_prog(prog):
    for name, value in sorted(six.iteritems(prog.block(0).vars)):
        print(value)
    for op in prog.block(0).ops:
        print("op type is {}".format(op.type))
        print("op inputs are {}".format(op.input_arg_names))
        print("op outputs are {}".format(op.output_arg_names))
        for key, value in sorted(six.iteritems(op.all_attrs())):
            if key not in ['op_callstack', 'op_role_var']:
                print(" [ attrs: {}:   {} ]".format(key, value))
  1. To clone a test program, the sample code is:
    import paddle.fluid as fluid
    import six
    
    def print_prog(prog):
        for name, value in sorted(six.iteritems(prog.block(0).vars)):
            print(value)
        for op in prog.block(0).ops:
            print("op type is {}".format(op.type))
            print("op inputs are {}".format(op.input_arg_names))
            print("op outputs are {}".format(op.output_arg_names))
            for key, value in sorted(six.iteritems(op.all_attrs())):
                if key not in ['op_callstack', 'op_role_var']:
                    print(" [ attrs: {}:   {} ]".format(key, value))
    
    train_program = fluid.Program()
    startup_program = fluid.Program()
    with fluid.program_guard(train_program, startup_program):
        with fluid.unique_name.guard():
            img = fluid.layers.data(name='image', shape=[784])
            hidden = fluid.layers.fc(input=img, size=200, act='relu')
            hidden = fluid.layers.dropout(hidden, dropout_prob=0.5)
            loss = fluid.layers.cross_entropy(
                                      input=fluid.layers.fc(hidden, size=10, act='softmax'),
                        label=fluid.layers.data(name='label', shape=[1], dtype='int64'))
            avg_loss = fluid.layers.mean(loss)
            test_program = train_program.clone(for_test=False)
    print_prog(test_program)
    with fluid.program_guard(train_program, startup_program):
        with fluid.unique_name.guard():
            sgd = fluid.optimizer.SGD(learning_rate=1e-3)
            sgd.minimize(avg_loss)
    
  2. The clone method can be avoid if you create program for training and program for testing individually.
    import paddle.fluid as fluid
    import six
    
    def print_prog(prog):
        for name, value in sorted(six.iteritems(prog.block(0).vars)):
            print(value)
        for op in prog.block(0).ops:
            print("op type is {}".format(op.type))
            print("op inputs are {}".format(op.input_arg_names))
            print("op outputs are {}".format(op.output_arg_names))
            for key, value in sorted(six.iteritems(op.all_attrs())):
                if key not in ['op_callstack', 'op_role_var']:
                    print(" [ attrs: {}:   {} ]".format(key, value))
    def network(is_test):
        img = fluid.layers.data(name='image', shape=[784])
        hidden = fluid.layers.fc(input=img, size=200, act='relu')
        hidden = fluid.layers.dropout(hidden, dropout_prob=0.5)
        loss = fluid.layers.cross_entropy(
            input=fluid.layers.fc(hidden, size=10, act='softmax'),
            label=fluid.layers.data(name='label', shape=[1], dtype='int64'))
        avg_loss = fluid.layers.mean(loss)
        return avg_loss
    
    
    train_program_2 = fluid.Program()
    startup_program_2 = fluid.Program()
    test_program_2 = fluid.Program()
    with fluid.program_guard(train_program_2, startup_program_2):
        with fluid.unique_name.guard():
             sgd = fluid.optimizer.SGD(learning_rate=1e-3)
             sgd.minimize(avg_loss)
    # the test startup program is not used.
    with fluid.program_guard(test_program_2, fluid.Program()):
        with fluid.unique_name.guard():
            loss = network(is_test=True)
    print(test_program_2)
    

The two code snippets above will generate and print same programs.

static parse_from_string(binary_str)[source]

Deserialize a program desc from protobuf binary string.

Notes: All information about parameters will be lost after serialization and deserialization.

Parameters:binary_str_type (str) – The binary prootbuf string.
Returns:A deserialized program desc.
Return type:Program
num_blocks

The number of blocks in this program.

Examples

import paddle.fluid as fluid

prog = fluid.default_main_program()
num_blocks = prog.num_blocks
print(num_blocks)
random_seed

The default random seed for random operators in Program. Zero means get the random seed from random device.

Notes: It must be set before the operators have been added.

Examples

import paddle.fluid as fluid

prog = fluid.default_main_program()
random_seed = prog.random_seed
print(random_seed)
prog.random_seed = 1
print(prog.random_seed)
global_block()[source]

Get the first block of this program.

Examples

import paddle.fluid as fluid

prog = fluid.default_main_program()
gb_block = prog.global_block()
print(gb_block)
block(index)[source]

Get the index block of this program :param index: The index of block to get :type index: int

Returns:The index block
Return type:Block

Examples

import paddle.fluid as fluid

prog = fluid.default_main_program()
block_0 = prog.block(0)
print(block_0)
current_block()[source]

Get the current block. The current block is the block to append operators.

Examples

import paddle.fluid as fluid

prog = fluid.default_main_program()
current_blk = prog.current_block()
print(current_blk)
list_vars()[source]

Get all variables from this Program. A iterable object is returned.

Returns:The generator will yield every variable in this program.
Return type:iterable

Examples

import paddle.fluid as fluid

prog = fluid.default_main_program()
img = fluid.layers.data(name='img', shape=[1,28,28], dtype='float32')
label = fluid.layers.data(name='label', shape=[128,1], dtype='int64')
for var in prog.list_vars():
    print(var)

program_guard

paddle.fluid.program_guard(main_program, startup_program=None)[source]

Change the global main program and startup program with “with” statement. Layer functions in the Python “with” block will append operators and variables to the new main programs.

Examples

import paddle.fluid as fluid

main_program = fluid.Program()
startup_program = fluid.Program()
with fluid.program_guard(main_program, startup_program):
    data = fluid.layers.data(name='image', shape=[784, 784], dtype='float32')
    hidden = fluid.layers.fc(input=data, size=10, act='relu')

Notes: The temporary Program can be used if the user does not need to construct either of startup program or main program.

Examples

import paddle.fluid as fluid

main_program = fluid.Program()
# does not care about startup program. Just pass a temporary value.
with fluid.program_guard(main_program, fluid.Program()):
    data = fluid.layers.data(name='image', shape=[784, 784], dtype='float32')
Parameters:
  • main_program (Program) – New main program inside “with” statement.
  • startup_program (Program) – New startup program inside “with” statement. None means not changing startup program.

release_memory

paddle.fluid.release_memory(input_program, skip_opt_set=None)[source]

Modify the input program and insert delete_op to early drop not used variables. The modification will be performed inplace.

Notes: This is an experimental API and could be removed in next few releases. Users should not use this API.

Parameters:
  • input_program (Program) – The program will be inserted delete_op.
  • skip_opt_set (set) – vars wil be skipped in memory optimze
Returns:

None

Examples

import paddle.fluid as fluid

# build network
# ...

# deprecated API
fluid.release_memory(fluid.default_main_program())

scope_guard

paddle.fluid.scope_guard(scope)[source]

Change the global/default scope instance by Python with statement. All variable in runtime will assigned to the new scope.

Parameters:scope – The new global/default scope.

Examples

import paddle.fluid as fluid
import numpy

new_scope = fluid.Scope()
with fluid.scope_guard(new_scope):
     fluid.global_scope().var("data").get_tensor().set(numpy.ones((2, 2)), fluid.CPUPlace())
numpy.array(new_scope.find_var("data").get_tensor())

Tensor

paddle.fluid.Tensor

alias of LoDTensor

WeightNormParamAttr

class paddle.fluid.WeightNormParamAttr(dim=None, name=None, initializer=None, learning_rate=1.0, regularizer=None, trainable=True, gradient_clip=None, do_model_average=False)[source]

Used for weight Norm. Weight Norm is a reparameterization of the weight vectors in a neural network that decouples the length of those weight vectors from their direction. Weight Norm has been implemented as discussed in this paper: Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks.

Parameters:
  • dim (list) – The parameter’s name. Default None.
  • name (str) – The parameter’s name. Default None.
  • initializer (Initializer) – The method to initial this parameter. Default None.
  • learning_rate (float) – The parameter’s learning rate. The learning rate when optimize is \(global\_lr * parameter\_lr * scheduler\_factor\). Default 1.0.
  • regularizer (WeightDecayRegularizer) – Regularization factor. Default None.
  • trainable (bool) – Whether this parameter is trainable. Default True.
  • gradient_clip (BaseGradientClipAttr) – The method to clip this parameter’s gradient. Default None.
  • do_model_average (bool) – Whether this parameter should do model average. Default False.

Examples

import paddle.fluid as fluid
data = fluid.layers.data(name="data", shape=[3, 32, 32], dtype="float32")
fc = fluid.layers.fc(input=data,
                     size=1000,
                     param_attr=fluid.WeightNormParamAttr(
                          dim=None,
                          name='weight_norm_param'))