# Take Numpy Array as Training Data¶

PaddlePaddle Fluid supports configuring data layer with fluid.layers.data() . Then you can use Numpy Array or directly use Python to create C++ fluid.LoDTensor , and then feed it to fluid.Executor or fluid.ParallelExecutor through Executor.run(feed=...) .

## Configure Data Layer¶

With fluid.layers.data() , you can configure data layer in neural network. Details are as follows:

import paddle.fluid as fluid

image = fluid.layers.data(name="image", shape=[3, 224, 224])
label = fluid.layers.data(name="label", shape=[1], dtype="int64")

# use image/label as layer input
prediction = fluid.layers.fc(input=image, size=1000, act="softmax")
loss = fluid.layers.cross_entropy(input=prediction, label=label)
...


In the code above, image and label are two input data layers created by fluid.layers.data . image is float data of shape [3, 224, 224] ; label is the int data of shape [1] . Note that:

1. -1 is represented for the dimension of batch size by default in Fluid. And -1 is added to the first dimension of shape by default. Therefore in the code above, it would be alright to transfer numpy array of [32, 3, 224, 224] to image . If you want to customize the position of the batch size dimension, please set fluid.layers.data(append_batch_size=False) .Please refer to the tutorial in the advanced user guide: Customize the BatchSize dimension .
2. Data type of category labels in Fluid is int64 and the label starts from 0. About the supported data types,please refer to Data types supported by Fluid .

## Transfer Train Data to Executor¶

Both Executor.run and ParallelExecutor.run receive a parameter feed . The parameter is a dict in Python. Its key is the name of data layer,such as image in code above. And its value is the corresponding numpy array.

For example:

exe = fluid.Executor(fluid.CPUPlace())
# init Program
exe.run(fluid.default_startup_program())
exe.run(feed={
"image": numpy.random.random(size=(32, 3, 224, 224)).astype('float32'),
"label": numpy.random.random(size=(32, 1)).astype('int64')
})


### How to feed Sequence Data¶

Sequence data is a unique data type supported by PaddlePaddle Fluid. You can take LoDTensor as input data type.

You need to:

1. Feed all data to be trained in a mini-batch.
2. Get the length of each sequence.

You can use fluid.create_lod_tensor to create LoDTensor .

To feed sequence information, it is necessary to set the sequence nested depth lod_level .

For instance, if the training data are sentences consisting of words, lod_level=1; if train data are paragraphs which consists of sentences that consists of words, lod_level=2 .

For example:

sentence = fluid.layers.data(name="sentence", dtype="int64", shape=[1], lod_level=1)

...

exe.run(feed={
"sentence": create_lod_tensor(
data=numpy.array([1, 3, 4, 5, 3, 6, 8], dtype='int64').reshape(-1, 1),
recursive_seq_lens=[[4, 1, 2]],
place=fluid.CPUPlace()
)
})


Training data sentence contain three samples, the lengths of which are 4, 1, 2 respectively.

They are data[0:4], data[4:5] and data[5:7] respectively.

### How to prepare training data for every device in ParallelExecutor¶

When you feed data to ParallelExecutor.run(feed=...) , you can explicitly assign data for every training device (such as GPU).

You need to feed a list to feed . Each element of the list is a dict.

The key of the dict is name of data layer and the value of dict is value of data layer.

For example:

parallel_executor = fluid.ParallelExecutor()
parallel_executor.run(
feed=[
{
"image": numpy.random.random(size=(32, 3, 224, 224)).astype('float32'),
"label": numpy.random.random(size=(32, 1)).astype('int64')
},
{
"image": numpy.random.random(size=(16, 3, 224, 224)).astype('float32'),
"label": numpy.random.random(size=(16, 1)).astype('int64')
},
]
)


In the code above, GPU0 will train 32 samples and GPU1 will train 16 samples.

### Customize the BatchSize dimension¶

Batch size is the first dimension of data by default in PaddlePaddle Fluid, indicated by -1 .But in advanced usage, batch_size could be fixed or respresented by other dimension or multiple dimensions, which could be implemented by setting fluid.layers.data(append_batch_size=False) .

1. fixed BatchSize dimension
image = fluid.layers.data(name="image", shape=[32, 784], append_batch_size=False)


Here image is always a matrix with size of [32, 784] .

1. batch size expressed by other dimension
sentence = fluid.layers.data(name="sentence",
shape=[80, -1, 1],
append_batch_size=False,
dtype="int64")


Here the middle dimension of sentence is batch size. This type of data layout is applied in fixed-length recurrent neural networks.