Performance Profiling for TensorRT Library

Test Environment

  • CPU:Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz GPU:Tesla P4
  • TensorRT4.0, CUDA8.0, CUDNNV7
  • Test model ResNet50, MobileNet, ResNet101, Inception V3.

Test Targets

PaddlePaddle, Pytorch, Tensorflow

  • In test, PaddlePaddle adopts subgraph optimization to integrate TensorRT model .
  • Native implementation is used in Pytorch. Model address 1 , address 2 .
  • Test for TensorFlow contains test for native TF and TF—TRT. Test for TF—TRT hasn't reached expectation wihch will be complemented later. Model address .

ResNet50

batch_size PaddlePaddle(ms) Pytorch(ms) TensorFlow(ms)
1 4.64117 16.3 10.878
5 6.90622 22.9 20.62
10 7.9758 40.6 34.36

MobileNet

batch_size PaddlePaddle(ms) Pytorch(ms) TensorFlow(ms)
1 1.7541 7.8 2.72
5 3.04666 7.8 3.19
10 4.19478 14.47 4.25

ResNet101

batch_size PaddlePaddle(ms) Pytorch(ms) TensorFlow(ms)
1 8.95767 22.48 18.78
5 12.9811 33.88 34.84
10 14.1463 61.97 57.94

Inception v3

batch_size PaddlePaddle(ms) Pytorch(ms) TensorFlow(ms)
1 15.1613 24.2 19.1
5 18.5373 34.8 27.2
10 19.2781 54.8 36.7