Hello everyone!
I am getting deviating results when trying to do inference with TensorRT. I have converted a TensorFlow (Keras) model according to the example from the NVIDIA DevBlog:
My goal is to deploy an application that uses TensorRT on the Jetson TX2. Unfortunately there is no Python API available for TensorRT on Jetson. I have modified this example to fit my needs:
There are already small differences between the prediction output of the Keras model and the TensorRT (Python) results after conversion. When running the image classification application (C++) the results also differ from the first two. I honestly have no idea what causes these differences, but they are in part substantial.
A simple example can be found here:
For testing purposes this is all run on a x64 host!
Log output of “train.py”:
log_train_py.txt
Using TensorFlow backend.
WARNING:tensorflow:From /usr/local/lib/python3.5/dist-packages/tensorflow/contrib/learn/python/learn/datasets/base.py:198: retry (from tensorflow.contrib.learn.python.learn.datasets.base) is deprecated and will be removed in a future version.
Instructions for updating:
Use the retry module or similar alternatives.
2018-04-25 21:27:47.392990: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-04-25 21:27:47.527610: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:898] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-04-25 21:27:47.528025: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1344] Found device 0 with properties:
name: Quadro M500M major: 5 minor: 0 memoryClockRate(GHz): 1.124
pciBusID: 0000:06:00.0
totalMemory: 1.96GiB freeMemory: 1.55GiB
This file has been truncated. show original
Log output of “convert.py”:
Log output of “test_keras.py”:
log_test_keras_py.txt
Using TensorFlow backend.
Loading network...
2018-04-25 21:40:00.962271: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-04-25 21:40:01.086140: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:898] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-04-25 21:40:01.086521: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1344] Found device 0 with properties:
name: Quadro M500M major: 5 minor: 0 memoryClockRate(GHz): 1.124
pciBusID: 0000:06:00.0
totalMemory: 1.96GiB freeMemory: 1.55GiB
2018-04-25 21:40:01.086540: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1423] Adding visible gpu devices: 0
2018-04-25 21:40:01.696561: I tensorflow/core/common_runtime/gpu/gpu_device.cc:911] Device interconnect StreamExecutor with strength 1 edge matrix:
This file has been truncated. show original
Log output of “test_trt.py”:
Log output of “test_classifier.py”:
Thank you!
Greetings,
Mario
We created a new “Deep Learning Training and Inference” section in Devtalk to improve the experience for deep learning and accelerated computing, and HPC users:
https://devtalk.nvidia.com/default/board/301/deep-learning-training-and-inference-/
We are moving active deep learning threads to the new section.
URLs for topics will not change with the re-categorization. So your bookmarks and links will continue to work as earlier.
-Siddharth