tensorflow inference result is error

I trained a model on the server with tensorflow, reasoning on the GPU of TX2, and got a wrong result, but the correct result can be obtained on the CPU on TX2. My version of TensorFlow is 1.8, from https://nvidia.box.com/v/TF180-Py35-wTRT, using CUDA9.0 and cudnn7.15 from JetPack3.3, can you solve this problem for me, thank you!


Do you use the same TensorFlow version of the training process?

We found that some implementation (should be GPU implementation for your use case) differs from the version:
Could you check if this issue still occurs with the same TensorFlow version of training first?



According to your method, I set the tensorflow version of the training model to 1.11.0, and the version on TX2 is also 1.11.0, from https://nvidia.box.com/v/JP33-TF1-11-0-py35-wTRT. But the model’s inferencing on the GPU of TX2 still yields the wrong result,but by specifying the device with tf.device(’/cpu:0’): inferencing on the CPU of TX2 can get the correct result。
The server environment for training the model:
Ubuntu16.04, CUDA9.0 cudnn7.3.0 tensorflow-gpu==1.11.0
TX2 environment:
Ubuntu16.04, CUDA9.0 cudnn7.1.5 from JetPack3.3, tensorflow==1.11.0 from https://nvidia.box.com/v/JP33-TF1-11-0-py35-wTRT.

My image preprocessing code is as follows. I visualized the preprocessed code, which is correct, because the reasoning on the CPU of the TX2 can get the correct result, just the GPU gets the wrong result, which is very strange.
image = cv2.imread(alltestPath, -1)
save_image = cv2.imread(alltestPath, -1)
image = cv2.resize(image, (self.imageWidth, self.imageHight))

        image0 = image
        save_image = cv2.resize(save_image, (self.imageWidth, self.imageHight))
        h, w, c = save_image.shape
        image = (image * 1.0 / 255) * 2 - 1
        image = np.expand_dims(image, 0)

I need your help,thank you!

Hi AastaLLL
Is there any way to solve my problem?
Thank you!


Sorry for keeping you waiting.

There is a known issue in TensorFlow on TX2:

If there is a batch_to_space_nd() layer inside your model, TensorFlow may handle it incorrectly.
To check this, you can execute it with cuda-memcheck to see if any error.


$ cuda-memcheck python myApp.py


Hi AastaLLL,
Thank you for your help. According to your guidance, I checked my code. There is a batch_to_space_nd() layer inside my model. Do you have any solution?


There are two available solutions for your reference.

1) This issue is fixed in Jetson Xavier.

2) Please fallback to CUDA 8.0, which is included in JetPack 3.1.