[Environment]
Jetpack 3.3, shipped with CUDA and CUDNN
Python 3.5.2
Tensorflow 1.9.0, installed via:
pip3 install --extra-index-url https://developer.download.nvidia.com/compute/redist/jp33 tensorflow-gpu
I tried to run my semantic segmentation network on TX2, but met an unexpected error.
Using only CPU, the segmentation result is correct:
https://image.ibb.co/mi0EnA/cpu-capture.png
Using GPU, the result is:
https://image.ibb.co/f1adfV/gpu-capture.png
By tracking the internal tensors in the model, I found that the CPU and GPU results differs from the SpaceToBatchND operation, below is part of the operations in my tf model, and the result begins to be false since the line 24:
fres9/conv_a_1x3/Relu
fres9/conv_b_3x1/weights/Initializer/random_uniform/shape
fres9/conv_b_3x1/weights/Initializer/random_uniform/min
fres9/conv_b_3x1/weights/Initializer/random_uniform/max
fres9/conv_b_3x1/weights/Initializer/random_uniform/RandomUniform
fres9/conv_b_3x1/weights/Initializer/random_uniform/sub
fres9/conv_b_3x1/weights/Initializer/random_uniform/mul
fres9/conv_b_3x1/weights/Initializer/random_uniform
fres9/conv_b_3x1/weights
fres9/conv_b_3x1/weights/Assign
fres9/conv_b_3x1/weights/read
fres9/conv_b_3x1/biases/Initializer/zeros
fres9/conv_b_3x1/biases
fres9/conv_b_3x1/biases/Assign
fres9/conv_b_3x1/biases/read
fres9/conv_b_3x1/dilation_rate
fres9/conv_b_3x1/filter_shape
fres9/conv_b_3x1/stack
fres9/conv_b_3x1/required_space_to_batch_paddings/input_shape
fres9/conv_b_3x1/required_space_to_batch_paddings/paddings
fres9/conv_b_3x1/required_space_to_batch_paddings/crops
fres9/conv_b_3x1/SpaceToBatchND/block_shape
fres9/conv_b_3x1/SpaceToBatchND/paddings
fres9/conv_b_3x1/SpaceToBatchND # GPU result is wrong here
fres9/conv_b_3x1/Conv2D
fres9/conv_b_3x1/BatchToSpaceND/block_shape
fres9/conv_b_3x1/BatchToSpaceND/crops
fres9/conv_b_3x1/BatchToSpaceND
fres9/conv_b_3x1/BiasAdd
fres9/conv_b_3x1/Relu
You could see the difference (left: CPU; right: GPU):
https://image.ibb.co/hLbOEq/diff801.png
If you would kindly reproduce this problem, code can be found at https://drive.google.com/file/d/1GqaKeqX49dr0NWy6dKpa66uwyFot7OEJ/view?usp=sharing, simply run
python3 cpu-main.py
and
python3 gpu-main.py
to see the difference.
Thank you!
See also https://devtalk.nvidia.com/default/topic/1037898/tensorflow-batch_to_space_nd-not-working-for-large-channel-sizes-on-tx2/, but in that post, it is because the dimensions are large (shape=(4, 35, 35, 543)), while in my situation, the shape is shape=(4, 36, 60, 128).
Your staff mentioned Nvidia will solve this problem, but how long will it take to get it work?