TensorFlow OP SpaceToBatchND Does Not Work Correctly on TX2

[Environment]
Jetpack 3.3, shipped with CUDA and CUDNN
Python 3.5.2
Tensorflow 1.9.0, installed via:

pip3 install --extra-index-url https://developer.download.nvidia.com/compute/redist/jp33 tensorflow-gpu

I tried to run my semantic segmentation network on TX2, but met an unexpected error.
Using only CPU, the segmentation result is correct:
https://image.ibb.co/mi0EnA/cpu-capture.png
Using GPU, the result is:
https://image.ibb.co/f1adfV/gpu-capture.png

By tracking the internal tensors in the model, I found that the CPU and GPU results differs from the SpaceToBatchND operation, below is part of the operations in my tf model, and the result begins to be false since the line 24:

fres9/conv_a_1x3/Relu 
fres9/conv_b_3x1/weights/Initializer/random_uniform/shape
fres9/conv_b_3x1/weights/Initializer/random_uniform/min
fres9/conv_b_3x1/weights/Initializer/random_uniform/max
fres9/conv_b_3x1/weights/Initializer/random_uniform/RandomUniform
fres9/conv_b_3x1/weights/Initializer/random_uniform/sub
fres9/conv_b_3x1/weights/Initializer/random_uniform/mul
fres9/conv_b_3x1/weights/Initializer/random_uniform
fres9/conv_b_3x1/weights  
fres9/conv_b_3x1/weights/Assign
fres9/conv_b_3x1/weights/read
fres9/conv_b_3x1/biases/Initializer/zeros
fres9/conv_b_3x1/biases
fres9/conv_b_3x1/biases/Assign
fres9/conv_b_3x1/biases/read
fres9/conv_b_3x1/dilation_rate
fres9/conv_b_3x1/filter_shape
fres9/conv_b_3x1/stack
fres9/conv_b_3x1/required_space_to_batch_paddings/input_shape
fres9/conv_b_3x1/required_space_to_batch_paddings/paddings
fres9/conv_b_3x1/required_space_to_batch_paddings/crops
fres9/conv_b_3x1/SpaceToBatchND/block_shape
fres9/conv_b_3x1/SpaceToBatchND/paddings
fres9/conv_b_3x1/SpaceToBatchND # GPU result is wrong here 
fres9/conv_b_3x1/Conv2D
fres9/conv_b_3x1/BatchToSpaceND/block_shape
fres9/conv_b_3x1/BatchToSpaceND/crops
fres9/conv_b_3x1/BatchToSpaceND
fres9/conv_b_3x1/BiasAdd
fres9/conv_b_3x1/Relu

You could see the difference (left: CPU; right: GPU):
https://image.ibb.co/hLbOEq/diff801.png

If you would kindly reproduce this problem, code can be found at https://drive.google.com/file/d/1GqaKeqX49dr0NWy6dKpa66uwyFot7OEJ/view?usp=sharing, simply run

python3 cpu-main.py

and

python3 gpu-main.py

to see the difference.

Thank you!

See also https://devtalk.nvidia.com/default/topic/1037898/tensorflow-batch_to_space_nd-not-working-for-large-channel-sizes-on-tx2/, but in that post, it is because the dimensions are large (shape=(4, 35, 35, 543)), while in my situation, the shape is shape=(4, 36, 60, 128).
Your staff mentioned Nvidia will solve this problem, but how long will it take to get it work?

Hi,

As you said, this is a known issue. We fix this on Xavier but not available for TX2.

There is a workaround worthy to try:
We found that TensorFlow can now do dilated convs natively instead of using SpaceToBatch/BatchToSpace with NCHW format.
So maybe you can try to use NCHW to see if any help.

Thanks.

Is a fix or workaround available for this yet? This single issue has prevented me from being able to deploy TX2s for 4+ months now.

This issue is fixed in Xavier.

Thanks.

Thanks – any chance on fixing for TX2 or is TX2 planning to be EOL’d? Xavier is overkill for some applications and it would be nice if Deeplab V3+ can be run on a TX2 as it’s one of the best segmentation networks out there. This is largely a dealbreaker in moving forward with an NVIDIA Jetson platform vs. an Intel + NVIDIA combo. All of my TX2s have been unusable for several months now due to this fatally critical issue.

I don’t think they would propose any solutions. since it’s been so far and they seem only focus on Xavier

Hi,

The request of TX2 is passed to the engineer when the topic filed.
It is prioritized internally and we will update to you if any progress.

Thanks.

Hi,

We have tested this issue on the JetPack4.2 with Tensorflow 1.13.1:
https://devtalk.nvidia.com/default/topic/1038957/jetson-tx2/tensorflow-for-jetson-tx2-/

Our application can be executed correctly without meeting the original error.
It’s recommended to give it a try.

Thanks.

Thanks!
I’m unable to use 4.2. Quoting Auvidea’s support:

“We are indeed planning on supporting Jetpack 4.2 and our software developer is already on the case, but currently, he has no clear estimate on how long it will take for it to be done.”