TensorFlow operation tf.batch_to_space_nd() function not working as expected on Jetson TX2

I am working on a TensorFlow model which uses the operation tf.batch_to_space_nd(). This operation works well on CPU as well as on NVIDIA GeForce GPU. However this op fails while running on Jetson TX2.
On looking into the operation, I figured out that it produces correct output only till a certain input size, after which the result is a matrix of zeros.
https://www.tensorflow.org/api_docs/python/tf/batch_to_space_nd

mat=np.random.rand(1,65,65,728)
in=tf.constant(mat,tf.float32)
block_shape=tf.constant([2,2],tf.int32)
paddings=tf.constant([[2,3],[2,3]],tf.int32)
op=tf.space_to_batch_nd(in,block_shape,paddings)
print(in)
print(op)
with tf.Session() as sess:
    out=sess.run(op)
    print('sum of elements in out:',np.sum(out))

The output obtained from the above code is as follows:
(‘in’,<tf.Tensor ‘Const:0’ shape=(1,65,65,728) dtype=float32)>)
(‘op’,<tf.Tensor ‘SpaceToBatchND:0’ shape=(4,35,35,728) dtype=float32)>)
(‘sum of elements in out:’,0.0)

c=542
mat=np.random.rand(1,65,65,c)
in=tf.constant(mat,tf.float32)
block_shape=tf.constant([2,2],tf.int32)
paddings=tf.constant([[2,3],[2,3]],tf.int32)
op=tf.space_to_batch_nd(in,block_shape,paddings)
print(in)
print(op)
with tf.Session() as sess:
    out=sess.run(op)
    print('sum of elements in out:',np.sum(out))

The output obtained from the above code is as follows:
(‘in’,<tf.Tensor ‘Const:0’ shape=(1,65,65,542) dtype=float32)>)
(‘op’,<tf.Tensor ‘SpaceToBatchND:0’ shape=(4,35,35,542) dtype=float32)>)
(‘sum of elements in out:’,326.3739)

While running on Jetson TX2 GPU, I have observed that when the number of channels c<=542, the operation works correctly and results in a non-zero output matrix. On the other hand if no.of channels c > 542, it results in a zero matrix of size(4,35,35,c).

Also the running the same on CPU results in correct output irrespective of the channel size.

I want to get this op working on TX2 for input size (1,65,65,728).

Any inputs on what might be causing this issue or any fix would be of great help.

Hi,

We want to reproduce this issue internally.
Could you share your TensorFlow version and wheel package with us?

Thanks.

Hi,
I am using the Tensorflow version ‘1.7.0’.
Can you please check on the above mentioned issue ?

Thanks

Hi,

Could you share the TensorFlow package you are using?
Do you build it from source or download from public?

Based on your explanation, it looks like your package is built with incorrect GPU architecture.
It is good on small tensor size.
Error occurs when using larger tensor size, which may meet the incompatible architecture between declared and TX2.

Could you help to confirm that your TensorFlow package is built with GPU=6.2 first.
Thanks.