TX2 - Tensorflow object detection model works with CPU but gives CUDNN error with GPU

I am using model “faster_rcnn_resnet152_v1_640x640_coco17_tpu-8”. When I use only CPU, detection time is around 13 seconds per image. So, I wanted to see the performance of the GPU, but when I use GPU, I get this error:

"UnknownError: 2 root error(s) found.
(0) Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
[[{{node StatefulPartitionedCall/model/conv1_conv/Conv2D}}]]
[[StatefulPartitionedCall/BatchMultiClassNonMaxSuppression/MultiClassNonMaxSuppression/unstack/_64]]
(1) Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
[[{{node StatefulPartitionedCall/model/conv1_conv/Conv2D}}]]
0 successful operations.
0 derived errors ignored. [Op:__inference_signature_wrapper_48771]

Function call stack:
signature_wrapper → signature_wrapper"

Which suggests not enough memory. When I check it, ram usage is at around 6800/7500. So it is a memory problem I guess. But somethings I don’t understand.

I have a PC with GPU “GTX 860m” with 2gb Vram, and “mask_rcnn_inception_v2_coco” model works with it fine, it takes around 4seconds per image. I know models are different but, how come 2gb vram is enough but 7gb ram isn’t. Is it because TX2 uses shared ram, and it is not really dedicated for gpu?

For TX2, should I use an ARM based model, such as mobilenet?

And lastly, is the reason why it works with CPU but not with GPU because with CPU calculations are done “one by one” so it doesn’t require as much as GPU where “multiple calculations done at the same” time?

Hi,

Failed to get convolution algorithm. This is probably because cuDNN failed to initialize ...

Based on the error message above, you should meet some issue when initialing the cuDNN.

Is there another error that mentions out-of-memory issue?
If not, this looks more like an environment issue.

May I know how do you setup the environment?
Which JetPack do you use? How do you install TensorFlow?
Do you install it with the instructions from the below document?

https://docs.nvidia.com/deeplearning/frameworks/install-tf-jetson-platform/index.html

Thanks.

TX2 also gives a warning about out of memory.

Python 3.6 is what I use… I am using JetPack 4.6, Tensorflow 2.6.0+nv21.11

Yes, I did follow the instructions in that document to install tf.

There is no update from you for a period, assuming this is not an issue any more.
Hence we are closing this topic. If need further support, please open a new one.
Thanks

Hi,

Could you share the complete TensorFlow output log with us?
Thanks.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.