I am using model “faster_rcnn_resnet152_v1_640x640_coco17_tpu-8”. When I use only CPU, detection time is around 13 seconds per image. So, I wanted to see the performance of the GPU, but when I use GPU, I get this error:
"UnknownError: 2 root error(s) found.
(0) Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
[[{{node StatefulPartitionedCall/model/conv1_conv/Conv2D}}]]
[[StatefulPartitionedCall/BatchMultiClassNonMaxSuppression/MultiClassNonMaxSuppression/unstack/_64]]
(1) Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
[[{{node StatefulPartitionedCall/model/conv1_conv/Conv2D}}]]
0 successful operations.
0 derived errors ignored. [Op:__inference_signature_wrapper_48771]
Function call stack:
signature_wrapper → signature_wrapper"
Which suggests not enough memory. When I check it, ram usage is at around 6800/7500. So it is a memory problem I guess. But somethings I don’t understand.
I have a PC with GPU “GTX 860m” with 2gb Vram, and “mask_rcnn_inception_v2_coco” model works with it fine, it takes around 4seconds per image. I know models are different but, how come 2gb vram is enough but 7gb ram isn’t. Is it because TX2 uses shared ram, and it is not really dedicated for gpu?
For TX2, should I use an ARM based model, such as mobilenet?
And lastly, is the reason why it works with CPU but not with GPU because with CPU calculations are done “one by one” so it doesn’t require as much as GPU where “multiple calculations done at the same” time?