cuDNN failed to initialize while running the evaluation method for detectnet_v2

I am trying to implement a dockerized version of the transfer learning toolkit where I pull the NGC Nvidia docker into my own docker env and try to run the training in the form of a .py script.
Command to pull NGC docker :
FROM nvcr.io/nvidia/tlt-streamanalytics:v2.0_py3

I have converted the jupyter notebook used for trainig into a python script(which I am attaching for your reference).
detectnet_v2.py (26.0 KB)

The training works properly, but gives the following error in the evaluation step :

2021-01-04 04:51:19.838020: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-01-04 04:51:19.838031: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0
2021-01-04 04:51:19.838041: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N
2021-01-04 04:51:19.838159: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7274 MB memory) → physical GPU (device: 0, name: GeForce RTX 2060 SUPER, pci bus id: 0000:01:00.0, compute capability: 7.5)
INFO:tensorflow:Running local_init_op.
2021-01-04 04:51:20,118 [INFO] tensorflow: Running local_init_op.
INFO:tensorflow:Done running local_init_op.
2021-01-04 04:51:20,165 [INFO] tensorflow: Done running local_init_op.
2021-01-04 04:51:20,663 [INFO] iva.detectnet_v2.evaluation.evaluation: step 0 / 30, 0.00s/step
2021-01-04 04:51:21.042845: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally
2021-01-04 04:51:21.615459: E tensorflow/stream_executor/cuda/cuda_dnn.cc:334] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2021-01-04 04:51:21.618820: E tensorflow/stream_executor/cuda/cuda_dnn.cc:334] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
Traceback (most recent call last):
File “/usr/local/bin/tlt-evaluate”, line 10, in
sys.exit(main())
File “./common/magnet_evaluate.py”, line 38, in main
File “</usr/local/lib/python2.7/dist-packages/decorator.pyc:decorator-gen-2>”, line 2, in main
File “./detectnet_v2/utilities/timer.py”, line 46, in wrapped_fn
File “./detectnet_v2/scripts/evaluate.py”, line 126, in main
File “./detectnet_v2/evaluation/evaluation.py”, line 156, in evaluate
File “./detectnet_v2/evaluation/evaluation.py”, line 116, in _get_validation_iterator
File “/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py”, line 929, in run
run_metadata_ptr)
File “/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py”, line 1152, in _run
feed_dict_tensor, options, run_metadata)
File “/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py”, line 1328, in _do_run
run_metadata)
File “/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py”, line 1348, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
[[node resnet18_nopool_bn_detectnet_v2/conv1/convolution (defined at /opt/nvidia/third_party/keras/tensorflow_backend.py:93) ]]
[[node strided_slice_34 (defined at ./detectnet_v2/model/utilities.py:53) ]]

Any kind of insights is appreciated.

Thanks !

Can you run tlt-evaluate successfully in the default jupyter notebook without your detectnet_v2.py?

More, please refer to How to resize KITTI dataset images and labels - #9 by xhuv_NV and
Error with cuDNN when attempting to perform inference after training an SSD model with TLT

@Morganh Yes , I am able to run the default jupyter notebook for Detectnet_v2. Also, something worth mentioning here is that the analogous python script for Yolo and SSD model work perfectly fine inside my docker environment.

Please refer to above link I mentioned.

I found a solution based off of another link related to the above link you provided.
Adding this to the dockerfile works perfectly and allows the evaluation script to run:
ENV TF_FORCE_GPU_ALLOW_GROWTH=true

Thanks for your help @Morganh.