Hi, we retrained detectnet_v2 using the docker image “Transfer Learning Toolkit for Video Streaming Analytics”. The retraining process worked fine, and our new models worked great in DeepStream. The problem is this:
In NVidia’s Jupyter Notebook, in step 4 and 7, the “Evaluate Trained” and pruned models, we are getting a fatal error and evaluation is failing:
Unknown Error (see above for traceback): Failed to get convolution algorithm. This is probably because cuDNN failed to initialize. We think the issue is :
Caused by op u’resnet18_nopool_bn_detectnet_v2/conv1/convolution’, defined at:
File “/usr/local/bin/tlt-evaluate”, line 8,
We suspect the issue is with CUDNN. We tried installing CUDNN directly into the container (as it appears CUDNN is not in the container and it is not being passed through docker), but it did not solve the problem.
Clearly this model is an engineered by NVidia, we are trying to find a way to evaluate trained models within the notebooks. Incidently, other models like Yolact 3, don’t suffer have this problem.
We are running Driver Version: 450.51.05 CUDA Toolkit: 11.0 (hence, CUDNN is 8.0.2), when launching the container, we are using --gpus all - and it clearly is working for model training in the notebook.
Suggestions? Cheers - G