I’m deploying my inference software using openCV multi-backend API.
I’m happy to support Nvidia backends, at the cost of distributing ~200Mb of additional files (cudnn_cnn_infer64_8.dll and cudnn_ops_infer64_8.dll) in addition to ~100Mo of Cuda 10.x dlls (cudart64_102.dll, cublas64_10.dll, cublasLt64_10.dll)
But I found out that if I want to support latest Geforce RTX 30x0 I need to ship with Cuda 11.x
Now the problem is that the 2 aforementioned cudd dlls for Cuda 11 are 800Mb big!
What happened? Is there a way this could be split in smaller, more granular packages? I don’t need support for float16 or int8 inference for instance, could we save space without these kernels?
Thanks for your message. We agree that the growth of the cuDNN DLL size is problematic, and we’re working on resolving this. E.g. you may have noticed that the size of cudnn_ops_infer was reduced by about 70% from 8.2.0 to 8.3.0. We know this doesn’t completely solve your problem – just mentioning it to point out that we’re working on it.
To answer your specific questions:
What happened?
As we’ve added more capabilities to cuDNN (e.g. new GPU architectures), the library size has grown.
Is there a way this could be split in smaller, more granular packages? I don’t need support for float16 or int8 inference for instance, could we save space without these kernels?
We are exploring various options for splitting the library further. It’s useful to know that your use case would benefit from splitting based on data type.
Thank you very much for your reply.
I had not noticed the dll reduction since I’m still shipping wiht Cuda 10. ops_infer is indeed much smaller, but cnn_infer has grown to 737Mo. But it’s a great news knowing that you’re working on it.
Do you confirm that there is no other way to support GTX 3080s?
Do you have any timeframe for smaller test release?
Will you share on this forum which options you have when it comes to splitting the library?