cudnnGetVersion() function crashes and I don’t understand what’s the point with it.
Could you please take a look at the dump by this link to see the reason? Unfortunately without sources/pdbs I’m unable to figure it out on my own.
I don’t fully understand what should be done from that thread. From what I see there are two thigs suggested:
update the driver
reboot PC
So our QA member done both of them, it is not working, the app still crashes, which was expected since, respectively,
the problem is found with different drivers on different PCs (QA member updated, to be precise from 461.40 to 461.72)
the prolem takes place on different PCs (but not any PC, which is unfortunate in debugging terms)
I was not clear in the initial message: the problem happens on Windows 10, not Linux, so there is no GCC, the compiler is MSVC. It’s also worth emphasizing, that crash is actual crash, there is no error message/error status. That’s why I have the crash dump from the initial message.
I recommend opening the dump presented in the link from my first message. It seems to be the easiest way in my opinion to figure out the cause. cudnn.dll is on top of the call stack, so you should be able to see the cause of what we are doing wrong, if anything.
Hi @arutyunovg , getVersion does nothing it just returns the cudnn version. The most likely reason for the crash is not all the necessary libs are found by the binary. Note that in addition to cudnn.dll, there are also several sub libraries like ops_infer/ops_train/cnn_infer/cnn_train/adv_train/adv_infer
That’s true. We had such crashes before, but figured out those runtime dependenices on our own.
This crash has something to do with resource consumption. If we allow GPU to have resources remaining, in particular memory, then there are no crashes.
I have a guess, that you are doing some initialization inside, which does not have enough resources to pass. If my guess is true, what is the amount of memory, you recommend us to have before calling cudnnGetVersion?
Also, was you able to open the dump or you have some issues with it? Basically, if you have symbols for the cudnn dll, you should be able to see call stack and, therefore, exact crash reason based on it, so we don’t have to guess it anymore.