Cudnn8 crash with stack overflow

interface: cudnnConvolutionForward
cuda:11.4
cudnn:8.24
Windows10, RTX2080Ti, driver version:471.41
I am trying to update my project from cuda10.2 to cuda11.4, I didn’t change any code. But the program crash with stack overflow by cudnn8 interface cudnnConvolutionForward.

Unhandled exception at 0x00007FFC9A31A3C8 (cudnn_cnn_infer64_8.dll) in PerformanceTool.exe: 0xC00000FD: Stack overflow (parameters: 0x0000000000000001, 0x0000004067303000). occurred

Hi @lei.yan
Can you pls share the detailed logs with us for better assistance.

Thanks!

Hi, Thank you for your reply. My program didn’t output logs, does cuda have function to print logs?
My program is multiple cpu threads and create some cuda streams, cudnn handles for every cpu thread.
And call functions:
cudnnFindConvolutionForwardAlgorithmEx,
cudnnGetConvolutionForwardWorkspaceSize,
cudnnConvolutionForward etc.
The program crash with stack overflow. Does cudnn stack size changed from cudnn7 to cudnn8?

My program didn’t output logs, does cuda have function to print logs?

cuDNN has API logging, which you can enable with environment variables or by using the API.

Thank you for your support. I got the logs file for cudnn7(OK) and cudnn8(NG).
Please help to check. Thanks a lotcudnn_log.zip|attachment (1.3 MB)

Hi @lei.yan ,
Looks like you may need to share the attachments again.
Thanks!

Crach happens in my new laptop:

cudnnCreateConvolutionDescriptor(&conv_desc) triggers the failure.

Information from the debugging outputs:
0x00007FFD5B3A62BD (cudnn64_8.dll) (RQNetd.exe 中)处有未经处理的异常: 请求了严重的程序退出。

Information from console:
Could not load library cudnn_cnn_infer64_8.dll. Error code 126
Please make sure cudnn_cnn_infer64_8.dll is in your library path!

Environments:
Windows 11
Visual Studio 2022
Cuda 11.5
Cudnn 8.3.0
RTX 3060

1 Like

cudnn64_8.dll has error.
Cuda 10.2 and Cudnn 8.0.2 is right

sounds crazy but this worked for me and one other… Nvidia should be ashamed !

copy zlibwapi.dll- from “C:\Program Files\Microsoft Office\root\Office16\ODBC Drivers\Salesforce\lib” (other guy installed using Microsoft 365 x64 in windows 11 but I already had it) and copy pasted this file into “C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.6\bin”

hey presto … no more crash when calling cudnnCreateConvolutionDescriptor

now you wouldn’t have guessed that one !

3 Likes

Copying zlibwapi.dll worked for me too. In hindsight it is explained here

For reference I only encountered the crash after upgrading from Visual Studio 2019, Cuda 11.2 and cuDNN 8.2 to VS 2022, Cuda 11.7 and cuDNN 8.4.

1 Like

The same question bothered me ever. Confirm that you install the correct package “Microsoft.ML.OnnxRuntime.Gpu” rather than “Microsoft.ML.OnnxRuntime”!

Hello.

I can confirm that this crash still happens on Windows 11!

Gary, thanks man - your solution worked for me! I could have spent days reinstalling toolkits and drivers. My crash was also in cudnnCreateConvolutionDescriptor call and fixes after I copied zlibwapi.dll and libs (downloaded with cuda toolkit) in CUDA install folders.

I use laptop with:
RTX 3050TI
Cuda11.8
Cudnn 8.6.0/8,5.0
Visual Studio 2017,2022