Pytorch stack overflow - conv1d.forward in DLL on CUDA

I faced a problem with calling conv1d.forward in DLL on CUDA. In the following C++ code snippet line 4 ‘conv1d.forward’ crashes with stack overflow. The full CPP file (27 lines) is at the bottom of this message.

auto Net = torch::nn::Conv1d(torch::nn::Conv1dOptions(21, 2, 3));
Net->to(device);
torch::Tensor X = torch::rand({ 5,21,25 }).to(device);
torch::Tensor Y = Net->forward(X);

Having experimented on two PCs with different GPU types I found the problem is consistently reproduced if all of the following criteria are met:

  1. DLL. The same code in console EXE application runs normally.
  2. GPU/CUDA. There is no problem running the same code on CPU
  3. Convolutional layer. No problem with other layer types, e.g. linear.

I tried debugging DLL in Visual Studio (debugger window screenshot attached). Call stack suggests that stack overflow happens inside cudnn_cnn_infer64_8.dll module. This is part of Nvidia CUDNN library.

I am not sure if this error is part of Pytorch or Nvidia CUDNN. If anyone has any suggestion on how to resolve this please respond.


TestDLL.cpp (640 Bytes)

Hi @acheglakov ,
Can you please share the detailed logs with us?
Thanks!

UPDATE: this was my mistake. I used rundll32.exe to execute my DLL. Apparently, as rundll32 was originally designed to run Windows modules it is not suitable for custom-made DLL libraries.

My DLL runs normally if called not by rundll32.