Hello,
I am posting there because I spent too much time looking on this forum and the rest of the Internet, and could not find an answer to my problem.
So I have of software that loads data in device memory, and then runs a kernel, then fetches back data. The “classic” use of a GPU.
But recently, after changing who knows what (I will come back on that later), none of my kernels would load.
Specifically, I get the “cudaErrorInvalidDeviceFunction” when I try to launch them. To make sure it was not something else entirely, I tried to call cudaFuncGetAttributes on the kernel and I got the same error. I also tried to comment all the code inside the kernel I’m calling, but it wouldn’t work either. So my guess is something is preventing the kernel to load.
The issue is that I did not change a thing in the kernels (or the device functions they call) between the moment it worked and the moment it stopped working. I can compare an earlier, working version with the broken one and I honestly don’t get what could have triggered that.
What I changed was: setting up memory pooling (that I disabled when I noticed the problem with the kernels), grouping allocations and copies to the same places (and only that, no big reorganization of code).
The code is running on an A1000 GPU with CUDA 12.5 installed (compute capability is set to 86).
Thanks for any help or suggestions