I have several simulations that run inside CUDA kernels parallelly, and I want to perform some neural network computation, so the first thing I can think of is cuDNN. But after studied for a while, I realized that cuDNN has no CUDA device interface, which means it can only be called in the CPU code.
Is there a way to use cuDNN without leaving the CUDA kernel? Thank you!