CUDA per-thread and cudnn behaviour

I’m using --default stream per-thread in order to issue kernels from 2 host threads in 2 non-default streams. All works fine until I want to use cudnn in both threads because any call to cudnn api will execute that api on the default thread.

I would like to run each cudnn api in the stream associated to the host thread which made the cudnn api call. I know I can use cudnnSetStream() to set a non-default stream but I need to get the stream that is associated to the host thread in order to pass it to cudnnSetStream().

How do I get, on the host side, the stream that is associated by cuda to the current host thread from which I want to call the cudnn api?

Thank you.

cross posting:

https://stackoverflow.com/questions/46244238/cuda-stream-per-thread-and-library-behaviour

in a nutshell, using cudaStreamPerThread:

cudnnSetStream(handle, cudaStreamPerThread);