How to execute multiple cudnn-forward function concurrently

Hello everyone,

Recently I want to achieve multiple forward executions of multiple tasks at same time, I have tried to assign the corresponding cudnn handles to different cuda stream, but the functions are not executed concurrently as expected.

Anyone has suggestions on this problem?

Many thanks!