Hello everyone,
Recently I want to achieve multiple forward executions of multiple tasks at same time, I have tried to assign the corresponding cudnn handles to different cuda stream, but the functions are not executed concurrently as expected.
Anyone has suggestions on this problem?
Many thanks!