Parallel execution of kernels C#

I came across the problem that kernel launches will fail, if they are executed parallely. Is this possible or might the problem be somewhere else?

I’m calling functions of a dll containing CUDA code from a C# application. The dll contains several filters.
I tried to execute the filters within a parallel for (threaded) and in a “normal” for loop. I get strange behaviour (mem copys or kernel launches fail randomly) in the parallel for loop.

Is there a way around it or does the CUDA framework simply not allow to execute it in parallel. (I know it doesn’t make much sense, since the kernels are already executed in parallel but still I’m just wondering).

CUDA allows execution from parallel host threads, but the DLL needs to be prepared for that. Consult the documentation of the DLL whether it is thread-safe.