I have a GTX 480 on which I want to run some
CUDA kernels concurrently.
Is there any specific compile options I need in
addition to making sure the kernels I wish to
run are in separate stream contexts?
I have tried to run the same kernel code operating
on different data regions in different stream contexts
so as to have them scheduled concurrently. But when analyzing
the streams in Parallel Nsight they seem to be serialized
instead of being run concurrently.
Any help gratefully received.