I have two questions about CUDA streams on GTX 480 and OpenMP
Can I load data on/from the GPU using the OpenMP threads and CUDA streams in parallel? Is it also possible to use OpenMP to launch the concurrent kernels on different streams? If those two options are not allowed, is it because the GPU/CPU communictations can only be done through thread0.
I have a square SIMD algorithm, that is to say an SIMD (SIMDB B for Big) that calls another SIMD (SIMDS S for Small). I want to know whether it is more optimal to use one big loop (which is the SIMDB) that launches the kernel that performs SIMDS, or I should reduce the length of the loop which launches concurrent kernels on the different streams? In my opinion, it depends on the memory size but I am not sure how.
Thank you for your response