32-256+ different process running in parallel

I have an application I want to develop which required 32-256+ different process running in parallel. I need performance so I’m looking at using a C-1060.

  1. Is think this possible with a C-1060?
  2. If so where do I begin, “I need some help in getting started” what should I read I’m trying not to reinvent the wheel again.

Thank In Advance

You can’t run “different processes” in parallel in CUDA. You can only run a single “process” at a time in a massively parallel fashion - basically many thousands of threads of a single computational kernel of some kind.

To be fair, you could probably run a kernel with one thread per block where each block performs a different task. Still, it’s unlikely that this will be a good way to use the GPU.

If by “different processes” you mean you need to run the same, serial task, 32-256 times in parallel, then that may also be a candidate for using CUDA.

If these processes all follow very divergent execution paths, then CUDA is not a good option (say, lots of if/else statements where each process will likely take a different path).

CUDA is fastest when groups of 32 processes all take the same execution path, then you get the full benefit of SIMD execuion. As soon as you get divergence, performance will be reduced.