I’m fairly new to using CUDA and while a lot of it seems pretty straightforward (thank goodness for the people who made GPU parallelization this easy!) I feel that I’m in the dark about getting a healthy ratio between blocks and threads (or whether it even matters).
Here’s the general layout of my application-to-be:
Basically, it’s a 1-, 2-, or 3-D fluid dynamics grid solver. So at some point I want a thread to run for each cube in the grid. The order in which the grid cubes are calculated is not important. I’m starting with the 1-D case now which could have either 2 or >70,000 grid cubes, so I don’t want to set an absolute number for number of blocks in the kernel call. Would a reasonable technique be to use a series of IF statements to decide whether I’ll have one thread per block, 128 threads per block, or 512 depending on the total number of threads required?
For all I know this issue is more or less unimportant, but minimizing runtime is the objective, and I’m afraid that since I don’t have much formal CS training, the theory of optimizing hardware usage is somewhat opaque to me.