cuda-gdb for non-running blocks

How do we switch to blocks that are not running?
It looks to me that there can be only up to 16 or 32 active blocks concurrently.

I tried ‘block (793,0)’, but it kept giving me block (15,0)
Thank you for any input.

The maximum number of concurrent blocks depends on your device and your kernel resource requirements. Once started, a block in CUDA runs until completion, and a block doesn’t start until the scheduler dispatches it to a SM. Blocks are not time-sliced like processes on a CPU, which would explain why you can’t switch to it.

I haven’t used cuda-gdb much. Is there a way to set a breakpoint that triggers once block 793 starts?

Thanks seibert.
Yes, the shared memory, as well as available registers restraints the number of blocks (or threads) running concurrently on the GPU.

I kinda solved my previous problem, where threads in block 793 kept giving me NaN. It seemed I didn’t initialize all the array elements. So some were random ones, giving me really large numbers.

Now, another problem. Basically I am implementing a PDE solver on GPU. For a small problem size, there are no NaN problem (the Error), and it converged well. But for a larger problem size (not that large), it kept complaining NaN. I shrink the time step a bit, and it went a little further, support a larger problem size.

So my question is, will the GPU ruin the convergence feature of a numerical algorithm?

Thank you for inputs.

The finite precision of all floating point is what matters here, and if you are accustomed to running things in double precision on the CPU, but then switch to single precision on the GPU, you can see differences. (Well, you can see differences in lots of ways, but these will be bigger differences.)

It really depends on how your PDE solver algorithm deals with round-off error.