Shared memory and register usage - just 1 thread/block


I want to run just 1 thread/block. My program has too much of branching factor and no use for parallelism!!.
(data compression)

But anyway, I want to know if its possible to run just 1 thread/block, and 30 blocks in 30 different SM’s simultaneously.

(Each of my blocks will use the same entry point, but will do completely different tasks - with different memory access.
i.e different algorithms altogether with different data!!)


If i have only 1 thread, can i use the entire Shared memory?? i.e approx 16KB, and will the compiler ‘use’ the entire
64KB of registers for optimization. This usage of registers is very important for me… coz my datasize is only 16KB/32KB!!


this will be much slower than any cpu implementation, even an order of magnitude, all latencies will not be hidden, and even if they were hidden, it would be slower…

i think registers are per core, so at best you could use 2048 registers, but it’s possible that for one thread 512 regs are max - there will be whole warp constructed with 31 idle threads.