I want to run just 1 thread/block. My program has too much of branching factor and no use for parallelism!!.
But anyway, I want to know if its possible to run just 1 thread/block, and 30 blocks in 30 different SM’s simultaneously.
(Each of my blocks will use the same entry point, but will do completely different tasks - with different memory access.
i.e different algorithms altogether with different data!!)
If i have only 1 thread, can i use the entire Shared memory?? i.e approx 16KB, and will the compiler ‘use’ the entire
64KB of registers for optimization. This usage of registers is very important for me… coz my datasize is only 16KB/32KB!!