This is somewhat a beginner question. Please bear in mind that I am still quite new to CUDA programming.
I’ve got a Quadro K1000M in my laptop and when I queried the specs with cudaGetDeviceProperties() I got:
Max threads per SM: 2048 -- Num multiprocessors: 1
Given that my GPU has just one SM does it make sense to generate more than 1 block when running my kernel, i.e.
Considering that the kernel does not use shared memory as threads do not need to communicate.