When I use the CUDA profiler to optimize my code, the occupancy is always 0 or 1, nothing in between. My colleague that has a similar computer can however get floating point values like 0.25 or 0.5, has anyone encountered this problem?
When I use the CUDA occupancy calculator and try to set the number of threads per block to something bigger than 512 I get “non valid value”, even if I select compute capability 2.0. As far as I know for example the GTX 480 can run 1024 threads per block…registers per thread and shared memory per block can however be set to any value, like 50000000000000000. I use open office calc and not Microsoft office excel, if it makes any difference.