Question about CUDA profiler and CUDA occupancy calculator

When I use the CUDA profiler to optimize my code, the occupancy is always 0 or 1, nothing in between. My colleague that has a similar computer can however get floating point values like 0.25 or 0.5, has anyone encountered this problem?

When I use the CUDA occupancy calculator and try to set the number of threads per block to something bigger than 512 I get “non valid value”, even if I select compute capability 2.0. As far as I know for example the GTX 480 can run 1024 threads per block…registers per thread and shared memory per block can however be set to any value, like 50000000000000000. I use open office calc and not Microsoft office excel, if it makes any difference.

Using openoffice 3.1 the Excel table works fine for me, graph is updated, etc… Are you editing only sections 1) and 2) as indicated ?

The profiler output seems weird. Which parameters do you use to launch your kernel and get occupancy 0 ?

Yes I also use openoffice calc 3.1

I get occupancy 0 for a lot of different kernel parameters, but I know that in several cases the occupancy is really 0.5