I’ve got two kernels (K1 and K2) doing approximately the same thing.
Both take similar time (less than 1% difference).
However, the occupancies are 0.66 for K1 and 1 for K2.
What does that mean? Which kernel do I have to choose?
It probably means that your kernels are both memory-bandwidth bound. Choose the one with the simplest code, or the one with the highest occupancy, as it is likely the better kernel when cards with higher bandwidth will come along.
occupancy is the ratio between concurrent running warps of your kernel and the maximum number of concurrent warps (currently 24).
So occupancy 1 means you have as much threads/warps per multiprocessor running as it is possible by now.
In memory bound kernels, an occupancy increase from 0.33 to 0.66 usually results in a performance increase. It has been reported by many, including NVIDIA employees, that optimizing beyond 0.66 rarely offers any gain. My own optimization experiences corroborate this.