Trying to improve occupancy

Have a kernel which I’m launching with the block size 16x16x1. Each thread uses 15 registers. Since I’m running on compute capability 1.1 this mean that I have 8192 registers available and thus an occupancy of 0.667 would be possible. However when using the profiler I only get an occupancy of 0.333. I there a reason for this?

Shared memory usage, probably.

Yepp, you’re right, don’t know why I missed it.