CUDA Occupancy Calculation How to understand the occupancy?

NVIDIA Tesla D870+ AMD Athlon Dual Core 4400+, Fedora 7
program: matrixMul
I used 44 tiled, 88 tiled, 1212 tiled and 1616 tiled.

I just tested the CUDA Occupancy using CUDA Occupancy calculator and get the following results. 88 tiled and 1616 tiled both have the same occupancy 67%. Occupancy can affect the effect of different tile size? I don’t understand how to choose tile size in terms of occupancy.

in general you can say that a higher occupancy gives the device more possibilities to hide the latency of memory accesses. So you can use the occupancy calculator as a guide to choosing your blocksize.
It is however always smart to also measure how long a kernel takes for different blocksizes, because there can be other factors playing a role in the performance you will get.