Is there any tool that allows us to calculate the occupancy of the device akin to the Excel sheet that we have for a windows installation?
OpenOffice runs the XLS sheet just fine.
As it just calculates the concurrent runnable threads on a multiprocessor you can easily calculate yourself or build you own spreadsheet.
You have the count of registers your kernel uses from the cubin file?
Divide 8192 by this number and 32 and you have the runnable warps (forget the decimal places).
Multiply the shared memory from the cubin file with the number of threads per block. Add the dynamically allocated shared mem to this result.
Now divide 16000 by this result.
You get the number of concurrent kernels. Now again multiply this by the threads per kernel and divide this another time by 32. There you are with the amounts of warp.
Take the minimum of both calculations.
I think the last number should be 16384.
But as said, openoffice has no trouble opening the occupancy calculator sheet, which gives you a nice graph to choose your blocksize optimally (when shared memory does not change with number of threads per block)