Is register allocation granularity per warp, in cc 5.2 and 6.0?

If yes, it means that if my blockDim.x=55 and each thread uses 100 registers, I will need 64 (NOT 55) * 100 = 6400 registers per block?

Adding --ptxas-options -v when compiling using nvcc, and it outputs the detailed register usage. You can then check the exact register allocation for a thread block with 55 threads and each thread uses 100 register.

Yes. Check out the Occupancy Calculator if you are unsure about any parts of the calculation.