PTX assembler output is incorrect ?


I was tweaking my Ray Tracer and so I wanted to know the register and local memory usage for different configurations…I compiled it with



It says 50 registers in one case and it still runs perfectly.

Why is this the case ?

This is the exact ptxas output:

1>ptxas info    : Compiling entry function '_Z13cudaRayTracerPiii6float3PKf'

1>ptxas info    : Used 50 registers, 8448+0 bytes lmem, 48+48 bytes smem, 212 bytes cmem[1], 8 bytes cmem[14]

Program runs correctly.



I think I’m missing something. What’s the problem with the compiler using 50 registers? That’s more registers than I’ve seen a kernel use before, and it means you can only have 160 threads active per multiprocessor. Otherwise, seems fine to me.

One of my raytracing kernels is using more than 60 registers. I use 64 threads per block and it is working ok also.