I have a cuda kernel which is not delivering the performance I expected. I checked the cubin file for the memory usage (56 bytes of local memory is used even though I didnt force a register cap on the kernel):
lmem = 56
smem = 144
reg = 21
bar = 0
My kernel is listed below. Can anyone let me know why lmem is used in my kernel? Thanks,