lmem overrun ?

Hello, trying to run some code that uses gup vram on the device, I get the error
Cuda error: Kernel execution failed in file ‘CUDAprog.cu’ in line 54 : out of memory.

the cubin lists the following requirements for the function

code {
name = muFync
lmem = 163840
smem = 28
reg = 28
bar = 0
bincode {…}

why would I run out of memory? I’m using less than 256 threads so it’s not the register usage either

It has been 4 years since your query, but I hope this new post is relevant for your question: Why does a simple single-threaded CUDA kernel consume large amounts of global memory? :smile: