Platform is Jetson Nano. I have a single call to malloc in my CUDA kernel which is only called once. This call cannot allocate more than 2MB of memory. It fails at 4MB.
HOWEVER, when I use cudaMalloc from the host code, it can allocate much more than 2MB.
Why does malloc in device code not work for large quantities of memory?