Allocating space in global memory from device

Hello:

Is it possible to dynamically allocate memory from within code that runs on the device, or is my only option to cudaMalloc from the host and copy to the device before calling my kernel? I tried to call cudaMalloc from within a device function I wrote, but the compiler complained that I cannot do that, since cudaMalloc is supposedly a host function, and only callable from the host.

Thanks!

yes, you cannot allocate memory on the device. Use atomicAdd to grab blocks of memory if you don’t know how much each GPU block will consume.