Use of cudaMalloc in a kernel

Hi, I would like to allocate memory on the device directly within a kernel, but seems that cudaMalloc is not the right function.
When I run my kernel in emu mode (So far I didn’t tested on the device, but I hve doubt that it will work) it hangs as soon as i call the cudaMalloc.

How can I manage device memory within a kernel?

You can’t.