cudaMalloc can not be used in a kernel?

is cudaMalloc not suppose used in a cuda kernel? when i do this, i get error code from cudaMalloc, which is 999, i remember i am able to use it in a kernel before.

You can use cudaMalloc in device code. In that case, it behaves in a similar fashion to in-kernel malloc or new, allocating from the device heap.