is cudaMalloc not suppose used in a cuda kernel? when i do this, i get error code from cudaMalloc, which is 999, i remember i am able to use it in a kernel before.
You can use cudaMalloc
in device code. In that case, it behaves in a similar fashion to in-kernel malloc
or new
, allocating from the device heap.