I am new to CUDA and I would be highly obliged if you guys could take some time out to answer my query. Am I allowed to use malloc in a cuda kernel function?
I need to dynamically allocate memory in my device code. I am using cudaMalloc and cudaMemcpy in my host code so that the host can transfer the input to the device. I am also allocating memory for the output from my host code using cudaMalloc. But how do I allocate memory for the " working space" for each thread?
How do I dynamically allocate per-thread local memory?