Dynamically assign array memories in my _global_ code ?

I recently found this won’t work in my global CUDA C++ kernel that I plan to compile and later to be called in Matlab ( so there is no host code in CUDA C):

int M = 10;
float V[M]; or if I were to import M value from the matlab host code.

I really need to write in the way of V[M] because M changes. I also tried operators new, malloc, and cudaMalloc, but it seems they can only be used on host code, not in the global, though perhaps cudaMalloc can but some parameter has to be adjusted (sm_20 or something) in the host code to enable the device code to use cudaMalloc. But since I don’t have host code in CUDA C but in matlab, I can’t do that.

Is there any way to dynamically assign array memories in my global code?
Very much look forward to your advice, thank you so much!

You can use malloc as you do en C/C++, however, each thread will allocate M integers. I don’t know if you want just one V vector, shared between all threads, or you want one V vector for each thread.
If you want one Vector, I think the best way is to allocate it in Host code, and then send it as parameter to the kernel. Matlab should provide you some API to do that, for sure.

I found that:

I hope it helps you. I’m sure there are many Matlab functions to help you to use CUDA.