Could someone please tell me how to allocate shared memory in a global or device functions?
I noticed that we can allocate a block of memory with a certain size when we launch the kernel, but I did not find a way to use this block of memory in a more controlled manner. For example, in a global function, I want to allocate an array, whose size is only known in run time, in shared memory space.
Furthermore, in this global function, I want to call a device function, where it is necessary to allocate its own memory (again the size can only be determined in run time). Is it possible with the current CUDA structure?