In my kernel, I am required to read a dynamically allocated array (allocated using host code). However, it so appears that a dynamic array can only be allocated in global memory ? This is very inefficient for my kernel. Any solutions ? I know we could use shared memory, but even there, I will need to know the tiled array size that I want to read beforehand, which I do not know.
Also, is there a way I could allocate dynamic memory on my device code ? In any examples, I see that the dynamic array has been allocated using the host code only.
Only global and constant memory are accessible via the host. You can dynamically allocate (outside the kernel) memory in global and shared memory (in shared memory via a parameter to the kernel call). You can’t dynamically allocate constant memory.
You can cache reads from global memory using textures (but note that that is read only non-coherent cache).
If you access each memory location once, and can perform coalesced accesses, than directly reading global memory is about the best you can do.
Could you let me know in slightly more detail, how I could allocate more than one variable dynamically in shared memory ? An example would be really appreciated…