I have a problem, and i am trying to resolve it.
I want to use stack for each thread in kernel, the stack size will be decides by each thread.
but i don’t know how to dynamic allocate local memory inside kernel.
if anybody know or has any suggestion, please help me.
It is simply not possible to allocate dynamically memory on the device inside a running kernel. The memory is allocated through the host.
And if I’m not wrong, cuda does not implement a stack as you have on the host/cpu side.
This thread has some information about maintaining a local stack, complete with some example code:
It looks like the MAX DEPTH of the stack has to be known ahead of time (since you can’t dynamically allocate memory).
Thanks @Tobi_W and jph4599. :)
I staticaly allocated local memory for each threads. and I use a variable for indexing element of array (same as stack). It works correctly.
But I must paied for waste local memory because i had no way to do.