Need help understanding this function __device__ parameter for function

How does this work actually ?
does it create a new thread if we call function with device parameter ?
or is it just work as standard function, which means it only do it within the same running thread ?
cause i’ll be needing this to split an array into two for quicksort design .

cause from what i’ve learnt, recursive function inside GPU would be a very expensive iteration.
might as well do it using standard CPU recursive function

anyway, my idea of sorting is that it’ll split an array into two at first (which means create only a single thread for it)
after split, the array will become 2 parts, and basically, it will continue until the point that all variables are sorted out.

the growth would be 1 -> 2 -> 4 -> 8 -> 16 and so on. that’s the number of threads running .

is it possible to use the device to create a new thread, or is there any workaround for this implementation ?

here’s a picture of how quicksort works . just to make it clearer .

device functions are executed by the calling thread. No method exists to create new threads on the device once a kernel is launched.

then there’s no point of doing heavy recursive-based method on the GPU right ?
its a very expensive operation for the GPU i guess .