Worklist based algorithms on GPU

Two questions regarding this –

Suppose I have a worklist based algorithm in which each thread block generates a set of new values(new work) to work on - the number of values generated isnt fixed but maximum is equal to number of threads. This goes on till the worklist is empty.

Q1 – These kind of situations may hog a lot of Global memory If i make the most conservative guess on memory allocation for worklist and do memory allocation from CPU. Will in-kernel Dynamic mem alloc help in this case (which is availabel on 2.0+ computed devices). What are the downsides of using in-kernel dynamic mem allocaiton? Whats the smart alternative on a GPU?

Q-2 – Also whats a smart way to implement worklist type algorithms on GPU - where one run of kernel generates more work. I am looking for something other than re-running the kernel again on the new work - since that would require taking all the worklist data out to CPU first and then re-injecting it into GPU on the next kernel call.

Thanks
Sid.

Thanks
Sid

Two questions regarding this –

Suppose I have a worklist based algorithm in which each thread block generates a set of new values(new work) to work on - the number of values generated isnt fixed but maximum is equal to number of threads. This goes on till the worklist is empty.

Q1 – These kind of situations may hog a lot of Global memory If i make the most conservative guess on memory allocation for worklist and do memory allocation from CPU. Will in-kernel Dynamic mem alloc help in this case (which is availabel on 2.0+ computed devices). What are the downsides of using in-kernel dynamic mem allocaiton? Whats the smart alternative on a GPU?

Q-2 – Also whats a smart way to implement worklist type algorithms on GPU - where one run of kernel generates more work. I am looking for something other than re-running the kernel again on the new work - since that would require taking all the worklist data out to CPU first and then re-injecting it into GPU on the next kernel call.

Thanks
Sid.

Thanks
Sid

Nvidia has used such a scheme to speed up raytracing. Google for “cuda persistent threads” to find more info.

Nvidia has used such a scheme to speed up raytracing. Google for “cuda persistent threads” to find more info.