Stream Compaction Sample code needed

grangerfx · September 9, 2007, 7:30pm

As mentioned in another thread, the topic of Stream Compaction was discussed in GPU Gems 3 in section 39.3.1 on page 866. Unfortunately there is no sample code to go along with this section. Stream compaction is based on the Scan algorithm. The SDK comes with projects that implement scan for large and small arrays which I am sure can be modified to implement stream compaction.

Stream compaction is an important algorithm. It can be used to solve a common problem for GPGPU programming. If you have threads that need to execute a different number of times to generate data, it would be more efficient to run them for a set number of times, generate a list of threads that require additional processing time and then continue processing just those threads.

A simple example where this would be helpfull is a Mandelbrot generator (like the one I posted). It would make sense to run each pixel’s Mandelbrot loop a set number of times. The unfinished pixels could then continue processing for another 100 iterations until iteration limit is reached or all the pixels have completed. Between each pass, stream compaction could be used to gather the IDs of the unfinished threads for additional processing.

A more common example where stream compaction could be use to optimize processing is adaptive sampling. The Mandelbrot generator I posted uses this technique to anti-alias only the pixels that need it. Unfortunately due to the GPGPU architecture, it will end up being a lot less efficient than it could be. The 32 threads in a warp will have to wait if just one of their pixels is outside the adaptive tolerance and needs additional sampling. Stream compaction could be used to generate a list of aliased pixels that could use additional sampling. In this case, stream compaction needs to be done only once and then additional passes can resample the same set of pixels in multiple passes while there is still time remaining in the frame.

I believe that there are many other cases where stream compaction will be useful to optimize algorithms. I think that it should probably be implemented in one of the CUDA libraries in addition to having SDK examples. I imagine there could be a library function that given a base address, stride and data size(1, 2 or 4 bytes) could generate a separate list of indices to the data elements. You could also allow threads to exit with a flag to indicate whether they need additional processing and then automatically peform stream compaction and then re-launch those threads. The threads themselves could control when they exited for additional stream compaction.

The previous thread that mentioned stream compaction for reference:
[url=“The Official NVIDIA Forums | NVIDIA”]http://forums.nvidia.com/index.php?showtopic=38410[/url]

-Mark Granger
New Tek

Simon_Green · September 10, 2007, 2:51pm

I agree, stream compaction is a very useful primitive for data parallel computing. It’s pretty simple to implement once you have a fast prefix sum, but we’ll see what we can do about putting an example in the SDK.

I recommend reading this paper from Graphics Hardware this year if you haven’t seen it already:
http://graphics.idav.ucdavis.edu/publicati…_pub?pub_id=915

I believe these guys are planning on releasing an open-source library (CUDPP) based on this work, but I’ll let Mark comment on that.

NotSoNewCudaDeveloper · July 13, 2011, 12:27pm

Ok…so I am new to CUDA and frankly I dont get the “scatter through gather” part of the stream compaction algorithm presented in Gem.What is exactly a parallel binary search means?

Could please explain this part…as in explaint it for dummieS?

thanks a lot for making my life easier

Topic		Replies	Views
Transform feedback in CUDA? CUDA Programming and Performance	6	6177	May 24, 2007
Opposite of Stream Compaction? CUDA Programming and Performance	2	746	February 8, 2013
how to parallel this simple algorithm? someone has met this before? CUDA Programming and Performance	2	7347	December 3, 2010
Stream compaction in OpenCL CUDA Programming and Performance	0	923	March 20, 2014
Funny development problem CUDA Programming and Performance	2	1213	March 13, 2013
'coompacting' result array CUDA Programming and Performance	4	4520	May 13, 2010
parallel search CUDA Programming and Performance	2	1256	June 26, 2009
Optimizing workload with Large amounts of computations but small amount of results CUDA Programming and Performance	1	542	March 25, 2018
Algorithm Strategy brainstroming.. Please help me choose the best algo.. CUDA Programming and Performance	5	1785	December 26, 2009
CUDPP Compact for 2D Array CUDA Programming and Performance	0	826	October 4, 2011

Stream Compaction Sample code needed

Related topics