Return variable number of results...?

A lot if not all of the programming examples are focused on calculating a result of fixed size from the given inputs. Are there any examples out there that produce some variable number of outputs and what is the best way to code this.

For example if I have a large array of values on the card and on each of my kernel calls I will input some value, compute some result against the large array for each element and then return the value for elements where the calculation produced a result falling within certain parameters.

I expect the number of resulting selected elements to be small compared with the over all list Is there a good way to do this?

There’s no dynamic memory allocation on the device BY the device, so you have to plan ahead. If your expected output is small, you might just preallocate a buffer large enough to handle your worst case size, then write into that, and have the CPU just check the memory read back for the info.

Global atomics are an easy way to record results like this. If you’re writing a LOT of results you may avoid atomic overhead and do some classic reduction and or prefix-sum operations to compact the data without atomics.

Worst case, if you preallocate memory but it’s not enough, you could just structure your algorithm to allow multiple passes if the CPU detects that you filled the whole buffer.

This kind of “uncertain result” computation is common. Particles of an n-body simulation may merge or escape, and don’t need to be reported. You may not bother to return results from missed ray/object intersections. Your code cracker probably won’t return ANYTHING except the 1-in-a-billion success value.

You could also do a stream compaction (using scan) to remove the results you’re not interested in, and then read back the rest. Take a look at the “scan” example in the SDK.

for a direct “generate” pattern, please refer to our paper “in-memory grid files…” in this forum.