Parallel Prefix Sum (Scan) SDK Sample with single block


I’ve been playing around with the NVIDIA OpenCL scan example and ran into a problem: It seems, the implementation works on batches for which it calculates the exclusive prefix sum. The first element of each batch is therefore zero. I tried to modify the parameters so only one batch is created no matter how many elements are to be processed, i.e a prefix sum over the entire range, but without luck.

Has anyone modified the scan to perform prefix sum over the entire array, or am I missing something obvious?.

Best regards,

Hey Christoph,

I’m not certain I understand the context of your question - I don’t know of the OpenCL scan example, but I do know some things about prefix sums. If you are receiving an exclusive prefix sum, you can convert it to an inclusive sum by adding an additional element whose value is equal to the final element of the exclusive array + the final element of the original array, and drop the zero.

Doing that might be the easiest way past your problem. Hope that’s at least a little helpful!

It will take a bit more than a bit of parameter tweaking to turn the example into a full range scan. I haven’t done it myself, but you can have a look at

which I believe does what you request.