Reduce kernel in OpenCL


i want to write an reduce kernel that takes one stream as
input and produces one single value as output. Is it possible
in OpenCL? How does the syntax look for this?


I implemented this in CUDA and ported it to OpenCL. Seems to still work.

(Though you wouldn’t want to rely on the omitted sub-warp barriers in OpenCL. All the other optimisations are relevant.)


i thought maybe there is a keyword similar to Brook+, like this:

reduce void add(double values<>, reduce double sum)

No, OpenCL doesn’t include any built-in reduction functionality, you’ll have to implement it yourself.