I’m trying to find an easy way to get the sum of a big array (with a varying size) in CUDA without success … I’ve found an example of reduction but the code is very old and not that easy to work with. I just have a device pointer of type float4, his size, and I want the sum …
Thanks you !
reduction is what you want.
There is a cuda parallel reduction sample code which should be useful. There is an accompanying PDF if you search for it “Mark Harris parallel reduction”
Finally, libraries like thrust (and cub) offer simple, convenient methods for reduction (google thrust::reduce)
Reduction of a vector type (float4) immediately raises questions in my head about your exact intent, but that doesn’t seem to be central to the very general question you have asked.
Well thanks you. I tried using Thrust, but ended with an error for only two lines of code … :
And the error just says : error no suitable constructor exists to convert from “int” to “float4”
While dW is basically a float4 *.
I’m using float4 in the case of a quaternion neural network and I want to add the L2 Regularization, thus I need to perform a big summation of dW which is my weight matrix.
Why not just describe what you want in simple math?
I have an array of float4. I want a summation where the float4 result contains the result of each component, e.g. result.x = summation(element.x), and the same for .y, .z, .w
This is actually exactly what you described. The result of the summation of float4 one = a, b, c, d and float4 two = e, f, g, h would be : float4 three = a+e, b+f, c+g, d+h.
in thrust, write a functor that does exactly that.
then pass that functor to thrust::reduce
here is a worked example for double2 vector type, should be trivial to convert it to float4 vector type:
[url]gpgpu - CUDA Thrust reduction with double2 arrays - Stack Overflow