I am wondering if anyone could suggest the best approach to computing the mean / standard deviation of a large number of relatively small but differently sized arrays in CUDA?
The parallel reduction example in the SDK works on a single very large array and it seems the size is conveniently a multiple of the number of threads per block… but my case is rather different:
Conceptually I however have a large number of objects which each contain two components, upper and lower and each of these components has an x and a y coordinate. i.e. upper.x, lower.y, upper.x, lower.y
Each of these arrays is approximately 800 in length but it varies between objects (not within an object) e.g.
Object1.lower.x = 1.1, 2.2, 3.3
Object1.lower.y = 4.4, 5.5, 6.6
Object1.upper.y = 7.7, 8.8, 9.9
Object1.upper.y = 1.1, 2.2, 3.3
Object2.model.x = 1.0, 2.0, 3.0, 4.0, 5.0
Object2.model.y = 6.0, 7.0, 8.0, 9.0, 10.0
Object2.image.y = 11.0, 12.0, 13.0, 14.0, 15.0
Object2.image.y = 16.0, 17.0, 18.0, 19.0, 20.0
Please note the above is just my way of representing the array and my data is not stored in C structs or anything like that, the data can be organised in whatever way I need. The point is that for each array the mean, standard deviation and eventually a histogram needs to be computed and within one particular object, ratios and differences between arrays need to be computed.
So how should I go about sending this data to the GPU device and organising my thread-block hierarchy? One idea I had was to zero pad all my arrays so they are of the same length, and have a group of blocks working on each object… but it seems there are all sorts of problems with that method if it would work at all.
Thanks in advance