The CUDA SDK examples demo GPU capabilities of parallel computing, for example

(1)square sum (2) bitonic sort (3) transpose

But all these demo use a rand data (for example 20000 items) , no examples for many data set at the same time.

My question is if I have many data set, say 1000 data set , can I get any benefits to calculate it at the same time from GPU, like square-sum (or sort) ?

(Actually I would like to calculate this 1000 statistics result of like mean, sigma, max, min, range, percentile, … )

Can anyone provide me the demo code if the answer is “Yes” ?

By the way , my hardware is 9800 GT with 1G RAM, 14 multiprocessors.