Simple question for multithreads of GPU Multithreads of 1000 + data sets

The CUDA SDK examples demo GPU capabilities of parallel computing, for example
(1)square sum (2) bitonic sort (3) transpose
But all these demo use a rand data (for example 20000 items) , no examples for many data set at the same time.

My question is if I have many data set, say 1000 data set , can I get any benefits to calculate it at the same time from GPU, like square-sum (or sort) ?
(Actually I would like to calculate this 1000 statistics result of like mean, sigma, max, min, range, percentile, … )

Can anyone provide me the demo code if the answer is “Yes” ?

By the way , my hardware is 9800 GT with 1G RAM, 14 multiprocessors.