Please can someone please straighten me out on the scan example in Cuda 6.0. I was expecting it to take some sequence of numbers and create a summed scan version, inter or intra-scan according to the input count N. But the sample actually seems to take a supplied number arrayLength, and then scans the input into lots of scanned segments of the original data, each arrayLength long. Am I reading this right, and if so, how is this helpful?
it depends on whether you wish to scan or merely reduce: end up with a summed sequence, or just a sum
but more importantly, it depends on the input array length, and ensuring that it can fit on the device, given the device’s max applicable thread block dimension
for example, you can not fit a 5k element array on the device, without breaking it up into smaller sub sequences
it is generally very easy to move from scanned sub-sequences to a scanned sequence, if you want to scan instead of just reducing
The scanExclusiveLarge function can do batches (of the same size scans) in parallel. If you only want to do a single scan, pass 1 as the 3rd parameter (batchSize) and whatever is the length of your array as the 4th parameter (arrayLength). Refer to the scan.cu file. Depending on the size of your array, you would use different scan functions. These sizes are delineated in scan.cu such as MIN_SHORT_ARRAY_SIZE and MIN_LARGE_ARRAY_SIZE, etc.
Sadly there’s a factorRadix2 check against the arrayLength, which limits usefulness.
I think I’m mostly struggling with the batched array concept itself. Assuming a a large input set, I thought that we’d just get each block to scan its segment of the data, and then we launch subsequent kernels to update the L1 subscans with the global results. So why do we need the batching?
You could pad your array up to the next size that satisfies the factorRadix2 check. And it is, after all, a sample code, not a production library.
If you’re just looking for a handy scan function, thrust and cub both have implementations that should be pretty flexible.