I need to compute the prefix sum of very large arrays, over 20million elements. I’ve been digging over the SDK sample, but I can’t get it to work over 16777218 elements, as all tests fail.
Is there any limit? As far as I understand, the limit should be 33553920 (65535*512), right?
You should try the Thrust library, or CUDPP. Their scan primitives will work for large inputs and they should also be significantly faster than the scan from the SDK.