I have been using the curand Sobol device api “sphere” example as a starting point for my own code, but I have noticed a couple of potential problems with it. One is that it is unexpectedly slow - I found it could be sped up a few times by caching the generator states in the kernel (at lower dimensionality), presumably reducing the bandwidth required for transferring the (quite large) states.
More concerning is poor accuracy, which seems to be related to its use of separate dimensionality for each thread. Probably this is not a good strategy for optimal performance anyway, but it is useful for validation of the generator at high dimensions. I found that reducing dimensionality significantly improved accuracy. This is possibly related to the initialization vectors - I note curand only cites Joe & Kuo’s first paper, not their second one that addresses some problems at high dimension (see Sobol sequence generator). It is also worth noting that even the corrected vector set only satisfies “Property A” up to dimension 1111.
Are the developers aware of the problems with the original Joe & Kuo set, and is the first or second set included in curand?