Poor accuracy of curand Sobol at high dimensions

I have been using the curand Sobol device api “sphere” example as a starting point for my own code, but I have noticed a couple of potential problems with it. One is that it is unexpectedly slow - I found it could be sped up a few times by caching the generator states in the kernel (at lower dimensionality), presumably reducing the bandwidth required for transferring the (quite large) states.

More concerning is poor accuracy, which seems to be related to its use of separate dimensionality for each thread. Probably this is not a good strategy for optimal performance anyway, but it is useful for validation of the generator at high dimensions. I found that reducing dimensionality significantly improved accuracy. This is possibly related to the initialization vectors - I note curand only cites Joe & Kuo’s first paper, not their second one that addresses some problems at high dimension (see https://web.maths.unsw.edu.au/~fkuo/sobol/). It is also worth noting that even the corrected vector set only satisfies “Property A” up to dimension 1111.

Are the developers aware of the problems with the original Joe & Kuo set, and is the first or second set included in curand?

cuRAND uses new-joe-kuo-6.21201 set. Sobol example was not focused on getting the best performance.

To get the quality you’re looking for, you’ll need to use the Host API

Thanks for the info. Is there any reason to expect a difference in output from the host api? In my testing (at low dimensions) the answer seemed to be the same. I was also able to get a bit more speed from the device api, but perhaps my implementation with the host api wasn’t quite optimal.