I have implemented XORWOW generator given in the paper http://www.jstatsoft.org/v08/i14/paper in C and got a set of random numbers as output. Below is the snippet of the logic:

uint32_t xorwow() {
static uint64_t x = 123456789, y = 362436069, z = 521288629,
w = 88675123, v = 5783321, d = 6615241;
uint64_t;
t = (x ^ (x >> 2));
x = y;
y = z;
z = w;
w = v;
v = (v ^ (v << 4)) ^ (t ^ (t << 1));
return (d += 362437) + v;
}
//Marsaglia's paper uses unsigned long as the word type. This has been changed to uint32_t to ensure the word type is an unsigned 32-bit integer.

Then I tried using the same XORWOW generator using curandCreateGenerator function with CURAND_RNG_PSEUDO_XORWOW as parameter and set the seed value to zero (ie. A value of 0 for seed sets the state to the values of the original published version of the xorwow algorithm). The code that is used is an example given in the cuRAND documentation http://docs.nvidia.com/cuda/curand/host-api-overview.html#host-api-example

According to my understanding both the implementation should give the same set of random numbers because its the same logic. But I am getting different; Is my understanding wrong or Is there anything that I missing out ?

No, it doesn’t. You are mistaken. That is not published anywhere, and it is not correct, even according to your own test.

The original algorithm had at least 4 32-bit seeds. The CURAND XORWOW algorithm has a single 64-bit seed, and it is not published anywhere how that seed relates to the original algorithm seeds.

They have also mentioned that the XORWOW generator uses XORWOW implementation of the Xorshift RNG and given this paper as reference http://www.jstatsoft.org/v08/i14/paper.

My doubt is since we use only one seed and it is given that

Here what does “values” mean? Does it mean the x, y, z, w, v, d that is used in the paper?

The CURAND documentation does indeed say that “when the seed is set to 0, state is initialized to the values of the original xorwow algorithm”, and it is a reasonable interpretation that this means that the internal state of CURAND’s XORWOW generator is initialized the way it is specified in Marsaglia’s original paper.

What is not clear is whether your code that sets up CURAND configures XORWOW in all aspects (not just the seed) such that one would expect the output to match the serial implementation. Parallelization of PRNGs involves splitting the original sequence into portions for individual threads, which could happen by interleaving or block-partitioning, so it is not immediately clear that any particular thread should produce the original sequence produced by the serial implementation.

I have not used CURAND so I cannot help assess the validity of your CURAND setup. I actually do not see buildable code in the original post above that would someone let knowledgeable asses it, either.

I guess the bigger question here is: Why would it be important whether CURAND’s XORWOW generator, for any particular thread, produces a sequence identical to the serial XORWOW generator? The significance of that escapes me.

#include <unistd.h>
#include <stdio.h>
/* we need these includes for CUDA's random number stuff */
#include <curand.h>
#include <curand_kernel.h>
/* this GPU kernel function calculates a random number and stores it in the parameter */
__global__ void random(unsigned int* result) {
/* CUDA's random number library uses curandState_t to keep track of the seed value
we will store a random state for every thread */
curandStateXORWOW_t state;
/* we have to initialize the state */
curand_init(0, /* the seed controls the sequence of random values that are produced */
0, /* the sequence number is only important with multiple cores */
0, /* the offset is how much extra we advance in the sequence for each call, can be 0 */
&state);
/* curand works like rand - except that it takes a state as a parameter */
*result = curand(&state);
}
int main( ) {
/* allocate an int on the GPU */
unsigned int* gpu_x;
cudaMalloc((void**) &gpu_x, sizeof (unsigned int));
/* invoke the GPU to initialize all of the random states */
random<<<1, 1>>>(gpu_x);
/* copy the random number back */
unsigned int x;
cudaMemcpy(&x, gpu_x, sizeof(unsigned int), cudaMemcpyDeviceToHost);
printf("Random number = %u.\n", x);
/* free the memory we allocated */
cudaFree(gpu_x);
return 0;
}

The above code is equal to a serial code in all aspects since the parameters are set to <<1,1>>, while calling the kernel function. Used nvcc to compile.

Below is the normal C implementation of the Marsaglia’s XORWOW random number generator. Used gcc to compile.

#include <inttypes.h>
#include <stdint.h>
#include <stdio.h>
uint32_t xorwow() {
static uint32_t x = 123456789, y = 362436069, z = 521288629, w = 88675123,
v = 5783321, d = 6615241;
uint32_t t = (x ^ (x >> 2));
x = y;
y = z;
z = w;
w = v;
v = (v ^ (v << 4)) ^ (t ^ (t << 1));
return (d += 362437) + v;
}
int main(void) {
size_t i;
printf("xorwow:");
printf(" %" PRIu32, xorwow());
printf("\n");
return 0;
}

Both give different random numbers.

Since, it both implements Marsaglia’s XORWOW random number generator, should it not give the same output given the same seed?

As I said, I am not familiar with CURAND, so I cannot assess whether the initialization shown in the code above should result in the same sequence being returned as the serial code. If the initialization is appropriate, presumably the answer to your question is “yes”. Assuming that is the case, I see at least two possible scenarios (which doesn’t mean yet other scenarios couldn’t apply):

[1] The CURAND documentation could be in error, and passing a seed of zero is not in fact intended to be equivalent to the seeding specified for the published serial algorithm. Cut & paste errors have known to occur in CUDA documentation, and I see that identical language about a seed of zero is mentioned for at least one other generator.

[2] The CURAND documentation correctly states what is supposed to happen, but there is an error in the implementation of the internal seeding (this could be as simple as a typo in one of the published seed values). As far as I understand, given a fairly short sequence of output values, one can determine the internal state of simple generators like XORWOW, but I do not know off-hand how to do it, so I can’t determined which seed values are actually used.

Best I can tell, you have posted nearly identical questions about CURAND’s XORWOW generator to at least three different forums, so if you are worried about possibilities [1] and [2] above, I would suggest filing a bug report with NVIDIA. Personally, I don’t see how the exact seed values matter in practical terms.