Mersenne Twister SDK: what's going on?

Hi all,

I’m new to CUDA and I’m trying to use the SDK Mersenne Twister code for some monte carlo simulations.

Just to test, I’m setting PATH_N = 8192, leaving the SEED = 777 as default, commenting out DO_BOXMULLER and outputing the uniform randoms in h_RandGPU to a file.

Why are there no numbers over 0.3 in the first 4096?

Are the numbers in the MT_RNG_COUNT and N_PER_RNG dimensions supposed to be all independent?

Am I missing something about the way this works?

Here’s a simple frequency count I add to make sure.

const unsigned int Nbins = 10;

		 int* bins = (int*)calloc(Nbins, sizeof(int));

		 for(unsigned int i=0; i<4096; i++){

			 int bucket = (int) (h_RandGPU[i]*Nbins);

			 bins[bucket] += 1;

		 }

		 for(unsigned int i=0; i<Nbins; i++){

				 printf("%d ", bins[i]);

		 }

		 free(bins);

It’s not just the first block of 4096 with empty bins. Blocks 3 and 11 have no numbers less than 0.5.

Using this code

printf("\n\nPATH_N %d, RAND_N %d, N_PER_RNG %d, MT_RNG_COUNT %d\n", PATH_N, RAND_N, N_PER_RNG, MT_RNG_COUNT);

	 

		float mean;

		float var;

for(unsigned int k=0; k<N_PER_RNG; k++){

		 printf("block %d\n", k);

		 mean = 0;

		 var = 0;

		 getStats(h_RandGPU + k*MT_RNG_COUNT, MT_RNG_COUNT, &mean, &var);

		 printf("mean %1.4f, variance %1.4f\n", mean, var);

		 const unsigned int Nbins = 10;

		 int* bins = (int*)calloc(Nbins, sizeof(int));

		 for(unsigned int i=k*MT_RNG_COUNT; i<(k+1)*MT_RNG_COUNT; i++){

			 int bucket = (int) (h_RandGPU[i]*Nbins);

			 bins[bucket] += 1;

		 }

		 for(unsigned int i=0; i<Nbins; i++){

				 printf("%d ", bins[i]);

		 }

			   printf("\n\n");

		 free(bins);

}

		 getStats(h_RandGPU, RAND_N, &mean, &var);

		 printf("mean %1.4f, variance %1.4f\n", mean, var);

I get this

PATH_N 131072, RAND_N 131072, N_PER_RNG 32, MT_RNG_COUNT 4096

block 0

mean 0.1125, variance 0.0056

1889 1883 324 0 0 0 0 0 0 0 

block 1

mean 0.4620, variance 0.0920

513 513 518 442 454 236 235 326 428 431 

block 2

mean 0.4852, variance 0.0788

377 470 404 407 512 423 415 363 349 376 

block 3

mean 0.6769, variance 0.0179

0 0 0 0 0 1916 130 1013 955 82 

block 4

mean 0.5036, variance 0.0634

80 829 91 468 715 101 1118 42 302 350 

block 5

mean 0.5572, variance 0.0795

40 911 8 501 292 0 1176 2 819 347 

block 6

mean 0.4650, variance 0.0824

492 425 505 448 451 392 292 356 405 330 

block 7

mean 0.5395, variance 0.0913

433 403 331 242 247 476 423 502 522 517 

block 8

mean 0.4676, variance 0.0837

462 488 490 381 500 323 367 368 330 387 

block 9

mean 0.4556, variance 0.0883

515 533 475 462 455 335 184 328 438 371 

block 10

mean 0.5439, variance 0.0750

369 201 567 135 167 421 1014 529 211 482 

block 11

mean 0.8077, variance 0.0169

0 0 0 0 0 43 1315 622 620 1496 

block 12

mean 0.5410, variance 0.0837

424 270 356 329 277 477 492 522 493 456 

block 13

mean 0.4602, variance 0.0818

516 444 447 487 427 383 342 372 369 309 

block 14

mean 0.4629, variance 0.0466

341 373 246 320 472 1142 1014 44 40 104 

block 15

mean 0.5224, variance 0.0886

375 437 384 361 369 342 437 424 435 532 

block 16

mean 0.5098, variance 0.0784

393 305 380 487 414 478 417 399 434 389 

block 17

mean 0.4888, variance 0.0578

110 713 356 215 350 950 651 339 227 185 

block 18

mean 0.5144, variance 0.0873

105 40 1336 135 1041 25 2 531 49 832 

block 19

mean 0.5089, variance 0.0646

205 485 359 434 413 230 1022 473 242 233 

block 20

mean 0.5054, variance 0.0822

386 380 442 419 387 410 437 400 413 422 

block 21

mean 0.4980, variance 0.0853

426 436 413 371 400 408 395 416 408 423 

block 22

mean 0.5909, variance 0.0723

249 186 260 509 66 407 853 590 439 537 

block 23

mean 0.6350, variance 0.0506

3 28 301 385 714 402 586 493 477 707 

block 24

mean 0.4357, variance 0.0809

838 308 243 217 245 1142 426 183 246 248 

block 25

mean 0.5012, variance 0.0838

403 412 415 420 394 399 386 442 403 422 

block 26

mean 0.5061, variance 0.0848

400 426 380 391 410 418 421 387 412 451 

block 27

mean 0.5041, variance 0.0821

424 355 419 424 376 431 421 414 459 373 

block 28

mean 0.4938, variance 0.0820

435 387 394 441 425 442 381 391 435 365 

block 29

mean 0.6008, variance 0.0768

258 249 248 248 221 591 512 666 376 727 

block 30

mean 0.5051, variance 0.0835

407 402 400 403 369 446 403 431 430 405 

block 31

mean 0.4917, variance 0.0910

504 543 367 233 254 510 461 325 526 373 

mean 0.5110, variance 0.0828

Anyone have any ideas? Does anyone know what kind of statistical testing the MT has been put through?

MT works fine. May b, the GPU implementation has a fine nuance that you are missing… (there was a comment talking about how the data are arranged in a column major way or sthg… Read that)

I have a CPU version of MT that I used to output 4096 values… here is a small portion of it.
"
0.734556 0.392499 0.704268 0.910215 0.893834 0.088856 0.002587 0.402766 0.658523 0.906760 0.708254 0.522216 0.825580 0.579092 0.043581 0.140917 0.471140 0.000711 0.254525 0.682086 0.971312 0.796372 0.059733 0.175758 0.057710 0.058751 0.687020 0.674676 0.690644 0.352411 0.597682 0.885546 0.843171 0.045824 0.763688 0.838252 0.136387 0.609334 0.262008 0.841279 0.424717 0.697412 0.022627 0.707624 0.399793 0.390493 0.707571 0.511026 0.656523 0.271968 0.934782 0.245935 0.524560 0.856338 0.791016 0.739560 0.580990 0.400961 0.039301 0.866973 0.523911 0.875278 0.690672 0.050260 0.383755 0.951849 0.472743 0.725286 0.637499 0.144497 0.781664 0.700307 0.298719 0.477379 0.782223 0.951083 0.365425 0.168270 0.872842 0.069135 0.332499 0.223942 0.622118 0.508557 0.683386 0.546276 0.808184 0.100295 0.250565 0.749764 0.056069 0.224864 0.466401 0.897846 0.183736 0.308357 0.017142 0.513527 0.103204 0.549385 0.306311 0.499438 0.189911 0.190347 0.874911 0.618095 0.483580 0.963632 0.296933 0.626331 0.295577 0.550114 0.076271 0.261043 0.577208 0.376509 0.949016 0.922442 0.766362 0.635737 0.018353 0.832577 0.676174 0.683619 0.512210 0.870770 0.876085 0.084748 0.433051 0.919519 0.127257 0.275303 0.848927 0.315964 0.560584 0.749607 0.438548 0.023271 0.475919 0.971597 0.430635 0.169895 0.181464 0.160650 0.847565 0.897873 0.312009 0.114712 0.893709 0.827831 0.781068 0.581521 0.247060 0.391118 0.235313 0.511961 0.450161 0.680828 0.967356 0.754529 0.875971 0.542963 0.308333 0.374463 0.442573 0.546507 0.885199 0.022775 0.215887 0.106419 0.054654 0.159828 0.699942 0.256472 0.648416 0.335730 0.371126 0.344692 0.451940 0.548022 0.703863 0.748433 0.066958 0.363695 0.819448 0.391312 0.452771 0.793580 0.775785 0.737253 0.762723 0.904085 0.969488 0.784382 0.616543 0.519395 0.412310 0.246841 0.657333 0.594507 0.685154 0.202528
"

I think it has a good mix.

Check the CPU version of SDK first (I use a slightly different CPU version).

I know the serial CPU MT is well established, it’s the SDK implementation of the parallel version I’m having trouble with.

If per-thread generators are correlated, it’s not a parallel algorithm is it?

I am not sure about the parallel implementation. Need to go through it to see how they have done.

Try writing to the author of MC SDK Sample. The author’s contact are there in the whitepaper.

Usually, they respond quick.

I tried that, but no response yet. Not sure he still works there.

I had to change mt also; see ntung.com/papers (cuda video implementation). as I’m not doing any sort of security work, xor with tid worked.

cheers,
Nicholas

Thanks I’ll try that.

Nvidia should have this working out of the box by now. I wonder how many people are doing monte carlo without checking this.

xor doesn’t remove the correlation across generators.

I’ve been recommended the leap frog method, but that just throws away all except one generator, so as I understand it is not really a parallel solution and just wasted memory storing random numbers I can’t use.

I thought using the dcmt library, as the SDK implementation does, is supposed to seed each generator independently?

Can anybody clarify?

How many random numbers generating in the default sdk MT implementation? please let me know what is 2400000 samples, does it mean that they are generating that much of RNs??