I am using the CUDA Mersenne Twister from the PGI insider article.
I have the following call in my code:
call seedmtgpu ( it*2 ) call randomgpu <<<blocksPerGrid,threadsPerBlock>>> & ( RANd, n*nran/4096 )
(the it*2 is due to the fact that even seeds seem to give more uniformity), however I see something very odd if I include the following lines (RANh is a host array version of RANd):
RANh = RANd print*,'-->',minval(RANh),maxval(RANh),minval(RANd),maxval(RANd)
Here is some typical output (from a loop):
--> 1.1294847E-04 0.9999350 0.7902580 0.2735343 --> 9.5369294E-05 0.9997641 8.4910914E-03 4.5726325E-02 --> 1.6205013E-06 0.9997230 0.4195687 0.6608130 --> 1.7785118E-04 0.9999308 3.0623030E-02 0.4581388 --> 4.8627215E-04 0.9999614 0.3619932 0.1537497 --> 1.6827625E-04 0.9999825 0.9266415 0.2022597 --> 1.5367521E-04 0.9991314 0.3081011 0.6696860 --> 5.7792291E-05 0.9998940 0.6558207 0.6036910 --> 6.8081543E-05 0.9998164 0.5965781 0.8798736
So the compiler doesn’t prevent me from using minval and maxval on device data to print to the screen, but clearly the results it generates are wrong. Might the way the numbers have been returned from the Mersenne Twister (using the 2003 ISO C bindings) be introducing a bug, hence the reason the spread of random numbers I see isn’t very uniform?
Also, the CUDA Fortran Programming Guide says that the fortran intrinsic MATMUL is available for device data - the compiler though says that this is an unsupported feature. Is MATMUL available in the latest release?