Odd results with minval and maxval on Mersenne Twister data

I am using the CUDA Mersenne Twister from the PGI insider article.

I have the following call in my code:

         call seedmtgpu   ( it*2 )
         call randomgpu   <<<blocksPerGrid,threadsPerBlock>>>            &
                          ( RANd, n*nran/4096 )

(the it*2 is due to the fact that even seeds seem to give more uniformity), however I see something very odd if I include the following lines (RANh is a host array version of RANd):

         RANh = RANd


Here is some typical output (from a loop):

 -->   1.1294847E-04   0.9999350       0.7902580       0.2735343
 -->   9.5369294E-05   0.9997641       8.4910914E-03   4.5726325E-02
 -->   1.6205013E-06   0.9997230       0.4195687       0.6608130
 -->   1.7785118E-04   0.9999308       3.0623030E-02   0.4581388
 -->   4.8627215E-04   0.9999614       0.3619932       0.1537497
 -->   1.6827625E-04   0.9999825       0.9266415       0.2022597
 -->   1.5367521E-04   0.9991314       0.3081011       0.6696860
 -->   5.7792291E-05   0.9998940       0.6558207       0.6036910
 -->   6.8081543E-05   0.9998164       0.5965781       0.8798736

So the compiler doesn’t prevent me from using minval and maxval on device data to print to the screen, but clearly the results it generates are wrong. Might the way the numbers have been returned from the Mersenne Twister (using the 2003 ISO C bindings) be introducing a bug, hence the reason the spread of random numbers I see isn’t very uniform?

Also, the CUDA Fortran Programming Guide says that the fortran intrinsic MATMUL is available for device data - the compiler though says that this is an unsupported feature. Is MATMUL available in the latest release?

The use of minval and maxval on the device array in the print statement is not allowed. The compiler should have flagged an error at compile time. I will file a bug regarding this.

The matmul intrinsic is not available in the current release.

Thanks for that toepfer