The errors is below
FAILED: 719(unspecified launch failure)
0: DEALLOCATE: misaligned address
And I use cuda-memcheck to run the file and find that errors are occur when threadid%x is odd(so they are even in Fortran). Then I try below code to check it.
module mphilox
use curand_device
integer, parameter :: n = 6
type(curandStatePhilox4_32_10), device :: h(n)
real, device :: a(n), b(n)
contains
attributes(global) subroutine curandinit()
integer(8) :: seed, seq, offset
integer :: iam
iam = (blockidx%x-1)*blockdim%x+threadidx%x
seed = 12345
if (mod(iam,2) == 0) return
! call curand_init(seed, iam-1, offset, h(iam-1))
call curand_init(seed, iam, offset, h(iam))
! call curand_init(seed, iam-1, offset, h(iam-1))
h(ian-1) = h(iam)
end subroutine
attributes(global) subroutine curandphilox()
integer :: iam
iam = (blockidx%x-1)*blockdim%x+threadidx%x
a(iam) = curand_uniform(h(iam))
b(iam) = curand_normal(h(iam))
!b(iam:iam+1) = curand_norm2(h(iam))
return
end subroutine
end module mphilox
program philox
use mphilox
a = 0.0; b = 0.0
call curandinit<<<2,3>>> ()
call curandphilox<<<2,3>>> ()
end
Change line 12-15
! if (mod(iam,2) == 0) return
call curand_init(seed, iam, offset, h(iam-1))
call curand_init(seed, iam, offset, h(iam))
! call curand_init(seed, iam, offset, h(iam-1))
Errors occur in “by thread (0,0,0) in block (0,0,0); by thread (2,0,0) in block (0,0,0); by thread (1,0,0) in block (1,0,0)”
And if
! if (mod(iam,2) == 0) return
! call curand_init(seed, iam-1, offset, h(iam-1))
call curand_init(seed, iam, offset, h(iam))
call curand_init(seed, iam-1, offset, h(iam-1))
Errors occur in “by thread (1,0,0) in block (0,0,0); by thread (0,0,0) in block (1,0,0); by thread (2,0,0) in block (1,0,0)”
So, I guess curand_init(seed,seq,offset, h(iam)) error when iam is even.
One of cuda-memcheck message is below
========= Invalid __global__ write of size 16
========= at 0x000000c0 in mphilox_curandinit_
========= by thread (0,0,0) in block (0,0,0)
========= Address 0x7f9616c0fcd8 is misaligned
========= Device Frame:mphilox_curandinit_ (mphilox_curandinit_ : 0xc0)
========= Saved host backtrace up to driver entry point at kernel launch time
========= Host Frame:/usr/lib/gcc/x86_64-redhat-linux/4.8.5/../../../../lib64/libcuda.so.1 (cuLaunchKernel + 0x34e) [0x2e46de]
========= Host Frame:/home/zirui/soft/NVHPC/Linux_x86_64/22.3/cuda/11.0/lib64/libcudart.so.11.0 [0xf62b]
========= Host Frame:/home/zirui/soft/NVHPC/Linux_x86_64/22.3/cuda/11.0/lib64/libcudart.so.11.0 (cudaLaunchKernel + 0x1c1) [0x4f5b1]
========= Host Frame:/home/zirui/soft/NVHPC/Linux_x86_64/22.3/compilers/lib/libcudafor.so (__pgiLaunchKernel + 0x1a6) [0x1202c]
========= Host Frame:./philox.out [0x1511]
========= Host Frame:./philox.out [0x11b3]
========= Host Frame:/usr/lib/gcc/x86_64-redhat-linux/4.8.5/../../../../lib64/libc.so.6 (__libc_start_main + 0xf5) [0x22555]
========= Host Frame:./philox.out [0x1029]
NVHPC version and command are below
nvfortran --version
nvfortran 22.3-0 64-bit target on x86-64 Linux -tp haswell
nvfortran -cuda -cudalib=curand -o philox.out curand_philox_test.f90 && cuda-memcheck ./philox.out
By the way, I’d like to use curand_norm2(), but there is also error about it
NVFORTRAN-S-0155-Calls from device code to a host function are allowed only in emulation mode - curand_norm2 (curand_philox_test.f90: 23)
But it is a device api in cuRand Document.
How should I call this function? Thanks