Invalid __global__ read when using constant memory

I am trying to use constant memory to store a number of arrays that are used throughout the code and are not changed. However, when one of the arrays in constant memory appears in a “cuf kernel do” loop, I get (from compute-sanitizer):
Invalid global read of size 8 bytes
========= at test_kernel_do_23+0xe50 in /home/…/test_kernel_do_re.cuf:24
========= by thread (0,0,0) in block (0,0,0)
(repeated many times - I removed the path).
This happens both with double precision and complex double array types. However, the error goes away if I give the “device” attribute instead of the “constant” attribute (to variable ad below). A failing example code is included below. Am I breaking some rule or the other for the use of constant memory?
Any help would be greatly appreciated.
Some specs:

nvfortran -V
nvfortran 24.5-1 64-bit target on x86-64 Linux -tp skylake-avx512
NVIDIA Compilers and Tools
Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

OS: CentOS Linux 7 (Core). Device: NVIDIA GeForce RTX 2080 Ti

Example code:

module globals
use cudafor
implicit none
integer, parameter :: n=100
real(kind=8), constant :: ad(n)
end module globals

program test_kernel_do
use globals
implicit none
integer :: k
real(kind=8) :: a(n),b(n),c(n)
real(kind=8), device :: bd(n),cd(n)

do k=1,n
end do

!$cuf kernel do
do k=1,n
cd(k)=ad(k)*bd(k) ! This is line 24 from the compute-sanitizer message.

print *,c

end program test_kernel_do

Thanks for the report. This looks to be the same as a known issue (tracked as TPR#35715). I added your example to the report and let engineering know.


Thanks! As a work-around, can I change the “constant” attribute to “device”? In this simple example that works, but is the scope the same for a more complex code? I need the arrays to be available (read-only) to any device-based function throughout the code.

You should be fine changing these to be “device”. They wont technically be “read-only” but will be available provided that they are visible from a module.

Constant memory can be a bit faster but is limited to 64K in total. So for a larger code, constant memory might be too limiting if you’re using multiple larger arrays and you’d need to use “device” anyway.