CUDA Fortran program works on GTX 1050 but fails on Tesla V100

Hi,

I’m experiencing an issue with my CUDA Fortran program. It runs fine on a GTX 1050, but fails on a Tesla V100. The program uses two kernel functions with the following thread configurations:

  • 31x64 threads for the first kernel
  • 64x64 threads for the second kernel

The error only occurs on the Tesla V100. I’ve attached the error message and a snippet of the code.

Could this issue be related to the Tesla V100’s hardware or resource limitations? Any suggestions would be appreciated!

Error Message Screenshot:

^C
Thread 1 “1” received signal SIGINT, Interrupt.
[Switching focus to CUDA kernel 0, grid 2, block (7,0,0), thread (0,2,0), device 0, sm 14, warp 2, lane 16]
flux::reconstruction_x<<<(8,8,1),(8,8,1)>>> (
aadens=<error reading variable: Cannot access memory at address 0x5>,
aaxmom=<error reading variable: Cannot access memory at address 0x0>,
aaymom=<error reading variable: Cannot access memory at address 0x0>,
aaener=<error reading variable: Cannot access memory at address 0x0>,
aad=<error reading variable: Cannot access memory at address 0x0>,
aax=<error reading variable: Cannot access memory at address 0x0>,
aay=<error reading variable: Cannot access memory at address 0x0>,
aae=<error reading variable: Cannot access memory at address 0x0>)
at 3.for:155

Thanks!

It would probably be helpful, if you post the program and how you call it.

1 Like

I usually recommend that CUDA Fortran questions be posted on the nvfortran forum.

Just as here, if posting there, a complete example is always a good idea.

Address 0 is never a legal device address starting point. My guess would be that you have device allocations that are failing for some reason. When you launch the kernel, and the kernel code attempts to access those allocations, you will have a machine fault.

I wouldn’t be able to explain why it works on GTX 1050 but not Tesla V100.

In any event, I would start with proper CUDA error checking, and run your code under compute-sanitizer.