variables when walking back from __assert_fail are incorrect on Volta

cuda-gdb displays corrupted variable information when walking back from __assert_fail on the Volta architecture, and specifically with a Quadro QV100 card. In all the previous architectures that I’ve tried, this works correctly. This is with CUDA 9.1. I don’t have sufficient control of the system with this card to experiment with a newer CUDA version

I’ll attempt to provide an example here, but you can also download from here:

https://concurrentrt-my.sharepoint.com/:u:/g/personal/todd_allen_concurrent-rt_com/EcH0252hXyVBsAwH6tRrYaMBIE_TKmB-LzhxnT53zKnDMA

The source is pretty simple:

#include <stdio.h>
#include <unistd.h>
#include <time.h>
#include <signal.h>
#include <assert.h>

#define  blocksPerGrid    13
#define  threadsPerBlock  27
#define  N                (blocksPerGrid * threadsPerBlock)

// CUDA kernel
__global__ void Kernel(float*  myFormal)
{
   int i = blockDim.x * blockIdx.x + threadIdx.x;

   float  result = myFormal[i] - 1.0;
   assert(i != 261);
   myFormal[i] = result;
}

#define  HOST_WRITE  (cudaHostAllocPortable \
                      | cudaHostAllocWriteCombined)
#define  HOST_RW     (cudaHostAllocPortable)

// Host code
int main(int argc, char** argv)
{
   size_t  floats_size = N * sizeof(float);

   float*  h_args;
   float*  d_args;
   cudaHostAlloc((void**)&h_args, floats_size, HOST_WRITE);
   cudaMalloc((void**)&d_args, floats_size);

   for (int i = 0; i < N; i++) {
      h_args[i] = 3.0;
   }

   cudaMemcpy(d_args, h_args, floats_size, cudaMemcpyHostToDevice);

   Kernel<<<blocksPerGrid, threadsPerBlock>>>(d_args);
   cudaError_t err = cudaGetLastError();
   if (err != cudaSuccess) {
      fprintf(stderr,
              "kernel launch failure: %s (%d)\n",
              cudaGetErrorString(err), err);
      exit(1);
   }                                                             

   cudaThreadSynchronize();

   cudaMemcpy(h_args, d_args, floats_size, cudaMemcpyDeviceToHost);

   cudaFree(d_args);
   cudaFreeHost(h_args);

   cudaThreadExit();

   exit(0);
}

To reproduce this:

/usr/local/cuda/bin/cuda-gdb ./assert
r
# Expect "Assertion failed", and to be stopped in block 9, thread 18 in __assert_fail.
i frame
# This shows only R21 being saved by __assert_fail
up
# Expect to be stopped at the assert call (line 17 in my copy of the source).
p i
# This displays i = 0.  That obviously is incorrect because the assertion is (i != 261).
# Therefore, i must be 261.
i addr i
# This shows $R17
p $R17
# This also shows that $R17 = 0

My first guess is that the .debug_frame information for __assert_fail is incomplete, and doesn’t describe R17 being saved wherever it’s saved, or even being DW_CFA_undefined if it isn’t saved anywhere.

I discovered this originally with my company’s NightView debugger, but then determined that it was a problem with DWARF information. And then was able to reproduce it with cuda-gdb.

Thanks for this feedback.
We are currently looking into this.

Would it be possible for you to attach the executable built with the 9.1 toolkit (ideally for Linux/amd64)?
Also, what version of nvcc are you using and was the application compiled with or without -g/-G? ?

I can’t find any way to attach a binary in these forums. But here’s a link that should get you to the file:

https://concurrentrt-my.sharepoint.com/:u:/g/personal/todd_allen_concurrent-rt_com/EffWb5hUx3VEhx_ktjFq6WwBVMYD9VkSoDi_28TXul2ZsA?e=YDb9HE

Unfortunately, I’ve lost access to the card that demonstrated the problem now. (It was on loan.) So I can’t confirm that the executable I’ve given you still has the problem. But it was built the same way as 3 weeks ago, so it should.

In response to your 2nd question: Yes, the executable was built with both -g and -G. The CUDA DWARF information was present, just incorrect. And the nvcc compiler says this:

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Nov__3_21:07:56_CDT_2017
Cuda compilation tools, release 9.1, V9.1.85

Tracking this issue with new email address.