Hi
I have observed that I cannot used cuda-gdb and/or cuobjdump -sass with some of my cuda programs.
I am using CUDA toolkit V9.2.88
In both cases, the process nvdisasm runs at 100% cpu core utilization and uses up all available memory (32 GB).
(cmdline is “nvdisasm /tmp/sometmpfile -b SM61 -hex”)
For example, a simple program which I cannot use cuobjdump -sass on is the dynamic parallelism example from the other post:
#include <stdio.h>
__global__
void kernel2(int n){
printf("thread %d in kernel2 called by thread %d from kernel1\n", threadIdx.x, n);
}
__global__
void kernel1(){
kernel2<<<1,4>>>(threadIdx.x);
}
int main(){
kernel1<<<1,2>>>();
cudaDeviceSynchronize();
cudaDeviceReset();
}
Compiled with nvcc -arch=sm_61 -g -G -rdc=true -o main main.cu
If I run cuobjdump -sass main, I only get the output
Fatbin elf code:
arch = sm_61
code version = [1,7]
producer =
host = linux
compile_size = 64bit
has debug info
compressed
identifier = main.cu
code for sm_61
Function : cudaDeviceGetAttribute
.headerflags @"EF_CUDA_SM61 EF_CUDA_PTX_SM(EF_CUDA_SM61)"
then the lock up occurs.
Is this a bug? Does someone else observe the same behavior?
In a more complex example, I was able to use objdump after compiling with -G. without it → lock up