SEGV during CUDA graph launch

chavan.harshada · March 20, 2020, 8:11pm

Hello,

I have been hitting following SEGV while launching cuda graph:

#0 0x00007f333ee7e4e0 in cuMemGetAttribute_v2 () from /usr/lib64/libcuda.so.1
#1 0x00007f333ef381de in cuEGLApiInit () from /usr/lib64/libcuda.so.1
#2 0x00007f333f00e117 in cuVDPAUCtxCreate () from /usr/lib64/libcuda.so.1
#3 0x00007f333ef397f7 in cuEGLApiInit () from /usr/lib64/libcuda.so.1
#4 0x00007f333ef2b6a8 in cuEGLApiInit () from /usr/lib64/libcuda.so.1
#5 0x00007f333ef2b83b in cuEGLApiInit () from /usr/lib64/libcuda.so.1
#6 0x00007f333ee4589a in ?? () from /usr/lib64/libcuda.so.1
#7 0x00007f333ef8dddf in cuGraphLaunch_ptsz () from /usr/lib64/libcuda.so.1
#8 0x00007f333fe71362 in cudart::cudaApiGraphLaunchCommon(CUgraphExec_st*, CUstream_st*, bool) () from gpu.so
#9 0x00007f333fec4966 in cudaGraphLaunch_ptsz () from gpu.so
#10 0x00007f333fe63289 in cuda_graph_run (graph=0x7f33380d9a30)

There are many nodes in the graph; however, the one that’s causing this issue is the following:

cuda_add_memcpy_node(G, A1); // G: graph, A1: memcpy node
cuda_add_kernel_node(G, A); // A: kernel node
cuda_graph_dependency(G, A1, A);
cuda_graph_instantiate(G); // calls cudaGraphInstantiate and returns cudaSuccess
cuda_graph_run(G); // calls cudaGraphLaunch and hits SEGV

The SEGV is hit when the A1->A dependency is specified. And looks like it’s on the host side (i.e. before any of the kernel is called). The earlier call to cudaGraphInstantiate() returns cudaSuccess. If we do not specify the dependency (A1->A), the graph launch completes and the kernels are called successfully. I wanted to know if you have you seen a similar issues before and know what could be causing it. In general, why can’t we add dependency to a kernel node?

Thanks,
Harshada