Hello,
I am testing my application that calculate Strongly Connected Components of several graphs in parallel. Each graph has its own stream to offload the calculation and retrieve the result . For few small graphs everything works, but when I increase the number and size of the graphs, after some minutes of computation I got the following error:
#0 0x00007ffff631b3ea in ?? () from /lib/x86_64-linux-gnu/libcuda.so.1
#1 0x00007ffff628512b in ?? () from /lib/x86_64-linux-gnu/libcuda.so.1
#2 0x00007ffff6285498 in ?? () from /lib/x86_64-linux-gnu/libcuda.so.1
#3 0x00007ffff65103d7 in ?? () from /lib/x86_64-linux-gnu/libcuda.so.1
#4 0x00007ffff65108a2 in ?? () from /lib/x86_64-linux-gnu/libcuda.so.1
#5 0x00007ffff62b767e in ?? () from /lib/x86_64-linux-gnu/libcuda.so.1
#6 0x00007ffff651548d in ?? () from /lib/x86_64-linux-gnu/libcuda.so.1
#7 0x00007ffff626bbf0 in ?? () from /lib/x86_64-linux-gnu/libcuda.so.1
#8 0x00007ffff626c3c4 in ?? () from /lib/x86_64-linux-gnu/libcuda.so.1
#9 0x00007ffff626e019 in ?? () from /lib/x86_64-linux-gnu/libcuda.so.1
#10 0x00007ffff62dd959 in ?? () from /lib/x86_64-linux-gnu/libcuda.so.1
#11 0x000055555569d819 in __cudart601 ()
#12 0x00005555556709cd in __cudart738 ()
#13 0x00005555556c1d92 in cudaMemcpyAsync ()
The memory pointers involved in the cudaMemcpyAsync seems ok. If I process the graphs sequentially everything seems to work fine. Memcheck is not helping since it force every call to be synchronous making the whole process sequential.
Do you have any suggestion about how to debug this error?