Unified Memory Signal 139 Cuda 10.1

Hi all,

After upgrading to the CUDA toolkit 10.1 (and driver 418 included), we are having issues profiling. Originally we had the “no permissions” issue, which we fixed using the modprobe.d configuration fix.

Now we get a signal 139 if we profile any application that uses unified memory.
https://imgur.com/a/Xjdf7Ok
As you can see, it happens with or without root access.
It works if we disable unified memory profiling:
https://imgur.com/a/nvdAju3
It also works if we do not use unified memory at all:
https://imgur.com/a/csTIS5X

You can see that our runtime/configuration/driver all matches up:
https://imgur.com/a/papoRnR

We also tried the 430 driver, without success.

Full disclosure: I did have to patch the driver version 418 to get it to work with our 5.1.5 and now 5.1.8 kernel. I did not have to patch 430. The patches were not functional changes; simply changes to some of the function interfaces (i.e. change int to unsigned int).

You can find the contents of the patch here:
https://gist.github.com/tallendev/bdd3965313f01df2f48b2ade709e4931

I guess it’s possible I won’t be able to get help since the kernel does not match the driver. However, I don’t think the changes matter much. It seems like a deeper issue/ return of an old bug from cuda ~7/8, but i’m not sure. The last time it worked was on cuda 9.2.

If anyone has any suggestions, that would be great. Maybe this would better serve as a bug report.
Thanks.

Hi… I am having exact same issue with the same CUDA 10.1 toolkit. Were you able to get this resolved? Thank you for the reply.

Hi StereoGraphics,

May I ask you to give a try to the CUDA 10.2 toolkit? If you can wait, it’s be better to use CUDA 11, which will be available soon.

If this issue still occurs, having more details would help us to inspect the issue at our end. We need details about the GPU used, and a minimal reproducer.