After upgrading to the CUDA toolkit 10.1 (and driver 418 included), we are having issues profiling. Originally we had the “no permissions” issue, which we fixed using the modprobe.d configuration fix.
Now we get a signal 139 if we profile any application that uses unified memory.
As you can see, it happens with or without root access.
It works if we disable unified memory profiling:
It also works if we do not use unified memory at all:
You can see that our runtime/configuration/driver all matches up:
We also tried the 430 driver, without success.
Full disclosure: I did have to patch the driver version 418 to get it to work with our 5.1.5 and now 5.1.8 kernel. I did not have to patch 430. The patches were not functional changes; simply changes to some of the function interfaces (i.e. change int to unsigned int).
You can find the contents of the patch here:
I guess it’s possible I won’t be able to get help since the kernel does not match the driver. However, I don’t think the changes matter much. It seems like a deeper issue/ return of an old bug from cuda ~7/8, but i’m not sure. The last time it worked was on cuda 9.2.
If anyone has any suggestions, that would be great. Maybe this would better serve as a bug report.