I’m trying to debug a program from remote by using nsight debugger on my host PC(Ubuntu 16.04). But it’s not able to step over any single CUDA function while it shows a fatal error message shown below:
The CUDA driver initialization failed. (error code = CUDBG_ERROR_INITIALIZATION_FAILURE(0x14))
However other libraries I use are ok to step over such as OpenCV.
I have prepared everything by SDK Manager (JetPack 4.4 DP). Do I need to do something else to solve this problem?
To rule out driver mismatch issues, can you run your application, or any other cuda app, outside of the debugger and see if that works?
What is your remote target?
My remote target is Jetson Xavier NX module. And I have run a CUDA sample (oceanFFT) and my application as well on NX module locally, it turns out they work well without any error.
Also I have checked my CUDA version on host PC and NX by [nvcc --version]
, both show the same version number which is V10.2.89.
Is that possible that debugging on a production module is not feasible?
In addition, I can’t see any time line from profiling tool, so except for simply running applications, it’s not able to use debugging tool and profiler for debugging purpose.
Further error I found :
When I checked CUDa toolkit path with [browse…] button, I found out there’s an error shown below the icon of cuda-gdb saying it has an operation failed.
From the screenshot above , i can see that you have the CUDA Software preemption enabled. That a beta feature not needed on NX.
Can you try the following :
- Disable CUDA software preemption
- Run the following command on Target before debugging :
sudo chmod a+rw /dev/nvhost-dbg-gpu /dev/nvgpu-pci/*
Hi neel, thank you for your reply!
I’ve tried to run
sudo chmod a+rw /dev/nvhost-dbg-gpu /dev/nvgpu-pci/* on NX, but it shows a message :
cannot access ‘/dev/nvgpu-pci/*’: No such file or directory.
And then I go to check this path where there’s no nvgpu-pci directory in /dev. So, does it mean I’ve missed something during installation?
Its ok , you didnt miss anything. The drive exists on Drive which has an iGPU and a dGPU as compared to Jetson which only has iGPU.
Can you please try running.
sudo chmod a+rw /dev/nvhost-dbg-gpu after disabling CUDA software preemption?
It works! After I’ve tried running
sudo chmod a+rw /dev/nvhost-dbg-gpu , now I can debug both my CUDA application and CUDA samples(I test it with OceanFFT) without any problem. It has solved the problem of CUDA driver initialization failed. Pleased beyond words!
However, I’ve got another error about Profiling tool. When I profile my CUDA app, it shows nothing in perspective window, but showing an warning message(ERR_NVGPUCTRPERM - The user does not have permission to profile on the target device.) in the console.