Hello NVIDIA Community,
I’m encountering a CUDA initialization issue on my ec2 g4dn.12xlarge with NVIDIA Driver 560.x and CUDA 12.6 and would greatly appreciate any assistance in resolving it.
System Configuration:
- Server Model: AWS g4dn.12xlarge
- Operating System: Ubuntu 24.04
- Architecture: x86_64
- NVIDIA Driver Version: 560.35.03
- CUDA Version: 12.6.77
- cuDNN Version: (Specify the version if applicable)
- GPUs: 4
Problem Description:
After installing the NVIDIA driver and CUDA 12.6 using the runfile method. I attempted to run the deviceQuery
sample from the CUDA sample code to verify that everything is set up correctly. Unfortunately, I’m getting the following error:
Copy code
./deviceQuery Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
cudaGetDeviceCount returned 3
-> initialization error
Result = FAIL
Troubleshooting Steps I’ve Tried:
- Followed Post installation actions In the CUDA Toolkil documentation
- Run
echo $PATH
andecho $LD_LIBRARY_PATH
to make sure it includes the cuda path.
- Verified the NVIDIA Driver Installation:
- Ran
nvidia-smi
andnvcc -V
to ensure the driver is installed and recognized the GPUs correctly. Everything appears normal in the output.
- Checked for Compatibility:
- Confirmed that CUDA 12.6 is compatible with the NVIDIA driver version 560.x.
Request for Assistance:
Despite these efforts, the issue persists. I would greatly appreciate any insights or suggestions you can provide on how to resolve this issue.
- Are there specific logs or diagnostic steps that I should check to identify why CUDA is failing to initialize?
Thank you in advance for your help!