CUDA Initialization Issue: cudaGetDeviceCount returned 3 on g4dn.12xlarge with NVIDIA Driver 560.x and CUDA 12.6

Hello NVIDIA Community,

I’m encountering a CUDA initialization issue on my ec2 g4dn.12xlarge with NVIDIA Driver 560.x and CUDA 12.6 and would greatly appreciate any assistance in resolving it.

System Configuration:

  • Server Model: AWS g4dn.12xlarge
  • Operating System: Ubuntu 24.04
  • Architecture: x86_64
  • NVIDIA Driver Version: 560.35.03
  • CUDA Version: 12.6.77
  • cuDNN Version: (Specify the version if applicable)
  • GPUs: 4

Problem Description:

After installing the NVIDIA driver and CUDA 12.6 using the runfile method. I attempted to run the deviceQuery sample from the CUDA sample code to verify that everything is set up correctly. Unfortunately, I’m getting the following error:

Copy code

./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

cudaGetDeviceCount returned 3
-> initialization error
Result = FAIL

Troubleshooting Steps I’ve Tried:

  1. Followed Post installation actions In the CUDA Toolkil documentation
  • Run echo $PATH and echo $LD_LIBRARY_PATH to make sure it includes the cuda path.
  1. Verified the NVIDIA Driver Installation:
  • Ran nvidia-smi and nvcc -V to ensure the driver is installed and recognized the GPUs correctly. Everything appears normal in the output.
  1. Checked for Compatibility:
  • Confirmed that CUDA 12.6 is compatible with the NVIDIA driver version 560.x.

Request for Assistance:

Despite these efforts, the issue persists. I would greatly appreciate any insights or suggestions you can provide on how to resolve this issue.

  • Are there specific logs or diagnostic steps that I should check to identify why CUDA is failing to initialize?

Thank you in advance for your help!

Hello @allison2, I ran into the same problem as you in a g4dn.xlarge with slightly different driver versions and managed to solve it with a simple fix (after many hours of despair).

System Configuration:

  • Server Model: AWS g4dn.xlarge
  • Operating System: Amazon Linux 2
  • Architecture: x86_64
  • NVIDIA Driver Version: 565.57.01
  • CUDA Version: 12.6.85
  • cuDNN Version: (Specify the version if applicable)
  • GPUs: 1

Solution:
The solution that worked for me was switching driver module flavours from open to proprietary. I initially overlooked the Driver Installation step, as the installation of the CUDA Toolkit was apparently correct following the steps, but it showed the same error as you mentioned while running deviceQuery.

Now with proprietary drivers installed CUDA works just fine.

Let me know if this helps or if you found another solution!

1 Like

Hi Jimmy,

Thanks for the tip. I tried downgrading my driver from 560.35.03 to 550.54.14 and cuda version from 12.6 to 12.4. and it works!