I am working on a g5.2xlarge EC2 instance with Ubuntu 24.04. I am trying to get CUDA working on it but I am constantly running into returned 3-> initialization error
.
I installed CUDA toolkit version 12.6.3 and nvidia-driver version 560.35.05 using the runfile.
This is the nvidia-smi output
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 560.35.05 Driver Version: 560.35.05 CUDA Version: 12.6 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA A10G Off | 00000000:00:1E.0 Off | 0 |
| 0% 22C P0 54W / 300W | 1MiB / 23028MiB | 12% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
After installation, when I am trying to run deviceQuery
present in cuda-samples, I get the following error:
cudaGetDeviceCount returned 3
-> initialization error
Result = FAIL
I tried turning on persistence mode as well for the gpu
sudo nvidia-smi -pm 1
, but I get the same error again.
What could be the problem and how should I proceed?