SLURM and CUDA. cudaGetDeviceCount returned 30


I have a cluster with 4 gpus nodes where I use slurm to run jobs. I already have cuda working in each node. My problem is where I want run cuda programs from main node.

I can login in a node using ssh or by slurm srun comand.

Using SSH, CUDA works good like normal user and root root.

But if I enter in a node opening a session using “srun -w node1 --pty bash” with a user, I get this:

./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

cudaGetDeviceCount returned 30
-> unknown error
Result = FAIL

Also if I get the same error using “srun -w node1 ./deviceQuery”. This form is equivalent but without entering in the node.