Problem with not getting accurate results with CUDA programs

We have some PCs with GPU cards installed where multiple users can log in to those PSc using SSH and execute their CUDA programs

The problem we are having is when executing the CUDA programs the PCs get stuck and the results of the CUDA programs are not accurate. Some times the results gets ZERO for all the users. The problem is not solved even if we restart the PCs

Thanks in advance


Tharindu Gamage

I assume the PC runs Linux.
Is the nvidia kernel module loaded and are the /dev/nvidia* device files properly set up? Check the installation notes for an example script to ensure this at boot time.

You should also be sure to check the error codes returned by CUDA functions. Then you can discover problems directly without having to infer it from incorrect results.

Assuming the problem is not related to the software you are running (easiest way to check - run some of the SDK examples): does a hard reset fix the problem?

I’ve run into cases where my gpu starts producing incorrect results that a soft-reset won’t fix, but a hard reset will.

The PC runs Ubuntu, I checked /dev/ folder there are no such files or folders having the name nvidia*

OK, this is the problem. Without these files, the CUDA library cannot communicate with the CUDA devices. Your users jobs have not been doing any calculation with the GPU (and they really, really should be checking return codes!). You should look at the Linux Release Notes to see what commands are required to recreate these files at boot time. If you run X, they are created for you.