Remote visualization error (GPU cluster)

We are having a GPU cluster installed in our lab. It has one master node and three compute nodes. Each compute node is equipped with two tesla K40 GPU cards. The installation of the cuda toolkit, nvidia drivers are completed.

Our task is to do remote visualization and we have found the following white paper by NVIDIA,
www.nvidia.com/content/PDF/remote-viz-tesla-gpus.pdf

So, we have connected one of the compute node remotely from a machine (having GeForce card) and tried to run the simulations examples (such as, smoke particles, oceanFFT etc.), those came with cuda toolkit. But it is showing following error,

[b][root@compute-0-3 release]# ./smokeParticles
CUDA Smoke Particles Starting…

NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.

Loaded ‘…/…/…/…/5_Simulations/smokeParticles/data/floortile.ppm’, 256 x 256 pixels
CUDA error at ParticleSystem_cuda.cu:91 code=10(cudaErrorInvalidDevice) “cudaMalloc3DArray(&noiseArray, &channelDesc, size)” [/b]

Please help
thanks

in some cases, cudaErrorInvalidDevice may be interpreted as “no device”, or “wrong device”

is “@compute-0-3 release]” the workstation "(having GeForce card) " or the remote node?

what @compute-0-3 release is, would help to know how to interpret this

Jimmy,
Thanks for replay,

no, ‘compute-0-3’ is the compute node in the cluster, which is having two tesla k40 cards (i.e. remote node) and the workstation from which we have accessed that compute node remotely is having (NVIDIA GeForce) card.

[root@compute-0-3 release]

  • root is the user through which we have logged in the compute node
  • compute-0-3 is the node name
  • release is the directory in which smokeparticale and other simulation program binaries are there

so, have you logged in to the compute node locally, or remotely?

if you have logged in to the compute node locally, and you get said error, i would double check that the gpus are picked up, and related things like proper exporting of paths

if you have logged in to the compute node remotely, how did you do this?
would ./smokeParticles not simply mean run said application on local machine
what strikes me is that i then do not follow how this would also automatically bring about the rendering on your remote station

Jimmy,

We have logged into the compute node remotely.
We are using ssh (secure shell) to connect remote node with -X option to xforward the display output.

have you looked at the x server config file?

for which device did you compile the sample?

Jimmy,

Can you please tel me, specifically which detail should I check in xserver config file ?

i may be vastly ignorant, but i perceive sections:

SETTING UP THE X SERVER FOR HEADLESS
OPERATION

TESTING THE HEADLESS X OPERATION

from the very document that you linked, to be pertinent to your case, and a good troubleshooting/ starting point

Thanks Jimmy,

I shall surly check our Xserver settings on compute node against these two sections and let you know the results.

Remote Viz in a CUDA/OpenGL interop environment (smokeParticles is an interop code) can be challenging.

Based on what you’ve described here, it doesn’t look like you’ve properly followed the steps in the document you linked to, e.g. pages 18 and 19. For example, you don’t indicate you are using vglrun.

This related thread, as well as the stackoverflow article linked in it, may be of interest:

https://devtalk.nvidia.com/default/topic/830917/installation-of-cuda-on-rhel-6-with-turbovnc-and-virtualgl/