I am currently working on accessing GPU inside the container.
These are my setup details.
host machine SGI UV100 and Tesla S2050(contains 4 M2050 GPUs) GPU is
attached to it.OS on host machine is RHEL6.1 and CUDA4.0 toolkit installed
I have succeeded in accessing the GPU from inside the container and
I am able to run the CUDA benchmark test cases(deviceQuery, matrixMul, nvidia-smi -q ,etc) successfully from inside
the container, when I give access to 4 GPU devices to the container .
i.e when these devices /dev/nvidia0 /devnvidia1 /dev/nvidia2 /dev/nvidia3 /dev/nvidiactl
are allowed in Container.
I want to share GPUs between containers.So I tried running the container by only giving access to two GPU devices and ctrl device.
i.e I have started container with only deices /dev/nvidia0 /dev/nvidia1 /dev/nvidiactl are allowed in it. GPUs 3 and 4 are deliberately not allowed to access from inside of this container(i.e creating /dev/nvida3(or)4 inside the container fails with kernal error.It is expected).
When I do this, I am not able to run any CUDA testcases or use nvidia-smi successfully.
Querying only GPU 1 using the command
nvidia-smi -q -i 0
gives error saying /dev/nvidia2 is not available.(Why is it trying to access nvidia2 when I am querying nvidia0 …??)
./deviceQuery gives error saying cudaDeviceCount failed with mismatch runtime version blababa…
./matriMul says cudaSafeCall: invalud device ordinal.
All I can observe is that the CUDA driver is trying to access all the 4 GPUs and hence getting failed.
Please let me know if is it possible to hide some GPU’s from a MULti-GPU hardware? If so please let me know.
Also please let me know does there something any handshake necessary between all the 4 GPU to use even one GPU.
Your time for reply is appreciated.
Thanks in advance