Accessing GPU inside the lxc(linux container)

Hi

I am currently working on accessing GPU inside the container.

These are my setup details.
host machine SGI UV100 and Tesla S2050(contains 4 M2050 GPUs) GPU is
attached to it.OS on host machine is RHEL6.1 and CUDA4.0 toolkit installed

I have succeeded in accessing the GPU from inside the container and
I am able to run the CUDA benchmark test cases(deviceQuery, matrixMul, nvidia-smi -q ,etc) successfully from inside
the container, when I give access to 4 GPU devices to the container .
i.e when these devices /dev/nvidia0 /devnvidia1 /dev/nvidia2 /dev/nvidia3 /dev/nvidiactl
are allowed in Container.

I want to share GPUs between containers.So I tried running the container by only giving access to two GPU devices and ctrl device.
i.e I have started container with only deices /dev/nvidia0 /dev/nvidia1 /dev/nvidiactl are allowed in it. GPUs 3 and 4 are deliberately not allowed to access from inside of this container(i.e creating /dev/nvida3(or)4 inside the container fails with kernal error.It is expected).

When I do this, I am not able to run any CUDA testcases or use nvidia-smi successfully.
Querying only GPU 1 using the command

nvidia-smi -q -i 0

gives error saying /dev/nvidia2 is not available.(Why is it trying to access nvidia2 when I am querying nvidia0 …??)

Also running
./deviceQuery gives error saying cudaDeviceCount failed with mismatch runtime version blababa…

./matriMul says cudaSafeCall: invalud device ordinal.

All I can observe is that the CUDA driver is trying to access all the 4 GPUs and hence getting failed.

Please let me know if is it possible to hide some GPU’s from a MULti-GPU hardware? If so please let me know.

Also please let me know does there something any handshake necessary between all the 4 GPU to use even one GPU.

Your time for reply is appreciated.

Thanks in advance

Regards
Devendra

Hi

I am currently working on accessing GPU inside the container.

These are my setup details.
host machine SGI UV100 and Tesla S2050(contains 4 M2050 GPUs) GPU is
attached to it.OS on host machine is RHEL6.1 and CUDA4.0 toolkit installed

I have succeeded in accessing the GPU from inside the container and
I am able to run the CUDA benchmark test cases(deviceQuery, matrixMul, nvidia-smi -q ,etc) successfully from inside
the container, when I give access to 4 GPU devices to the container .
i.e when these devices /dev/nvidia0 /devnvidia1 /dev/nvidia2 /dev/nvidia3 /dev/nvidiactl
are allowed in Container.

I want to share GPUs between containers.So I tried running the container by only giving access to two GPU devices and ctrl device.
i.e I have started container with only deices /dev/nvidia0 /dev/nvidia1 /dev/nvidiactl are allowed in it. GPUs 3 and 4 are deliberately not allowed to access from inside of this container(i.e creating /dev/nvida3(or)4 inside the container fails with kernal error.It is expected).

When I do this, I am not able to run any CUDA testcases or use nvidia-smi successfully.
Querying only GPU 1 using the command

nvidia-smi -q -i 0

gives error saying /dev/nvidia2 is not available.(Why is it trying to access nvidia2 when I am querying nvidia0 …??)

Also running
./deviceQuery gives error saying cudaDeviceCount failed with mismatch runtime version blababa…

./matriMul says cudaSafeCall: invalud device ordinal.

All I can observe is that the CUDA driver is trying to access all the 4 GPUs and hence getting failed.

Please let me know if is it possible to hide some GPU’s from a MULti-GPU hardware? If so please let me know.

Also please let me know does there something any handshake necessary between all the 4 GPU to use even one GPU.

Your time for reply is appreciated.

Thanks in advance

Regards
Devendra

Please let me know if this isn’t right place to post

Regards
Devendra