Installing the GPU operator on SELinux enforced-nodes

Hi,

I have an issue with installing the GPU operator on nodes with enforced SELinux.

SELinux is enabled on the nodes, but disabled in containerd. As a result, the domains are not propagated to the containers. If we check the drivers’ directory on the node, the video folder is of type modules_object_t (the type needed by kernel modules):

$ ls -Z /usr/lib/modules/5.19.9-200.fc36.x86_64/kernel/drivers
...
system_u:object_r:modules_object_t:s0 video
...

While on the pod (nvidia-driver-daemonset pods) the same folder is of type var_lib_t:

$ ls -Z /usr/lib/modules/5.19.9-200.fc36.x86_64/kernel/drivers/
...
system_u:object_r:var_lib_t:s0 video
...

When trying to install the gpu-operator, the installation fails, and the error on the node is:

[ 5039.462792 audit: type=1400 audit(1684925801.740:1438): avc:  denied  { module_load } for  pid=189114 comm="modprobe" path="/usr/lib/modules/5.19.9-200.fc36.x86_64/kernel/drivers/video/nvidia.ko" dev="overlay" ino=95506836 scontext=system_u:system_r:unconfined_service_t:s0 tcontext=system_u:object_r:var_lib_t:s0 tclass=system permissive=0

The error tells me that unconfined_service_t is not allowed to do module_load on type var_lib_t. This is the expected behavior because modules_object_t is needed to denote it is loadable. But because containerd is not propagating the types into the pods, I’m losing the modules_object_t type on the drivers and cannot load them.

One easy fix is to create a policy: allow unconfined_service_t var_lib_t:system module_load. This is better than running SELinux permissive but still gives more permissions than required. So I was wondering if there is some other way to solve this. Enabling SELinux on containerd is not an option.

If I run chcon -t modules_object_t /usr/lib/modules/5.19.9-200.fc36.x86_64/kernel/drivers/video before nvidia-driver init the installation succeeds, but there is no easy way to do this using values file.

Versions used:

  • Kubernetes v1.25
  • Fedora Core OS 36
  • containerd 1.7.1
  • gpu-operator 22.9.1

Any ideas would be highly appreciated.

Best regards, Diana.

1 Like

Hello Diana,

I tried to find a solution for your scenario, but unfortunately had no luck. It looks more complex and requires some reproduction.

If you have actual Enterprise Support contract, please open the case and we’ll do our best to reproduce the issue and engage experts to look for a fix or workaround to avoid the issue with SELinux.

Regards,
Vladislav

We have encountered the same issue.
OS: RHEL 8.8
Kubernetes: v1.24.9
containerd: 1.6.15
gpu-operator: v23.3.2