What is unique about the Nvidia container runtime? Does CDI replace it?

I have been trying to understand what is unique about the Nvidia container runtime. The container toolkit installs it as a possible alternate runtime in docker or containerd config. It appears to be a modified version of runc.

What is “modified” about it? What does it do that regular upstream runc does not?

Relatedly, version 1.12.0 and above support using container device interface; does using CDI obviate the forked runtime (which would be great)?

I have a few more questions on containers; will put them in their own topics

Hi,

We need to check with our internal team.
Will update more info with you later.

Thanks.

1 Like

I found this excellent post from 2.5 years ago on a containerd issue. Post is by @klueska1 ; is it still up to date? And what is the long term goal. Is it to focus on CDI, or keep this alive? I imagine this is a maintenance headache, as the interface has to match the runc one precisely?

Hi,

Here is the info from our internal team.

The NVIDIA Container Runtime is a wrapper around runc ​ and does not replace the low-level runtime such as runc​.
It makes modifications to the incoming OCI runtime specification before invoking runc.

Moving to CDI could allow the NVIDIA Container Runtime to be replaced, assuming that a container runtime that supports CDI is being used.
Note that the NVIDIA Container Toolkit includes an nvidia-ctk ​ command line tool that allows CDI specs to be generated for supported NVIDIA platforms with support for Jetson systems being added in version 1.14.0.

Thanks.

This is pretty helpful, thank you.

The point about CDI support is especially important. Podman mostly does already, but containerd only partially. Client-side as of 2.0, not sure about server-side.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.