Jetson - Driver persistence mode

Hello,
I have both jetson TX2 and Xavier boards with JetPack 4.3 installed.

I wanted to know if nvidia-smi tool isn’t part of the CUDA toolkit for jetson (TX2\Xavier)?

I wanted to activate the CUDA driver persistence mode as described here:
driver-persistence

Especially I wanted to set the persistence daemon mode.

If the nvidia-smi tool isn’t supported for the jetson:

  • Is there a plan to add it for future CUDA versions?

  • If not, are there any alternatives exist that I can use to achieve the persistence mode with my computing application only (without the need to add an additional Graphic application in background)?

Thanks,

“nvidia-smi” is dependent up on a PCI bus, and talks to a GPU via PCI (a discrete “dGPU”). The GPU on any Jetson is directly wired into the memory controller and does not use PCI (an integrated “iGPU”). Thus there is no possibility of nvidia-smi ever working on an iGPU which does not use PCI…the wiring and PCI structure is literally missing.

You might want to provide more detail on your problem you want to work around. In the case of being tied to a GUI (a “context”), this is usually related to the user space driver being loaded into the X11 application, and not into the kernel. There may be other cases (which I cannot describe) which directly access the GPU.

Many people are under the misunderstanding that X11 is a GUI environment. Think of X as really being an API defined interface to a buffer space which has certain characteristics which a video display would require. One does not necessarily have to have a GUI to run a virtual X server in which no real display exists as X and the GPU do not really care if there is a monitor attached, nor is there any requirement for such a buffer to have desktop software running on it. X itself does not provide logins, does not provide windowing, so on…but a desktop application such as Gnome or KDE can run on X and all of those properties are associated with that desktop manager or login manager…don’t run those applications, and the buffer is still there, but without a desktop environment running. X runs only one application, but if that application is a desktop, then suddenly you are able to run a full GUI…but it is the desktop doing this, not X.

I have no recommendations on a particular X virtual desktop, but others probably do. Someone else may also be able to mention how some applications may function without a $DISPLAY environment variable (which is what points at an X server instance, the “context”).

Hello,
Thanks for your detailed answer.

More details about my applications:
I’m running on Jetson Xavier AGX and Jetson TX2.
These applications are only computing applications without any graphics functionalities needs.
Each one of them includes a set of CUDA kernels and CPU C++ SW that manage them and their flow…

We have two problems:

  • Usually, after power up or restart, in order to achieve a deterministic average time execution for our GPU computing logic (CUDA kernels), we shall activate it multiple times.
    The amount of times required for the reactivation is changed from one logic (Set of CUDA kernels) to another.

  • Additionally sometimes (after every number of cycles which is not stable) we see that there is a misunderstood peak in time consumed that generates big standard deviation in our measured cycles times executions.
    So, we cannot plan a deterministic cycle time with worst case scenario of consumed time.

When we activate the jetson_clock bash file the second problem disappeared and we get a periodical deterministic cycle time with minor standard deviation between all measured cycles times.
But the first problem is still happened.

After I saw the driver persistence issue I thought that maybe this is the root cause of both problems and if we will use it they will be solved.

But then I found that I cannot activate it on Jetson.

Please provide your opinion about my problems…
Can you think on any other reason why they happens?
How to solve them?
Or work around them?

Someone else with more knowledge of this will need to answer, but from what I can tell via “the driver persistence doc” the answer will depend on whether the X server is running or not, and the case where X server is running would be simpler. Can someone comment on that document as it applies to the iGPU of a Jetson (and whether PCI function changes the application in that document)?

Keep in mind that running a virtual X server would simplify your life (and make available virtual desktop remote access) without requiring an actual HDMI monitor (the driver would load into this as described in the doc regardless of being dGPU/PCI versus iGPU). You can even run simultaneous virtual and actual X servers. Any virtual server could be a purely non-desktop environment without the windowing software. I am unsure if a headless environment setup for GPU driver persistence would be the same as in that document since this could depend on PCI functions (dGPU) if the document was unaware of your iGPU case.

You will almost always have issues if you want truly deterministic boot since neither hardware nor software are capable of hard realtime. I can see how having a carefully set up init (including reboot) would improve things, and I cannot answer, but you will want to provide details on what kind of hard/warm boot timings you are trying to improve, e.g., what the circumstances and requirements are (use-case perhaps).

You also mentioned working with both a TX2 and Xavier. Currently the TX2 uses U-Boot after CBoot, but on the Xavier U-Boot is not used and some U-Boot features were put into CBoot of the Xavier (which then boots directly from CBoot without U-Boot). You may need to separate your question into a TX2 case versus Xavier case if the environment prior to loading the Linux kernel is part of the consideration.

Hello,
Sorry for the late response.

We have no problem with system boot process and time.
Our two problems described above in my prevous response related to our application execution phase after system boot was already completed and OS started to work.

We start our application running, after a while stop it, start to debrif our generated logs and find out the following:

  • Startup latency -
    First activation of the CUDA kernels take much more time than those that come after.
    Somtimes, it required to activate the same CUDA kernels several times (dozens) till their time execution became stable with lower jitters.

  • Non deterministic time consuming -
    Every number of cycles (The number is dynamic) (CUDA kernels activations) there is a non deterministic peak in the time consumed that generates big standard deviation in our measured cycles times executions

When we activate the jetson_clock bash file the second problem disappeared and we get a periodical deterministic cycle time with minor standard deviation between all measured cycles times.
But the first problem is still happened.

Additionally,

  • The problems are not only Jetson issue but also reproduced with Desktop NVIDIA GPUs such as Quadro and GeForce.

  • Additionally, we saw that there is a direct relation between the jitters intensity and CPU usage.
    As much as our logic is implemented by CUDA and C++ the jitters intensity reduced and vice versa and as much as we have a synchronization points between CUDA and C++ in our logic, again the intensity increase……

I won’t be able to answer, but I suspect that setup time to first use of a CUDA kernel is related to memory controller performance, and perhaps also initial setup of the GPU. What follows once things are running should be closer to deterministic (but never fully deterministic). In cases where memory is shared with the CPU there would be additional variance. This is most likely true to some extent on both PCIe (dGPU) and iGPU variants, but a desktop PC tends to have higher memory performance, and the memory of a dGPU is actually integrated with the dGPU, whereas the iGPU shares with the CPU (making iGPU less predictable for timing related to memory transfers).

Other than nvpmodel and jetson_clocks I don’ t know of any good way to improve on this. Someone from NVIDIA with better knowledge of the underlying drivers and hardware will probably need to answer.

Can someone from NVIDIA give any updates on this ?

Is there an option (config file, some tool, etc) to use driver persistence mode on Jetson that we are missing ?
Or is it possible but not implemented yet ?
Or is it impossible at all ?