Under DriveOS 6.0.8.1 CUDA operations are not completely deterministic

Software Version
DRIVE OS Linux 6.0.8.1 and DriveWorks 5.10

Hardware Platform
DRIVE AGX Orin

Hi,

In the previous thread @VickNV revealed a quite important detail of the CUDA ↔ NvMedia interaction.

It is super important for us to understand what is going on under the hood as there is some strange behavior of our code under DriveOS 6.0.8.1 which we don’t understand yet.

We have a visual validation pipeline that loads a video, processes it and produces a quality score in the end. During the processing we use NvMedia and CUDA operations as well. This score is deterministic under DriveOS 5, every time we run the pipeline we get the very same score, all the images are the same, they are pixel-perfect in the consecutive runs.

Under DriveOS 6 we get a deterministic score when we don’t use CUDA operations on the images. But whenever CUDA is involved the end score fluctuates slightly. The images look good, there is nothing visibly broken, but the images from the consecutive runs are not pixel-perfect, not exactly the same. The difference is subtle, with a human eye it is not noticeable, but still, they are not pixel perfect same.

Is it possible that the padded image content (between width and pitch) somehow affect CUDA operations, NvMedia operations or the encoder? We saw non-deterministic pipeline score under DriveOS 5 as well previously, it was caused by the memory junk if the height of the video was not dividable by 16.

We don’t have an explanation yet so we are brainstorming what can cause the slight end score fluctuation under DriveOS 6 when CUDA operations are involved, an unexpected side-effect of this changed pitch seems to be a good candidate. Do you have any idea what we should double-check? We already double and triple-checked the synchronization, it must not be the problem.

Thank you,
Adam

How do you calculate the quality score? Is this method applicable to any of our sample applications to reproduce the issue? Additionally, have you tried using Nsight to observe whether it provides any insights into the root cause of the score fluctuation?