HDMI was flicker on demo kit when running stress on gpu

Wayne.

This week , we had updated patch on R32.3.1 and running . Ths issue was happen on 1~2 day. Can you help to check it.

Bill

Hi Bill,

Could you use rel-32.4.2 and use that patch and see if you could still hit it?

Also, please do check the dmesg and make sure it is same error or new issue.

Hi WayneWWW,

We are testing R32.4.2 with the patch.

However, the nvgpu.ko from we built is about 88M.
The original nvgpu.ko is around 2.4M.

We did get the same size difference on R32.4.2 and R32.3.1.

Does you have the same size difference?

Thank you,

Hi HuiW,

That big size difference is because you didn’t strip the symobol of your nvgpu.ko.

https://docs.nvidia.com/jetson/l4t/index.html#page/Tegra%2520Linux%2520Driver%2520Package%2520Development%2520Guide%2Fkernel_custom.html%23wwpID0EUHA

Hi WayneWWW,

Thank you for your prompt support.
So the ko file with symbol should not affect the patch test, right?

Thanks,

It should not. But better stripping it.

dear Wayne.

 We had try to use R32.4.2(update patch) and same issue was happen.

Bill

Hi Bill,

As I said, please also share the dmesg so that we can prove this issue is same as before.
Also, how much is the repro rate? How long it took to hit this issue?

error.log (109.7 KB)

Dear Wayne.

For R34.4.2, we  was duplication ths issue on the one day.

   Bill .

Hi Bill,

Thanks for sharing. Also want to make sure you’ve replaced the nvgpu.ko, right?

dear Wayne.

yes, we use new nvgpu.ko to test and test reult.

Bill

We will check. Thanks for report.

Hi Bill,

We verified this patch on rel-32.4.2 again. Ran for 22 hours and no error is shown.
I notice your device is in very high temperature. Would you might put a fan there and try again? I don’t want any throttling affects the test.

Also, could you add a print in the nvgpu so that we can confirm that you really replace the kernel module.

Hi Bill and HuiW,

Have you checked the temperature issue?

dear Wayne.

We had put a fan and reduce temperature . but the issue was haapen on 2 day .
Billerror.log (128.3 KB)

Hi Bill,

Does this always need to take 2 days to reproduce? So far we have not tried such long time before.

dear Wayne.

We had try again on 5 hours and duplicate it (put on Fan) . 

Bill

Hi Bill,

Could you try this issue and patch on multiple Nano device? For example, B01/A02 and production module. It is fine with custom carrier board for this test.

I just checked and notice our test is also running for 2 days but no issue.

dear Wayne.

we had try two B01 module and duplication same issue . We use run 4H (power mode=MaxN) and can be duplication it.

Bill

1 Like

Do you have A02 module to do the test?