gk20a error

How many times of reset is for this test?

Hello WayneWWW,
“tegradc tegradc.0: Display timing doesn’t meet restrictions.” is from edp port.
After tk1 reads edp port monitor edid, it trys to determine the output timing to the edp port monitor.
It will use check_mode_timings() to check if this timing is OK.
If this timing is not OK, it will print “tegradc tegradc.0: Display timing doesn’t meet restrictions.” and try other timing.

We also have message “tegradc tegradc.0: Display timing doesn’t meet restrictions.” when the system is OK (both monitors display, no gk20a message).

ps. my colleague done 300 reset test, it happened once in the 265th test. I also done 300 reset test, it didn’t happen. It is very low probability.

Thanks!

Not sure what is your system design(how many display is there?). If display timing is invalid, sometimes it could work and sometimes it would not.

Does this error also happen when no display is connected?

Hello WayneWWW,
We connect two monitors on K1’s hdmi port and edp port.
ps1. If display timing is invalid, it will only cause edp port monitor NG, hdmi port monitor should be normal still.
ps2. Is our “fifo_gk20a.c”, “gr_gk20a.h” correct?

Thanks!

Yes, your files are same as mine.

Hi Tim,

Please also add following patch for this.

---

diff --git a/drivers/gpu/nvgpu/gk20a/pmu_gk20a.c b/drivers/gpu/nvgpu/gk20a/pmu_gk20a.c
index 3abf1b4..c23d83c 100644
--- a/drivers/gpu/nvgpu/gk20a/pmu_gk20a.c
+++ b/drivers/gpu/nvgpu/gk20a/pmu_gk20a.c
@@ -1227,7 +1227,7 @@
 	u32 data, owner, max_retry;
 
 	if (!pmu->initialized)
-		return -EINVAL;
+		return 0;
 
 	BUG_ON(!token);
 	BUG_ON(!PMU_MUTEX_ID_IS_VALID(id));
@@ -1296,7 +1296,7 @@
 	u32 owner, data;
 
 	if (!pmu->initialized)
-		return -EINVAL;
+		return 0;
 
 	BUG_ON(!token);
 	BUG_ON(!PMU_MUTEX_ID_IS_VALID(id));

Hello WayneWWW,
Thanks! We will try this patch.

Hello WayneWWW,
Our pmu_gk20a.c seem different. Our kernel code is based on Git tag: tegra-l4t-r21.6
Is our pmu_gk20a.c correct?

Thanks!
pmu_gk20a.c (94.4 KB)

Do you have a conflict? Please base on the previous patch I gave and apply the new one.

Hello WayneWWW,
We can build zImage sucessfully. But here are our patch:

diff --git a/kernel/drivers/gpu/nvgpu/gk20a/pmu_gk20a.c b/kernel/drivers/gpu/nvgpu/gk20a/pmu_gk20a.c
index 4899319…2691013 100644
— a/kernel/drivers/gpu/nvgpu/gk20a/pmu_gk20a.c
+++ b/kernel/drivers/gpu/nvgpu/gk20a/pmu_gk20a.c
@@ -1157,7 +1157,7@@ int pmu_mutex_acquire(struct pmu_gk20a *pmu, u32 id, u32 *token)
u32 data, owner, max_retry;

if (!pmu->initialized)
  •   return -EINVAL;
    
  •   return 0;
    

    BUG_ON(!token);
    BUG_ON(!PMU_MUTEX_ID_IS_VALID(id));
    @@ -1226,7 +1226,7@@ int pmu_mutex_release(struct pmu_gk20a *pmu, u32 id, u32 *token)
    u32 owner, data;

    if (!pmu->initialized)

  •   return -EINVAL;
    
  •   return 0;
    

    BUG_ON(!token);
    BUG_ON(!PMU_MUTEX_ID_IS_VALID(id));
    =======================================================================================================
    the line is 1157 and 1226. But your patch is line 1227 and 1296. So I am wondering if there is anything wrong with our pmu_gk20a.c. Our pmu_gk20a.c is as attachment file in comment #28

Thanks!

HI Tim2016,

Sorry for this. Our engineer picks some changes from rel-24.2.1 kernel. If it can be built, I think it is fine.

Hi Tim2016,

Have you resolved the differ issue? Any result can be shared?

Thanks

Hello,
We are still testing it. We will let you know the result later.

Thanks!

Hello,
Wa still have gk20a error, please refer to the attachment file “log_20171228_gk20a.txt”.

Thanks!
log_20171228_gk20a.txt (52.8 KB)

It looks like a new error again.

Since we cannot get a convergence of this bug, I went through this topic again and had following questions:

  1. Does your test environment on devkit or custom board? In previous comment, you said you are on customized carrier board, could you move back to devkit and retest? If it can be reproduced, could you give out your test method or scripts?

  2. The latest error message, does it cause system to fail?

  3. You already had a 1/700 times test result in second attempt, how many times will be needed to call a “passed” test?? I believe tk1 did not have such test internally.

  4. Does every tk1 module you have hit this error? Or only a specific one?

Hello WayneWWW,

  1. Does your test environment on devkit or custom board? In previous comment, you said you are on customized carrier board, could you move back to devkit and retest? If it can be reproduced, could you give out your test method or scripts?
    We are on customized carrier board.

  2. The latest error message, does it cause system to fail?
    The two monitors connected to our board don’t display, but we can enter the shell through UART. We can reopen the system by pressing the reset key.

  3. You already had a 1/700 times test result in second attempt, how many times will be needed to call a “passed” test?? I believe tk1 did not have such test internally.
    In the beginning, we only did 20 tests for “power off → power on” and 20 tests for “press hardware reset key”.
    But we encountered this problem once in one test. So we try to fix this problem and add number of tests to make sure this promblem is resolved completely.
    It’s not an urgent issue, we just want to do the best if you have the solution.

Besides, until now, the patch in comment #18 seems the best. Do you recommend us to use this patch?
(the patch in comment #26 just return 0 instead of error number, doest it better than the patch in comment #18?)

  1. Does every tk1 module you have hit this error? Or only a specific one?
    It happens on more than one customized boards we have.

Thanks!

Hi Tim2016,

This is better to reproduce your issue on devkit, so that our dev team can have fully support.
Please select the most stable patch you’ve tried. Thanks.