gk20a error

WayneWWW · December 14, 2017, 3:07am

How many times of reset is for this test?

Tim2016 · December 14, 2017, 3:43am

Hello WayneWWW,
“tegradc tegradc.0: Display timing doesn’t meet restrictions.” is from edp port.
After tk1 reads edp port monitor edid, it trys to determine the output timing to the edp port monitor.
It will use check_mode_timings() to check if this timing is OK.
If this timing is not OK, it will print “tegradc tegradc.0: Display timing doesn’t meet restrictions.” and try other timing.

We also have message “tegradc tegradc.0: Display timing doesn’t meet restrictions.” when the system is OK (both monitors display, no gk20a message).

ps. my colleague done 300 reset test, it happened once in the 265th test. I also done 300 reset test, it didn’t happen. It is very low probability.

Thanks!

WayneWWW · December 14, 2017, 3:47am

Not sure what is your system design(how many display is there?). If display timing is invalid, sometimes it could work and sometimes it would not.

Does this error also happen when no display is connected?

Tim2016 · December 14, 2017, 3:58am

Hello WayneWWW,
We connect two monitors on K1’s hdmi port and edp port.
ps1. If display timing is invalid, it will only cause edp port monitor NG, hdmi port monitor should be normal still.
ps2. Is our “fifo_gk20a.c”, “gr_gk20a.h” correct?

Thanks!

WayneWWW · December 14, 2017, 4:28am

Yes, your files are same as mine.

WayneWWW · December 19, 2017, 10:09am

Hi Tim,

Please also add following patch for this.

---

diff --git a/drivers/gpu/nvgpu/gk20a/pmu_gk20a.c b/drivers/gpu/nvgpu/gk20a/pmu_gk20a.c
index 3abf1b4..c23d83c 100644
--- a/drivers/gpu/nvgpu/gk20a/pmu_gk20a.c
+++ b/drivers/gpu/nvgpu/gk20a/pmu_gk20a.c
@@ -1227,7 +1227,7 @@
 	u32 data, owner, max_retry;
 
 	if (!pmu->initialized)
-		return -EINVAL;
+		return 0;
 
 	BUG_ON(!token);
 	BUG_ON(!PMU_MUTEX_ID_IS_VALID(id));
@@ -1296,7 +1296,7 @@
 	u32 owner, data;
 
 	if (!pmu->initialized)
-		return -EINVAL;
+		return 0;
 
 	BUG_ON(!token);
 	BUG_ON(!PMU_MUTEX_ID_IS_VALID(id));

Tim2016 · December 20, 2017, 1:03am

Hello WayneWWW,
Thanks! We will try this patch.

Tim2016 · December 20, 2017, 1:46am

Hello WayneWWW,
Our pmu_gk20a.c seem different. Our kernel code is based on Git tag: tegra-l4t-r21.6
Is our pmu_gk20a.c correct?

Thanks!
pmu_gk20a.c (94.4 KB)

WayneWWW · December 20, 2017, 7:21am

Do you have a conflict? Please base on the previous patch I gave and apply the new one.

Tim2016 · December 20, 2017, 7:53am

Hello WayneWWW,
We can build zImage sucessfully. But here are our patch:

diff --git a/kernel/drivers/gpu/nvgpu/gk20a/pmu_gk20a.c b/kernel/drivers/gpu/nvgpu/gk20a/pmu_gk20a.c
index 4899319…2691013 100644
— a/kernel/drivers/gpu/nvgpu/gk20a/pmu_gk20a.c
+++ b/kernel/drivers/gpu/nvgpu/gk20a/pmu_gk20a.c
@@ -1157,7 +1157,7@@ int pmu_mutex_acquire(struct pmu_gk20a *pmu, u32 id, u32 *token)
u32 data, owner, max_retry;

if (!pmu->initialized)

```
  return -EINVAL;
```

```
  return 0;
```
BUG_ON(!token);
BUG_ON(!PMU_MUTEX_ID_IS_VALID(id));
@@ -1226,7 +1226,7@@ int pmu_mutex_release(struct pmu_gk20a *pmu, u32 id, u32 *token)
u32 owner, data;

if (!pmu->initialized)

```
  return -EINVAL;
```

```
  return 0;
```
BUG_ON(!token);
BUG_ON(!PMU_MUTEX_ID_IS_VALID(id));
=======================================================================================================
the line is 1157 and 1226. But your patch is line 1227 and 1296. So I am wondering if there is anything wrong with our pmu_gk20a.c. Our pmu_gk20a.c is as attachment file in comment #28

Thanks!

WayneWWW · December 20, 2017, 8:06am

HI Tim2016,

Sorry for this. Our engineer picks some changes from rel-24.2.1 kernel. If it can be built, I think it is fine.

kayccc · December 28, 2017, 3:00am

Hi Tim2016,

Have you resolved the differ issue? Any result can be shared?

Thanks

Tim2016 · December 28, 2017, 3:09am

Hello,
We are still testing it. We will let you know the result later.

Thanks!

Tim2016 · December 28, 2017, 7:22am

Hello,
Wa still have gk20a error, please refer to the attachment file “log_20171228_gk20a.txt”.

Thanks!
log_20171228_gk20a.txt (52.8 KB)

WayneWWW · December 28, 2017, 7:44am

It looks like a new error again.

Since we cannot get a convergence of this bug, I went through this topic again and had following questions:

Does your test environment on devkit or custom board? In previous comment, you said you are on customized carrier board, could you move back to devkit and retest? If it can be reproduced, could you give out your test method or scripts?
The latest error message, does it cause system to fail?
You already had a 1/700 times test result in second attempt, how many times will be needed to call a “passed” test?? I believe tk1 did not have such test internally.
Does every tk1 module you have hit this error? Or only a specific one?

Tim2016 · December 29, 2017, 3:39am

Hello WayneWWW,

Does your test environment on devkit or custom board? In previous comment, you said you are on customized carrier board, could you move back to devkit and retest? If it can be reproduced, could you give out your test method or scripts?
We are on customized carrier board.
The latest error message, does it cause system to fail?
The two monitors connected to our board don’t display, but we can enter the shell through UART. We can reopen the system by pressing the reset key.
You already had a 1/700 times test result in second attempt, how many times will be needed to call a “passed” test?? I believe tk1 did not have such test internally.
In the beginning, we only did 20 tests for “power off → power on” and 20 tests for “press hardware reset key”.
But we encountered this problem once in one test. So we try to fix this problem and add number of tests to make sure this promblem is resolved completely.
It’s not an urgent issue, we just want to do the best if you have the solution.

Besides, until now, the patch in comment #18 seems the best. Do you recommend us to use this patch?
(the patch in comment #26 just return 0 instead of error number, doest it better than the patch in comment #18?)

Does every tk1 module you have hit this error? Or only a specific one?
It happens on more than one customized boards we have.

Thanks!

WayneWWW · January 2, 2018, 7:14am

Hi Tim2016,

This is better to reproduce your issue on devkit, so that our dev team can have fully support.
Please select the most stable patch you’ve tried. Thanks.

gk20a error

Hello WayneWWW, We can build zImage sucessfully. But here are our patch:

Hello WayneWWW,
We can build zImage sucessfully. But here are our patch: