Hello,
We connect two monitors on K1’s hdmi port and edp port and do hardware reset test.
It’s normal that these two monitors display ubuntu Desktop. But one time these two montors don’t display. (It happens once when we do 300 tests).
the demesg of this situation is recorded in attachment file log1116.txt.
It seems there is gpu error that kernel can’t boot successfully.
Is it a known issue and how can we fix it?
Hello,
I have added print message in gk20a_pmu_enable_elpg(),
The kernel now print the message I added when it boot successfuly.
Is it OK?
Do you want to check if the kernel will print the message I added when it boot fail like in log1116.txt?
Hello,
These two montiors don’t display again after 225 tests.
The message through rs232 is recorded in attachment file “log1124.txt”.
“20171123.gk20a_pmu_enable_elpg()…” is the message I added in gk20a_pmu_enable_elpg().
The last message is “gk20a gk20a.0: gr_gk20a_wait_idle: timeout, ctxsw busy : 0, gr busy : 1” and no more other message printed after 10 mins.
Then I pressed keyboard in PC, the shell appear “ubuntu@tegra-ubuntu:~$” as line 883 in “log1124.txt”.
Hi,
Please try following patch to see if error is still
diff --git a/drivers/gpu/nvgpu/gk20a/fifo_gk20a.c b/drivers/gpu/nvgpu/gk20a/fifo_gk20a.c
index fe29beb..2d48114 100644
--- a/drivers/gpu/nvgpu/gk20a/fifo_gk20a.c
+++ b/drivers/gpu/nvgpu/gk20a/fifo_gk20a.c
@@ -3,7 +3,7 @@
*
* GK20A Graphics FIFO (gr host)
*
- * Copyright (c) 2011-2015, NVIDIA CORPORATION. All rights reserved.
+ * Copyright (c) 2011-2016, NVIDIA CORPORATION. All rights reserved.
*
* This program is free software; you can redistribute it and/or modify it
* under the terms and conditions of the GNU General Public License,
@@ -1072,12 +1072,15 @@
" deferring channel recovery to channel free");
/* clear interrupt */
gk20a_writel(g, fifo_intr_mmu_fault_id_r(), fault_id);
- return verbose;
+ goto exit_enable;
}
/* resetting the engines and clearing the runlists is done in
a separate function to allow deferred reset. */
fifo_gk20a_finish_mmu_fault_handling(g, fault_id);
+
+exit_enable:
+ gk20a_pmu_enable_elpg(g);
return verbose;
}
diff --git a/drivers/gpu/nvgpu/gk20a/gr_gk20a.h b/drivers/gpu/nvgpu/gk20a/gr_gk20a.h
index 526eefb..b973338 100644
--- a/drivers/gpu/nvgpu/gk20a/gr_gk20a.h
+++ b/drivers/gpu/nvgpu/gk20a/gr_gk20a.h
@@ -365,7 +365,10 @@
int err = 0; \
if (support_gk20a_pmu()) \
err = gk20a_pmu_disable_elpg(g); \
- if (err) return err; \
+ if (err) { \
+ gk20a_pmu_enable_elpg(g); \
+ return err; \
+ } \
err = func; \
if (support_gk20a_pmu()) \
gk20a_pmu_enable_elpg(g); \
Hello WayneWWW,
We use your patch and still see gk20a problem (It happened once when we test 700 times).
The dmesg is recorded in attachment file hdminodisplay_reset.txt.
Hello WayneWWW,
There will be some compiling error when I use this patch:
/home/tim/pcpartner/temp/tk1_r21-4/kernel/drivers/gpu/nvgpu/gk20a/gk20a.c: In function ‘gk20a_intr_thread_stall’:
/home/tim/pcpartner/temp/tk1_r21-4/kernel/drivers/gpu/nvgpu/gk20a/gr_gk20a.h:366:7: error: too many arguments to function ‘support_gk20a_pmu’
if (support_gk20a_pmu(g->dev) && g->elpg_enabled) {
^
/home/tim/pcpartner/temp/tk1_r21-4/kernel/drivers/gpu/nvgpu/gk20a/gk20a.c:565:3: note: in expansion of macro ‘gr_gk20a_elpg_protected_call’
gr_gk20a_elpg_protected_call(g, gk20a_gr_isr(g));
^
In file included from /home/tim/pcpartner/temp/tk1_r21-4/kernel/drivers/gpu/nvgpu/gk20a/gk20a.c:51:0:
/home/tim/pcpartner/temp/tk1_r21-4/kernel/drivers/gpu/nvgpu/gk20a/gk20a.h:537:19: note: declared here
static inline int support_gk20a_pmu(void)
^
In file included from /home/tim/pcpartner/temp/tk1_r21-4/kernel/drivers/gpu/nvgpu/gk20a/channel_gk20a.h:36:0,
from /home/tim/pcpartner/temp/tk1_r21-4/kernel/drivers/gpu/nvgpu/gk20a/fifo_gk20a.h:24,
from /home/tim/pcpartner/temp/tk1_r21-4/kernel/drivers/gpu/nvgpu/gk20a/gk20a.h:40,
from /home/tim/pcpartner/temp/tk1_r21-4/kernel/drivers/gpu/nvgpu/gk20a/gk20a.c:51:
/home/tim/pcpartner/temp/tk1_r21-4/kernel/drivers/gpu/nvgpu/gk20a/gr_gk20a.h:374:7: error: too many arguments to function ‘support_gk20a_pmu’
if (support_gk20a_pmu(g->dev) && g->elpg_enabled)
^
/home/tim/pcpartner/temp/tk1_r21-4/kernel/drivers/gpu/nvgpu/gk20a/gk20a.c:565:3: note: in expansion of macro ‘gr_gk20a_elpg_protected_call’
gr_gk20a_elpg_protected_call(g, gk20a_gr_isr(g));
^
In file included from /home/tim/pcpartner/temp/tk1_r21-4/kernel/drivers/gpu/nvgpu/gk20a/gk20a.c:51:0:
/home/tim/pcpartner/temp/tk1_r21-4/kernel/drivers/gpu/nvgpu/gk20a/gk20a.h:537:19: note: declared here
static inline int support_gk20a_pmu(void)
Hello WayneWWW,
We use the combination patch but still have the same problem as in “hdminodisplay_reset.txt”.
The latest log is attachment file “log_1214.txt”.
Is our “fifo_gk20a.c”, “gr_gk20a.h” correct?