I can’t answer all of it, but here is a start:
tegradcis the GPU (Tegra Display Controller). Perhaps the GPU is talking to the CPU?
/proc/interrupts” (then the right side column will mention something about the driver servicing that interrupt, which in turn suggests what it is).
I’m not positive, but I think gk20 is something specific to the Tegra System on a Chip (SoC).
Hi, I’m not only focused on the tegradc, but also the irq/472-gk20a-s, irq/73-host-sync, irq/110-b9500.
Thel all are in Real Time sched policy, because PR is < 0. The occupy much cpu.
When the rt sched policy thread occupy cpu > 200%, the xavier will hung, we must reboot it.
Someone else will need to answer specifics, but just some general information: Any of those IRQs shown in “
/proc/interrupt” are hardware IRQs, meaning they are triggering the scheduler to get attention to service actual hardware. Priority is up to the scheduler (what you noted about the nice level less than 0 does say this is “important”), but once the driver is in context it could reach “atomic” sections. If atomic, then the thread cannot be preempted until the atomic section exits. In most cases hardware itself will determine what happens as to when the atomic section exits. So if there is a stall in a driver it is different than simply taking a lot of time to service the hardware. The former is a bug, the latter is a performance problem.
We don’t know why it is taking more time. It is highly likely though that this hardware must have the high priority. Changing priority could for example imply a priority inversion and complete lock up of the system. The particular data and context of the physical hardware is something I can’t even guess at, so I can’t say if this is just heavy load or priority inversion or something else.
Incidentally, this is just soft scheduling, not hard scheduling. The Cortex-A is not capable of true realtime scheduling. Whatever is going on in the gk20a IRQ starts from the Cortex-A, so it emphasizes that the problem might need to be separated into what is triggering the gk20a to run, versus what makes the gk20a take time. I don’t know. The gk20a load might not be due to the gk20a itself, and it might just be the symptom.
It would be interesting though for a comment from NVIDIA on the gk20a being under high load. You’d want to specifically state your exact carrier board: If this is the NVIDIA dev kit, or something else. You’d also want to state your exact kernel and its configuration (you could simply state it is the stock and the kernel has never been modified; if modified, then you’d need details).
This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.