As you can see, it says “GPU has disappeared from bus!!”.
There are a couple of similar posts here in the forum, but which were about desktop GPUs, not Jetson. Yet, anyway, they were about GPU temperature. So, I double checked the GPU temperature, and which was way below the threshold, 34 degrees Celsius in the most recent case (reported by tegrastats).
L4T is r32.3.1, running on a Nano carrier board, with our own LTE module (SIMCOM/Telit).
I would appreciate if I get some points we should check especially from the hardware design point’s of view.
This probably is not the issue, but if the device tree were wrong, then the GPU might disappear. I’ve you’ve made any device tree changes, then you might check for conflict with the GPU.
FYI, you are correct that the desktop GPU not applying…they are PCI, but the Jetson GPU is integrated directly to the memory controller. Had the GPU been PCI, then device tree would not have been required to set up some of the GPU (PCI allows query of the device, but devices without a form of reply to requests for its details need device tree).
Yes, we use our own pinmux configuration. But do you think GPU could disappear only under a particular (unknown) condition? I will check our own pinmux spreadsheet.
You’ll need to add the content mentioned by @WayneWWW, but yes, an error in the device tree can make one device appear, but the other to disappear. Depends on the conflict. Specs are only written for how it works when not in error, so there is no way to define how it should behave in odd device tree error conditions. Just as a contrived example, if memory regions are reserved, and two devices are not intended to operate in the same memory region, then if both do there is no telling how one will behave as the other puts all of its init into that memory region (and device tree might determine this).
No. This does not depend on an application but on a particular device. It is common that all of these affected devices are our LTE models.
No, as said above.
I got the confirmation. Our Nano model just uses the default pinmux spreadsheet as is to produce dtb files. So, this should not be an issue.
Now I suspect the radio wave of LTE module somehow electromagnetically affect the signal under a particular condition, for example, when the power of the radio wave is higher.
In order to get the required stability, we use a previous (stable) version. So, we have r32.4.4 based image. I would lose this reproducible environment, but could try to see if this issue gets resolved or not.
I would test that by wrapping the non-LTE component in grounded foil or other metal. If this is on something like an m.2 mount, then you could perhaps sandwich grounded foil between two very thin cardboard insulators and have at least a partial degree of RF separation.