R32.1 kernel bug fixes

Hi,

I’ve fixed some bugs in the kernel, mostly related to initializing and error handling of multiple MIPI cameras accessed via V4L2. The fixes are currently based on r32.1 (currently working on upgrading to the latest release). Here’s a tarball of patches which applies to r32.1 with git apply: patches-20200821.tar.gz (12.1 KB)

I’ve been using these with code running on a Xavier AGX for a while now. Before the fixes, I saw various reliability problems during startup and shutdown. I then discovered that a stress test which repeatedly power cycles the cameras while capturing produced similar crashes or lack of ability to capture frames, within a few hours. Depending on how lost the hardware got, sometimes it would require a cold power cycle to recover, not just a kernel-level reboot. With these fixes, the system now passes that stress test, and I haven’t observed similar problems during any other testing.

My TODO comment in “Avoid using TEMP_CHANNEL_ID for transactions with responses” should be easily addressed by somebody more familiar with the rest of that code.

The following patches fix error paths which look like they can’t be hit unless there’s another bug or a hardware failure. I fixed them because they were being exercised due to the other bugs, but the changes are hard to test thoroughly:

  • Fix the error paths in vi_capture_setup and vi_capture_release
  • Make several error paths more useful

These patches are just changing debug logging messages, so they’re not going to fix any bugs:

  • Improve debug and error messages
  • Add some tracepoints in the communication with the VI5

@brian.silverman
Could you check with the latest release r32.4 aka J4.4, we need to review the patch base on the latest release.