[CustomKernel] The Grinch 21.3.4 for Jetson TK1 / developed

Hello again,

This is the output after flashing with normal kernel

root@tegra-ubuntu:/home/ubuntu# uname -a
Linux tegra-ubuntu 3.10.40-gdacac96 #1 SMP PREEMPT Thu Jun 25 15:25:11 PDT 2015 armv7l armv7l armv7l GNU/Linux

the pcie devices:

root@tegra-ubuntu:/home/ubuntu# lspci
00:00.0 PCI bridge: NVIDIA Corporation Device 0e12 (rev a1)
01:00.0 Network controller: Intel Corporation PRO/Wireless 5100 AGN [Shiloh] Network Connection
02:00.0 PCI bridge: NVIDIA Corporation Device 0e13 (rev a1)
03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 0c)
root@tegra-ubuntu:/home/ubuntu# lspci -s '01:00.0' -vvv
01:00.0 Network controller: Intel Corporation PRO/Wireless 5100 AGN [Shiloh] Network Connection
        Subsystem: Intel Corporation WiFi Link 5100 AGN
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        Interrupt: pin A routed to IRQ 130
        Region 0: Memory at 32200000 (64-bit, non-prefetchable) 
        Capabilities: [c8] Power Management version 3
                Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
                Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [d0] MSI: Enable- Count=1/1 Maskable- 64bit+
                Address: 00000000ad778000  Data: 0003
        Capabilities: [e0] Express (v1) Endpoint, MSI 00
                DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <512ns, L1 unlimited
                        ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset+
                DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
                        RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+ FLReset-
                        MaxPayload 128 bytes, MaxReadReq 128 bytes
                DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend-
                LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s <128ns, L1 <32us
                        ClockPM+ Surprise- LLActRep- BwNot-
                LnkCtl: ASPM L1 Enabled; RCB 64 bytes Disabled- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
        Capabilities: [100 v1] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr+ BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
                AERCap: First Error Pointer: 00, GenCap- CGenEn- ChkCap- ChkEn-
        Capabilities: [140 v1] Device Serial Number 00-21-6b-ff-ff-5f-4b-88
        Kernel modules: iwlwifi

other information (might be useful):

root@tegra-ubuntu:/home/ubuntu# iwconfig
rmnetctl  no wireless extensions.

sit0      no wireless extensions.

dummy0    no wireless extensions.

lo        no wireless extensions.

eth0      no wireless extensions.

ip6tnl0   no wireless extensions.

root@tegra-ubuntu:/home/ubuntu# lsmod
Module                  Size  Used by
dm_crypt               13259  0
dm_mod                 73887  1 dm_crypt
rfcomm                 38359  0
bnep                   10469  2
bluetooth             307068  10 bnep,rfcomm
iwlwifi                83371  0
cfg80211              367964  1 iwlwifi
rfkill                 10365  4 cfg80211,bluetooth
nvhost_vi               3064  0

root@tegra-ubuntu:/home/ubuntu# dmesg | grep iwlwifi
[   10.142739] iwlwifi 0000:01:00.0: request for firmware file 'iwlwifi-5000-5.ucode' failed.
[   10.185835] iwlwifi 0000:01:00.0: request for firmware file 'iwlwifi-5000-4.ucode' failed.
[   10.216445] iwlwifi 0000:01:00.0: request for firmware file 'iwlwifi-5000-3.ucode' failed.
[   10.239182] iwlwifi 0000:01:00.0: request for firmware file 'iwlwifi-5000-2.ucode' failed.
[   10.265336] iwlwifi 0000:01:00.0: request for firmware file 'iwlwifi-5000-1.ucode' failed.
[   10.282217] iwlwifi 0000:01:00.0: no suitable firmware found!

root@tegra-ubuntu:/home/ubuntu# ls /lib/firmware/
hp  nvavp_os_0ff00000.bin  nvavp_os_8ff00000.bin  nvavp_os_eff00000.bin  nvavp_os_f7e00000.bin  nvavp_vid_ucode_alt.bin  tegra12x  tegra_xusb_firmware

Would you have any suggestions?

Best regards
R.

Hello,
I would also appreciate if someone could point me to some guide on how to install a wifi driver? See my trial to install the ath10k driver here:

Thanks!

Someone with the actual card would be required to debug this. My thoughts are that if the card does not cause boot failure under the stock kernel, but PCIe fails when you add the firmware, that very likely the firmware (or how it is used) must be at fault. I do not believe the firmware which actually downloads into the WiFi card should be architecture dependent…that particular piece of software should work without PCIe failure regardless of CPU architecture. However, kernel modules dependent upon that firmware could fail and cause PCIe error. Drivers change over time, I do not know if the firmware you used was designed to be used with that version of the driver…it could be as simple as loading different firmware to fix, but it might also be that no such firmware is available for the ARMv7 version of the driver.

Hello,

@lazmol, thank you for posting in this context, I will try to follow your steps, and linuxdev’s suggestions in that thread.

@linuxdev: what i understood from your response:

  • My WiFi PCIe card (which was really recycled from an HP HDX 18 - where it worked fine) might not be fit for the Jetson board architecture (the jetson kernel and the software drivers might not be “talking” to the card correctly). It does not mean that the card is faulty - it might be that the firmware was designed for different operating systems in mind and it just will not talk to the Jetson board correctly.

Is my interpretation of your response correct?

BTW:

  • When I tried a previous version from [url]en:users:drivers:iwlwifi:url [Linux Wireless] that is the version 5.4.A.11, which was extracted as 5000-1.ucode - the iwlwifi driver said something like “Expected api version 5000-5.ucode, got 5000-1.ucode”. That gave me a hint which version to use.
  • The card was working fine with the HDX laptop (Windows based)
  • I already found a guy that will sell me some old PCIe WiFi cards, including Realtek and Atheros - I will try to work with all of them.
  • Is it possible that I “shorted” the PCIe slot by trying to hotplug the PCIe Intel WiFi card? Maybe the PCIe slot is broken now? I don’t think it should recognize the hardware if it’s broken but maybe there is a deeper level of “talking” on the driver level that comes out after the correct API is loaded?
  • I got some weird errors from the PCIe (they seem to be a little different each time):

External Media

External Media

  • those errors only exist when I put the downloaded driver to /lib/firmware
  • I will also take my WiFi Intel PCIe to a different computer and check if it works on a different device.
  • I understand that there is no way I could debug this driver as there is no access to the driver source code?

Cheers and thanks again for answering my questions so far.
R.

Hello.

I have acquired some other PCIe WiFi devices.
I tested them with the default kernel - most of them were identified properly (except for Intels WM3945ABG - which was not enumerated on pcie using lspci). However none of them worked with the default kernel.

After switching to newest Grinch I tested Atheros AR928X:

01:00.0 Network controller: Qualcomm Atheros AR928X Wireless Network Adapter (PCI-Express) (rev 01)

Works like a charm.
It seems that the pci is working correctly with Grinch (thank you Santyago). I will also test another Atheros I have (AR242x) and retest the other cards on Grinch (Intels).

@linuxdev: you wrote that someone with the card would have to debug this. Well I have the card but I don’t know the debug process. I might be able to read the code but I would need some info on how to setup the debug/dev environment for jetson and how to obtain the necessary source code. Any help in this area would be greatly apprecieated.

Best regards
R.

I suspect that what you’d want for debug is the Lauterbach JTAG debugger:
[url]TRACE32®
[url]http://www.electronique-mag.com/article10398.html[/url]

You would need to create a debug version of the kernel, and set a break point in the PCIe enumeration, and then step through until the particular PCIe card begins enumeration. I do not have a Lauterbach JTAG debugger (I only have one which works with OpenOCD…not compatible with TegraK1), so I cannot give you specifics. You could experiment with gdb over console, but I suspect some of the limitations of this (and many annoyances) would take a very long time to work around…an actual JTAG debugger would very much speed up the debug process (there are also cases where serial console simply cannot do the job, even if you are willing to work harder at it with additional debug kernel options and learning curve).

Basically you’d install a kernel with the exact configuration (and source code) of the kernel you get the error on. Since this is a PCIe error, you can probably use either the non-Grinch source OR the Grinch source. In both cases, you’d configure the kernel exactly as it is when errors occur, plus add in debugging options. Set a break point at enumeration, and step through it until you see the error. At this point you should know if the WiFi card did something the kernel source does not like, or if the kernel source simply failed to handle something which is valid for PCIe, but sometimes shows up as a corner case. The really difficult thing to account for is the case of the WiFi card doing something wrong, as then you’d need the firmware and probably internal design of the card.

Thanks for the info. I have worked with Lauterbach some time ago and it is in fact a nice tool. Although I might have difficulties in finding a supplier at my place (PL). I shall look into it.

Cheers.
R.

Thanks Santyago for all your hard work on the Grinch kernel.

Is there any extra work that needs to be done to setup SPI, other than the one line you specify in your install instructions? I’ve followed your instructions, but

ls /dev

does not show any SPI devices for me.

Once installed, can I do sudo apt-get update/upgrade and upgrade the base files without worrying about something screwing up?

The standard apt mechanisms for update/upgrade are safe. The only issue was with the original version of L4T, which has long since been fixed.

I am trying to install a driver for a mini pci-e capture card. I am getting the errors that the following symbols are missing:

videobuf_to_DMA
videobuf_dma_unmap
videobuf_queue_sg_init
videobuf_dma_free

I found that if I enabled Conexant cx25821 (under Device Drivers-> Multimedia Support-> Media PCI Adapters) and rebuilt the kernel then the device driver would install ok, however with this the board somewhat randomly kernel panics on boot. Once it’s booted it’s ok, but every 3rd or 4th or so boot it panics. Using the serial cable I was able to record the difference in a successful boot vs. a panic boot. In the panic boot log(at time 5.649642), there is a “Unable to handle kernel NULL pointer dereference at virtual address 0000000c” line where things start to go south.

We can boot reliably before the driver was enabled so I assume this null pointer is somewhere in the driver. Is there an alternate driver that can be enabled or better way to define these symbols? They all appear to be defined in the videobuf-dma-sg.c file. If not is there a way to fix the kernel panic?

Part of driver install process actually builds the driver and it appears I have most of the source for it. Probably a dumb question but is there a way for me to just add the videobuf-dma-sg.c file to the build process so all the symbols are included?

Normal Boot serial output
[url]Microsoft OneDrive - Access files anywhere. Create docs with free Office Online.

Panic Boot serial output
[url]Microsoft OneDrive - Access files anywhere. Create docs with free Office Online.

I’m not an expert (at all) with working on the kernel so your help is appreciated.

Are you using the Grinch kernel? Is there any info you can give on which driver?

Yes I’m using the grinch kernel. The driver is for the Magewell Pro Capture Mini HDMI card and can be downloaded from [url]the page doesn't exist Magewell

First a comment on the downloadable files. I see a firmware download, which seems it should be agnostic to operating system, but did not see explicit instructions with it. The firmware basically is labeled as Intel format, but the reality is that if the firmware is to be uploaded into the camera then there is no “Intel versus ARM” issue, it would be the camera using that particular firmware. The file naming is for Windows on that firmware, perhaps since it is firmware for the camera and not the operating system file names won’t matter, but I don’t know. Without looking more for instructions though, I’m unsure of details of install or use of those particular firmware files.

Should you have a working kernel module for the driver, you would also be required to have the firmware loaded into the camera. Failure to have the driver and the camera agree (via firmware) on how the driver accesses the hardware could result in system crash. I do not have the camera, so there is a lot I can’t test related to firmware.

For reference, I’m using the 3.10.40-gdacac96 kernel…Grinch will have more features which I do not have, but I don’t believe this is the issue for missing functions in this case.

Regardless of Grinch versus 3.10.40-gdacac96 kernel parts of module compile procedure should be the same. I have kernel source installed on my JTK1 at “/usr/src/kernels/$(uname -r)”. I run “make mrproper” there, then copy and decompress “/proc/config.gz” to that kernel folder as “.config”. I then ran “make prepare” at the kernel source. From there I ran the “ProcaptureForLinuxTK1_2173/install.sh” using “sudo” just in case anything needs root.

Against my current kernel and config (which would differ somewhat from Grinch), compile fails and produces the mwcap_install.log file. Looks like build of the actual program files for ProCaptureForLinux worked. Then kernel module build takes place…

The kernel module build against my non-Grinch kernel fails, citing missing kernel source file “scripts/recordmcount”, and as a side-effect mentions the “mwcap_build/sources/dma/mw-dma-*” files needing this.

Going into the ProCaptureForLinuxTK1 docs directory, the Readme.txt file shows known compatible kernels. Quoting here for convenience:

Compatible Linux Kernel Version
===============================
  3.2.0_23   (Ubuntu 12.04)
  3.13.0_52  (Ubuntu 14.04)
  3.16.0_37  (Ubuntu 14.04.02)
  3.19.0_15  (Ubuntu 15.04)

The kernel used in JTK1 is skipped, but some kernels before and after that version are listed, so it may just mean they tested with those kernels and would not necessarily indicate “incompatible”. Although normally a missing kernel dependency would be a config step in a regular source subdirectory, this is one of the scripts. It can be created by going to the kernel source and running:

make scripts

Starting the install.sh process again, module build succeeds, but module load fails. Unfortunately the intermediate build directory gets removed as install.sh ends, it might be nice to browse some of the temporary build files.

The kernel module itself is saved at the standard module location for a third party module:

/lib/modules/$(uname -r)/extra

The insmod is the reason for the dmesg errors you mentioned about “ProCapture: Unknown symbol videobuf_*”. This would be due to kernel features which are missing. The missing features could be as simple as enabling a disabled feature in the kernel config, or as complicated as needing to both back port a feature and enable it. Considering a kernel version both prior to and after the existing JTK1 kernel version are listed as compatible, odds are high you just need to find out which kernel feature is required (I’d ask the camera people which kernel feature is required for those symbols).

As a side note, when you use nice front ends to config, e.g., “make menuconfig”, dependency checking is automated and enabling a feature requiring another is simple. Building a module outside of the kernel tree gives up that convenience. So to make your kernel module load you’ll need to do research on that feature. I suspect it’ll work just fine after that.

FYI, don’t forget about the firmware download and whether it is used or needed under Linux…specifically where it goes…the file naming is for Windows, but some of those files might go in a subdirectory of “/lib/firmware”. Sometimes a device will work with a given driver but using old firmware, other times the API changes and driver plus old firmware will crash the system.

Hi linuxdev–thanks for diving into this. I don’ t think the firmware is an issue because the random kernel panics happen regardless of whether or not the pro capture card is physically installed on the board. I downloaded the firmware and it looks like it’s just a couple executables (that as you mention only work on Windows) that load the correct firmware to capture card and the firmware files for various capture cards. There doesn’t appear to be any files that are related to linux.

I think your comment “odds are high you just need to find out which kernel feature is required” hits the nail on the head and is at the crux of what I’m trying to do. I know the symbols I’m looking for and I even know what file they’re defined in (/usr/src/linux-grinch-21.3.4/drivers/media/v4l2-core/videobuf-dma-sg.c). I just don’t know what to select in make menuconfig to get that file included when the kernel is built. Is there a way to figure that out? Including Conexant cx25821 under Device Drivers-> Multimedia Support-> Media PCI Adapters) will do this, but doing so causes the panics. I am currently trying it with activating Conexant 2388x instead of the cx25821 to see if that will still include the symbols but not cause the panics. It feels kind of hackish though.

Is there a more systematic way to figure out which feature to activate? I have contact the support for the Magewell card, but since they are in China, communication is slow.

The feature configuration of the kernel is known as “kconfig”, and you’ll notice each kernel subdirectory has a subset of kernel features, with the “Kconfig” file listing them for activation or not. Menuconfig will step into each subdirectory where a parent directory has a Kconfig item configured naming that subdirectory. Menuconfig config description display will be similar in the naming presented to what the parent/child directory structure names are, although not exact.

The software to be compiled against the kernel really should document kernel config requirements. Even so, it isn’t too bad in some cases to track it down manually. You’re looking for “drivers/media/v4l2-core/Kconfig”, so in menuconfig note the existence of each of these submenus, one following the other:

Device drivers
  Multimedia support

This is where it gets interesting. There are at this menuconfig level two possibilities:

V4L2 int device (DEPRECATED)
V4L platform devices

the former is not normally selected in L4T, the latter is normally selected in L4T. This hints though at the possibility that the interface being used by the camera software is deprecated, and may not be supported without some modification at some future date, but this is all just hints and guessing at this stage.

If you see the menuconfig help topic for V4L2 int device, this is important in tracking what it does:

Defined at drivers/media/v4l2-core/Kconfig:87
Depends on: MEDIA_SUPPORT [=y] && VIDEO_V4L2 [=y]

If you go to “drivers/media/v4l2-core/” and run this you’ll see what is defined in that directory:

egrep -R '(VIDEO_V4L2|MEDIA_SUPPORT)' `find . -name 'Kconfig'`

Since VIDEO_V4L2 shows up there, this is where the functionality of the option is implemented. MEDIA_SUPPORT does not show up unless you go to the parent directory, “drivers/media/”, in which case MEDIA_SUPPORT shows up. Kconfig of “drivers/media/” allows seeing “drivers/media/v4l2-core/”. The actual files which implement something will refer to those in the code, and from that you can figure out symbols.

Note that you can also search for the help text in the Kconfig files. This works from “drivers/media/”:

egrep -R '(V4L.*platform)' `find . -name 'Kconfig'`

…the result of that search is the help text you see in menuconfig of “V4L2 int devices (DEPRECATED)”:

/Kconfig:# V4L platform/mem2mem drivers
./platform/Kconfig:     bool "V4L platform devices"

A similar approach shows the “V4L platform devices” belongs with the Kconfig of “drivers/media/platform/”. This is the default config in L4T. Assuming the non-deprecated code is to be used, you can grep for the symbols in “drivers/media/platform/”:

egrep '(videobuf_to_DMA|videobuf_dma_unmap|videobuf_queue_sg_init|videobuf_dma_free)' * 2>/dev/null

…if the deprecated code is to be searched for, you can do the same thing in “drivers/media/v4l2-core/”.

Because of where the symbols are actually used, if we are using the modern “platform” branch a feature which enables compile of omap24xxcam.c or via-camera.c would do the job. If using the deprecated v4l2-core, a feature of Kconfig enabling compile of videobuf-dma-sg.c would do the job. I have not tried to set up the new dependencies to check, but really the manufacturer should provide the required kernel feature config. What I’d do if I had the camera is to go here in menuconfig:

Device drivers
  Multimedia support
    V4L platform devices

…and then look for devices which might be for your camera, yet not yet and see if anything works. If not, go ahead and enable the deprecated, and try again. I do not have their code downloaded and installed anymore, otherwise it might be simpler to just look at their code and see what it wants. Perhaps their code is looking for something not in this version of kernel source.

I heard back from Magewell support and there answer was to edit a Kconfig file to make the option appear in menuconfig:

$ cd /usr/src/linux-grinch-21.3.4
$ sudo su
# nano drivers/media/v4l2-core/Kconfig

→ find line that says ‘config VIDEOBUF_DMA_SG’
→ the line below it says ‘tristate’
→ change that line so that it says ‘tristate “VIDEOBUF DMA SG”’
→ Save and exit

# zcat /proc/config.gz > .config
# make menuconfig

→ Navigate to “General setup → Local version” and edit add something like “-ProCapture” to the version name
→ Go back to the main menu and navigate to “Device Drivers → Multimedia Support → VIDEOBUF DMA SG”
→ Hit ‘y’ to make that a kernel built-in (the < > preceding VIDEOBUF DMA SG changes to <*>
→ exit all the way back out, saving on exit
→ rebuild kernel

Since implementing this the kernel panics have vastly decreased. However I would still get one every 10 or so boots. In the kernel puke, i found this line

kernel BUG at drivers/platform/tegra/hier_ictlr/hier_ictlr.c

which led me to this forum post https://devtalk.nvidia.com/default/topic/887568/jetson-tk1-sometimes-fails-to-boot-with-u-boot/

I realized the grinch is one release behind L4T, i.e. it’s at 21.3.4 and L4T is at 21.4. At the risk of sounding ungrateful (I am grateful–really!) @Santyago are you by chance updating the grinch to 21.4?

Followed the steps, but I don’t see anything related to spidev in /dev. Is there something I’m missing?

Also having the same issue.