Flash Jetson Orin Nano WSL2

Hi,

I have been struggling with flashing the Jetson Orin Nano with NVMe disk through WSL2.
The issue that I am facing is that during the flashing procedure I see that the Orin Nano exhibits itself from initially an APX USB device to a Remote NDIS Compatible Device, USB Serial Device.

I made sure that both type of devices get auto attached in WSL2 to make sure that I do not have an USB passthru issue from my windows machine to WSL2.

At the point where the flash process starts trying to connect through this ethernet over USB device, it fails with timeouts. The reason for that is that my WSL2 does not seem to detect the usb ethernet interface (although my USB device is enumerated and visible in WSL2 → lsusb gives Bus 001 Device 002: ID 0955:7035 NVIDIA Corp. Linux for Tegra)

I believe that NVidia supports the WSL2 flow with the sdk-manager and thus I assume that this should work fundamentally. However, it seems that I am missing a crucial point somewhere which prevents the Orin Nano to show up as an ethernet over USB interface.

I would expect with ifconfig -a that I would see an usb0 ethernet adapter (something which I see on a native Linux machine), but don’t see it pop up. And therefore logically the flash process fails with the timeout during waiting for boot up…

I am using the following WSL2 version, running on Windows 10:

WSL version: 1.2.5.0
Kernel version: 5.15.90.1
WSLg version: 1.0.51
MSRDC version: 1.2.3770
Direct3D version: 1.608.2-61064218
DXCore version: 10.0.25131.1002-220531-1700.rs-onecore-base2-hyp
Windows version: 10.0.19045.3324

Anyone here with suggestions?

Hi @henkman,
You would have to rebuild the WSL kernel to add support. I had to do something similar to add joystick and usb mass storage support. Here is the tutorial I followed https://www.youtube.com/watch?v=iyBfQXmyH4o . Only difference being in the menuconfig you would need to change a different settting. If you can, I also suggest using windows 11.

1 Like

Hi,

I believe the kernel version that I use already have these options enabled. It is the latest one from MS.

If I am really missing something then I would ask for someone to show me the exact config settings for the kernel (which I can check with cat /proc/config.gz | gunzip ).

Hi,
It could also be the privileges of the /dev directory in WSL. You can check with ls -ld /dev.

Hi,

I have the rndis_host working :) . After all, it was a lot of playing which options where needed to change the WSL2 kernel.

See this:

[  101.757945] usb 1-1: new high-speed USB device number 2 using vhci_hcd
[  101.937757] usb 1-1: SetAddress Request (2) to port 0
[  101.973360] usb 1-1: New USB device found, idVendor=0955, idProduct=7035, bcdDevice= 0.01
[  101.973364] usb 1-1: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[  101.973365] usb 1-1: Product: Linux for Tegra
[  101.973365] usb 1-1: Manufacturer: NVIDIA
[  101.973366] usb 1-1: SerialNumber: 1421523024293
[  101.985902] rndis_host 1-1:1.0 usb0: register 'rndis_host' at usb-vhci_hcd.0-1, RNDIS device, 7a:1c:fe:ba:4c:6f
[  101.987466] usb-storage 1-1:1.2: USB Mass Storage device detected
[  101.987789] scsi host1: usb-storage 1-1:1.2
[  103.028234] scsi 1:0:0:0: Direct-Access     mmc0     0                     PQ: 0 ANSI: 2
[  103.028803] sd 1:0:0:0: Attached scsi generic sg3 type 0
[  103.036585] sd 1:0:0:0: Power-on or device reset occurred
[  103.040551] scsi 1:0:0:1: Direct-Access     mmc0b0   0                     PQ: 0 ANSI: 2
[  103.040897] sd 1:0:0:1: Attached scsi generic sg4 type 0
[  103.045023] sd 1:0:0:0: [sdd] Media removed, stopped polling
[  103.048601] sd 1:0:0:1: Power-on or device reset occurred
[  103.052151] sd 1:0:0:1: [sde] Media removed, stopped polling
[  103.058860] sd 1:0:0:0: [sdd] Attached SCSI removable disk
[  103.063827] scsi 1:0:0:2: Direct-Access     mmc0b1   0                     PQ: 0 ANSI: 2
[  103.063934] sd 1:0:0:2: Attached scsi generic sg5 type 0
[  103.072696] sd 1:0:0:1: [sde] Attached SCSI removable disk
[  103.076021] scsi 1:0:0:3: Direct-Access     ext0     0                     PQ: 0 ANSI: 2
[  103.076130] sd 1:0:0:3: Attached scsi generic sg6 type 0
[  103.085987] sd 1:0:0:3: Power-on or device reset occurred
[  103.087851] sd 1:0:0:3: [sdg] 3907029168 512-byte logical blocks: (2.00 TB/1.82 TiB)
[  103.089059] sd 1:0:0:2: Power-on or device reset occurred
[  103.090198] sd 1:0:0:2: [sdf] Media removed, stopped polling
[  103.095152] sd 1:0:0:3: [sdg] Write Protect is off
[  103.095154] sd 1:0:0:3: [sdg] Mode Sense: 0f 00 00 00
[  103.096195] sd 1:0:0:2: [sdf] Attached SCSI removable disk
[  103.100753] sd 1:0:0:3: [sdg] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[  103.111920] TCP: eth0: Driver has suspect GRO implementation, TCP performance may be compromised.
[  103.133559]  sdg: sdg1 sdg2 sdg3 sdg4 sdg5 sdg6 sdg7 sdg8 sdg9 sdg10 sdg11 sdg12 sdg13 sdg14
[  103.144495] sd 1:0:0:3: [sdg] Attached SCSI removable disk

and query with ifconfig -a :

usb0: flags=4098<BROADCAST,MULTICAST>  mtu 1500
        ether 7a:1c:fe:ba:4c:6f  txqueuelen 1000  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

Regards,
Henk

If he has that, then he could just copy the config.gz and attach that to the forum.

Hi again,

First of all I want to highlight a curiosity at the official NVidia documentation pages as found here:

Windows Subsystem for Linux :: NVIDIA SDK Manager Documentation

Because NVidia wrote that page, it suggests that WSL should work without any issues to have the sdk-manager working for several devices such as Jetson.
I have been pulling out all my hairs during my struggles why it did not work, so I am a bald guy now… unfortunately.
Anyway, maybe NVidia should indicate there what kernel of WSL2 they used. Just like I posted my complete config in the first part of this thread. WSL2 is really awesome to my opinion but things are progressing and it is not always clear what features are working and what not.

Besides that, this forum is flooded with post like “do not try WSL2 and use native host only” without any further solution directions for guys like me that just want to understand why and just want to fix it.

Now a description of the path that I walked to get it all fixed.

  1. It started with not being able to build the images initially on my WSL2 which at the end were caused by an issue that the qemu-system package was not completely available on binfmts, something which is essential to get chroot working during the build and packaging process of rootfs. My fix for that issue was posted here:
    Orin nano installation fails; chroot: failed to run command ‘dpkg’: Exec format error - Jetson & Embedded Systems / Jetson Orin Nano - NVIDIA Developer Forums

  2. After fixing that hurdle I crashed in the flashing process with the SDK manager. I switched to the commandline part of the install but that breaked at the same points (although the messages are much more clear than from the sdk-manager, which seems to hide a lot of details). I carefully watched the USB device enumeration in WSL2 in a separate Powershell window, the one where I initially did a bind --force and an auto-attach of the NVidia APX device descriptor which pops up if you put the device in factory recovery mode. During the preparation and flashing process (i believe some of the bootloader stages get activated there, correct me if I am wrong) somewhere at the main flashing stage the USB port enumerates as a different device descriptor. In fact it enumerates as a “Remote NDIS Compatible Device #2, USB Mass Storage Device”. Of course, usbpid does not have any knowledge of this new device descriptor and it is at this point not bounded yet in case you never did the bind --force for this descriptor. I suppose that many of us get stuck there. Anyway, just bind that one too once and you are done for every next auto-attach to WSL2.

  3. At this point I got stuck again with the “waiting for boot” or something which looks like that message during the flash procedure, simply because the tools expects that an rndis device becomes active on the USB port, including your USB mass memory (the NVMe disk partition). For the ones who don’t understand the importance here; the flashing tool is waiting here for an Ethernet over USB connection (usbnet) to continue the flashing process over that ethernet connection (on an ipv6 address). Unfortunately the kernel that I use and described at my first post in this thread does not have all the options enabled to get an rndis capable USB connection working. My initial assumption that this was the latest kernel from MS and the fact that NVidia has posted a nice documentation page which suggests that WSL2 works ok with the SDK manager was the cause for a lot of confusion. I have been playing with more than one kernel config setting and I can just say that I needed quite some features before I achieved white smoke.

Please find attached my config.gz for my latest modified 5.15.90.1 WSL2 kernel.

Hope that this post clarifies a lot and I hope that a lot of clutter messages get removed from this forum by NVidia

config.gz (25.3 KB)

Regards
Henk

2 Likes

The main issue is that some flash images are pre-canned binary images, but others must be generated. Most notably, the rootfs is created as a large file, which is then covered with loopback, and formatted to ext4. Loopback is a function of the kernel, and the default WSL2 does not have that kernel. To enable this one has to be sufficiently expert at WSL2 to (A) build a new kernel with the addtional feature, and (B) install that new kernel. I have not personally tried this on WSL2, so I couldn’t tell you the steps, but it is pretty much a given that it is more difficult in WSL2 compared to an actual Linux system.

The SD card models have the option of a pre-canned rootfs image. The QSPI content is already a binary image, and so those don’t need to be built, they are only flashed. If you are using a pre-canned binary image of all software, then it should work.

The other issue, which is an issue of all VMs, is that sometimes USB pass-through needs to be adjusted so that the Jetson will reconnect after a disconnect. That’s a VM setup issue, and differs with every VM.

Incidentally, one would want to be sure that the WSL2 is itself running on an ext4 filesystem type if any image is to be generated. If the content which is to be copied in to the loopback partition is not originally copied to a filesystem type which understands Linux permissions, then the flash will succeed, but boot will end up with an unusable system needing to be flashed again.

QEMU is interesting as it is used by the flash software itself when preparing parts of the generated rootfs image. Some packaging operations (or creating a default user) use a chroot into the “Linux_for_Tegra/rootfs/” before using the arm64/aarch64 tools (they are native, not cross tools). So if you use WSL2, then WSL2 runs on Windows, and QEMU runs on the Linux running on Windows, and the rootfs edits run on QEMU running on the Linux running on Windows.

From what I’ve seen most VMs can work, but it is hard for any company to support such an environment on all VMs, or without specialty support in each VM. It is usually up to the end user to contact support of the VM rather than NVIDIA being the support. So imagine that you call Microsoft to ask why the Jetson software isn’t flashing, and they do not know about loopback; or if they do know, they have to document how to (A) get the kernel source, (B) configure the kernel for the WSL2 environment, and (C) add the loopback capability. It is a difficult path. The other option, to just use Linux, is far easier to support.

But this is all my opinion. But I must say it is nice to have the config.gz you posted.

Hi @linuxdev

Because the WSL2 kernel is constantly progressing, I believe it is therefore important to mention the versions that are used in all those messages on this forum.

For that reason I opened this thread with clarifying my starting conditions. It remains fuzzy what NVidia used with their WSL2 link where they suggest that the 1.9.3 version of the SDK manager works with WSL2. I even saw a post from someone who mentioned successful flashing without ever indicating what versions of the kernel was used. It just stated that WSL runs ok with the sdk manager. That makes a large difference and leads to questions, especially if it then does not work as expected.

I won’t deny that using native Linux is easier to work with; I even used my native Linux to debug and solve my solution as posted above. Therefore I also believed that it might be worthful for others to make a good description of my solution, including the USB enumeration issues with WSL2.

Given the fact that I want to develop on a company machine (which uses Window10 Enterprise) I am currently investigating how mature WSL2 is and if I can use it for all my development tools, such as FPGA tooling, simulators, Yocto builds (which all run ok now at WSL2) and recently this NVidia stuff. I always used native Linux for all those tools but sometimes company rules do not make life easier and you need to work around it.

If something does not work, I tend to find solutions and solutions are only possible if we understand the problems. And finally understanding these problems is the point where you learn a lot. And sometimes you need some help from others to speed up this learning curve.

Given your loopback issue; isn’t that already solved with WSL2 ? In the config I see CONFIG_BLK_DEV_LOOP=y
By the way, building a WSL2 kernel is amazingly simple, and even installing it.

Anyway, for me as a developer it is very important to understand the whole flows because I need to design specific use cases. Currently I am just playing around with the available setups and tooling, now it is time to find out what is more needed for future integration in real systems, possibly with dedicated Yocto builds (don’t even understand why this is not the default choice).

Regards
Henk

I think a lot of the history revolves around supporting each individual VM being a lot of work. One would need time to test each platform on each VM, and then predict what failures will occur (e.g., in one setup USB always passes through, but in other cases it only passes through once and disconnect/reconnect is lost; or updating a kernel in each VM has different instructions; NVIDIA would itself have a huge learning curve).

As far as Yocto goes, I agree that this would be very nice. On the other hand, at least there is the separation of (A) sample rootfs into the “Linux_for_Tegra/rootfs/”, (B) addition of NVIDIA drivers to that same location (this is where one really needs more flexibility for other flavors of Linux), and (C) various boot content needed to get to the uniform API in UEFI (which is only true since R34.x+).

Ubuntu has been used for a very long time. I have a Tegra 3 sitting here which was basically Yocto, but everything from TK1 and newer is Ubuntu. Before TK1 only a reference design was given, and third party manufacturers had to build their own boards. There was no publicly available sample board an average person could purchase (all purchases had to come from a larger company before that). The TK1 board is when Ubuntu came to life in the first Jetson. That design was set, and everything since then has inherited it.

I think one reason why such inheritance occurred is because no Jetson has a BIOS. They only have a very custom boot chain, mostly provided in binary images (at least for earlier boot stages…some later stages have source code available). But working with that requires a lot of knowledge. To add another o/s other than Ubuntu would have required knowledge of initial kernel bring-up, and I don’t think (my opinion) that NVIDIA had time for that since Ubuntu worked. The other half of that is that these were all just learning kits, they were not intended for third party resale. Their warranties (for dev kits, even now) do not transfer to third party recipients. So there wasn’t much reason to make the TK1 itself (and most newer releases) more flexible, it wasn’t intended for commercial use.

Xavier and Orin though could benefit from a new flash design. This is because of UEFI. Xavier itself won’t benefit because after the R35.x L4T Xavier goes into maintenance and won’t get new features. The boot chain will now be much easier to work with and use more or less standard methods, so Orin could really benefit from a Yocto release. However, then NVIDIA would have to support this for both learning hardware (the dev kits) and separate eMMC modules on third party carrier boards. I have no idea what NVIDIA’s thoughts are on whether they think this is worth the effort or not. Obviously end users would be happy with it, but I don’t know if it would drive sales or not.

Hi @linuxdev,

Are you aware on the existence of this :

OpenEmbedded for Tegra (github.com)

and

OE4T/meta-tegra: BSP layer for NVIDIA Jetson platforms, based on L4T (github.com)

By the way, you did not mention my loopback remark in WSL2.

Regards
Henk

The OpenEmbedded and other efforts are not officially supported, and the point is about official support. Many (all?) of the efforts for other third party releases don’t include working CUDA (or GPU use for OpenGLGLES). There is driver source code now available publicly which was not available previously, but I don’t know how well it works.

The whole point about WSL2 has never been knowing how to configure a kernel feature for loopback, it has always been about steps to do this for WSL2. I think it goes beyond simple configuration of WSL2’s USB, and this is something which Microsoft would need to support if it is ever to “officially” be added (versus end users installing a new kernel).

Hi @linuxdev

I still do not understand your loopback discussion on WSL2. It is already supported for a while.
With respect to USB; it works just like you normally see in virtual environments (for example VirtualBox). Which means you need to bind and attach.

With respect to OpenEmbedded; is what you write based on your own latest experiences? The OpenEmbedded github pages suggest that it is all derived from the latest release of L4T; Jetson Linux release: R35.3.1 JetPack release: 5.1.1

Regards
Henk

Many people have come here and had a failure on loopback. If this is already supported, I guess it is a question then of knowing which release on which platform loopback becomes supported in. I do not know that answer.

As far as VirtualBox or other VM setup goes, that isn’t something NVIDIA would be obligated to support, it is up to the individual VM users (or VM support) to set that up correctly. All I can personally do is suggest that USB pass-through is what often fails, and that this can work if the user either gets lucky, or if the user knows how to correctly assign USB to always pass through (based on whichever brand of VM is being used, it isn’t specific to WSL2 or VirtualBox or VMware or any other VM, which is the real problem…it varies). A large percentage of people who use a VM here are brand new to VMs, and do not know anything about binding. A native install does not require that knowledge.

My comments are not specific to OpenEmbedded. I am familiar with two things that comments are related to: (A) The progression of change in boot content over the years, and (B) the Xorg server and how it uses the GPU driver.

Boot change over the years is self-explanatory as to how it makes it difficult for third parties to build their own since (until UEFI) this was entirely custom and a moving target.

The Xorg server itself is less obvious, and goes back to when the GPU driver (in user space) was only released in binary form. If you look at your X server log (usually “/var/log/Xorg.0.log”), then you will find that some components the server uses are modular “plugin” devices. The Xorg server has a dynamic load ability very much like loading a dynamic library. This isn’t just for the GPU, it also includes things like keyboard, mouse, various drawing tablets, so on. To use a plug-in (not Xorg’s terminology, but it is logical) the code has to be compiled. There are two standards to meet: The API, and the ABI. The API is basically the source code, and is not a problem. The problem is the ABI (Application Binary Interface).

Each new Xorg server requires a different ABI. Unless you have the NVIDIA GPU user space driver in that ABI, the server won’t load it. Many of the CUDA type applications require the Xorg server (many people mistake Xorg or X as a graphics application, but it is not; more below). So if you try to reuse that ABI (due to lack of source code and no ability to rebuild it with a new ABI), then X won’t work with hardware rendering (and software rendering won’t assist CUDA). I’m not sure which of the driver releases are now open sourced, some are. This is recent though, and was not available just a short time ago. If you have the source, and if you can adapt it to the API, and if you can recompile it to the correct ABI, then you can migrate the GPU to a new X server. If not, you have software rendering.

It would be interesting if NVIDIA could post what releases of the GPU driver for X plug-in are available in source code. I’m thinking it must be used in combination with kernel code.

I am a fan of the change to UEFI. There is an abstraction layer which means the older issues of the changing boot environment won’t be so bad for people adapting their own system. I want it to be easier to adapt Jetsons, but it is not yet as easy (perhaps a bad choice in wording) as it is with some other ARM based embedded systems.

I’ve not used the OpenEmbedded you speak of, but if it is derived from R35.x, then it won’t apply to any of the older Jetsons, and those are what I’ve worked with for the most part. I do have Xavier and an AGX Orin, but I have only one of each, and don’t really want to remove the official L4T (if I had a second, then I would probably try the OpenEmbedded). I’d like to see how they deal with the Xorg ABI.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.