Performance & flashing issues with jetson orin nx 16GB

I’m having hard time to understand how to flash jetson orin nx and would really appriciate some help. Options I tried so far:

  1. SDK Manager (2.2.0-12021_amd64)
    Fails with some issue about dpkg -i : error: chroot: failed to run command ‘dpkg’: Exec format error

  2. Flash using flash.sh (36.4)
    $ sudo ./flash.sh jetson-orin-nano-devkit-nvme internal
    Doesn’t work as it stucks on “retrieving storage info” step and doesn’t go further. From this forum I learned that flash.sh doesn’t flash to nvme, however, there is no SD slot on jetson orin nx, so jetson orin nx simply not supported by flash.sh (Flash.sh gets stuck at retrieving storage info)

  3. l4t_initrd_flash.sh (36.4)
    It doesn’t matter which combination of internal/external/nvme0n1 parameters used, flashes completes successfully, but when device restarts, it fails to boot as it tries to access mmcXXX and it can’t, since orin nx doesn’t have microSD slot. So, initrd is always flashed the way it depends on mmc, even if you flash nvme. Nvme is flashed properly however, I see some partitions table and partition with rootfs, it is just useless because of broken initrd.

  4. dd jetson ubuntu 22.04
    So the way I managed to get my orin nx up and running is dding jetson ubuntu 22.04 image from usb to nvme. But this setup is super fragile,
    basically, any apt upgrade breaks kernel so cuda stops to work. Jetson-container doesn’t work at all due to failure to symlink /usr/lib/aarch64-linux-gnu/nvidia/nvidia_icd.json. From this forum I learned that jetson-container gets broken after any change in nvidia-docker setup, but in my case, it is so right from the beginning as I had to install nvidia-docker manually. And the answer for this was … you need to flash it again, as freshly flashed rootfs has all of this configured properly. Except, none of the standard flashing options works for orin nx since it doesn’t have micro sd slot.

After failing again and again to flash orin nx, I feel a bit lost as pretty much any way I go I face the issues and i’m running out of ideas how to get it up and running properly. More over, I’m not sure it worth trying. I got orin nx for 800$ and it gives me only 5 more tokens per second (llama3.2:3b) in comparison to 80$ raspberry pi. I’m not sure, I probably had wrong expectations but I thought it could give me a bit more as it has 60TOPS more than Nano, but it gives me exactly the same. And I definetly didn’t expect it to have almost same performance as without any cuda at all if you compare to raspberry pi. Why is it so much expensive and so much broken? Did I do something wrong?

Hi,

  • SDK Manager

    • could you run the sdkm manager and EXPORT LOGS when error occurred as shown in the image.
  • l4t_initrd_flash.sh (36.4)

    • Have you followed the guidance in official doc
    • Please use ${YourCMD}| tee flash_log.txt to attach a flash log for us to review

Also record the serial console log during the flash process.

Thanks

Hi David,

Thank you for the fast answer. Here is what I got after another day of working with it:

SDK Manager
GUI seems to be not compatible with ubuntu 24.10 I have on a laptop, at least I didn’t manage to get it up and running easily so I used docker image instead:

sudo docker run -it --privileged -v /dev/bus/usb:/dev/bus/usb/ -v /dev:/dev -v /media/$USER:/media/nvidia:slave --name JetPack_Orin_NX_Devkit --network host sdkmanager --cli --action install --login-type devzone --product Jetson --target-os Linux --version 6.1 --target JETSON_ORIN_NX_TARGETS --flash --license accept --stay-logged-in true --collect-usage-data enable --exit-on-finish

Here are the logs I extracted from the container after flash failed:

NV_L4T_DRIVERS_COMP.log (1.9 KB)
NV_L4T_FILE_SYSTEM_AND_OS_COMP.log (30.7 KB)

As I see it went a bit further this time and error is different:

20:41:26.614 - Error: qemu: uncaught target signal 11 (Segmentation fault) - core dumped
20:41:26.708 - Error: Segmentation fault (core dumped)
20:41:26.737 - Error: qemu: uncaught target signal 11 (Segmentation fault) - core dumped
20:41:26.829 - Error: Segmentation fault (core dumped)
20:41:26.831 - Error: dpkg: error processing package libc-bin (--install):
20:41:26.831 - Error:  installed libc-bin package post-installation script subprocess returned error exit status 139
20:41:26.908 - Error: Errors were encountered while processing:
20:41:26.908 - Error:  libc-bin
20:41:26.914 - Error: [exec_command]: /bin/bash -c /home/nvidia/.nvsdkm/replays/scripts/JetPack_6.1_Linux/NV_L4T_FILE_SYSTEM_AND_OS_COMP.sh; [error]:  libc-bin

l4t_initrd_flash.sh (36.4)
I had to edit tools/l4t_flash_prerequisites.sh to change netcat to netcat-traditional as it seems what it is now in newer ubuntu version (i’m on 24.10).
sudo ./apply_binaries.sh completed successfully without any issues.
Then I executed:

sudo ./tools/kernel_flash/l4t_initrd_flash.sh --external-device nvme0n1p1 -p "-c ./bootloader/generic/cfg/flash_t234_qspi.xml" -c ./tools/kernel_flash/flash_l4t_t234_nvme.xml --showlogs --network usb0 jetson-orin-nano-devkit external | tee flash_log.txt

(this is from Flashing Support — NVIDIA Jetson Linux Developer Guide 1 documentation , default flashing case)

Aaaaand this time it succeeded! Here is the log just in case:
flash_log.txt (315.5 KB)

The difference between yesterday and today was that I edited tools/l4t_flash_prerequisites.sh and fixed the name of the netcat package for ubuntu 24.10. Not sure but this seems to be the only difference.

Reliability
Using flash method with initrd script I got much more reliable system. Cuda continues to work after apt upgrade and I have no issues with jetson-containers.

Performance side-note
After switching to 25W power mode I got 10 token/s for the same llama3.2:3b which is 2 token/s faster than it was on 15W (8 token/s) and only 5 token/s faster than on raspberry pi (5 token/s). I don’t know, I think my expectations were wrong about it. You don’t get N times faster if you spend 10 times more $ it seems.

Thanks for the help anyway.

Hi,

Have you executed the following commands during benchmark testing to fully utilize the device’s resources?

sudo nvpmodel -m 0
sudo jetson_clocks 

Thanks

Hi David,

Thanks for the hint, no I didn’t run these and it indeed improves efficiency significantly. Now I get 21 token/s on 25W, which is 11 token/s faster than it was before. Could you please give a link where this is documented, i’m curious what does this spell do now and what side-effects might be on having this executed. Does it make sense to run it after every boot if the only purpuse is to run ollama run llama3.2 with highest possible efficiency?

So this sets power mode to MAXN which is not recommended by documentation. How dangerous is it to use it constantly instead of nvpmodel -m 3 which I had before for 25W?

Hi

MAX-N is an unconstraint power mode which uses maximum frequency for each domains. You can use this mode but it may cause OC event to prevent the hardware defect which may decrease the performance so that I suggested you create a custom power mode for your use case.

To create a custom power mode, you could refer to this topic example

Some power mode matrix could refer this

Thanks

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.