Problem with upgrading jetpack 6.0 to jetpack 6.1

nbeser1 · October 10, 2024, 12:37pm

I have come to the conclusion that this upgrade is too buggy to actually use. I realize that is a bit harsh and I need to explain. I upgraded my jetson to jetpack 6.1, After the upgrade, jtop shows the libraries have been upgraded, but it also says that the jetpack is not installed. I have been informed that that is a jtop issue and can be corrected by modifying it source code.

I teach a class with this jetson, and use nsight-compute (previous version 2023.2.2, new version 2024.3.1). This new version Nsight-compute consistely crashed on cuda examples that it had work fine with. I recompiled the examples, and discovered the recompiled code worked fine by itself, but nsight compute ran until the very end and then crashed. It offered to send a report to invidia. It did this for several cuda program that work fine by itself. (perhaps nsight compute also changes and requires some other compilation argument that was different?).

Is there a way to roll back the installation to the previous version? Is there a fix for nsight-compute?

I was fortunate that I as a instructor I have several jetson orin nano systems, and I was able to continue teaching with a unmodified system.

KevinFFF · October 11, 2024, 1:43am

Hi nbeser1,

Are you using the devkit or custom board for Orin Nano?

How did you upgrade it from JP6.0 to JP6.1?

Could you share the result of cat /etc/nv_tegra_release on your board?
Have you tried to run sudo apt install nvidia-jetpack?

Have you tried to use command to flash previous release?

nbeser1 · October 11, 2024, 2:13am

I upgraded JP6.0 to JP6.1 using the method from:

This is a dev kit.
I issued the following commands:
sudo apt update
sudo apt install nvidia-jetpack

Here is the file you asked about:
$ cat nv_tegra_release

R36 (release), REVISION: 4.0, GCID: 37537400, BOARD: generic, EABI: aarch64, DATE: Fri Sep 13 04:36:44 UTC 2024

KERNEL_VARIANT: oot

TARGET_USERSPACE_LIB_DIR=nvidia

TARGET_USERSPACE_LIB_DIR_PATH=usr/lib/aarch64-linux-gnu/nvidia

KevinFFF · October 14, 2024, 8:37am

It seems you have updated the devkit to JP6.1(R36.4.0).

It is not allowed to roll-back to previous release, but you can reflash the board for this use case.

nbeser1 · October 14, 2024, 12:33pm

You did not answer my question. Is there a fix for nsight-compute? Your announcement for jetpack 6.1 did not list a update for nsight compute, but after 6.1 was updated we were running 2024.3.1. If you intend for 2024.3.1 to be in jetpack 6.1 is there a way to make it work without crashing?

AastaLLL · October 25, 2024, 5:33am

Hi,

We tested ncu on JetPack 6.1 and it can work without issue.

The sample we test is a simple vectorAdd CUDA kernel.
Which app do you use? Is there any EGL or special library involved so we can find a similar app to give it a try?

$ cat /etc/nv_tegra_release 
# R36 (release), REVISION: 4.0, GCID: 37537400, BOARD: generic, EABI: aarch64, DATE: Fri Sep 13 04:36:44 UTC 2024
# KERNEL_VARIANT: oot
TARGET_USERSPACE_LIB_DIR=nvidia
TARGET_USERSPACE_LIB_DIR_PATH=usr/lib/aarch64-linux-gnu/nvidia

$ ncu --version
NVIDIA (R) Nsight Compute Command Line Profiler
Copyright (c) 2018-2024 NVIDIA Corporation
Version 2024.3.1.0 (build 34702747) (public-release)

$ sudo /opt/nvidia/nsight-compute/2024.3.1/ncu ./vectorAdd
[Vector addition of 50000 elements]
==PROF== Connected to process 24865 (/home/nvidia/cuda-samples/Samples/0_Introduction/vectorAdd/vectorAdd)
Copy input data from the host memory to the CUDA device
CUDA kernel launch with 196 blocks of 256 threads
==PROF== Profiling "vectorAdd" - 0: 0%....50%....100% - 8 passes
Copy output data from the CUDA device to the host memory
Test PASSED
Done
==PROF== Disconnected from process 24865
[24865] vectorAdd@127.0.0.1
  vectorAdd(const float *, const float *, float *, int) (196, 1, 1)x(256, 1, 1), Context 1, Stream 7, Device 0, CC 8.7
    Section: GPU Speed Of Light Throughput
    ----------------------- ----------- ------------
    Metric Name             Metric Unit Metric Value
    ----------------------- ----------- ------------
    SM Frequency                    Mhz       303.70
    Elapsed Cycles                cycle         7280
    Memory Throughput                 %        37.25
    Duration                         us        23.97
    L1/TEX Cache Throughput           %        13.91
    L2 Cache Throughput               %        37.25
    SM Active Cycles              cycle      3787.56
    Compute (SM) Throughput           %        15.63
    ----------------------- ----------- ------------

    OPT   This kernel exhibits low compute throughput and memory bandwidth utilization relative to the peak performance 
          of this device. Achieved compute throughput and/or memory bandwidth below 60.0% of peak typically indicate    
          latency issues. Look at Scheduler Statistics and Warp State Statistics for potential reasons.                 

    Section: Launch Statistics
    -------------------------------- --------------- ---------------
    Metric Name                          Metric Unit    Metric Value
    -------------------------------- --------------- ---------------
    Block Size                                                   256
    Function Cache Configuration                     CachePreferNone
    Grid Size                                                    196
    Registers Per Thread             register/thread              16
    Shared Memory Configuration Size           Kbyte            8.19
    Driver Shared Memory Per Block       Kbyte/block            1.02
    Dynamic Shared Memory Per Block       byte/block               0
    Static Shared Memory Per Block        byte/block               0
    # SMs                                         SM              16
    Threads                                   thread          5,0176
    Uses Green Context                                             0
    Waves Per SM                                                2.04
    -------------------------------- --------------- ---------------

    OPT   Est. Speedup: 33.33%                                                                                          
          A wave of thread blocks is defined as the maximum number of blocks that can be executed in parallel on the    
          target GPU. The number of blocks in a wave depends on the number of multiprocessors and the theoretical       
          occupancy of the kernel. This kernel launch results in 2 full waves and a partial wave of 3 thread blocks.    
          Under the assumption of a uniform execution duration of all thread blocks, the partial wave may account for   
          up to 33.3% of the total kernel runtime with a lower occupancy of 22.5%. Try launching a grid with no         
          partial wave. The overall impact of this tail effect also lessens with the number of full waves executed for  
          a grid. See the Hardware Model                                                                                
          (https://docs.nvidia.com/nsight-compute/ProfilingGuide/index.html#metrics-hw-model) description for more      
          details on launch configurations.                                                                             

    Section: Occupancy
    ------------------------------- ----------- ------------
    Metric Name                     Metric Unit Metric Value
    ------------------------------- ----------- ------------
    Block Limit SM                        block           16
    Block Limit Registers                 block           16
    Block Limit Shared Mem                block            8
    Block Limit Warps                     block            6
    Theoretical Active Warps per SM        warp           48
    Theoretical Occupancy                     %          100
    Achieved Occupancy                        %        77.45
    Achieved Active Warps Per SM           warp        37.18
    ------------------------------- ----------- ------------

    OPT   Est. Local Speedup: 22.55%                                                                                    
          The difference between calculated theoretical (100.0%) and measured achieved occupancy (77.5%) can be the     
          result of warp scheduling overheads or workload imbalances during the kernel execution. Load imbalances can   
          occur between warps within a block as well as across blocks of the same kernel. See the CUDA Best Practices   
          Guide (https://docs.nvidia.com/cuda/cuda-c-best-practices-guide/index.html#occupancy) for more details on     
          optimizing occupancy.                                                                                         

    Section: GPU and Memory Workload Distribution
    -------------------------- ----------- ------------
    Metric Name                Metric Unit Metric Value
    -------------------------- ----------- ------------
    Average L1 Active Cycles         cycle      3787.56
    Total L1 Elapsed Cycles          cycle       9,4552
    Average L2 Active Cycles         cycle      3397.25
    Total L2 Elapsed Cycles          cycle       5,8216
    Average SM Active Cycles         cycle      3787.56
    Total SM Elapsed Cycles          cycle       9,4552
    Average SMSP Active Cycles       cycle      3752.45
    Total SMSP Elapsed Cycles        cycle      37,8208
    -------------------------- ----------- ------------

    WRN   The optional metric dram__cycles_active.avg could not be found. Collecting it as an additional metric could   
          enable the rule to provide more guidance.

Thanks.

system · November 20, 2024, 4:59am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Nsight-compute (ncu-ui) failure on jetpack 6.2 jetson orin nano Nsight Compute jetpack , jetson	15	468	April 26, 2025
How to install and use nsight compute with jetpack 5.0? Jetson AGX Orin nsight	15	1969	November 6, 2023
Correct linux Nsight Compute ui version for Jetpack 6 GA Nsight Compute cudnn	6	811	July 12, 2024
Jetpack 6.1 - docker cuda version mismatch Jetson Orin Nano docker	15	1005	December 19, 2024
Can not find Nsight Compute for JetPack 4.6 Jetson AGX Xavier nsight	4	1986	November 10, 2021
Jetson TX2 (8GB) Dev Board - JetPack 4.6.2 Does not Install NSIGHT Compute Jetson Nano cuda , nsight , profiling	8	1705	November 29, 2022
Jetpack 6.0 to 6.1 Upgrade: 'upgrade-jetpack' vs 'Upgradable Compute Stack'? Jetson AGX Orin ubuntu	6	937	December 25, 2024
JetPack 6.1 Release Announcement Jetson Orin Nano cudnn , jetson , deepstream	11	2926	August 20, 2025
Command 'sudo apt install nvidia-jetpack' is failling after upgrading to JetPack 6.1 Jetson AGX Orin reflash	2	187	October 13, 2024
Upgrade Jetpack from 6.0 to 6.1 on Orin NX Fails Jetson Orin NX apt_upgrade	7	248	January 24, 2025

Problem with upgrading jetpack 6.0 to jetpack 6.1

R36 (release), REVISION: 4.0, GCID: 37537400, BOARD: generic, EABI: aarch64, DATE: Fri Sep 13 04:36:44 UTC 2024

KERNEL_VARIANT: oot

Related topics