Thor enable 25G but nvpmodel Fail

根据官网文档编译更新两个.dtb文件之后,用ethtool可以看到4个mgbex_0口显示速度都是25000Mb/s,但是却发现nvpmodel执行失败,提示如下:
$ sudo nvpmodel -q
NVPM WARN: power mode is not set!
$ sudo nvpmodel -m 1
NVPM ERROR: Failed to determine the driver of GPU, assuming nvgpu.NVPM ERROR: Error opening /sys/bus/pci/devices/0000:01:00.0/gpu_pg_mask: 2
NVPM ERROR: optMask is 1, no request for power mode
$

刷机命令: sudo ./l4t_initrd_flash.sh jetson-agx-thor-devkit internal

刷机之后,没有进行其他配置或者安装。

参考文档:

Are you sure this issue only happened after you update the dtb but not actually already there before your modification?

我尝试了两种方式给Thor刷机:

1.用sdkmanager刷机。该方法刷机后nvpmodel是正常的,但25G的修改无效。

2.用命令行刷机。sudo ./l4t_initrd_flash.sh jetson-agx-thor-devkit internal 使能25G是有效的,但nvpmodel异常。命令行刷机的前置命令如下:

tar xf Jetson_Linux_R38.2.1_aarch64.tbz2
sudo tar xpf Tegra_Linux_Sample-Root-Filesystem_R38.2.1_aarch64.tbz2 -C Linux_for_Tegra/rootfs/

cd Linux_for_Tegra
sudo ./tools/l4t_flash_prerequisites.sh
sudo ./apply_binaries.sh
cd -

sudo cp tegra264-p4071-0000+p3834-0008-nv.dtb Linux_for_Tegra/kernel/dtb/
sudo cp tegra264-p4071-0000+p3834-0008-nv.dtb Linux_for_Tegra/bootloader/
sudo cp tegra264-p4071-0000+p3834-0008-nv.dtb Linux_for_Tegra/rootfs/boot/

sudo cp tegra264-bpmp-3834-0008-4071-xxxx.dtb Linux_for_Tegra/bootloader/generic/
sudo cp tegra264-bpmp-3834-0008-4071-xxxx.dtb Linux_for_Tegra/bootloader/

cd Linux_for_Tegra
sudo ./tools/l4t_create_default_user.sh -u ictrek -p ictrek -a

what is the result of lspci after you enabled the 25GB?

Please check the file for full pci info. Thanks

20251211_cli_25G_info_pci.txt (50.5 KB)

Your gpu driver is gone from the lspci.

Share your full dmesg too.

因为项目安排的原因,暂时没有拿到开发套件上的dmesg,我另外从一个自研载板+T5000的环境上保存了dmesg,这里面所做的改动是仅使能了pcie C3,没有使能25G。这里同样有nvpmodel失败的问题,和开发套件上使能25G之后是一样的情况。请帮忙看下可否参考,谢谢!

20251223_1610_t5000_dmesg.txt (120.1 KB)

What is the result of lspci on this board?

重建了软件环境Linux_for_Tegra/,除了“sudo ./tools/l4t_create_default_user.sh -u ictrek -p ictrek -a”之外不做任何改动,然后用“sudo ./l4t_initrd_flash.sh jetson-agx-thor-devkit internal”刷到 自研载板+T5000 的环境上,面临同样的nvpmodel错误。

ictrek@tegra-ubuntu:~$ lspci
0000:00:00.0 PCI bridge: NVIDIA Corporation Device 22e6
0000:01:00.0 3D controller: NVIDIA Corporation Device 2b00 (rev a1)
0001:00:00.0 PCI bridge: NVIDIA Corporation Device 22d8
0001:01:00.0 Non-Volatile memory controller: Sandisk Corp WD PC SN740 NVMe SSD 512GB (DRAM-less) (rev 01)
0002:00:00.0 PCI bridge: NVIDIA Corporation Device 22d8
0002:01:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. Device 8127 (rev 05)
0005:00:00.0 PCI bridge: NVIDIA Corporation Device 22d8
0005:01:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller PM9C1a
ictrek@tegra-ubuntu:~$

另外附上dmesg信息:

20251223_1704_t5000_nochange_dmesg.txt (115.1 KB)

Hi,hyla_huang1

Could it be that the operation abnomaly mentioned above was caused by this reason?

感谢提供线索!我会尝试一下看是否有效。

能否提供一下这个文档的链接?我在devkit和developer的guide里都没搜到,谢谢!

3. Apply the binaries based on the platform: Thor

sudo ./apply_binaries.sh --openrm

Thanks very much!

加了这个参数之后,用命令行刷机确实解决了nvpmodel的异常,而且用gpu_gurn能跑起来压力测试。

感谢指点!

但是还遗留了一个gpu相关问题:查看gpu温度时报错如下,其它几个温度都能正常查看。各位高手见过这个情况吗?

hh@tegra-ubuntu:/sys/class/thermal$ cat thermal_zone1/type
gpu-thermal
hh@tegra-ubuntu:/sys/class/thermal$ cat thermal_zone1/temp
cat: thermal_zone1/temp: Resource temporarily unavailable
hh@tegra-ubuntu:/sys/class/thermal$

Hi,hyla_huang1

The previous NVIDIA engineer told me about it, and then I found it in the JetPack 7.0 release note document.

Thanks!

thermal-specifications documents

If you want to see all values

grep -r "" /sys/class/thermal/thermal_zone*/* -d skip -I
or
find /sys/class/thermal/thermal_zone*/ -maxdepth 1 -type f -exec grep -H "" {} +

I encountered the same behavior

cat /sys/class/thermal/thermal_zone*/temp
32531
cat: /sys/class/thermal/thermal_zone1/temp: Resource temporarily unavailable
32187
32531
30687
scott@chithor:~$ cat /sys/class/thermal/thermal_zone*/temp
36093
36093
33531
35437
32218
scott@chithor:~$ ll /sys/class/thermal/thermal_zone*/temp
-r--r--r-- 1 root root 4096 Dec 25 23:27 /sys/class/thermal/thermal_zone0/temp
-r--r--r-- 1 root root 4096 Dec 25 23:27 /sys/class/thermal/thermal_zone1/temp
-r--r--r-- 1 root root 4096 Dec 25 23:27 /sys/class/thermal/thermal_zone2/temp
-r--r--r-- 1 root root 4096 Dec 25 23:27 /sys/class/thermal/thermal_zone3/temp
-r--r--r-- 1 root root 4096 Dec 25 23:27 /sys/class/thermal/thermal_zone4/temp

Thanks !

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.