CPU:0, Error:cbb-noc errors after secureboot enabled (Xavier NX)

System was working fine, but then I blew the fuses and flashed a secure bootloader. The system does still boot, but 10seconds after Linux starts, I see a flood of these messages on every boot.

[   26.792980] CPU:0, Error: cbb-noc@2300000, irq=14
[   26.792984] **************************************
[   26.792987] CPU:0, Error:cbb-noc
[   26.792990]  Error Logger            : 1
[   26.792998]  ErrLog0                 : 0x800f0100
[   26.793003]    Transaction Type      : RD  - Read, Incrementing
[   26.793006]    Error Code            : DEC
[   26.793010]    Error Source          : Initiator NIU
[   26.793013]    Error Description     : Address decode error
[   26.793016]    Packet header Lock    : 0
[   26.793020]    Packet header Len1    : 15
[   26.793023]    NOC protocol version  : version >= 2.7
[   26.793026]  ErrLog1                 : 0x12000c
[   26.793030]  ErrLog2                 : 0x0
[   26.793033]    RouteId               : 0x12000c
[   26.793036]    InitFlow              : ape_p2ps/I/ape_p2ps
[   26.793040]    Targflow              : axis_satellite_grout/T/axis_satellite_grout
[   26.793043]    TargSubRange          : 0
[   26.793046]    SeqId                 : 0
[   26.793049]  ErrLog3                 : 0x9e4c60
[   26.793053]  ErrLog4                 : 0x0
[   26.793057]    Address accessed      : 0x9e4c60
[   26.793060]  ErrLog5                 : 0x358fcf0
[   26.793063]    Non-Modify            : 0x1
[   26.793067]    AXI ID                : 0x6
[   26.793070]    Master ID             : APE
[   26.793073]    Security Group(GRPSEC): 0x3f
[   26.793077]    Cache                 : 0x0 -- Device Non-Bufferable
[   26.793081]    Protection            : 0x7 -- Privileged, Non-Secure, Instruction Access
[   26.793084]    FALCONSEC             : 0x0
[   26.793087]    Virtual Queuing Channel(VQC): 0x0
[   26.793092]  **************************************

The above is a paste of the last one, though I have seen it go until 30seconds.

The system then calms down and I get a serial and SSH login and I have logged in with SSH. The system is running okay, but top shows only 4 cores rather than the normal 6

top - 17:06:51 up 13 min,  1 user,  load average: 0.04, 0.32, 0.52
Tasks: 258 total,   1 running, 257 sleeping,   0 stopped,   0 zombie
%Cpu0  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu1  :  0.0 us,  1.0 sy,  0.0 ni, 99.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu2  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu3  :  0.0 us,  0.3 sy,  0.0 ni, 99.3 id,  0.0 wa,  0.0 hi,  0.3 si,  0.0 st
MiB Mem :   6845.7 total,   5524.2 free,    790.4 used,    531.1 buff/cache
MiB Swap:   3422.8 total,   3422.8 free,      0.0 used.   5842.8 avail Mem

Both my serial port and dmesg have overflowed, so I can’t easiliy see what was happening before this. rebooting and turning off the power allows me to see it. I’ve attached this.

initial-boot.txt (86.1 KB)

We have an HDMI port, but it is connected to an HDMI recieiver (and then an FPGA framebuffer) that has no EDID programmed until after boot is complete. I don’t think this would be detectable at this point in the boot. This arrangement worked fine before secureboot was enabled.

What should I be looking at?

Thanks!

hello nick-ratbert,

this error happens when it’s accessing to 0x9e4c60.
could you please examine your device tree settings, and please check whether it’s turn-on accidentally?

I’m not sure that this is the problem as the device tree worked before, however…

I think I’m using the default dtb, not my working dtb. I used to flash our custom dtb using:

sudo ./flash.sh -d <mydtb>.dtb -r -k kernel-dtb <target> mmcblk0p1

With a full flash using:

sudo ./flash.sh -r -u rsa_priv-3k.pem -v sbk.key jetson-xavier-nx-devkit-emmc mmcblk0p1

…can I still use the ‘-d’ option to use mydtb? I tried it, but I still have the wrong model name in the device-tree. How should I go about using mydtb file instead of the default one? Can I sign it individually and then flash it to the kernet-dtb partition like I always did?

Thanks.

hello nick-ratbert,

FYI, you cannot perform partition update (i.e. -k kernel-dtb) as long as you’ve enable secureboot.
please update the dtb binary file under…$OUT/Linux_for_Tegra/bootloader/ and performing a full flash.

This didn’t work as the model name in the device-tree was still wrong.

I found that the flashing process was picking up mydtb in all the correct places (copying from the dtb build I did in $OUT/Linux_for_Tegra/sources). Linux was still using the default though. I corrected this by copying the dtb like this:

/boot/dtb$ sudo cp ~/kernel_tegra194-p3668-0001-p3509-0000.dtb .
/boot/dtb$ ls -l
total 304
-rw-r--r-- 1 root root 309624 Jan 31 12:54 kernel_tegra194-p3668-0001-p3509-0000.dtb

After a reboot, my model name in /proc/device-tree/model is now correct. I need to capture a new system.img with this in place.

However, the cbb-noc error above is still occuring.

I can’t tell where this is coming from in the device tree. I think tegra194-soc-cvm.dtsi: cbb-noc@2300000 { is a required item in device-tree and it is that device that is reporting the error Address accessed : 0x9e4c60. I can’t find what item in the device-tree is trying to access that address though.

Can you tell me how to track the problem down and also if this is likely to be the reason that I only have 4 cores, not 6?

hello nick-ratbert,

you may disassembler the dtb file into text file for checking what’s within 0x9e4c60.
for instance, $ dtc -I dtb -O dts -o temp.txt tegra194-p3668-0001-p3509-0000.dtb

did you meant you’ve NVP Model clock configuration to leave only 4 CPU cores online?

Hi @JerryChang,

The dtc dump turned up nothing I could pin these errors on, though inspecting my dtsi files I noticed that there was some audio stuff still in there. We don’t use audio on our board, so I commented out:

//#include "tegra194-audio-p3668.dtsi"
//#include "tegra194-super-module-e2614-p3509.dtsi"

in .../sources/hardware/nvidia/platform/t19x/jakku/kernel-dts/common/tegra194-p3509-0000-a00.dtsi

I’m not sure if I can work without any super-module stuff, but the system boots fine and the cbb-noc errors are gone.

At the end of the boot though, I see:

[   13.665733] IRQ215: set affinity failed(-22).
[   13.666057] CPU4: shutdown
[   13.721453] IRQ215: set affinity failed(-22).
[   13.721705] CPU5: shutdown
[   21.623490] nvidia: loading out-of-tree module taints kernel.

full-boot-log-no-audio.txt (61.3 KB)

In /etc/nvpmodel.conf, I have set:

# mandatory section to configure the default power mode
< PM_CONFIG DEFAULT=2 >

Post boot nvpmodel -q reports model 5, or 10W DESKTOP (4 Core).

If I manually run nvpmodel -m 2 for 15W 6 Core, that appears to work fine and top shows 6 cores. Is this selection of model 5 due to the the irq affinity failures in the boot? Or have I put the boot time model number in the wrong place?

Oh, never mind. I found the answer here: nvpmodel default mode

All good!

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.