Cloned image causes system crash on another unit ( in production)

Hi,
I have been trying to setup production with cloning Jetson TX2 APP partition.
Both creating the partition as well as restoring works ok ( we use Jetson TX2 with Orbitty carrier - from Connectech) . We use their flashing profile ( so we end with sudo ./flash.sh cti/tx2/orbitty mmcblk0p1 )

But - when we boot the board with restored image - it fails ( we made sure we use same revision of boards)

It fails on Nvidia driver - any idea why ???

[ 15.257602] Unable to handle kernel paging request at virtual address a20600009c
[ 15.265150] Mem abort info:
[ 15.267962] ESR = 0x86000004
[ 15.271258] Exception class = IABT (current EL), IL = 32 bits
[ 15.277223] SET = 0, FnV = 0
[ 15.280300] EA = 0, S1PTW = 0
[ 15.283491] [000000a20600009c] address between user and kernel address ranges
[ 15.290698] Internal error: Oops: 86000004 [#1] PREEMPT SMP
[ 15.296293] Modules linked in: overlay binfmt_misc bcmdhd cfg80211 userspace_alert ip6t_REJECT nvgpu nf_reject_ipv6 nf_log_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 ipt_REJECT nf_reject_ipv4 nf_log_ipv4 nf_log_common xt_LOG xt_limit xt_tcpudp xt_addrtype nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack ip6table_filter ip6_tables nf_conntrack_netbios_ns nf_conntrack_broadcast nf_nat_ftp nf_nat nf_conntrack_ftp nf_conntrack iptable_filter bluedroid_pm ip_tables x_tables
[ 15.337802] CPU: 5 PID: 6303 Comm: nvpmodel Not tainted 4.9.201-tegra #2
[ 15.344538] Hardware name: quill (DT)
[ 15.348221] task: ffffffc1e6a38000 task.stack: ffffffc1e2bec000
[ 15.354175] PC is at 0xa20600009c
[ 15.357940] LR is at gk20a_finalize_poweron+0x80/0x8f0 [nvgpu]
[ 15.363824] pc : [<000000a20600009c>] lr : [] pstate: 40400045
[ 15.371290] sp : ffffffc1e2befaf0
[ 15.374651] x29: ffffffc1e2befaf0 x28: ffffffc1e6a38000
[ 15.380070] x27: ffffff8008f82000 x26: 0000000000000000
[ 15.385523] x25: ffffff8001129008 x24: ffffff8001176478
[ 15.390972] x23: ffffffc1de4d8000 x22: ffffffc1de4d8000
[ 15.396420] x21: ffffff800112da18 x20: 0000000000000001
[ 15.401875] x19: ffffffc1de4d0000 x18: 0000000000001398
[ 15.407379] x17: 0000000000000001 x16: 0000000000000000
[ 15.412885] x15: 0000000000001318 x14: 000000000000c5c8
[ 15.418392] x13: 00000000000012d8 x12: ffffff800105e8d8
[ 15.423896] x11: ffffff80010db038 x10: ffffff80010e0c90
[ 15.429400] x9 : ffffff80010e0cf0 x8 : ffffff80010e2230
[ 15.434905] x7 : ffffff80010ee8a0 x6 : ffffff80010eed48
[ 15.440412] x5 : ffffff80010eed78 x4 : ffffff8001158198
[ 15.445917] x3 : 0000000000000083 x2 : 000000a20600009c
[ 15.451422] x1 : 0000000000000001 x0 : ffffffc1de4d0000
[ 15.456926]
[ 15.458476] Process nvpmodel (pid: 6303, stack limit = 0xffffffc1e2bec000)
[ 15.465525] Call trace:
[ 15.468054] [<000000a20600009c>] 0xa20600009c
[ 15.473614] [] gk20a_pm_finalize_poweron+0xe4/0x400 [nvgpu]
[ 15.481994] [] gk20a_pm_runtime_resume+0x58/0x70 [nvgpu]
[ 15.489062] [] pm_generic_runtime_resume+0x3c/0x58
[ 15.495586] [] __rpm_callback+0x74/0xa0
[ 15.501131] [] rpm_callback+0x34/0x98
[ 15.506497] [] rpm_resume+0x470/0x710
[ 15.511865] [] pm_runtime_forbid+0x64/0x78
[ 15.517680] [] control_store+0xf4/0x118
[ 15.523225] [] dev_attr_store+0x44/0x60
[ 15.528778] [] sysfs_kf_write+0x58/0x80
[ 15.534326] [] kernfs_fop_write+0xfc/0x1e0
[ 15.540141] [] __vfs_write+0x48/0x118
[ 15.545508] [] vfs_write+0xac/0x1b0
[ 15.550698] [] SyS_write+0x5c/0xc8
[ 15.555800] [] el0_svc_naked+0x34/0x38
[ 15.561260] —[ end trace 4584a98e3ec436bb ]—

The new board had initial setup comes from Jetpack 4.5 ( same as the one used for creating clone image)

I am not quite sure about this comment.

sudo ./flash.sh cti/tx2/orbitty mmcblk0p1

You said that your last command to flash the board is above one?

Hi,

Let me clarify the process ( it did work for previous JetPack - I think 4.3 )

Once I get factory fresh Jetson TX2 and put it in Orbitty carrier I need to apply BSP to Jetpack ( full instruction is here : Flashing NVIDIA Jetson TX2 or TX1 Module - YouTube )

Brief story:

cd ~/nvidia/nvidia_sdk/JetPack_4.5_Linux_JETSON_TX2/
cd Linux_for_Tegra/
wget https://connecttech.com/ftp/Drivers/CTI-L4T-TX2-32.5-V001.tgz
tar -zxf CTI-L4T-TX2-32.5-V001.tgz
cd CTI-L4T/
sudo ./install.sh
cd …
sudo ./flash.sh cti/tx2/orbitty mmcblk0p1

This is the good working part - such images work nicely on all our Jetsons.
Connectech did something strange in that way with Orbitty that I am not able to connect to the Jetson itself using unpatched JetPack - it just needs patched image.

But then we reconfigure the system a bit, install Docker ( we use ROS 2 from Docker) and then - we want to create golden image

We do it with:

sudo ./flash.h -r -k APP -G ~/backup.img cti/tx2/orbitty mmcblk0p1

Then we try to apply the image to Jetson which just had based Jetpack installed ( with Orbiitty BSP)

sudo ./flash.h -r -k APP --image ~/backup.img cti/tx2/orbitty mmcblk0p1

This unfortunately causes crash as described before - I mean I can connect to the board using serial cable, but the nvpmodel crashed and most of the system is not operational.

See below log from our golden unit and seond one from the other - with crash

log_ok_golden_jetson.txt (23.7 KB)
log_faulty_cloned_jetson.txt (29.8 KB)

Not sure if this is related to their board. You can try to contact with vendor first.

Well, I have asked for their support lets see.

But I am getting hard kernel memory crash on Nvidia hardware driver - any advice how to narrow root cause ?
Base rootfs partition should not have any hardware calibrations ( like DDR memory adjustments etc.) so it is really strange

Just to clarify. Have you tried to clone it out and flash it back using the same device?

Or you always used two devices to do the test?

I have actually used three devices - one as “golden” ( to create image).
Second - to restore the image - it crashed, but I noticed that it was different revision. So I took third - identical to our “golden” - also crash.

The orbitty support recomended different cloning procedure then I made - I have used the pair:

sudo ./flash.h -r -k APP -G ~/backup.img cti/tx2/orbitty mmcblk0p1
sudo ./flash.h -r -k APP --image ~/backup.img cti/tx2/orbitty mmcblk0p1

Their method ( https://connecttech.com/resource-center/kdb-378-cloning-jetson-modules-with-connect-tech-board-support-package/ )

./flash.sh -r -k APP -G clone.img jetson-tx2 mmcblk0p1
cp clone.img bootloader/system.img
./flash.sh -r cti/tx2/orbitty mmcblk0p1

So clonning with Nvidia profile, but restoring the clone with theirs - testing it now.

Ok, this sequence works.

There must be much more under the hood the just copying raw APP partition image ( otherwise it should not matter)
The sequence uses different kernels and dtb files depending on the context. It also overwrites all partitions when restoring the image ( not just APP)

But … it works

Thanks for help and suggestions.

Unless I know what vendor has done in their own board config, otherwise we cannot give any explanation.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.