Kernel Warning: clk_prepare_lock Triggered on NVIDIA Orin Nano

Hello,

Summary:
While running on an NVIDIA Orin Nano with kernel 5.10.120-rt70-tegra, a warning occurred in clk_prepare_lock during pm_runtime_work. The issue seems linked to the SPI driver (tegra_spi_runtime_suspend) and power management. The CAN module (mttcan) also reported message loss in queue Q0. This could indicate a conflict or mismanagement in runtime power or clock handling.

2024-12-09 10:12:14.873	
 oksbot69 journald   kernel------------[ cut here ]------------
2024-12-09 10:12:14.873	
 oksbot69 journald   kernelWARNING: CPU: 5 PID: 50116 at drivers/clk/clk.c:168 clk_prepare_lock+0x8c/0xa0
2024-12-09 10:12:14.874	
 oksbot69 journald   kernelModules linked in: can_raw can tcp_diag inet_diag xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink ip6table_nat ip6table_filter ip6_tables iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c xt_addrtype iptable_filter br_netfilter overlay lzo_rle lzo_compress zram iwlmvm ramoops reed_solomon mac80211 loop tcp_bbrplus sch_fq nvgpu aes_ce_blk crypto_simd cryptd iwlwifi aes_ce_cipher ghash_ce cp210x sha2_ce usbserial sha256_arm64 sha1_ce cfg80211 pwm_fan leds_gpio ina3221 userspace_alert tegra_bpmp_thermal imx290(O) spi_tegra114 binfmt_misc nvmap mttcan can_dev ip_tables x_tables [last unloaded: mtd]
2024-12-09 10:12:14.874	
 oksbot69 journald   kernelCPU: 5 PID: 50116 Comm: kworker/5:2 Tainted: G           O      5.10.120-rt70-tegra #2
2024-12-09 10:12:14.874	
 oksbot69 journald   kernelHardware name: Rapyuta Robotics Jiri (DT)
2024-12-09 10:12:14.874	
 oksbot69 journald   kernelWorkqueue: pm pm_runtime_work
2024-12-09 10:12:14.874	
 oksbot69 journald   kernelpstate: 60c00009 (nZCv daif +PAN +UAO -TCO BTYPE=--)
2024-12-09 10:12:14.874	
 oksbot69 journald   kernelpc : clk_prepare_lock+0x8c/0xa0
2024-12-09 10:12:14.874	
 oksbot69 journald   kernellr : clk_prepare_lock+0x48/0xa0
2024-12-09 10:12:14.874	
 oksbot69 journald   kernelsp : ffff80002cdf3c10
2024-12-09 10:12:14.874	
 oksbot69 journald   kernelx29: ffff80002cdf3c10 x28: ffffd9da09dd9000 
2024-12-09 10:12:14.874	
 oksbot69 journald   kernelx27: ffff80002e813ca8 x26: 0000000000000008 
2024-12-09 10:12:14.874	
 oksbot69 journald   kernelx25: ffff80002cdf3d68 x24: ffffd9da0824f800 
2024-12-09 10:12:14.874	
 oksbot69 journald   kernelx23: 0000000000000001 x22: ffff3ccc00be0110 
2024-12-09 10:12:14.874	
 oksbot69 journald   kernelx21: 000000000000000a x20: ffffd9da0a144758 
2024-12-09 10:12:14.874	
 oksbot69 journald   kernelx19: ffffd9da0a144740 x18: 0000000000000000 
2024-12-09 10:12:14.874	
 oksbot69 journald   kernelx17: 0000000000000000 x16: ffffd9da089e2450 
2024-12-09 10:12:14.874	
 oksbot69 journald   kernelx15: 0000000000000000 x14: 000000000000002f 
2024-12-09 10:12:14.874	
 oksbot69 journald   kernelx13: 0000000000000032 x12: 0000000000000024 
2024-12-09 10:12:14.874	
 oksbot69 journald   kernelx11: 0000000000000040 x10: 0000000000000ae0 
2024-12-09 10:12:14.874	
 oksbot69 journald   kernelx9 : ffff3ccc1f50b16c x8 : fefefefefefefeff 
2024-12-09 10:12:14.874	
 oksbot69 journald   kernelx7 : 0000000000000018 x6 : ffff3ccc00e50ec0 
2024-12-09 10:12:14.874	
 oksbot69 journald   kernelx5 : ffff80002cdf3b58 x4 : ffffd9da0a144770 
2024-12-09 10:12:14.875	
 oksbot69 journald   kernelx3 : 0000000000000000 x2 : 0000000100000000 
2024-12-09 10:12:14.875	
 oksbot69 journald   kernelx1 : 0000000000000000 x0 : ffff3ccbd505c9c0 
2024-12-09 10:12:14.875	
 oksbot69 journald   kernelCall trace:
2024-12-09 10:12:14.875	
 oksbot69 journald   kernel clk_prepare_lock+0x8c/0xa0
2024-12-09 10:12:14.875	
 oksbot69 journald   kernel clk_unprepare+0x30/0x50
2024-12-09 10:12:14.875	
 oksbot69 journald   kernel tegra_spi_runtime_suspend+0x68/0x80 [spi_tegra114]
2024-12-09 10:12:14.875	
 oksbot69 journald   kernel pm_generic_runtime_suspend+0x40/0x60
2024-12-09 10:12:14.875	
 oksbot69 journald   kernel __rpm_callback+0xac/0x1b0
2024-12-09 10:12:14.875	
 oksbot69 journald   kernel rpm_callback+0x38/0xa0
2024-12-09 10:12:14.875	
 oksbot69 journald   kernel rpm_suspend+0xe4/0x650
2024-12-09 10:12:14.875	
 oksbot69 journald   kernel pm_runtime_work+0xd0/0xf0
2024-12-09 10:12:14.875	
 oksbot69 journald   kernel process_one_work+0x1c4/0x4d0
2024-12-09 10:12:14.875	
 oksbot69 journald   kernel worker_thread+0x54/0x430
2024-12-09 10:12:14.875	
 oksbot69 journald   kernel kthread+0x180/0x1b0
2024-12-09 10:12:14.875	
 oksbot69 journald   kernel ret_from_fork+0x10/0x24
2024-12-09 10:12:14.875	
 oksbot69 journald   kernel---[ end trace 0000000000000002 ]---
2024-12-09 10:12:14.875	
 oksbot69 journald   kernelmttcan c310000.mttcan can0: mttcan_poll_ir: some msgs lost on in Q0

Hi bhuvanchandra.dv1,

Are you using the devkit or custom board for Orin Nano?
What’s the Jetpack version in use?

Could you share the detailed steps to reproduce the issue?
And provide the full dmesg as file here for further check.

Hello,

Are you using the devkit or custom board for Orin Nano?

We are using a custom carrier board.

What’s the Jetpack version in use?

JetPack 5.1.2

Could you share the detailed steps to reproduce the issue?

We do not yet know the reproducibility of this issue. We faced this recently at our production setup.

And provide the full dmesg as file here for further check.

At the moment I don’t have full dmesg logs.

To further investigate the issue, please help to clarify the exact reproduce steps and the full error logs in dmesg.

Please also help to check if you can reproduce it on the devkit with the latest JP5.1.4(R35.6.0) in case you missed some known issues have been fixed.

Hello Kevin,

To further investigate the issue, please help clarify the exact steps to reproduce and provide the full error logs from dmesg.

If it is not easy to reproduce every reported bug in production, what I am hoping for is some insight into what might be causing this issue. From the logs, it is clear that there is a problem with the power management of the SPI driver. If you could confirm the cause or provide potential reasons, we can start defining actionable steps.

We are logging the journald logs from production. However, due to the lack of an RTC (a design limitation), we cannot capture early boot logs before the NTP time sync. The log snippet we have is from journald at runtime. Could you specify the exact information you need to investigate further in the dmesg logs?

On our system, we are using one SPI port, two UARTs, one USB port, two I2C ports, four MIPI cameras, several GPIOs, WiFi over PCIe, and a 120GB NVMe. We run ROS applications without any custom kernel modules.

Please also help to check if you can reproduce it on the devkit with the latest JP5.1.4 (R35.6.0) in case you missed some known issues that have been fixed.

We understand the importance of testing on the devkit, but our production software requires specific sensors and interfaces not present on the devkit, making it impractical to reproduce the issue in that environment. If you believe testing on the devkit with JP5.1.4 might provide additional insights, could you share any specific scenarios or configurations we should replicate?

Regarding the use of JP5.1.4 (R35.6.0), I do not see any specific known issues we are facing that have been resolved. Are there any known issues addressed in the latest JP that might be relevant? I have reviewed R35.6.0, and nothing related to kernel bugs appears to have been fixed, apart from camera-related issues.

As you said before, we can only know it is relating to suspend function in spi driver from your current log.
That’s all we know so far w/o further error logs and reproduce steps.
We would need a clear reproduce steps on the devkit so that it can exclude some factors due to your custom design.

We would like to verify it locally and do further check, but we need to know how to setup/test.
We don’t know the details of your setup, what application you run to cause the issue.
We always work on the latest release to take time to resolve some known issue.
Sorry that I can’t give you the exact root cause at this moment. However, we would like to check this issue further with more specific reproduce steps on the devkit with the latest release.(R35.6.0)

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.