Xaiver 4.3 crash when kernel run in the function bpmp_trywait->wait_for_completion_timeout

Hi, I found that when entering the function bpmp_trywait at the point wait_for_completion_timeout, the kernel sometimes crash,the crash info is as following. Can you share me some introduction about BPMP and CBB? I can’t found any information about bpmp in the internet.
CPU:0, Error:CBB-NOC@0x2300000,irq=485
[ 4.913850] **************************************
[ 4.913853] * For more Internal Decode Help
[ 4.913858] * http://nv/cbberr
[ 4.913861] * NVIDIA userID is required to access
[ 4.913864] **************************************
[ 4.913868] CPU:0, Error:CBB-NOC
[ 4.913872] Error Logger : 0
[ 4.913886] ErrLog0 : 0x80000000
[ 4.913892] Transaction Type : RD - Read, Incrementing
[ 4.913896] Error Code : SLV
[ 4.913900] Error Source : Target
[ 4.913905] Error Description : Target error detected by CBB slave
[ 4.913918] Packet header Lock : 0
[ 4.913922] Packet header Len1 : 0
[ 4.913928] NOC protocol version : version >= 2.7
[ 4.913932] ErrLog1 : 0x40000
[ 4.913936] ErrLog2 : 0x0
[ 4.913946] RouteId : 0x40000
[ 4.913951] InitFlow : aon_p2ps/I/aon
[ 4.913955] Targflow : gpu_p2pm/T/gpu_p2pm
[ 4.913959] TargSubRange : 0
[ 4.913963] SeqId : 0
[ 4.913967] ErrLog3 : 0x10
[ 4.913971] ErrLog4 : 0x80
[ 4.914006] debug using routeid alone as below address is a joker entry and not-reliable.
[ 4.914007] Address : 0x8017000010 (unknown device)
[ 4.914011] ErrLog5 : 0x0
[ 4.914015] Non-Modify : 0x0
[ 4.914018] AXI ID : 0x0
[ 4.914023] Master ID : �{���
[ 4.914027] Security Group(GRPSEC): 0x0
[ 4.914031] Cache : 0x0 – Non-cacheable/Non-Bufferable)
[ 4.914038] Protection : 0x0 – Unprivileged, Secure, Data Access
[ 4.914042] FALCONSEC : 0x0
[ 4.914046] Virtual Queuing Channel(VQC): 0x0
[ 4.914050] **************************************

I am more curious about what you are doing now.

What kind of use case needs to call bpmp_trywait on your side?

hello, the call order is as following.In the follwing first code, i=1.
pclks[i].max_rate = clk_round_rate(pclks[i].clk, UINT_MAX);
ret = clk_core_round_rate_nolock(clk->core, &req);
return core->ops->determine_rate(core->hw, req);
.determine_rate = clk_bpmp_determine_rate,
err = bpmp_send_clk_message(req, sizeof(req_d), reply, sizeof(reply));
err = tegra_bpmp_send_receive(MRQ_CLK, req, size, reply, reply_size);
r = bpmp_trywait(ch, mrq, ob_data, ob_sz);
rt = wait_for_completion_timeout(w, timeout);

thanks

What driver is that?

Can you share the introduction to bpmp and cbb? I’m looking for it.
Thanks

It is not open source component, so nothing I can share here.

The point is we don’t know why you are configuring this.

Hello, the driver is tegra_thermal_throttle.c.
The devicetree section is as following.
bthrot_cdev {
compatible = “nvidia,tegra-thermal-throttle”;
clocks = <0x4 0x118 0x4 0x119 0x4 0x11a 0x4 0x11b 0x4 0xda>;
clock-names = “cpu0”, “cpu1”, “cpu2”, “cpu3”, “gpu”;
Thanks

I don’t see such code gets called in this driver.

Hello, the driver is tegra_thermal_throttle.c.
The devicetree section is as following.
bthrot_cdev {
compatible = “nvidia,tegra-thermal-throttle”;
clocks = <0x4 0x118 0x4 0x119 0x4 0x11a 0x4 0x11b 0x4 0xda>;
clock-names = “cpu0”, “cpu1”, “cpu2”, “cpu3”, “gpu”;
Thanks

Hi,

I opened my tegra_thermal_throttle.c and search “bpmp” but there is no result of this keyword.

Can you firstly share what is the exact problem you want to ask here?

The call order is following, first call clk_round_rate,then call clk_core_round_rate_nolock in clk_round_rate, and following this order.
pclks[i].max_rate = clk_round_rate(pclks[i].clk, UINT_MAX);
ret = clk_core_round_rate_nolock(clk->core, &req);
return core->ops->determine_rate(core->hw, req);
.determine_rate = clk_bpmp_determine_rate,
err = bpmp_send_clk_message(req, sizeof(req_d), reply, sizeof(reply));
err = tegra_bpmp_send_receive(MRQ_CLK, req, size, reply, reply_size);
r = bpmp_trywait(ch, mrq, ob_data, ob_sz);
rt = wait_for_completion_timeout(w, timeout);

What we want to know is where did this code come from… is it added by you?

If so, what is your purpose to add this?

This code is NVIDIA code, I don’t change anything.

Can you directly attach the full code file?

the attachment is about the linux partial code I used.linux-partial-kernel.zip (35.0 KB)

Just share some advice through private message. Please share your reply in English here.

Thanks.

OK, I mean the attachment is the code you need.linux-partial-kernel.zip (35.0 KB)

No, they are not what I need… check my private message first…

Our baseboard first power up, then the Xavier power up after 4 seconds,and Strapping Pins is dangling.This is the diffirence.

Will same module, same software trigger this problem if it is NV devkit?