GPU kernel module reports errors while executing a basic cuda program

Hi team,
I was able to bring up the customized Ubuntu Linux on the orion jetson and most of the components work well. However, I met a problem on GPU/CUDA part while running a basic CUDA program that works in native L4T os. Can experts from Nvidia or anyone else in the forum in the forum shade me some lights on this problem?
There are multiple errors. It seems the key is the falcon (flcn) ucode boot failure. Any suggestions will be appreciated. thanks

tester@tester-desktop:~/cuda$ ./vector_add
[   31.358869] nvgpu: 17000000.ga10b     nvgpu_acr_wait_for_completion:143  [ERR]  flcn-1: HS ucode boot failed, err 5
[   31.359253] nvgpu: 17000000.ga10b     nvgpu_acr_wait_for_completion:145  [ERR]  flcn-1: Mailbox-1 : 0x0
[   31.359547] nvgpu: 17000000.ga10b nvgpu_cic_mon_report_err_safety_services:92   [ERR]  Error reporting is not supported in this platform
[   31.359920] nvgpu: 17000000.ga10b nvgpu_pmu_report_bar0_pri_err_status:41   [ERR]  PMU falcon bar0 timeout. status(0x0), error_type(0xc)
[   31.360296] nvgpu: 17000000.ga10b            ga10b_bootstrap_hs_acr:72   [ERR]  ACR bootstrap failed
[   31.360583] nvgpu: 17000000.ga10b        nvgpu_acr_bootstrap_hs_acr:85   [ERR]  ACR bootstrap failed
[   31.360848] nvgpu: 17000000.ga10b       nvgpu_acr_construct_execute:108  [ERR]  Bootstrap HS ACR failed
[   31.361113] nvgpu: 17000000.ga10b            nvgpu_finalize_poweron:1010 [ERR]  Failed initialization for: g->ops.acr.acr_construct_execute
[   31.391999] nvgpu: 17000000.ga10b                 gk20a_power_write:127  [ERR]  power_node_write failed at busy
libnvrm_gpu.so: NvRmGpuLibOpen failed, error=14
vector_add.work_on_nvidia_release: vector_add.cu:49: int main(): Assertion `fabs(out[i] - a[i] - b[i]) < MAX_ERR' failed.
Aborted (core dumped)

Hi,

Jetson’s GPU driver is integrated into the L4T OS.
Do you build it from the standard Linux or Linux4Tegra?

Thanks.

I built it from the L4T OS. Is there any other info you need to narrow down this problem?
I did some further investigation. All the errors are related the falcon HS ucode boot failure that detected in function nvgpu_acr_wait_for_completion()

Hi,
Could you try this rootfs on Jetpack 5.1.1(r35.3.1) and check if the issue is present:
Root File System — Jetson Linux Developer Guide documentation

Thanks, DaneLLL. I was actually running this on a customized guest os. I have tried on a desktop ubuntu. That test works well. I don;t think we need to check the mininal flavor(I do like this idea). This only happens on a customized os.
Any idea on the cause of the errors in. previous posts? Thanks

Hi,
If you can run it on minumun rootfs, you can compare with the customized guest os to see what is missing. From the log it seems like certain binary is missing but uncertain which one it is.

BTW, I disabled the smmu on the guest os. can the nvgpu work correctly under guest os with smmu disabled?

From the log and investigation, if the code works correctly, the firmware was loaded, verified, but it’s not up as expected