I have to say I’m quite disappointed by the lack of support and follow-up from NVIDIA on this thread. This issue has been impacting production systems, and it’s frustrating to have to dig so deeply without any concrete guidance from the development team. I understand that launching new products is important, but it shouldn’t take priority over ensuring that existing ones run reliably. Is the entire NVIDIA team focused only on the next release?
After many months of extensive testing, we discovered that removing the ensemble from Triton Server completely resolves the issue. To work around it, we built a new model using the Python Triton backend that internally loads all the models from our previous ensemble pipeline (ONNX, Python, TensorRT, etc.) and executes them in the same order and logic as before. With this approach, the RCU_PREEMPT issue no longer occurs.
It seems that something in the ensemble handling layer is triggering the kernel-level problem. When we bypass the ensemble and manage the pipeline ourselves inside a single Python backend model, everything runs smoothly.
That said, this solution is not ideal, as we lose some performance due to the lack of concurrency that the ensemble previously provided.
Hopefully, this helps others encountering the same issue.
I agree the response from NVIDIA has been poor. We have escalated this via our supplier who have raise the issue with Advantech, who have recreated the problem and are raising this with NVIDIA. I can only hope they have more success in finding a solution.
I have bypassed our ensemble, running the model directly (so no pre/post processing). After about 10 days I have met another rcu_preempt error and the system restarted.
We are running a YOLOX small model at maximum load the system can handle. As mentioned before, I ran this same test on an identical system running Jetpack 5.1.2 - and it has been running for over a month with no problems.
Bypassing the ensemble will allow us to run a higher load before it fails, giving us some breathing room - but this is far from an acceptable solution.
Does “bypass” mean all models are still loaded in the inference server, but you’re just calling the main one directly?
Or does it mean you actually removed all the other models from the server?
We’re asking because our setup has been running stably for 16 days on multiple machines without the issue; however, in our setup, we removed all other models and now only have that single, unique model in the inference server.
Hello, we are having the same issue on the following:
Jetson Orin NX 16GB
L4T 36.4.3
Oct 30 09:58:33 19-eai131-16gb kernel: [ 1572.262310] tegra-hda 3510000.hda: azx_get_response timeout, switching to polling mode: last cmd=0x004f0900
Oct 30 09:58:49 19-eai131-16gb kernel: [ 1578.422080] ---- syncpts ----
Oct 30 09:58:49 19-eai131-16gb kernel: [ 1578.422093] id 0 (reserved) min 0 max 0 (0 waiters)
Oct 30 09:58:49 19-eai131-16gb kernel: [ 1578.422095] id 1 (1-15340000.vic) min 0 max 0 (0 waiters)
Oct 30 09:58:49 19-eai131-16gb kernel: [ 1578.422097] id 2 (2-15480000.nvdec) min 0 max 0 (0 waiters)
Oct 30 09:58:49 19-eai131-16gb kernel: [ 1578.422099] id 3 (3-154c0000.nvenc) min 0 max 0 (0 waiters)
Oct 30 09:58:49 19-eai131-16gb kernel: [ 1578.422101] id 4 (4-15380000.nvjpg) min 0 max 0 (0 waiters)
Oct 30 09:58:49 19-eai131-16gb kernel: [ 1578.422103] id 5 (5-15540000.nvjpg) min 0 max 0 (0 waiters)
Oct 30 09:58:49 19-eai131-16gb kernel: [ 1578.422105] id 6 (6-15a50000.ofa) min 0 max 0 (0 waiters)
Oct 30 09:58:49 19-eai131-16gb kernel: [ 1578.422107] id 7 (7-pva_syncpt) min 0 max 0 (0 waiters)
Oct 30 09:58:49 19-eai131-16gb kernel: [ 1578.422109] id 8 (8-pva_syncpt) min 0 max 0 (0 waiters)
Oct 30 09:58:49 19-eai131-16gb kernel: [ 1578.422111] id 9 (9-pva_syncpt) min 0 max 0 (0 waiters)
Oct 30 09:58:49 19-eai131-16gb kernel: [ 1578.422114] id 10 (10-pva_syncpt) min 0 max 0 (0 waiters)
Oct 30 09:58:49 19-eai131-16gb kernel: [ 1578.422115] id 11 (11-pva_syncpt) min 0 max 0 (0 waiters)
Oct 30 09:58:49 19-eai131-16gb kernel: [ 1578.422117] id 12 (12-pva_syncpt) min 0 max 0 (0 waiters)
Oct 30 09:58:49 19-eai131-16gb kernel: [ 1578.422119] id 13 (13-pva_syncpt) min 0 max 0 (0 waiters)
Oct 30 09:58:49 19-eai131-16gb kernel: [ 1578.422121] id 14 (14-pva_syncpt) min 0 max 0 (0 waiters)
Oct 30 09:58:49 19-eai131-16gb kernel: [ 1578.422123] id 15 (15-ga10b_511_user) min 2 max 0 (0 waiters)
This happens when we run 2 parallel processes, with Deepstream Pipeline, each running a rf-detr using a gst-nvinfer.
We do not use Triton at all.
The lock happens reliably around after ~30 minutes of processing
We previously could only reproduce this with the Triton Inference server, and it takes hours to reproduce.
Another app will help us figure out the issue.
On our side, we merged the ensemble into a unique model using the Python backend. The final setup had only this Python model loaded in the inference server, and it has been running continuously and at max capacity for weeks now.
It seems to me that the issue is not the ensemble but how the inference server deals with multiple models, regardless of whether they are used in an ensemble or not.
Hello, unfortunately, it is an internal application, which I do not have the permission to share with you.
Only thing what I can do at this moment to help, is to share some information with you. This is the last kernel log from our Orin device, when the lockup happened:
`Nov 3 21:41:25 19-eai131-16gb kernel: [ 7325.714036] tegra-hda 3510000.hda: azx_get_response timeout, switching to polling mode: last cmd=0x004f0900`
`Nov 3 21:41:41 19-eai131-16gb kernel: [ 7330.833828] ---- syncpts ----`
`Nov 3 21:41:41 19-eai131-16gb kernel: [ 7330.833840] id 0 (reserved) min 0 max 0 (0 waiters)`
`Nov 3 21:41:41 19-eai131-16gb kernel: [ 7330.833844] id 1 (1-pva_syncpt) min 0 max 0 (0 waiters)`
`Nov 3 21:41:41 19-eai131-16gb kernel: [ 7330.833846] id 2 (2-pva_syncpt) min 0 max 0 (0 waiters)`
`Nov 3 21:41:41 19-eai131-16gb kernel: [ 7330.833849] id 3 (3-pva_syncpt) min 0 max 0 (0 waiters)`
`Nov 3 21:41:41 19-eai131-16gb kernel: [ 7330.833852] id 4 (4-pva_syncpt) min 0 max 0 (0 waiters)`
`Nov 3 21:41:41 19-eai131-16gb kernel: [ 7330.833855] id 5 (5-pva_syncpt) min 0 max 0 (0 waiters)`
`Nov 3 21:41:41 19-eai131-16gb kernel: [ 7330.833857] id 6 (6-pva_syncpt) min 0 max 0 (0 waiters)`
`Nov 3 21:41:41 19-eai131-16gb kernel: [ 7330.833860] id 7 (7-pva_syncpt) min 0 max 0 (0 waiters)`
`Nov 3 21:41:41 19-eai131-16gb kernel: [ 7330.833863] id 8 (8-pva_syncpt) min 0 max 0 (0 waiters)`
`Nov 3 21:41:41 19-eai131-16gb kernel: [ 7330.833866] id 9 (9-15340000.vic) min 0 max 0 (0 waiters)`
`Nov 3 21:41:41 19-eai131-16gb kernel: [ 7330.833869] id 10 (10-15480000.nvdec) min 0 max 0 (0 waiters)`
`Nov 3 21:41:41 19-eai131-16gb kernel: [ 7330.833872] id 11 (11-154c0000.nvenc) min 0 max 0 (0 waiters)`
`Nov 3 21:41:41 19-eai131-16gb kernel: [ 7330.833874] id 12 (12-15380000.nvjpg) min 0 max 0 (0 waiters)`
`Nov 3 21:41:41 19-eai131-16gb kernel: [ 7330.833877] id 13 (13-15540000.nvjpg) min 0 max 0 (0 waiters)`
`Nov 3 21:41:41 19-eai131-16gb kernel: [ 7330.833880] id 14 (14-15a50000.ofa) min 0 max 0 (0 waiters)`
`Nov 3 21:41:41 19-eai131-16gb kernel: [ 7330.833883] id 15 (15-ga10b_511_user) min 12875 max 0 (0 waiters)`
`Nov 3 21:41:41 19-eai131-16gb kernel: [ 7330.833885] id 16 (16-ga10b_510_user) min 0 max 0 (0 waiters)`
`Nov 3 21:41:41 19-eai131-16gb kernel: [ 7330.833887] id 17 (17-ga10b_509_user) min 0 max 0 (0 waiters)`
`Nov 3 21:41:41 19-eai131-16gb kernel: [ 7330.833889] id 18 (18-ga10b_508_user) min 0 max 0 (0 waiters)`
`Nov 3 21:41:41 19-eai131-16gb kernel: [ 7330.833891] id 19 (19-ga10b_507_user) min 0 max 0 (0 waiters)`
`Nov 3 21:41:41 19-eai131-16gb kernel: [ 7330.833893] id 20 (20-ga10b_506_user) min 0 max 0 (0 waiters)`
`Nov 3 21:41:41 19-eai131-16gb kernel: [ 7330.833895] id 21 (21-ga10b_505_user) min 106302 max 0 (0 waiters)`
`Nov 3 21:41:41 19-eai131-16gb kernel: [ 7330.833897] id 22 (22-ga10b_504_user) min 48492 max 0 (0 waiters)`
`Nov 3 21:41:41 19-eai131-16gb kernel: [ 7330.833899] id 23 (23-ga10b_503_user) min 0 max 0 (0 waiters)`
`Nov 3 21:41:41 19-eai131-16gb kernel: [ 7330.833900] id 24 (24-ga10b_502_user) min 12879 max 0 (0 waiters)`
`Nov 3 21:41:41 19-eai131-16gb kernel: [ 7330.833902] id 25 (25-ga10b_501_user) min 443892 max 0 (0 waiters)`
`Nov 3 21:41:41 19-eai131-16gb kernel: [ 7330.833904] id 26 (26-ga10b_500_user) min 12875 max 0 (1 waiters)`
`Nov 3 21:41:41 19-eai131-16gb kernel: [ 7341.533367] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:`
`Nov 3 21:41:41 19-eai131-16gb kernel: [ 7341.533372] rcu: 0-…0: (1 GPs behind) idle=881/1/0x4000000000000002 softirq=2039940/2039946 fqs=1050`
`Nov 3 21:41:41 19-eai131-16gb kernel: [ 7341.533379] (detected by 6, t=5256 jiffies, g=2114317, q=53536)`
`Nov 3 21:41:41 19-eai131-16gb kernel: [ 7341.533382] Task dump for CPU 0:`
`Nov 3 21:41:41 19-eai131-16gb kernel: [ 7341.533384] task:cuda-EvtHandlr state:R running task stack: 0 pid: 3294 ppid: 2465 flags:0x0000080e`
`Nov 3 21:41:41 19-eai131-16gb kernel: [ 7341.533391] Call trace:`
`Nov 3 21:41:41 19-eai131-16gb kernel: [ 7341.533392] __switch_to+0x104/0x160`
`Nov 3 21:41:41 19-eai131-16gb kernel: [ 7341.533404] 0x0`
`Nov 3 21:41:53 19-eai131-16gb kernel: [ 7353.872895] nvme nvme0: I/O 862 QID 4 timeout, completion polled`
`Nov 3 21:41:53 19-eai131-16gb kernel: [ 7353.872963] nvme nvme0: I/O 742 QID 7 timeout, completion polled`
`Nov 3 21:41:54 19-eai131-16gb kernel: [ 7354.712891] nvme nvme0: I/O 302 QID 6 timeout, completion polled`
`^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@Nov 4 00:02:20 19-eai131-16gb kernel: [ 0.000000] Booting Linux on physical CPU 0x0000000000 [0x410fd421]`
You can see it took more than 2 hours for the device to reboot itself.
Another piece we found is that although SSH will not be responsive in this state, our device can still communicate via the UART.
If I manage to create a reproducible example which can be shared publicly, I will be very happy to post it here.
Is there any NMS-related use case in your DeepStream pipeline?
In the original use case, we observe that this issue only happens when feeding the output of one model into another.
we are not using NMS in our deepstream pipeline, we use cluster-mode=4. However we are using gst-nvtracker with NvDCF configuration, which uses GPU i believe.
( We are doing NMS ourselves, using ONLY CPU code while iterating buffer NvDsObjectMeta )
Building on what you and others have said:
It happens when there are multiple models in the triton server
It happens when there are multiple chained nvinfer engines in the pipeline (your mention)
It happens when there is nvinfer and nvtracker chained in the pipeline (our case)
Using Nvinfer + NMS (as lukasz mentioned)
I am starting to believe in the hypothesis that this problem happens when there are more than one GPU-related workers in the one process (pipeline).