Nvargus crashes with unreliable CSI camera connections on Jetpack 5.1.2

I have the same issue with IMX477-based CSI (arducam) on Orin NX 8GB Jetpack-5.1.2/ l4t-r35.4.1.
It happens randomly on some devices after a few seconds or minutes or more.
same dmesg error:

[  314.359535] [RCE] ERROR: camera-ip/vi5/vi5.c:745 [vi5_handle_eof] "General error queue is out of sync with frame queue. ts=329522895200 sof_ts=329563896608 gerror_code=2 gerror_data=600062 notify_bits=0"

I did try to boost clocks with no success:

echo 1 > /sys/kernel/debug/bpmp/debug/clk/vi/mrq_rate_locked
echo 1 > /sys/kernel/debug/bpmp/debug/clk/isp/mrq_rate_locked
echo 1 > /sys/kernel/debug/bpmp/debug/clk/nvcsi/mrq_rate_locked
echo 1 > /sys/kernel/debug/bpmp/debug/clk/emc/mrq_rate_locked
cat /sys/kernel/debug/bpmp/debug/clk/vi/max_rate |tee /sys/kernel/debug/bpmp/debug/clk/vi/rate
cat /sys/kernel/debug/bpmp/debug/clk/isp/max_rate | tee /sys/kernel/debug/bpmp/debug/clk/isp/rate
cat /sys/kernel/debug/bpmp/debug/clk/nvcsi/max_rate | tee /sys/kernel/debug/bpmp/debug/clk/nvcsi/rate
cat /sys/kernel/debug/bpmp/debug/clk/emc/max_rate | tee /sys/kernel/debug/bpmp/debug/clk/emc/rate

replacing libnvargus.so (Argus pipeline randomly gets error - #4 by JerryChang) didn’t helped getting errors on syslog:

Nov 26 14:00:17 baseline-cam nvargus-daemon: Module_id 30 Severity 2 : (fusa) Error: InvalidState Status syncpoint signaled but status value not updated in:/capture/src/fusaViHandler.cpp 817
Nov 26 14:00:17 baseline-cam nvargus-daemon: Module_id 30 Severity 2 : (fusa) Error: InvalidState  propagating from:/capture/src/fusaViHandler.cpp 759
Nov 26 14:00:17 baseline-cam nvargus-daemon[20147]: SCF: Error InvalidState:  Corr Error Received for sensor 1 .. Continuing!
Nov 26 14:00:17 baseline-cam nvargus-daemon[20147]:  (in src/services/capture/FusaCaptureViCsiHw.cpp, function waitCsiFrameEnd(), line 643)
Nov 26 14:00:17 baseline-cam nvargus-daemon: Module_id 30 Severity 2 : (fusa) Error: ResourceAlreadyInUse All captures are already pending, no idle captures available in:/capture/src/fusaViHandler.cpp 633
Nov 26 14:00:17 baseline-cam nvargus-daemon: Module_id 30 Severity 2 : (fusa) Error: ResourceAlreadyInUse  propagating from:/capture/src/fusaViHandler.cpp 475
Nov 26 14:00:17 baseline-cam nvargus-daemon[20147]: SCF: Error ResourceAlreadyInUse:  (propagating from src/services/capture/FusaCaptureViCsiHw.cpp, function startCaptureInternal(), line 866)
Nov 26 14:00:17 baseline-cam nvargus-daemon[20147]: SCF: Error ResourceAlreadyInUse:  (propagating from src/services/capture/CaptureRecord.cpp, function doCSItoMemCapture(), line 536)
Nov 26 14:00:17 baseline-cam nvargus-daemon[20147]: SCF: Error ResourceAlreadyInUse:  (propagating from src/services/capture/CaptureRecord.cpp, function issueCapture(), line 483)
Nov 26 14:00:17 baseline-cam nvargus-daemon[20147]: SCF: Error ResourceAlreadyInUse:  (propagating from src/services/capture/CaptureServiceDevice.cpp, function issueCaptures(), line 1530)
Nov 26 14:00:17 baseline-cam nvargus-daemon[20147]: SCF: Error ResourceAlreadyInUse:  (propagating from src/services/capture/CaptureServiceDevice.cpp, function issueCaptures(), line 1359)
Nov 26 14:00:17 baseline-cam nvargus-daemon[20147]: SCF: Error ResourceAlreadyInUse:  (propagating from src/common/Utils.cpp, function workerThread(), line 114)
Nov 26 14:00:17 baseline-cam nvargus-daemon[20147]: SCF: Error ResourceAlreadyInUse: Worker thread CaptureScheduler frameStart failed (in src/common/Utils.cpp, function workerThread(), line 133)
Nov 26 14:00:17 baseline-cam nvargus-daemon[20147]: SCF: Error Timeout:  (propagating from src/api/Buffer.cpp, function waitForUnlock(), line 644)
Nov 26 14:00:17 baseline-cam nvargus-daemon[20147]: SCF: Error Timeout:  (propagating from src/components/CaptureContainerImpl.cpp, function returnBuffer(), line 426)
Nov 26 14:00:17 baseline-cam nvargus-daemon[20147]: SCF: Error InvalidState: Capture Scheduler not running (in src/services/capture/CaptureServiceDevice.cpp, function addNewItemToSchedule(), line 1004)
Nov 26 14:00:17 baseline-cam nvargus-daemon[20147]: SCF: Error InvalidState:  (propagating from src/services/capture/CaptureService.cpp, function addRequest(), line 411)
Nov 26 14:00:17 baseline-cam nvargus-daemon[20147]: SCF: Error InvalidState:  (propagating from src/components/stages/MemoryToISPCaptureStage.cpp, function doHandleRequest(), line 144)
Nov 26 14:00:17 baseline-cam nvargus-daemon[20147]: SCF: Error InvalidState:  (propagating from src/components/stages/OrderedStage.cpp, function doExecute(), line 158)
Nov 26 14:00:17 baseline-cam nvargus-daemon[20147]: SCF: Error InvalidState: Sending critical error event for Session 0
Nov 26 14:00:17 baseline-cam nvargus-daemon[20147]:  (in src/api/Session.cpp, function sendErrorEvent(), line 1039)
Nov 26 14:00:17 baseline-cam wpa_supplicant[885]: wlan0: CTRL-EVENT-SCAN-FAILED ret=-95 retry=1
Nov 26 14:00:18 baseline-cam nvargus-daemon[20147]: SCF: Error Timeout:  (propagating from src/components/amr/Snapshot.cpp, function waitForNewerSample(), line 91)
Nov 26 14:00:18 baseline-cam nvargus-daemon[20147]: SCF_AutocontrolACSync failed to wait for an earlier frame to complete.
Nov 26 14:00:18 baseline-cam nvargus-daemon[20147]: SCF: Error Timeout:  (propagating from src/components/ac_stages/ACSynchronizeStage.cpp, function doHandleRequest(), line 126)
Nov 26 14:00:18 baseline-cam nvargus-daemon[20147]: SCF: Error Timeout:  (propagating from src/components/stages/OrderedStage.cpp, function doExecute(), line 137)
Nov 26 14:00:18 baseline-cam nvargus-daemon[20147]: SCF: Error Timeout: Sending critical error event for Session 1
Nov 26 14:00:18 baseline-cam nvargus-daemon[20147]:  (in src/api/Session.cpp, function sendErrorEvent(), line 1039)
Nov 26 14:00:18 baseline-cam wpa_supplicant[885]: wlan0: CTRL-EVENT-SCAN-FAILED ret=-95 retry=1
Nov 26 14:00:25 baseline-cam wpa_supplicant[885]: message repeated 7 times: [ wlan0: CTRL-EVENT-SCAN-FAILED ret=-95 retry=1]
Nov 26 14:00:26 baseline-cam nvargus-daemon[20147]: SCF: Error Timeout:  (propagating from src/services/capture/CaptureServiceEvent.cpp, function wait(), line 59)
Nov 26 14:00:26 baseline-cam nvargus-daemon[20147]: Error: Camera HwEvents wait, this may indicate a hardware timeout occured,abort current/incoming cc for sensor guid 0 count -2078883072
Nov 26 14:00:26 baseline-cam nvargus-daemon[20147]: SCF: Error Timeout:  (propagating from src/services/capture/CaptureServiceEvent.cpp, function wait(), line 59)
Nov 26 14:00:26 baseline-cam nvargus-daemon[20147]: Error: Camera HwEvents wait, this may indicate a hardware timeout occured,abort current/incoming cc for sensor guid 1 count -2078883072
Nov 26 14:00:26 baseline-cam wpa_supplicant[885]: wlan0: CTRL-EVENT-SCAN-FAILED ret=-95 retry=1
Nov 26 14:00:37 baseline-cam wpa_supplicant[885]: message repeated 11 times: [ wlan0: CTRL-EVENT-SCAN-FAILED ret=-95 retry=1]
Nov 26 14:00:37 baseline-cam nvargus-daemon[20147]: SCF: Error InvalidState: 6 buffers still pending during EGLStreamProducer destruction (in src/services/gl/EGLStreamProducer.cpp, function freeBuffers(), line 300)
Nov 26 14:00:38 baseline-cam wpa_supplicant[885]: wlan0: CTRL-EVENT-SCAN-FAILED ret=-95 retry=1
Nov 26 14:00:42 baseline-cam wpa_supplicant[885]: message repeated 4 times: [ wlan0: CTRL-EVENT-SCAN-FAILED ret=-95 retry=1]
Nov 26 14:00:42 baseline-cam nvargus-daemon[20147]: waitForIdleLocked remaining request 132672
Nov 26 14:00:42 baseline-cam nvargus-daemon[20147]: waitForIdleLocked remaining request 132669
Nov 26 14:00:42 baseline-cam nvargus-daemon[20147]: waitForIdleLocked remaining request 132668
Nov 26 14:00:42 baseline-cam nvargus-daemon[20147]: waitForIdleLocked remaining request 132667
Nov 26 14:00:42 baseline-cam nvargus-daemon[20147]: waitForIdleLocked remaining request 132666
Nov 26 14:00:42 baseline-cam nvargus-daemon[20147]: waitForIdleLocked remaining request 132665
Nov 26 14:00:42 baseline-cam nvargus-daemon[20147]: SCF: Error Timeout: waitForIdle() timed out (in src/api/Session.cpp, function waitForIdleLocked(), line 969)

what else can I try?
Thanks,
Yaniv