NVargus daemon crashing seconds/minutes after startup consistently

Hello, I have a issue where the nvargus daemon crashes a few sconds/1-2 minutes after startup everytime.
Even if i try to just open a very simple gstreamer pipeline, it runs fine for the first few seconds after boot, i am able to start and stop the pipeline, but after 1 -2 minutes it crashes, and when it does only a full reboot solves the issue. We always experience the same error message on crash:

airis@ubuntu:~/Desktop/AirisInference/production$ gst-launch-1.0 nvarguscamerasrc ! ‘video/x-raw(memory:NVMM),width=1920,height=1080’ ! fakesink
Setting pipeline to PAUSED …
Pipeline is live and does not need PREROLL …
Pipeline is PREROLLED …
Setting pipeline to PLAYING …
New clock: GstSystemClock
GST_ARGUS: Creating output stream
CONSUMER: Waiting until producer is connected…
GST_ARGUS: Available Sensor modes :
GST_ARGUS: 3840 x 2160 FR = 29.999999 fps Duration = 33333334 ; Analog Gain range min 1.000000, max 22.250000; Exposure Range min 13000, max 683709000;

GST_ARGUS: 1920 x 1080 FR = 59.999999 fps Duration = 16666667 ; Analog Gain range min 1.000000, max 22.250000; Exposure Range min 13000, max 683709000;

GST_ARGUS: Running with following settings:
Camera index = 0
Camera mode = 1
Output Stream W = 1920 H = 1080
seconds to Run = 0
Frame Rate = 59.999999
GST_ARGUS: Setup Complete, Starting captures for 0 seconds
GST_ARGUS: Starting repeat capture requests.
CONSUMER: Producer has connected; continuing.
nvbuf_utils: dmabuf_fd -1 mapped entry NOT found
Error generated. /dvs/git/dirty/git-master_linux/multimedia/nvgstreamer/gst-nvarguscamera/gstnvarguscamerasrc.cpp, threadExecute:734 NvBufSurfaceFromFd Failed.
Error generated. /dvs/git/dirty/git-master_linux/multimedia/nvgstreamer/gst-nvarguscamera/gstnvarguscamerasrc.cpp, threadFunction:245 (propagating)
Redistribute latency…
Got EOS from element “pipeline0”.
Execution ended after 0:00:04.014728767
Setting pipeline to NULL …
GST_ARGUS: Cleaning up
GST_ARGUS: Done Success
Freeing pipeline …
(inference_env) airis@ubuntu:~/Desktop/AirisInference/production$

We are running the latest jetpack version. (tegra 36.4.4)

After reading a few of the blogposts on similar issues I tried the fix mentioned on About nvargus-daemon crash issue - #8 by ShaneCCC, but had no luck.

here is the journal-ctl output for the daemon after a crash:

(inference_env) airis@ubuntu:~/Desktop/AirisInference/production$ sudo journalctl -u nvargus-daemon -f
[sudo] password for airis:
Jul 24 13:53:53 ubuntu nvargus-daemon[873]: (in src/services/capture/CaptureServiceDeviceSensor.cpp, function setErrorState(), line 100)
Jul 24 13:53:53 ubuntu nvargus-daemon[873]: SCF: Error InvalidState: Timeout!! Skipping requests on sensor GUID 1, capture sequence ID = 1 draining session frameEnd events 2
Jul 24 13:53:53 ubuntu nvargus-daemon[873]: (in src/services/capture/FusaCaptureViCsiHw.cpp, function waitCsiFrameEnd(), line 646)
Jul 24 13:53:53 ubuntu nvargus-daemon[873]: SCF: Error InvalidState: Sensor 1 already in same state
Jul 24 13:53:53 ubuntu nvargus-daemon[873]: (in src/services/capture/CaptureServiceDeviceSensor.cpp, function setErrorState(), line 100)
Jul 24 13:53:53 ubuntu nvargus-daemon[873]: SCF: Error InvalidState: Timeout!! Skipping requests on sensor GUID 1, capture sequence ID = 2 draining session frameEnd events 1
Jul 24 13:53:53 ubuntu nvargus-daemon[873]: (in src/services/capture/FusaCaptureViCsiHw.cpp, function waitCsiFrameEnd(), line 646)
Jul 24 13:53:53 ubuntu nvargus-daemon[873]: SCF: Error Timeout: Sending critical error event for Session 1
Jul 24 13:53:53 ubuntu nvargus-daemon[873]: (in src/api/Session.cpp, function sendErrorEvent(), line 1039)
Jul 24 13:53:53 ubuntu nvargus-daemon[873]: PowerServiceCore:handleRequests: timePassed = 4526

Another important piece of information, when i run any nvargus pipeline, and it goes, the argus daemon will work just fine for hours, the longest I tested was 3 hours of contiuous operation. The issue always happens when the argus daemon is not in use (either before i ever open a pipeline, or after a few atemps) the only consist behaviour is that the crash happens at max a few minutes after startup.

Try modify the CID function as dummy in sensor driver to narrow if any of CID function cause the capture failed.