DeepStream 7.1 — `gst-nvinfer` `cudaErrorIllegalAddress` under sustained inference load on Jetson Orin Nano (JetPack 6.2 / L4T 36.4.x)

# DeepStream 7.1 — `gst-nvinfer` `cudaErrorIllegalAddress` under sustained inference load on Jetson Orin Nano (JetPack 6.2 / L4T 36.4.x)

**Status**: Open, seeking guidance

**Product context**: commercial AI vision totem currently in pilot deployment, built on Jetson Orin Nano + DeepStream

**Date**: 2026-05-05

## 1. Summary

Under sustained per-frame inference load, our DeepStream pipeline crashes with

`cudaErrorIllegalAddress` (CUDA error 700) inside `gst-nvinfer`'s

`NvDsInferContextImpl::releaseBatchOutput`, followed by an unrecoverable CUDA

context corruption that cascades into `SIGSEGV` and process termination.

The pipeline is automatically restarted by our process supervisor, the bug

re-fires within 30-90 s of resumption.

We have substantially reduced the rate by throttling `nvinfer.interval`, but

the underlying race / double-release appears to persist. We would like

guidance on whether this matches a known issue in DeepStream 7.1 and whether

a patched build or supported workaround exists.

## 2. Environment

| Item | Value |

|—|—|

| Hardware | NVIDIA Jetson Orin Nano Super (8 GB) Developer Kit |

| Power profile | `MAXN SUPER` (`nvpmodel -m 0`), official 19V/4.74A barrel-jack PSU |

| Storage | NVMe SSD, ext4, 18% utilised, no I/O errors |

| L4T | 36.4.x — currently a mixed state: most packages at 36.4.4-20250616085344, three packages (`nvidia-l4t-gstreamer`, `nvidia-l4t-jetson-multimedia-api`, `nvidia-l4t-libwayland-egl1`) at 36.4.7-20250918154033. `libnvbufsurftransform` (in `nvidia-l4t-multimedia`) is at 36.4.4 alongside the CUDA runtime. |

| CUDA | 12.6.68 |

| TensorRT | bundled with JetPack 6.2 (10.x) |

| DeepStream | 7.1.0-1 |

| Custom YOLO parser | `libnvdsinfer_custom_impl_Yolo.so` (DeepStream-Yolo, NVIDIA-AI-IOT compatible) |

| Model | YOLOv8 PPE-detector, 8 classes, FP16, 640×640 input, exported to TensorRT engine |

| Camera | Logitech C920 HD Pro (UVC, 1280×720 @ 30 fps MJPEG over USB 2) |

| Tracker | Tested both `config_tracker_IOU.yml` (currently in use) and `config_tracker_NvDCF_perf.yml` (worse — see §6) |

## 3. Pipeline topology

```

v4l2src (1280x720,MJPEG)

→ capsfilter (image/jpeg, 1280x720, 30/1)

→ jpegdec

→ videoconvert

→ tee

   ├── queue (leaky=2, max-buffers=1)

   │     -> nvvideoconvert

   │     -> capsfilter (video/x-raw(memory:NVMM), format=NV12, width=640, height=360)

   │     -> nvstreammux (batch-size=1, live-source=1, width=640, height=360,

   │                     batched-push-timeout=1000)

   │     -> nvinfer (interval=2, FP16, custom YOLO parser, cluster-mode=2,

   │                  maintain-aspect-ratio=1, symmetric-padding=1)

   │     -> nvtracker (IOU)

   │     -> fakesink

   └── queue (leaky=2, max-buffers=1)

         -> capsfilter (video/x-raw, I420)

         -> jpegenc (preview branch, system memory only)

         -> appsink (max-buffers=1, drop=true, sync=false)

```

The MJPEG preview branch is deliberately kept in system memory (no

`nvvideoconvert`) so that NVMM access is single-consumer. The capsfilter

between `nvvideoconvert` and `nvstreammux` pins width/height to the

streammux profile so streammux does not have to invoke its internal

`nvbufsurftransform` resize on every buffer.

## 4. Crash signature (full sequence)

A representative crash from `aegis-error.log`, reproduced verbatim. The same

sequence has been captured >50 times across dozens of pipeline restarts.

```

ERROR: [TRT]: [cudaDriverHelpers.cpp::operator()::106] Error Code 1: Cuda Driver

   (an illegal memory access was encountered)

ERROR: cudaStreamDestroy failed, cuda err_no:700, err_str:cudaErrorIllegalAddress

ERROR: cudaStreamDestroy failed, cuda err_no:700, err_str:cudaErrorIllegalAddress

ERROR: cudaEventDestroy failed, cuda err_no:700, err_str:cudaErrorIllegalAddress

ERROR: cudaEventDestroy failed, cuda err_no:700, err_str:cudaErrorIllegalAddress

ERROR: cudaFree failed, cuda err_no:700, err_str:cudaErrorIllegalAddress

ERROR: cudaFreeHost failed, cuda err_no:700, err_str:cudaErrorIllegalAddress

ERROR: [TRT]: createInferRuntime: Error Code 6: API Usage Error

   (CUDA initialization failure with error: 700)

ERROR: [TRT]: [checkMacros.cpp::catchCudaError::212] Error Code 1: Cuda Runtime

   (an illegal memory access was encountered)

[process exits via SIGSEGV]

```

In a small number of cases we have also captured the warmup-period variant:

```

WARN: nvinfer gstnvinfer.cpp:2461 gst_nvinfer_output_loop:

  error: Failed to dequeue output from inferencing.

  NvDsInferContext error: NVDSINFER_CUDA_ERROR

WARN: nvinfer gstnvinfer.cpp:681 NvDsInferContext[UID 1]:

  Warning from NvDsInferContextImpl::releaseBatchOutput()

  <nvdsinfer_context_impl.cpp:1990> \[UID = 1\]:

  Tried to release an outputBatchID which is already with the context

ERROR: nvinfer gstnvinfer.cpp:1267 get_converted_buffer:

   cudaMemset2DAsync failed with error cudaErrorIllegalAddress

   while converting buffer

WARN: nvinfer gstnvinfer.cpp:1576 gst_nvinfer_process_full_frame:

  error: Buffer conversion failed

/dvs/git/dirty/git-master_linux/nvutils/nvbufsurftransform/nvbufsurftransform_copy.cpp:341:

   => Failed in mem copy

ERROR: [TRT]: IExecutionContext::enqueueV3: Error Code 1:

   Cask (Cask convolution execution)

```

The `Tried to release an outputBatchID which is already with the context`

line is what we believe to be the proximate cause — a double-release inside

`NvDsInferContextImpl::releaseBatchOutput`. Once that fires, every subsequent

CUDA call in the process returns `cudaErrorIllegalAddress` until the process

exits.

## 5. Reproduction

1. Boot Jetson into MAXN SUPER, mount the C920, start the pipeline above.

2. Stand a person in front of the camera so the YOLO model produces sustained

detections (≥ 1 object per frame).

3. Within 30-90 s, the crash sequence in §4 fires and the process exits.

We have reproduced this on:

* PIXY USB camera at 30 fps (where it fired less often, ~1 per 13 hours

overnight, because the camera frequently dropped frames and accidentally

protected the buffer pool).

* Logitech C920 at 30 fps (where reliable frame delivery exposes the bug

within ~50-90 s under active person load).

* Both `interval=0` (every frame; crashes within ~14 s) and `interval=1`

(every other frame; crashes within ~50-90 s under load).

## 6. Mitigations attempted

| Change | Effect |

|—|—|

| Pin `width`/`height` in the NVMM caps before streammux so streammux does not invoke its internal resize | Significantly reduced crash rate but did not eliminate. Crash site moved from streammux’s resize to `gst-nvinfer`'s internal converter. |

| `nvinfer interval=2` (10 Hz inference) instead of `interval=0` | Reduced rate from “within 14 s” to “within 50-90 s under load” (~30× improvement). Required to keep system usable. |

| Replace IOU tracker with `NvDCF_perf` | **Made it worse.** NvDCF crashed within ~50 s with `gstnvtracker: Low-level tracker lib returned error 1` and on restart `gstnvtracker: Failed to create cuda stream for buffer conversion: cudaErrorIllegalAddress`. The CUDA context was unrecoverable until full process restart. |

| Process-supervisor restart on SIGSEGV (pm2 fork mode, `min_uptime=10s`, `restart_delay=2000`) | Restores service in ~5 s but is not an acceptable production posture for our use case (totem drops video for 5-10 s every restart). |

| Multi-layer in-process watchdog (15 s no-frames → in-process pipeline restart, 2 attempts → escalate to process restart) | Recovers from the milder warmup-period variant. Does not recover from the full `cudaErrorIllegalAddress` cascade because the CUDA context cannot be re-initialised in-process. |

## 7. What we are asking

In priority order:

1. **Is this a known issue in DeepStream 7.1’s `gst-nvinfer`?** Specifically a

double-release race in `NvDsInferContextImpl::releaseBatchOutput` under

sustained 1-source, 1-batch inference on Jetson Tegra. Our error

signatures are reproducible and identical across runs.

2. **Is there a patched DeepStream build available** (DS 7.1.x maintenance

release, DS 7.2 preview, internal patch) that fixes the race?

3. **Is there a supported pipeline configuration that avoids the bug?** For

example: `nvbuf-memory-type` setting, alternative buffer-pool size, an

`nvstreammux` config we have not tried, or a recommendation against

`tee` with `nvinfer` on the same source on Tegra.

4. **Should we move to JetPack 6.x.y newer** (e.g. r36.4.7 across the board

instead of our current mixed 36.4.4 / 36.4.7 state) before further

investigation? We have left runtime libs (`libnvbufsurftransform`,

`libcuda`, CUDA 12.6 runtime) at 36.4.4 to maintain ABI consistency.

5. **As a last resort**: is there a recommended path to bypass `gst-nvinfer`

entirely and call TensorRT directly from a custom appsrc/appsink loop on

Jetson, with an example we can study?

## 8. Artefacts available on request

* Full pm2 / aegis-error / aegis-out logs from a known crash window (~90 MB compressed).

* `nvinfer` config file we generate at runtime.

* GStreamer pipeline `.dot` graph captured at `PLAYING` state.

* `tegrastats` capture across a crash window.

* `dmesg` from boot through a representative crash.

* SQLite dump of our `pipeline_stall_events` and `process_restarts` audit

tables (~280 stall snapshots and ~120 restart records to date).

1.For v4l2src devices that output JPEGs, we recommend using nvv4ldecoder/nvjpegdec for hardware decoding.
Please refer to this FAQ

2.Based on your description, this is most likely an nvvideoconvert issue caused by incompatibility between JP-6.2 and DS-7.1 (compatible only with JP-6.1). Please use the workaround provided in the FAQ.

After fixing the two issues mentioned above, if you encounter any other problems, please provide the logs: GST_DEBUG=3 ./your_app > log.log 2>&1