VIC Engine Limitation: Performance Cliff at 9+ Concurrent nvvidconv Pipelines on Jetson AGX Thor

Summary

I discovered a significant performance cliff when running 9+ concurrent GStreamer pipelines using nvvidconv with default settings (VIC) on Jetson AGX Thor (JetPack 6, R38.4.0). Performance drops by ~35% when going from 8 to 9 pipelines. Using compute-hw=GPU instead of VIC eliminates this issue.

Use Case

I need to decode 20 concurrent 1080p H.264 video streams and resize each to 768x416 to feed a neural network for inference. The video processing should not be the bottleneck.

Environment

  • Platform: NVIDIA Jetson AGX Thor Developer Kit
  • JetPack: 7.1 (L4T R38.4)
  • Ubuntu: 24.04 LTS (Noble)
  • Container: nvcr.io/nvidia/deepstream:8.0-samples-multiarch
  • GStreamer: 1.24.x
  • Kernel: 6.8.12-tegra

Issue Description

When running concurrent pipelines with nvvidconv (which uses VIC by default on Jetson), performance collapses at 9 and 10 pipelines:

Test Pipeline (per stream)

filesrc ! qtdemux ! h264parse ! nvv4l2decoder ! tee name=t \
  t. ! queue ! fakesink \
  t. ! queue ! nvvidconv ! video/x-raw(memory:NVMM),width=768,height=416 ! fakesink

Results with VIC (default)

Pipelines FPS per pipeline Total throughput
8 ~73 FPS 584 FPS
9 ~47 FPS 423 FPS ⚠️
10 ~26 FPS 260 FPS ⚠️
15 ~12 FPS 180 FPS

35% performance drop from 8→9 pipelines!

Results with GPU mode (compute-hw=GPU)

Pipelines FPS per pipeline Total throughput
8 ~73 FPS 584 FPS
9 ~68 FPS 612 FPS ✓
10 ~65 FPS 650 FPS ✓
15 ~50 FPS 750 FPS ✓

No cliff! Smooth scaling.

Diagnostic Observations

  1. NVDEC is NOT the bottleneck - Decode-only pipelines scale to 15+ without issues
  2. VIC appears limited to ~8 concurrent operations - The cliff happens regardless of nvvidconv count per pipeline
  3. GPU mode works correctly - Using nvvidconv compute-hw=GPU eliminates the cliff
  4. nvidia-smi shows erratic utilization during VIC cliff - GPU/Decoder usage becomes unstable at 9+ VIC pipelines

Reproduction Steps

# Start DeepStream container
docker run -it --rm --runtime nvidia -e NVIDIA_DRIVER_CAPABILITIES=all \
  -v /tmp:/tmp nvcr.io/nvidia/deepstream:8.0-samples-multiarch bash

# Inside container, create and run the benchmark script below

Reproduction Script

#!/bin/bash
# vic_benchmark.sh - Test VIC vs GPU for nvvidconv scaling
# Run inside nvcr.io/nvidia/deepstream:8.0-samples-multiarch container

FRAMES=500
VIDEO_DIR="/tmp/vic_test"
mkdir -p $VIDEO_DIR

# Generate test videos
echo "Generating test videos..."
for i in $(seq 1 15); do
  [ -f "$VIDEO_DIR/test_$i.mp4" ] || \
  gst-launch-1.0 -q videotestsrc num-buffers=$FRAMES pattern=$((i % 18)) \
    ! "video/x-raw,width=1920,height=1080,format=NV12,framerate=30/1" \
    ! nvvidconv ! "video/x-raw(memory:NVMM),format=NV12" \
    ! nvv4l2h264enc ! h264parse ! mp4mux \
    ! filesink location="$VIDEO_DIR/test_$i.mp4" 2>/dev/null
done

run_test() {
  local NUM=$1
  local HW=$2
  local NVVC="nvvidconv $HW"
  
  echo "=== $NUM pipelines, ${HW:-VIC} ==="
  for i in $(seq 1 $NUM); do
    VIDEO="$VIDEO_DIR/test_$((((i-1) % 15) + 1)).mp4"
    PIPELINE="filesrc location=$VIDEO ! qtdemux ! h264parse ! nvv4l2decoder \
      ! tee name=t t. ! queue ! fakesink \
      t. ! queue ! $NVVC ! video/x-raw\(memory:NVMM\),width=768,height=416 ! fakesink"
    (
      start=$(date +%s.%N)
      gst-launch-1.0 -q $PIPELINE 2>/dev/null
      end=$(date +%s.%N)
      awk -v s="$start" -v e="$end" -v f="$FRAMES" 'BEGIN { printf "%.1f FPS\n", f/(e-s) }'
    ) &
  done
  wait
  echo ""
}

# Run tests
run_test 8 ""                    # VIC, 8 pipelines
run_test 9 ""                    # VIC, 9 pipelines (expect cliff)
run_test 9 "compute-hw=GPU"      # GPU, 9 pipelines (no cliff)
run_test 15 "compute-hw=GPU"     # GPU, 15 pipelines

Questions

  1. Is there a documented limit on concurrent VIC operations? The cliff at 8→9 suggests a hard limit of ~8 concurrent VIC contexts.

  2. Is using compute-hw=GPU the recommended workaround? It works, but I want to ensure this is the correct approach and won’t cause other issues.

  3. Will this limitation be addressed in future JetPack releases? For multi-stream video analytics, VIC’s ~8 pipeline limit is quite restrictive.

  4. Are there any performance implications of using GPU instead of VIC? In my tests, GPU mode actually provides higher total throughput.

Workaround

Add compute-hw=GPU to all nvvidconv elements when running 8+ concurrent pipelines:

nvvidconv compute-hw=GPU ! video/x-raw(memory:NVMM),width=768,height=416

Thank you for any insights into this VIC limitation!

Thanks fro the sharing! ‘nvvidconv’ is not DeepStream element, please use ‘nvvideoconvert’ instead with other DeepStream elements. Please refer to the related topics topic1, topic2.