Summary
I discovered a significant performance cliff when running 9+ concurrent GStreamer pipelines using nvvidconv with default settings (VIC) on Jetson AGX Thor (JetPack 6, R38.4.0). Performance drops by ~35% when going from 8 to 9 pipelines. Using compute-hw=GPU instead of VIC eliminates this issue.
Use Case
I need to decode 20 concurrent 1080p H.264 video streams and resize each to 768x416 to feed a neural network for inference. The video processing should not be the bottleneck.
Environment
- Platform: NVIDIA Jetson AGX Thor Developer Kit
- JetPack: 7.1 (L4T R38.4)
- Ubuntu: 24.04 LTS (Noble)
- Container:
nvcr.io/nvidia/deepstream:8.0-samples-multiarch - GStreamer: 1.24.x
- Kernel: 6.8.12-tegra
Issue Description
When running concurrent pipelines with nvvidconv (which uses VIC by default on Jetson), performance collapses at 9 and 10 pipelines:
Test Pipeline (per stream)
filesrc ! qtdemux ! h264parse ! nvv4l2decoder ! tee name=t \
t. ! queue ! fakesink \
t. ! queue ! nvvidconv ! video/x-raw(memory:NVMM),width=768,height=416 ! fakesink
Results with VIC (default)
| Pipelines | FPS per pipeline | Total throughput |
|---|---|---|
| 8 | ~73 FPS | 584 FPS |
| 9 | ~47 FPS | 423 FPS ⚠️ |
| 10 | ~26 FPS | 260 FPS ⚠️ |
| 15 | ~12 FPS | 180 FPS |
35% performance drop from 8→9 pipelines!
Results with GPU mode (compute-hw=GPU)
| Pipelines | FPS per pipeline | Total throughput |
|---|---|---|
| 8 | ~73 FPS | 584 FPS |
| 9 | ~68 FPS | 612 FPS ✓ |
| 10 | ~65 FPS | 650 FPS ✓ |
| 15 | ~50 FPS | 750 FPS ✓ |
No cliff! Smooth scaling.
Diagnostic Observations
- NVDEC is NOT the bottleneck - Decode-only pipelines scale to 15+ without issues
- VIC appears limited to ~8 concurrent operations - The cliff happens regardless of nvvidconv count per pipeline
- GPU mode works correctly - Using
nvvidconv compute-hw=GPUeliminates the cliff - nvidia-smi shows erratic utilization during VIC cliff - GPU/Decoder usage becomes unstable at 9+ VIC pipelines
Reproduction Steps
# Start DeepStream container
docker run -it --rm --runtime nvidia -e NVIDIA_DRIVER_CAPABILITIES=all \
-v /tmp:/tmp nvcr.io/nvidia/deepstream:8.0-samples-multiarch bash
# Inside container, create and run the benchmark script below
Reproduction Script
#!/bin/bash
# vic_benchmark.sh - Test VIC vs GPU for nvvidconv scaling
# Run inside nvcr.io/nvidia/deepstream:8.0-samples-multiarch container
FRAMES=500
VIDEO_DIR="/tmp/vic_test"
mkdir -p $VIDEO_DIR
# Generate test videos
echo "Generating test videos..."
for i in $(seq 1 15); do
[ -f "$VIDEO_DIR/test_$i.mp4" ] || \
gst-launch-1.0 -q videotestsrc num-buffers=$FRAMES pattern=$((i % 18)) \
! "video/x-raw,width=1920,height=1080,format=NV12,framerate=30/1" \
! nvvidconv ! "video/x-raw(memory:NVMM),format=NV12" \
! nvv4l2h264enc ! h264parse ! mp4mux \
! filesink location="$VIDEO_DIR/test_$i.mp4" 2>/dev/null
done
run_test() {
local NUM=$1
local HW=$2
local NVVC="nvvidconv $HW"
echo "=== $NUM pipelines, ${HW:-VIC} ==="
for i in $(seq 1 $NUM); do
VIDEO="$VIDEO_DIR/test_$((((i-1) % 15) + 1)).mp4"
PIPELINE="filesrc location=$VIDEO ! qtdemux ! h264parse ! nvv4l2decoder \
! tee name=t t. ! queue ! fakesink \
t. ! queue ! $NVVC ! video/x-raw\(memory:NVMM\),width=768,height=416 ! fakesink"
(
start=$(date +%s.%N)
gst-launch-1.0 -q $PIPELINE 2>/dev/null
end=$(date +%s.%N)
awk -v s="$start" -v e="$end" -v f="$FRAMES" 'BEGIN { printf "%.1f FPS\n", f/(e-s) }'
) &
done
wait
echo ""
}
# Run tests
run_test 8 "" # VIC, 8 pipelines
run_test 9 "" # VIC, 9 pipelines (expect cliff)
run_test 9 "compute-hw=GPU" # GPU, 9 pipelines (no cliff)
run_test 15 "compute-hw=GPU" # GPU, 15 pipelines
Questions
-
Is there a documented limit on concurrent VIC operations? The cliff at 8→9 suggests a hard limit of ~8 concurrent VIC contexts.
-
Is using
compute-hw=GPUthe recommended workaround? It works, but I want to ensure this is the correct approach and won’t cause other issues. -
Will this limitation be addressed in future JetPack releases? For multi-stream video analytics, VIC’s ~8 pipeline limit is quite restrictive.
-
Are there any performance implications of using GPU instead of VIC? In my tests, GPU mode actually provides higher total throughput.
Workaround
Add compute-hw=GPU to all nvvidconv elements when running 8+ concurrent pipelines:
nvvidconv compute-hw=GPU ! video/x-raw(memory:NVMM),width=768,height=416
Thank you for any insights into this VIC limitation!