• Hardware Platform (Jetson / GPU)
Jetson NX • DeepStream Version
6.0 • JetPack Version (valid for Jetson only)
4.6. • TensorRT Version
8.0.1 • Issue Type( questions, new requirements, bugs)
Bugs • How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing)
Hi, we’re looking to set up a pipeline for model training and deployment from Nvidia TAO to DeepStream using TAO (preferably version 5.0) and DeepStream 6.0 (fixed). I’m having issues getting results in Deepstream at the minute though:
Also in general is there any differences in the process to deploy from TAO v5.0? As we’ll be using that in the future for yolov4 models
• Requirement details( This is for new requirement. Including the module name-for which plugin or for which sample application, the function description)
Preferably some tutorial and downloadable yolov4 model
It runs. Although when I test the .engine with batch size of >1:
/usr/src/tensorrt/bin/trtexec --loadEngine=/data/models/yolov4_resnet18_default.engine --batch=4 --iterations=100 --avgRuns=10 --dumpProfile --dumpOutput --useCudaGraph
I get:
Error Code 3: Internal Error (Parameter check failed at: runtime/api/executionContext.cpp::enqueue::276, condition: batchSize > 0 && batchSize <= mEngine.getMaxBatchSize(). Note: Batch size was: 4, but engine max batch size was: 1
Using enable-perf-measurement=1
it looks ~14fps (30fps input video)
Using export NVDS_ENABLE_LATENCY_MEASUREMENT=1, we see: PERF: 13.80 (13.92) 13.80 (13.88) 13.80 (13.92) 13.80 (13.92) BATCH-NUM = 2
Batch meta not found for buffer 0x7eb40097c0 BATCH-NUM = 3**
Batch meta not found for buffer 0x7ec4048b40
I’m setting batched-push-timeout to 1/max_fps (33333)
Height and width in streammux are set to the input video’s height and width
Looking at jtop, the GPU usage appears to sit at >99% the majority of the time, and drops down once every few seconds
Setting qos=0 in sink0 appears to make no difference
I can now also bounding boxes printed on an unrelated part of the screen - not sure if this is related?
For a bit more info this is my config file:
[application]
enable-perf-measurement=1
perf-measurement-interval-sec=5
#gie-kitti-output-dir=streamscl
output display details
[tiled-display]
enable=1
rows=2
columns=2
width=1280
height=720
gpu-id=0
#(0): nvbuf-mem-default - Default memory allocated, specific to particular platform
#(1): nvbuf-mem-cuda-pinned - Allocate Pinned/Host cuda memory, applicable for Tesla
#(2): nvbuf-mem-cuda-device - Allocate Device cuda memory, applicable for Tesla
#(3): nvbuf-mem-cuda-unified - Allocate Unified cuda memory, applicable for Tesla
#(4): nvbuf-mem-surface-array - Allocate Surface Array memory, applicable for Jetson
nvbuf-memory-type=0
stream mux - forms batches of frames from multiple input sources
[streammux]
gpu-id=0
##Boolean property to inform muxer that sources are live
live-source=0
batch-size=4
##time out in usec, to wait after the first buffer is available
##to push the batch even if the complete batch is not formed
batched-push-timeout=33333
Set muxer output width and height
width=1280
height=720
#enable to maintain aspect ratio wrt source, and allow black borders, works
##along with width, height properties
enable-padding=0
nvbuf-memory-type=0
If set to TRUE, system timestamp will be attached as ntp timestamp
If set to FALSE, ntp timestamp from rtspsrc, if available, will be attached
I’m running in docker using the nvcr.io/nvidia/deepstream-l4t:6.0-samples image on a Jetson NX (rev32.6.1)
and the setup of the deepstream_tlt_apps library in the dockerfile looks like this: