Tensorrt has insufficent memory in container

Model: Jetson AGX Orin
Jetpack: 6.1
CUDA: 12.6.68
Tensorrt: 10.3.0.30
Deepstream version: 7.1
container: deepstream:7.1-triton-multiarch

docker run command: docker run -it --rm --runtime=nvidia --network=host --gpus all --privileged -e DISPLAY=$DISPLAY -v /tmp/.X11-unix:/tmp/.X11-unix -v /etc/X11:/etc/X11 <container image name>

I am trying to containerize the deepstream-test3 sample pipeline modified to use multiple RTSP streams. The modified pipeline works on the host with 30+ RTSP inputs, but when running in container, tensorrt fails to build the engine file with 10 RTSP streams.

WARNING: [TRT]: Tactic Device request: 79MB Available: 38MB. Device memory is insufficient to use tactic
WARNING: [TRT]: UNSUPPORTED_STATE: Skipping tactic 66 due to insufficient memory on requested size of 83558400 detected for tactic 0x79a4e52543793dbe.
ERROR: [TRT]: IBuilder::buildSerializedNetwork: Error Code 10: Internal Error (Could not find any implementation for node block_1a_conv_shortcut/BiasAdd + block_1a_bn_shortcut/batchnorm/mul__22 + block_1a_bn_shortcut/batchnorm/mul_1 + block_1a_bn_shortcut /batchnorm/sub__23 + block_1a_bn_shortcut/batchnorm/add_1 + add_1/add + block_1a_relu/Relu.)
Segmentation fault (core dumped)

Looking at the host’s jtop output, the container gets towards 430MB of shared GPU RAM which is close to tensorrt’s default 450MB workspace size. I have explored increasing the workspace size as described in this post and others. I did notice that running the codecs script described in the container info pushes the container’s shared GPU mem towards 490MB, but when running the same pipeline directly on device the same process gets up to 750MB. I have attempted increasing the tensorrt workspace size, but nothing changes. Even so, the pipeline builds the .engine file without increasing the workspace size when run on device. Lastly, I have successfully created the .engine file when directly calling trtexec for the model onyx file, so I speculate that this is not an issue with shared memory capacity but something internal to the deepstream container.

Which model did you use inside the container? Can you share the model and configurations for us to reproduce the issue?

Hi Fiona, thank you for the help!
I am using resnet18_trafficcamnet_pruned which is the model built into sample test application 3
Everything is configured as in the original example (in this directory: /opt/nvidia/deepstream/deepstream-7.1/sources/apps/sample_apps/deepstream-test3) except for some modifications in the source-list and nvstreammux configuration seen in the dstest3_config.yml. In particular, the input was modified to a list of RTSP streams (made accessible to the container by sharing the host network), changing the batched-push-timeout to 66000 (~15fps), and changing the batch-size to reflect the number of RTSP streams.
Another thing that should probably be specified is that I am using the original configuration which uses the nvinfer plugin (the nvinferserver plugin is an option as well, but not the default configuration)

What is your RTSP streams’ resolution and format(h264, H265,…)?

FRAME_WIDTH = 2208;
FRAME_HEIGHT = 1242;
The encoding is h264.
I do need to adjust the default configuration, but it worked fine without the modification on device

Please refer to the limitation part in the DeepStream SDK 7.1 for NVIDIA dGPU/X86 and Jetson document page 12.