How to reduce TRT memory use ?

Hi,
I have tried to use TRT for face detection. It should support 4 cameras, so now my solution is starting 4 processes. In every process, I have new a TRT model, So every instance may occupy 950MB. Theoretically, 4 processes will occupy 950 x 4, and about 4G. But TX2 have 8G memory. When I use ./tegrastats and top to see the memory’s occupation. It seems not enough memory. You can see the picture in attachment.

Hi,

It’s recommended to handle pipelines within an application.

To run TensorRT, it’s required to load cuDNN relevant libraries into memory.
If you run it with independent app, it may take more memory to load the essential libraries.

By the way, it’s also recommended to try our fp16 mode, which can cut memory into half:
http://docs.nvidia.com/deeplearning/sdk/tensorrt-developer-guide/index.html#googlenet_sample

Thanks.

Hi,
I have used fp16 mode, but we have used customer plugin, tensorRT doesn’t support customer layer with fp16, so can it cut memory into half ?

Thanks.

Hi,

You still can build the remaining networks in half precision model(fp16).
This leads to slight performance degradation for format conversion but reduce memory.

[fp16] ► TensorRT ► [fp16 to fp32] ► Plugin ► [fp32 to fp16] ► TensorRT ► [fp16]

Thanks.