Why does my pruned and retrained tao yolov3 model have more GPU memory usage per pipeline than the original unpruned model?

While conducting performance testing for two yolov3 models on an ubuntu 18.04 machine, one pruned and the other unpruned both trained using TAO toolkit 4.0, the unpruned model is taking less GPU memory per pipeline in deepstream 6.0.

The unpruned model was trained for a total of 80 epochs and the second model is the pruned version of it generated after pruning using tao yolov3 prune command and retraining the resultant model for 80 epochs with the same configuration file. Both the models are int8 models and were exported to .etlt format using tao yolov3 export and the resulting .etlt file and their .bin files were used to deploy these models in deepstream 6.0.

Following are the memory usage in an nvidia GeForce GTX 1050 Ti GPU for the same video file inputs,
unpruned model - 379 MB pipeline size in GPU for the 1 video file input
pruned model - 449 MB pipeline size in GPU for the 1 video file input

These results are surprising to me as one would expect the pruned model to have a lower memory usage than the unpruned model because of the lower model size and the lower number of weights associated with the pruned model. Why does this happen and is there any way to reduce the memory consumption by the pruned model ?

Deepstream config file for the pruned and unpruned models are attached

tao_yolov3_pruned.conf (3.7 KB)
tao_yolov3_unpruned.conf (3.7 KB)

Hi @adithya.ajith ,
These two memory usage data are from ‘nvidia-smi’, right?

Can you capture the log with steps below:

  1. remove the generetaed TensorRT engine
  2. run “export NVDSINFER_LOG_LEVEL=3” before deepstream-app
  3. run “deepstream-app …” and share the output logs of these two cases

The memory usagee data are from nvidia-smi output

unpruned engine file name - “dec30_yolov3_resnet18_epoch_080.etlt_b1_gpu0_int8.engine”

pruned engine file name - “yolov3_resnet18_epoch_080-pruned.etlt_b1_gpu0_int8.engine”

Attaching output log for the unpruned and pruned models are in the following .txt files

unpruned-model_log.txt (1.8 MB)
pruned-model_log.txt (2.1 MB)

Hi @adithya.ajith ,
Thanks for the log! From the TensorRT build log, pruned network consumes ~13MB less TensorRT memory than unpruned network as screenshot below. We need more info to debug this.

Can you share the deepstream config file besides the gie congfig files you shared above?

And, 379 MB/449 MB are the memory running with pre-build TensorRT engine, right?

There is no deepstream config file, deepstream is running in the default config.

The tensorrt engines are generated by deepstream not pre built using the TAO toolkit.

How about the memory usage when running with pre-built TensorRT engine even the engine is generated by DeepStream?

There is no deepstream config file,

deepstream config file must have to run deepstream-app, it’s used in deepstream-app as
$ deepstream-app -c deepstream_config_file

We are generating the engine files on the deepstream runtime. I will get back to you with the memory usage data for pre built engine files.

We run deepstream using a custom python app which is based on the “multistream python app” that can be found here and there is no configuration file attached to it.

Thanks for your patience, the delayed response for the memory usage data for the pre built tensor files was because there were some issues due to the mismatch in the tensorrt versions between tao toolkit used for generating the engine files and deepstream were they were deployed, debugging this issue took time.

The previous memory usage data from nvidia-smi shared was on a machine which was not used for the training. The memory usage data for the machine were the models were trained and subsequently the tensorrt engine files generated and later deployed to deepstream 6.0 is as follows, for the unpruned tensorrt engine file two pipelines were created of size 479 MB and 645 MB, the pruned tensorrtt engine file also had two pipelines of 479 MB and 619 MB created respectively. The nvidia GPU in this machine is NVIDIA GeForce RTX 2060 .

Hi @adithya.ajith ,
Thanks for the update!
So the issue is classified, right?

The pre built tensorrt engines memory usage makes sense, but I do not understand why the deepstream pipeline memory usage for the pruned model is higher for the engine file generated by deepstream.

Hi @adithya.ajith
Does “two pipelines” mean two application?

And, with one application/pipeline, using pruned-network and unpruned network consume the same memory - 479MB.
while using another application/pipeline, using pruned-network consumes more memory than using unpruned network,i.e. 645MB vs 619MB?

Can you refer to DeepStream SDK FAQ - #10 by mchi to dump the pipeline diagram of these two?

yes, pipeline is the application.

PFA for the GST pipeline graph for the applications running with pruned and unpruned engine files.