Why does my pruned and retrained tao yolov3 model have more GPU memory usage per pipeline than the original unpruned model?

adithya.ajith · January 6, 2023, 10:04am

While conducting performance testing for two yolov3 models on an ubuntu 18.04 machine, one pruned and the other unpruned both trained using TAO toolkit 4.0, the unpruned model is taking less GPU memory per pipeline in deepstream 6.0.

The unpruned model was trained for a total of 80 epochs and the second model is the pruned version of it generated after pruning using tao yolov3 prune command and retraining the resultant model for 80 epochs with the same configuration file. Both the models are int8 models and were exported to .etlt format using tao yolov3 export and the resulting .etlt file and their .bin files were used to deploy these models in deepstream 6.0.

Following are the memory usage in an nvidia GeForce GTX 1050 Ti GPU for the same video file inputs,
unpruned model - 379 MB pipeline size in GPU for the 1 video file input
pruned model - 449 MB pipeline size in GPU for the 1 video file input

These results are surprising to me as one would expect the pruned model to have a lower memory usage than the unpruned model because of the lower model size and the lower number of weights associated with the pruned model. Why does this happen and is there any way to reduce the memory consumption by the pruned model ?

Deepstream config file for the pruned and unpruned models are attached

tao_yolov3_pruned.conf (3.7 KB)
tao_yolov3_unpruned.conf (3.7 KB)

mchi · January 8, 2023, 7:52am

Hi @adithya.ajith ,
These two memory usage data are from ‘nvidia-smi’, right?

Can you capture the log with steps below:

remove the generetaed TensorRT engine
run “export NVDSINFER_LOG_LEVEL=3” before deepstream-app
run “deepstream-app …” and share the output logs of these two cases

adithya.ajith · January 9, 2023, 8:34am

The memory usagee data are from nvidia-smi output

unpruned engine file name - “dec30_yolov3_resnet18_epoch_080.etlt_b1_gpu0_int8.engine”

pruned engine file name - “yolov3_resnet18_epoch_080-pruned.etlt_b1_gpu0_int8.engine”

Attaching output log for the unpruned and pruned models are in the following .txt files

unpruned-model_log.txt (1.8 MB)
pruned-model_log.txt (2.1 MB)

mchi · January 11, 2023, 3:09pm

Hi @adithya.ajith ,
Thanks for the log! From the TensorRT build log, pruned network consumes ~13MB less TensorRT memory than unpruned network as screenshot below. We need more info to debug this.

Can you share the deepstream config file besides the gie congfig files you shared above?

And, 379 MB/449 MB are the memory running with pre-build TensorRT engine, right?

adithya.ajith · January 12, 2023, 11:34am

There is no deepstream config file, deepstream is running in the default config.

The tensorrt engines are generated by deepstream not pre built using the TAO toolkit.

mchi · January 13, 2023, 3:18am

How about the memory usage when running with pre-built TensorRT engine even the engine is generated by DeepStream?

There is no deepstream config file,

deepstream config file must have to run deepstream-app, it’s used in deepstream-app as
$ deepstream-app -c deepstream_config_file

adithya.ajith · January 13, 2023, 4:58am

We are generating the engine files on the deepstream runtime. I will get back to you with the memory usage data for pre built engine files.

We run deepstream using a custom python app which is based on the “multistream python app” that can be found here and there is no configuration file attached to it.

adithya.ajith · January 16, 2023, 10:38am

Thanks for your patience, the delayed response for the memory usage data for the pre built tensor files was because there were some issues due to the mismatch in the tensorrt versions between tao toolkit used for generating the engine files and deepstream were they were deployed, debugging this issue took time.

The previous memory usage data from nvidia-smi shared was on a machine which was not used for the training. The memory usage data for the machine were the models were trained and subsequently the tensorrt engine files generated and later deployed to deepstream 6.0 is as follows, for the unpruned tensorrt engine file two pipelines were created of size 479 MB and 645 MB, the pruned tensorrtt engine file also had two pipelines of 479 MB and 619 MB created respectively. The nvidia GPU in this machine is NVIDIA GeForce RTX 2060 .

mchi · January 17, 2023, 1:24am

Hi @adithya.ajith ,
Thanks for the update!
So the issue is classified, right?

adithya.ajith · January 17, 2023, 6:01am

The pre built tensorrt engines memory usage makes sense, but I do not understand why the deepstream pipeline memory usage for the pruned model is higher for the engine file generated by deepstream.

mchi · January 19, 2023, 4:19am

Hi @adithya.ajith
Does “two pipelines” mean two application?

And, with one application/pipeline, using pruned-network and unpruned network consume the same memory - 479MB.
while using another application/pipeline, using pruned-network consumes more memory than using unpruned network,i.e. 645MB vs 619MB?

Can you refer to DeepStream SDK FAQ - #10 by mchi to dump the pipeline diagram of these two?

adithya.ajith · January 24, 2023, 8:38am

yes, pipeline is the application.

PFA for the GST pipeline graph for the applications running with pruned and unpruned engine files.

mchi · February 9, 2023, 4:00am

There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one. Thanks

Hi @adithya.ajith ,
I checked the two pipeline diagram, looks they are the same.
Why did you say there are two applications?

And, is it possible to share us the repo?

system · March 7, 2023, 6:08am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Low FPS for pruned tao toolkit models on deepstream DeepStream SDK	30	442	August 1, 2024
Constant Deepstream / TensorRT memory usage independent of engine. How to improve? TAO Toolkit tensorrt , tao	13	1588	March 10, 2022
Pruning on standard darknet model and tensorflow model so that it can be used for deepstream DeepStream SDK	7	1012	October 12, 2021
Pruning on standard darknet model and tensorflow model DeepStream SDK	4	517	October 12, 2021
How to prune general yolov2 and yolov3 and yolov3 tiny models and use it for deepstream TAO Toolkit	10	1190	October 12, 2021
Lower FPS compared to the unpruned model for the pruned MaskRCNN model TAO Toolkit	46	644	November 14, 2024
Deepstream_lpr_app runs slowly TAO Toolkit	27	1154	November 30, 2021
Probleme with training/pruning tlt TAO Toolkit yolo	10	1083	October 12, 2021
TX2 "INT8 not supported by platform. Trying FP16 mode" TAO Toolkit	11	2898	October 12, 2021
Deploying Custom Trained Yolov4 model on Deepstream 6.2 sdk DeepStream SDK	21	1256	March 17, 2023

Why does my pruned and retrained tao yolov3 model have more GPU memory usage per pipeline than the original unpruned model?

Related topics