While conducting performance testing for two yolov3 models on an ubuntu 18.04 machine, one pruned and the other unpruned both trained using TAO toolkit 4.0, the unpruned model is taking less GPU memory per pipeline in deepstream 6.0.
The unpruned model was trained for a total of 80 epochs and the second model is the pruned version of it generated after pruning using tao yolov3 prune command and retraining the resultant model for 80 epochs with the same configuration file. Both the models are int8 models and were exported to .etlt format using tao yolov3 export and the resulting .etlt file and their .bin files were used to deploy these models in deepstream 6.0.
Following are the memory usage in an nvidia GeForce GTX 1050 Ti GPU for the same video file inputs,
unpruned model - 379 MB pipeline size in GPU for the 1 video file input
pruned model - 449 MB pipeline size in GPU for the 1 video file input
These results are surprising to me as one would expect the pruned model to have a lower memory usage than the unpruned model because of the lower model size and the lower number of weights associated with the pruned model. Why does this happen and is there any way to reduce the memory consumption by the pruned model ?
Deepstream config file for the pruned and unpruned models are attached
Hi @adithya.ajith ,
Thanks for the log! From the TensorRT build log, pruned network consumes ~13MB less TensorRT memory than unpruned network as screenshot below. We need more info to debug this.
Can you share the deepstream config file besides the gie congfig files you shared above?
And, 379 MB/449 MB are the memory running with pre-build TensorRT engine, right?
We are generating the engine files on the deepstream runtime. I will get back to you with the memory usage data for pre built engine files.
We run deepstream using a custom python app which is based on the “multistream python app” that can be found here and there is no configuration file attached to it.
Thanks for your patience, the delayed response for the memory usage data for the pre built tensor files was because there were some issues due to the mismatch in the tensorrt versions between tao toolkit used for generating the engine files and deepstream were they were deployed, debugging this issue took time.
The previous memory usage data from nvidia-smi shared was on a machine which was not used for the training. The memory usage data for the machine were the models were trained and subsequently the tensorrt engine files generated and later deployed to deepstream 6.0 is as follows, for the unpruned tensorrt engine file two pipelines were created of size 479 MB and 645 MB, the pruned tensorrtt engine file also had two pipelines of 479 MB and 619 MB created respectively. The nvidia GPU in this machine is NVIDIA GeForce RTX 2060 .
The pre built tensorrt engines memory usage makes sense, but I do not understand why the deepstream pipeline memory usage for the pruned model is higher for the engine file generated by deepstream.
Hi @adithya.ajith
Does “two pipelines” mean two application?
And, with one application/pipeline, using pruned-network and unpruned network consume the same memory - 479MB.
while using another application/pipeline, using pruned-network consumes more memory than using unpruned network,i.e. 645MB vs 619MB?
There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one. Thanks
Hi @adithya.ajith ,
I checked the two pipeline diagram, looks they are the same.
Why did you say there are two applications?