Please provide the following information when requesting support.
• Hardware RTX3090
• Network Type unet vgg16
• TLT Version
Configuration of the TAO Toolkit Instance
dockers: [‘nvidia/tao/tao-toolkit-tf’, ‘nvidia/tao/tao-toolkit-pyt’, ‘nvidia/tao/tao-toolkit-lm’]
format_version: 2.0
toolkit_version: 3.22.02
published_date: 02/28/2022
• Training spec file
unet_retrain_vgg_6S1100.txt (1.5 KB)
• How to reproduce the issue ?
Using the unet example notebook and my custom images and masks, trained a model, pruned and retrained, then exported and running evaluate on the exported engine I get the error
[TensorRT] INTERNAL ERROR: [defaultAllocator.cpp::allocate::63] Error Code 1: Cuda Runtime (out of memory)
[TensorRT] ERROR: Requested amount of GPU memory (16292249600 bytes) could not be allocated. There may not be enough free memory for allocation to succeed.
nvidia-smi reports only 2 GB of 24 used before running the evaluate…
Mon Aug 15 16:54:52 2022
±----------------------------------------------------------------------------+
| NVIDIA-SMI 510.85.02 Driver Version: 510.85.02 CUDA Version: 11.6 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce … Off | 00000000:65:00.0 On | N/A |
| 30% 36C P8 32W / 350W | 2102MiB / 24576MiB | 13% Default |
| | | N/A |
±------------------------------±---------------------±---------------------+±----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1564 G /usr/lib/xorg/Xorg 754MiB |
| 0 N/A N/A 1948 G /usr/bin/gnome-shell 313MiB |
| 0 N/A N/A 4692 G /usr/lib/firefox/firefox 1031MiB |
±----------------------------------------------------------------------------+
And the error reports asking for 16GB
I’ve tried with batch size of 1 on the spec file, and I get the same memory problem.
Howerver, tao evaluate
completes with no issues on the model being exported:
!tao unet evaluate --gpu_index=$GPU_INDEX -e $SPECS_DIR/unet_retrain_vgg_6S1100.txt \
-m $USER_EXPERIMENT_DIR/retrain/weights/model_retrained.tlt \
-o $USER_EXPERIMENT_DIR/retrain/ \
-k $KEY
produces this output:
evaluateAfterRetraining.log (15.7 KB)
Working backwards (last command first):
The evaluate command that produces the error is:
!tao unet evaluate --gpu_index=$GPU_INDEX -e $SPECS_DIR/unet_retrain_vgg_6S1100.txt \
-m $USER_EXPERIMENT_DIR/export/tao.fp32_6s03.engine \
-o $USER_EXPERIMENT_DIR/export/ \
-k $KEY
The detailed output of that is:
evaluateEngine.log (15.7 KB)
The engine tao.fp32_6s03.engine was created running:
!tao unet export --gpu_index=$GPU_INDEX -m $USER_EXPERIMENT_DIR/retrain/weights/model_retrained.tlt \
-k $KEY \
-e $SPECS_DIR/unet_retrain_vgg_6S1100.txt \
-o $USER_EXPERIMENT_DIR/export/tao.fp32_6s03.etlt \
--data_type fp32 \
--engine_file $USER_EXPERIMENT_DIR/export/tao.fp32_6s03.engine \
--max_batch_size 3
And the detailed output of the export is:
export.log (15.0 KB)
Continuing backwards, the model model_retrained.tlt was created with the retrain command after pruning:
The retrain command is:
!tao unet train --gpus=1 --gpu_index=$GPU_INDEX \
-e $SPECS_DIR/unet_retrain_vgg_6S1100.txt \
-r $USER_EXPERIMENT_DIR/retrain \
-m $USER_EXPERIMENT_DIR/pruned/model_pruned.tlt \
-n model_retrained \
-k $KEY
The output of that is here:
retrain.log (108.3 KB)
The prune command is
!tao unet prune -e $SPECS_DIR/unet_train_vgg_6S900.txt \
-m $USER_EXPERIMENT_DIR/unpruned/weights/model.tlt \
-o $USER_EXPERIMENT_DIR/pruned/model_pruned.tlt \
-eq union \
-pth 0.6 \
-k $KEY
An the prune output:
prune.log (41.4 KB)
The train command is:
!tao unet train --gpus=1 --gpu_index=$GPU_INDEX \
-e $SPECS_DIR/unet_train_vgg_6S900.txt \
-r $USER_EXPERIMENT_DIR/unpruned \
-m $USER_EXPERIMENT_DIR/pretrained_vgg16/vgg_16.hdf5 \
-n model \
-k $KEY
The training specs:
unet_train_vgg_6S900.txt (1.5 KB)
The output of the train command:
train.log (171.8 KB)
Thanks!!