Out of memory running tao evaluate on exported model

Please provide the following information when requesting support.

• Hardware RTX3090
• Network Type unet vgg16
• TLT Version

Configuration of the TAO Toolkit Instance
dockers: [‘nvidia/tao/tao-toolkit-tf’, ‘nvidia/tao/tao-toolkit-pyt’, ‘nvidia/tao/tao-toolkit-lm’]
format_version: 2.0
toolkit_version: 3.22.02
published_date: 02/28/2022

• Training spec file
unet_retrain_vgg_6S1100.txt (1.5 KB)

• How to reproduce the issue ?

Using the unet example notebook and my custom images and masks, trained a model, pruned and retrained, then exported and running evaluate on the exported engine I get the error

[TensorRT] INTERNAL ERROR: [defaultAllocator.cpp::allocate::63] Error Code 1: Cuda Runtime (out of memory)
[TensorRT] ERROR: Requested amount of GPU memory (16292249600 bytes) could not be allocated. There may not be enough free memory for allocation to succeed.

nvidia-smi reports only 2 GB of 24 used before running the evaluate…

Mon Aug 15 16:54:52 2022
±----------------------------------------------------------------------------+
| NVIDIA-SMI 510.85.02 Driver Version: 510.85.02 CUDA Version: 11.6 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce … Off | 00000000:65:00.0 On | N/A |
| 30% 36C P8 32W / 350W | 2102MiB / 24576MiB | 13% Default |
| | | N/A |
±------------------------------±---------------------±---------------------+

±----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1564 G /usr/lib/xorg/Xorg 754MiB |
| 0 N/A N/A 1948 G /usr/bin/gnome-shell 313MiB |
| 0 N/A N/A 4692 G /usr/lib/firefox/firefox 1031MiB |
±----------------------------------------------------------------------------+

And the error reports asking for 16GB

I’ve tried with batch size of 1 on the spec file, and I get the same memory problem.

Howerver, tao evaluate completes with no issues on the model being exported:

!tao unet evaluate --gpu_index=$GPU_INDEX -e $SPECS_DIR/unet_retrain_vgg_6S1100.txt \
                 -m $USER_EXPERIMENT_DIR/retrain/weights/model_retrained.tlt \
                 -o $USER_EXPERIMENT_DIR/retrain/ \
                 -k $KEY

produces this output:
evaluateAfterRetraining.log (15.7 KB)

Working backwards (last command first):

The evaluate command that produces the error is:

!tao unet evaluate --gpu_index=$GPU_INDEX -e $SPECS_DIR/unet_retrain_vgg_6S1100.txt  \
                 -m $USER_EXPERIMENT_DIR/export/tao.fp32_6s03.engine  \
                 -o $USER_EXPERIMENT_DIR/export/ \
                 -k $KEY

The detailed output of that is:
evaluateEngine.log (15.7 KB)

The engine tao.fp32_6s03.engine was created running:

!tao unet export --gpu_index=$GPU_INDEX -m $USER_EXPERIMENT_DIR/retrain/weights/model_retrained.tlt \
               -k $KEY \
               -e $SPECS_DIR/unet_retrain_vgg_6S1100.txt  \
               -o $USER_EXPERIMENT_DIR/export/tao.fp32_6s03.etlt \
               --data_type fp32 \
               --engine_file $USER_EXPERIMENT_DIR/export/tao.fp32_6s03.engine \
               --max_batch_size 3 

And the detailed output of the export is:
export.log (15.0 KB)

Continuing backwards, the model model_retrained.tlt was created with the retrain command after pruning:

The retrain command is:

!tao unet train --gpus=1 --gpu_index=$GPU_INDEX \
              -e $SPECS_DIR/unet_retrain_vgg_6S1100.txt \
              -r $USER_EXPERIMENT_DIR/retrain \
              -m $USER_EXPERIMENT_DIR/pruned/model_pruned.tlt \
              -n model_retrained \
              -k $KEY

The output of that is here:
retrain.log (108.3 KB)

The prune command is

!tao unet prune   -e $SPECS_DIR/unet_train_vgg_6S900.txt \
                  -m $USER_EXPERIMENT_DIR/unpruned/weights/model.tlt \
                  -o $USER_EXPERIMENT_DIR/pruned/model_pruned.tlt \
                  -eq union \
                  -pth 0.6 \
                  -k $KEY

An the prune output:
prune.log (41.4 KB)

The train command is:

!tao unet train --gpus=1 --gpu_index=$GPU_INDEX \
              -e $SPECS_DIR/unet_train_vgg_6S900.txt \
              -r $USER_EXPERIMENT_DIR/unpruned \
              -m $USER_EXPERIMENT_DIR/pretrained_vgg16/vgg_16.hdf5  \
              -n model \
              -k $KEY 

The training specs:
unet_train_vgg_6S900.txt (1.5 KB)

The output of the train command:
train.log (171.8 KB)

Thanks!!

After some research added the undocumented option

–batch_size 1

And that solved it

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.