[TensorRT] ERROR: ../rtSafe/safeRuntime.cpp (25) - Cuda Error in allocate: 2 (out of memory)

I am following detectnet_v2 notebook to train a single class object detection model. I am able to train the

  1. train the initial pre-trained model
  2. validate the model,
  3. prune the model
  4. training the pruned model
  5. Validating the pruned model without any memory issues.

But when I try to perform deploy step

    !mkdir -p $USER_EXPERIMENT_DIR/experiment_dir_final
# Removing a pre-existing copy of the etlt if there has been any.
import os
output_file=os.path.join(os.environ['USER_EXPERIMENT_DIR'],
                         "experiment_dir_final/resnet18_detector.etlt")
if os.path.exists(output_file):
    os.system("rm {}".format(output_file))
!tlt-export detectnet_v2 \
            -m $USER_EXPERIMENT_DIR/experiment_dir_retrain/weights/resnet18_detector_pruned.tlt \
            -o $USER_EXPERIMENT_DIR/experiment_dir_final/resnet18_detector.etlt \
            -k $KEY

I am getting below error.

2020-11-19 06:28:45.551169: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165] 0
2020-11-19 06:28:45.551175: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0: N
2020-11-19 06:28:45.551245: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-11-19 06:28:45.551585: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-11-19 06:28:45.551894: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4492 MB memory) → physical GPU (device: 0, name: GeForce GTX 1660 Ti, pci bus id: 0000:01:00.0, compute capability: 7.5)
NOTE: UFF has been tested with TensorFlow 1.14.0.
WARNING: The version of TensorFlow installed on this system is not guaranteed to work with UFF.
DEBUG [/usr/local/lib/python3.6/dist-packages/uff/converters/tensorflow/converter.py:96] Marking [‘output_cov/Sigmoid’, ‘output_bbox/BiasAdd’] as outputs
[TensorRT] ERROR: …/rtSafe/safeRuntime.cpp (25) - Cuda Error in allocate: 2 (out of memory)
[TensorRT] ERROR: …/rtSafe/safeRuntime.cpp (25) - Cuda Error in allocate: 2 (GPU memory allocation failed during allocation of workspace. Try decreasing batch size.)
2020-11-19 06:29:30,029 [ERROR] modulus.export._tensorrt: Failed to create engine
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/core/build_wheel.runfiles/ai_infra/moduluspy/modulus/export/_tensorrt.py”, line 521, in init
Traceback (most recent call last):
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/core/build_wheel.runfiles/ai_infra/moduluspy/modulus/export/_tensorrt.py”, line 521, in init
AssertionError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “/usr/local/bin/tlt-export”, line 8, in
sys.exit(main())
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/common/export/app.py”, line 185, in main
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/common/export/app.py”, line 263, in run_export
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/common/export/base_exporter.py”, line 505, in export
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/core/build_wheel.runfiles/ai_infra/moduluspy/modulus/export/_tensorrt.py”, line 676, in init
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/core/build_wheel.runfiles/ai_infra/moduluspy/modulus/export/_tensorrt.py”, line 529, in init
AssertionError: Parsing failed on line 521 in statement

• Hardware Platform (GPU)
• TLT container 2.0
• NVIDIA GPU Driver Version (440.1)
• Issue Type( questions)

Can you run “tlt-export detectnet_v2 -h” and set larger max_workspace_size ?

root@063734c4e074:/workspace# tlt-export detectnet_v2 -h
Using TensorFlow backend.
2020-11-19 09:58:05.672464: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
usage: tlt-export [-h] -m MODEL -k KEY [-o OUTPUT_FILE] [–force_ptq]
[–cal_data_file CAL_DATA_FILE]
[–cal_image_dir CAL_IMAGE_DIR]
[–data_type {fp32,fp16,int8}] [-s]
[–cal_cache_file CAL_CACHE_FILE] [–batches BATCHES]
[–max_workspace_size MAX_WORKSPACE_SIZE]
[–max_batch_size MAX_BATCH_SIZE] [–batch_size BATCH_SIZE]
[-e EXPERIMENT_SPEC] [–engine_file ENGINE_FILE] [-v]
{classification,detectnet_v2,ssd,dssd,faster_rcnn,yolo,retinanet,mask_rcnn}

Export a trained TLT model and save an int8calibration file.

positional arguments:
{classification,detectnet_v2,ssd,dssd,faster_rcnn,yolo,retinanet,mask_rcnn}
Module being exported.

optional arguments:
-h, --help show this help message and exit
-m MODEL, --model MODEL
Path to the model file.
-k KEY, --key KEY Key to load the model.
-o OUTPUT_FILE, --output_file OUTPUT_FILE
Output file (defaults to $(input_filename).etlt)
–force_ptq Flag to force post training quantization for QAT
models.
–cal_data_file CAL_DATA_FILE
Tensorfile to run calibration for int8 optimization.
–cal_image_dir CAL_IMAGE_DIR
Directory of images to run int8 calibration if data
file is unavailable
–data_type {fp32,fp16,int8}
Data type for the TensorRT export.
-s, --strict_type_constraints
Apply TensorRT strict_type_constraints or not for INT8
mode.
–cal_cache_file CAL_CACHE_FILE
Calibration cache file to write to.
–batches BATCHES Number of batches to calibrate over.
–max_workspace_size MAX_WORKSPACE_SIZE
Max size of workspace to be set for TensorRT engine
builder.
–max_batch_size MAX_BATCH_SIZE
Max batch size for TensorRT engine builder.
–batch_size BATCH_SIZE
Number of images per batch.
-e EXPERIMENT_SPEC, --experiment_spec EXPERIMENT_SPEC
Path to the experiment spec file.
–engine_file ENGINE_FILE
Path to the exported TRT engine.
-v, --verbose Verbosity of the logger.

2 Likes