BodyPoseNet TAO - Exporting Model Out of Memory

Please provide the following information when requesting support.

Hardware (Ubuntu 18.04 PC with NVIDIA Quadro K610M)
Network Type (BodyPoseNet)
TLT Version (3.22.02)

How to reproduce the issue ?
While following the BodyPoseNet TAO Jupyter notebook I encountered a problem when I had to export the model as tfonnx format (part 9.2 of the notebook).

After running:

!tao bpnet export -m $USER_EXPERIMENT_DIR/models/exp_m1_retrain/$RETRAIN_MODEL_CHECKPOINT
-e $SPECS_DIR/bpnet_retrain_m1_coco.yaml
-o $USER_EXPERIMENT_DIR/models/exp_m1_final/bpnet_model.etlt
-k $KEY
-t tfonnx \

I got this:

INFO:tensorflow:Restoring parameters from /tmp/tmp_pzary68.ckpt
2022-05-24 12:04:55,419 [INFO] tensorflow: Restoring parameters from /tmp/tmp_pzary68.ckpt
INFO:tensorflow:Froze 107 variables.
2022-05-24 12:04:55,664 [INFO] tensorflow: Froze 107 variables.
INFO:tensorflow:Converted 107 variables to const ops.
2022-05-24 12:04:55,711 [INFO] tensorflow: Converted 107 variables to const ops.
WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/driveix/build_wheel.runfiles/ai_infra/driveix/common/utilities/tlt_utils.py:503: The name tf.gfile.GFile is deprecated. Please use tf.io.gfile.GFile instead.

2022-05-24 12:04:56,056 [WARNING] tensorflow: From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/driveix/build_wheel.runfiles/ai_infra/driveix/common/utilities/tlt_utils.py:503: The name tf.gfile.GFile is deprecated. Please use tf.io.gfile.GFile instead.

2022-05-24 12:04:56,952 [INFO] tf2onnx.tf_utils: Computed 85 values for constant folding
2022-05-24 12:04:57,619 [INFO] tf2onnx.optimizer: Optimizing ONNX model
2022-05-24 12:04:58,301 [INFO] tf2onnx.optimizer: After optimization: Add -37 (37->0), Const -13 (87->74), Identity -2 (2->0), Mul -11 (11->0), Transpose -71 (74->3)
2022-05-24 12:04:58,675 [INFO] driveix.common.export.base_exporter: Output Tensors: [‘paf_out/BiasAdd:0’, ‘heatmap_out/BiasAdd:0’]
2022-05-24 12:04:58,676 [INFO] driveix.common.export.base_exporter: Input Tensors: input_1:0 of shape: (None, None, None, 3)
2022-05-24 12:04:59,461 [INFO] numba.cuda.cudadrv.driver: init
[TensorRT] INTERNAL ERROR: [virtualMemoryBuffer.cpp::resizePhysical::79] Error Code 2: OutOfMemory (no further information)
[TensorRT] INTERNAL ERROR: [virtualMemoryBuffer.cpp::resizePhysical::65] Error Code 2: OutOfMemory (no further information)
[TensorRT] ERROR: Requested amount of GPU memory (536870912 bytes) could not be allocated. There may not be enough free memory for allocation to succeed.
[TensorRT] INTERNAL ERROR: [virtualMemoryBuffer.cpp::resizePhysical::65] Error Code 2: OutOfMemory (no further information)
[TensorRT] INTERNAL ERROR: [virtualMemoryBuffer.cpp::resizePhysical::65] Error Code 2: OutOfMemory (no further information)
[TensorRT] ERROR: Requested amount of GPU memory (1073741824 bytes) could not be allocated. There may not be enough free memory for allocation to succeed.
[TensorRT] INTERNAL ERROR: [virtualMemoryBuffer.cpp::resizePhysical::65] Error Code 2: OutOfMemory (no further information)
[TensorRT] INTERNAL ERROR: [virtualMemoryBuffer.cpp::resizePhysical::65] Error Code 2: OutOfMemory (no further information)
[TensorRT] ERROR: Requested amount of GPU memory (1073741824 bytes) could not be allocated. There may not be enough free memory for allocation to succeed.
[TensorRT] INTERNAL ERROR: [virtualMemoryBuffer.cpp::resizePhysical::65] Error Code 2: OutOfMemory (no further information)
[TensorRT] INTERNAL ERROR: [virtualMemoryBuffer.cpp::resizePhysical::65] Error Code 2: OutOfMemory (no further information)
[TensorRT] ERROR: Requested amount of GPU memory (1073741824 bytes) could not be allocated. There may not be enough free memory for allocation to succeed.
[TensorRT] INTERNAL ERROR: [virtualMemoryBuffer.cpp::resizePhysical::65] Error Code 2: OutOfMemory (no further information)
[TensorRT] INTERNAL ERROR: [virtualMemoryBuffer.cpp::resizePhysical::65] Error Code 2: OutOfMemory (no further information)
[TensorRT] ERROR: Requested amount of GPU memory (1073741824 bytes) could not be allocated. There may not be enough free memory for allocation to succeed.
[TensorRT] INTERNAL ERROR: [virtualMemoryBuffer.cpp::resizePhysical::65] Error Code 2: OutOfMemory (no further information)
[TensorRT] INTERNAL ERROR: [virtualMemoryBuffer.cpp::resizePhysical::65] Error Code 2: OutOfMemory (no further information)
[TensorRT] ERROR: Requested amount of GPU memory (1073741824 bytes) could not be allocated. There may not be enough free memory for allocation to succeed.
[TensorRT] INTERNAL ERROR: [virtualMemoryBuffer.cpp::resizePhysical::65] Error Code 2: OutOfMemory (no further information)
[TensorRT] INTERNAL ERROR: [virtualMemoryBuffer.cpp::resizePhysical::65] Error Code 2: OutOfMemory (no further information)
[TensorRT] ERROR: Requested amount of GPU memory (1073741824 bytes) could not be allocated. There may not be enough free memory for allocation to succeed.
[TensorRT] INTERNAL ERROR: [virtualMemoryBuffer.cpp::resizePhysical::65] Error Code 2: OutOfMemory (no further information)
[TensorRT] INTERNAL ERROR: [virtualMemoryBuffer.cpp::resizePhysical::65] Error Code 2: OutOfMemory (no further information)
[TensorRT] ERROR: Requested amount of GPU memory (1073741824 bytes) could not be allocated. There may not be enough free memory for allocation to succeed.
[TensorRT] INTERNAL ERROR: [virtualMemoryBuffer.cpp::resizePhysical::65] Error Code 2: OutOfMemory (no further information)
[TensorRT] INTERNAL ERROR: [virtualMemoryBuffer.cpp::resizePhysical::65] Error Code 2: OutOfMemory (no further information)
[TensorRT] ERROR: Requested amount of GPU memory (1073741824 bytes) could not be allocated. There may not be enough free memory for allocation to succeed.
[TensorRT] INTERNAL ERROR: [virtualMemoryBuffer.cpp::resizePhysical::65] Error Code 2: OutOfMemory (no further information)
[TensorRT] INTERNAL ERROR: [virtualMemoryBuffer.cpp::resizePhysical::65] Error Code 2: OutOfMemory (no further information)
[TensorRT] ERROR: Requested amount of GPU memory (536870912 bytes) could not be allocated. There may not be enough free memory for allocation to succeed.
[TensorRT] INTERNAL ERROR: [virtualMemoryBuffer.cpp::resizePhysical::65] Error Code 2: OutOfMemory (no further information)
[TensorRT] INTERNAL ERROR: [virtualMemoryBuffer.cpp::resizePhysical::65] Error Code 2: OutOfMemory (no further information)
[TensorRT] ERROR: Requested amount of GPU memory (536870912 bytes) could not be allocated. There may not be enough free memory for allocation to succeed.
2022-05-24 14:05:16,267 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.

It happens because my GPU doesn’t have enough memory: nvidia-smi showed me that"Volatile GPU-Util" peaked at 100%. So I tried to run the command with these parameters:

              --batch_size 1 \
              --max_batch_size 1 \
              --max_workspace_size 10000000000 \
              --static_batch_size 1 \

Yet I still got the problem and I don’t know how to reduce even more the memory usage.
If anyone has an idea…

Best regards,
Nicolas

You can export to fp16 or INT8 mode.

With int8:

!tao bpnet export -m $USER_EXPERIMENT_DIR/models/exp_m1_retrain/$RETRAIN_MODEL_CHECKPOINT
-e $SPECS_DIR/bpnet_retrain_m1_coco.yaml
-o $USER_EXPERIMENT_DIR/models/exp_m1_final/bpnet_model.etlt
-k $KEY
-t tfonnx
–data_type int8\

2022-05-24 15:33:37,262 [WARNING] tensorflow: From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/driveix/build_wheel.runfiles/ai_infra/driveix/common/utilities/tlt_utils.py:503: The name tf.gfile.GFile is deprecated. Please use tf.io.gfile.GFile instead.

2022-05-24 15:33:38,333 [INFO] tf2onnx.tf_utils: Computed 85 values for constant folding
2022-05-24 15:33:38,977 [INFO] tf2onnx.optimizer: Optimizing ONNX model
2022-05-24 15:33:39,735 [INFO] tf2onnx.optimizer: After optimization: Add -37 (37->0), Const -13 (87->74), Identity -2 (2->0), Mul -11 (11->0), Transpose -71 (74->3)
2022-05-24 15:33:40,217 [INFO] driveix.common.export.base_exporter: Output Tensors: [‘paf_out/BiasAdd:0’, ‘heatmap_out/BiasAdd:0’]
2022-05-24 15:33:40,218 [INFO] driveix.common.export.base_exporter: Input Tensors: input_1:0 of shape: (None, None, None, 3)
2022-05-24 15:33:41,074 [INFO] numba.cuda.cudadrv.driver: init
2022-05-24 15:33:41,099 [INFO] driveix.common.export.base_exporter: Generating a tensorfile with random tensor images. This may work well as a profiling tool, however, it may result in inaccurate results at inference. Please generate a tensorfile using the tlt-int8-tensorfile, or provide a custom directory of images for best performance.
Traceback (most recent call last):
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/driveix/build_wheel.runfiles/ai_infra/driveix/bpnet/scripts/export.py”, line 236, in
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/driveix/build_wheel.runfiles/ai_infra/driveix/bpnet/scripts/export.py”, line 232, in main
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/driveix/build_wheel.runfiles/ai_infra/driveix/bpnet/scripts/export.py”, line 226, in run_export
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/driveix/build_wheel.runfiles/ai_infra/driveix/bpnet/exporter/bpnet_exporter.py”, line 218, in export
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/driveix/build_wheel.runfiles/ai_infra/driveix/common/export/base_exporter.py”, line 204, in get_calibrator
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/driveix/build_wheel.runfiles/ai_infra/driveix/common/export/base_exporter.py”, line 329, in generate_tensor_file
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/driveix/build_wheel.runfiles/ai_infra/driveix/common/export/base_exporter.py”, line 378, in generate_random_tensorfile
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/common/export/tensorfile.py”, line 54, in init
File “/usr/local/lib/python3.6/dist-packages/h5py/_hl/files.py”, line 312, in init
fid = make_fid(name, mode, userblock_size, fapl, swmr=swmr)
File “/usr/local/lib/python3.6/dist-packages/h5py/_hl/files.py”, line 148, in make_fid
fid = h5f.create(name, h5f.ACC_TRUNC, fapl=fapl, fcpl=fcpl)
File “h5py/_objects.pyx”, line 54, in h5py._objects.with_phil.wrapper
File “h5py/_objects.pyx”, line 55, in h5py._objects.with_phil.wrapper
File “h5py/h5f.pyx”, line 98, in h5py.h5f.create
ValueError: Invalid file name (invalid file name)
Error in atexit._run_exitfuncs:
Traceback (most recent call last):
File “/usr/local/lib/python3.6/dist-packages/pycuda/autoinit.py”, line 14, in _finish_up
context.pop()
pycuda._driver.LogicError: cuCtxPopCurrent failed: invalid device context

PyCUDA ERROR: The context stack was not empty upon module cleanup.

A context was still active when the context stack was being
cleaned up. At this point in our execution, CUDA may already
have been deinitialized, so there is no way we can finish
cleanly. The program will be aborted now.
Use Context.pop() to avoid this problem.

Aborted (core dumped)
Traceback (most recent call last):
File “/usr/local/bin/bpnet”, line 8, in
sys.exit(main())
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/driveix/build_wheel.runfiles/ai_infra/driveix/bpnet/entrypoint/bpnet.py”, line 12, in main
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/driveix/build_wheel.runfiles/ai_infra/driveix/common/entrypoint/entrypoint.py”, line 300, in launch_job
AssertionError: Process run failed.
2022-05-24 17:33:42,695 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.

With fp16:

!tao bpnet export -m $USER_EXPERIMENT_DIR/models/exp_m1_retrain/$RETRAIN_MODEL_CHECKPOINT
-e $SPECS_DIR/bpnet_retrain_m1_coco.yaml
-o $USER_EXPERIMENT_DIR/models/exp_m1_final/bpnet_model.etlt
-k $KEY
-t tfonnx
–data_type fp16\

2022-05-24 15:35:53,946 [ERROR] modulus.export._tensorrt: Specified FP16 but not supported on platform.
Traceback (most recent call last):
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/driveix/build_wheel.runfiles/ai_infra/driveix/bpnet/scripts/export.py”, line 236, in
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/driveix/build_wheel.runfiles/ai_infra/driveix/bpnet/scripts/export.py”, line 232, in main
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/driveix/build_wheel.runfiles/ai_infra/driveix/bpnet/scripts/export.py”, line 226, in run_export
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/driveix/build_wheel.runfiles/ai_infra/driveix/bpnet/exporter/bpnet_exporter.py”, line 251, in export
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/core/build_wheel.runfiles/ai_infra/moduluspy/modulus/export/_tensorrt.py”, line 781, in init
AttributeError: Specified FP16 but not supported on platform.
Traceback (most recent call last):
File “/usr/local/bin/bpnet”, line 8, in
sys.exit(main())
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/driveix/build_wheel.runfiles/ai_infra/driveix/bpnet/entrypoint/bpnet.py”, line 12, in main
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/driveix/build_wheel.runfiles/ai_infra/driveix/common/entrypoint/entrypoint.py”, line 300, in launch_job
AssertionError: Process run failed.
2022-05-24 17:35:55,363 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.

FP16 seems not to be supported and INT8 isn’t working because of an invalid filename even though I verified the weights path, coco_spec.json path, images path and json keypoints path.

Sorry to bother you,
Nicolas

Please try to follow jupyter notebook for exporting int8 model.
https://catalog.ngc.nvidia.com/orgs/nvidia/teams/tao/resources/cv_samples/version/v1.3.0/files/bpnet/bpnet.ipynb#head-10-4

The problem is that my GPU is way below the hardware requirements:

Thanks for the help

Please try to use another machine. The Quadro K610M has not enough GPU memory.