BodyPoseNet TAO - Exporting Model Out of Memory

taoonvision · May 24, 2022, 12:19pm

Please provide the following information when requesting support.

• Hardware (Ubuntu 18.04 PC with NVIDIA Quadro K610M)
• Network Type (BodyPoseNet)
• TLT Version (3.22.02)

• How to reproduce the issue ?
While following the BodyPoseNet TAO Jupyter notebook I encountered a problem when I had to export the model as tfonnx format (part 9.2 of the notebook).

After running:

!tao bpnet export -m $USER_EXPERIMENT_DIR/models/exp_m1_retrain/$RETRAIN_MODEL_CHECKPOINT
-e $SPECS_DIR/bpnet_retrain_m1_coco.yaml
-o $USER_EXPERIMENT_DIR/models/exp_m1_final/bpnet_model.etlt
-k $KEY
-t tfonnx \

I got this:

INFO:tensorflow:Restoring parameters from /tmp/tmp_pzary68.ckpt
2022-05-24 12:04:55,419 [INFO] tensorflow: Restoring parameters from /tmp/tmp_pzary68.ckpt
INFO:tensorflow:Froze 107 variables.
2022-05-24 12:04:55,664 [INFO] tensorflow: Froze 107 variables.
INFO:tensorflow:Converted 107 variables to const ops.
2022-05-24 12:04:55,711 [INFO] tensorflow: Converted 107 variables to const ops.
WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/driveix/build_wheel.runfiles/ai_infra/driveix/common/utilities/tlt_utils.py:503: The name tf.gfile.GFile is deprecated. Please use tf.io.gfile.GFile instead.

2022-05-24 12:04:56,056 [WARNING] tensorflow: From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/driveix/build_wheel.runfiles/ai_infra/driveix/common/utilities/tlt_utils.py:503: The name tf.gfile.GFile is deprecated. Please use tf.io.gfile.GFile instead.

2022-05-24 12:04:56,952 [INFO] tf2onnx.tf_utils: Computed 85 values for constant folding
2022-05-24 12:04:57,619 [INFO] tf2onnx.optimizer: Optimizing ONNX model
2022-05-24 12:04:58,301 [INFO] tf2onnx.optimizer: After optimization: Add -37 (37->0), Const -13 (87->74), Identity -2 (2->0), Mul -11 (11->0), Transpose -71 (74->3)
2022-05-24 12:04:58,675 [INFO] driveix.common.export.base_exporter: Output Tensors: [‘paf_out/BiasAdd:0’, ‘heatmap_out/BiasAdd:0’]
2022-05-24 12:04:58,676 [INFO] driveix.common.export.base_exporter: Input Tensors: input_1:0 of shape: (None, None, None, 3)
2022-05-24 12:04:59,461 [INFO] numba.cuda.cudadrv.driver: init
[TensorRT] INTERNAL ERROR: [virtualMemoryBuffer.cpp::resizePhysical::79] Error Code 2: OutOfMemory (no further information)
[TensorRT] INTERNAL ERROR: [virtualMemoryBuffer.cpp::resizePhysical::65] Error Code 2: OutOfMemory (no further information)
[TensorRT] ERROR: Requested amount of GPU memory (536870912 bytes) could not be allocated. There may not be enough free memory for allocation to succeed.
[TensorRT] INTERNAL ERROR: [virtualMemoryBuffer.cpp::resizePhysical::65] Error Code 2: OutOfMemory (no further information)
[TensorRT] INTERNAL ERROR: [virtualMemoryBuffer.cpp::resizePhysical::65] Error Code 2: OutOfMemory (no further information)
[TensorRT] ERROR: Requested amount of GPU memory (1073741824 bytes) could not be allocated. There may not be enough free memory for allocation to succeed.
[TensorRT] INTERNAL ERROR: [virtualMemoryBuffer.cpp::resizePhysical::65] Error Code 2: OutOfMemory (no further information)
[TensorRT] INTERNAL ERROR: [virtualMemoryBuffer.cpp::resizePhysical::65] Error Code 2: OutOfMemory (no further information)
[TensorRT] ERROR: Requested amount of GPU memory (1073741824 bytes) could not be allocated. There may not be enough free memory for allocation to succeed.
[TensorRT] INTERNAL ERROR: [virtualMemoryBuffer.cpp::resizePhysical::65] Error Code 2: OutOfMemory (no further information)
[TensorRT] INTERNAL ERROR: [virtualMemoryBuffer.cpp::resizePhysical::65] Error Code 2: OutOfMemory (no further information)
[TensorRT] ERROR: Requested amount of GPU memory (1073741824 bytes) could not be allocated. There may not be enough free memory for allocation to succeed.
[TensorRT] INTERNAL ERROR: [virtualMemoryBuffer.cpp::resizePhysical::65] Error Code 2: OutOfMemory (no further information)
[TensorRT] INTERNAL ERROR: [virtualMemoryBuffer.cpp::resizePhysical::65] Error Code 2: OutOfMemory (no further information)
[TensorRT] ERROR: Requested amount of GPU memory (1073741824 bytes) could not be allocated. There may not be enough free memory for allocation to succeed.
[TensorRT] INTERNAL ERROR: [virtualMemoryBuffer.cpp::resizePhysical::65] Error Code 2: OutOfMemory (no further information)
[TensorRT] INTERNAL ERROR: [virtualMemoryBuffer.cpp::resizePhysical::65] Error Code 2: OutOfMemory (no further information)
[TensorRT] ERROR: Requested amount of GPU memory (1073741824 bytes) could not be allocated. There may not be enough free memory for allocation to succeed.
[TensorRT] INTERNAL ERROR: [virtualMemoryBuffer.cpp::resizePhysical::65] Error Code 2: OutOfMemory (no further information)
[TensorRT] INTERNAL ERROR: [virtualMemoryBuffer.cpp::resizePhysical::65] Error Code 2: OutOfMemory (no further information)
[TensorRT] ERROR: Requested amount of GPU memory (1073741824 bytes) could not be allocated. There may not be enough free memory for allocation to succeed.
[TensorRT] INTERNAL ERROR: [virtualMemoryBuffer.cpp::resizePhysical::65] Error Code 2: OutOfMemory (no further information)
[TensorRT] INTERNAL ERROR: [virtualMemoryBuffer.cpp::resizePhysical::65] Error Code 2: OutOfMemory (no further information)
[TensorRT] ERROR: Requested amount of GPU memory (1073741824 bytes) could not be allocated. There may not be enough free memory for allocation to succeed.
[TensorRT] INTERNAL ERROR: [virtualMemoryBuffer.cpp::resizePhysical::65] Error Code 2: OutOfMemory (no further information)
[TensorRT] INTERNAL ERROR: [virtualMemoryBuffer.cpp::resizePhysical::65] Error Code 2: OutOfMemory (no further information)
[TensorRT] ERROR: Requested amount of GPU memory (1073741824 bytes) could not be allocated. There may not be enough free memory for allocation to succeed.
[TensorRT] INTERNAL ERROR: [virtualMemoryBuffer.cpp::resizePhysical::65] Error Code 2: OutOfMemory (no further information)
[TensorRT] INTERNAL ERROR: [virtualMemoryBuffer.cpp::resizePhysical::65] Error Code 2: OutOfMemory (no further information)
[TensorRT] ERROR: Requested amount of GPU memory (536870912 bytes) could not be allocated. There may not be enough free memory for allocation to succeed.
[TensorRT] INTERNAL ERROR: [virtualMemoryBuffer.cpp::resizePhysical::65] Error Code 2: OutOfMemory (no further information)
[TensorRT] INTERNAL ERROR: [virtualMemoryBuffer.cpp::resizePhysical::65] Error Code 2: OutOfMemory (no further information)
[TensorRT] ERROR: Requested amount of GPU memory (536870912 bytes) could not be allocated. There may not be enough free memory for allocation to succeed.
2022-05-24 14:05:16,267 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.

It happens because my GPU doesn’t have enough memory: nvidia-smi showed me that"Volatile GPU-Util" peaked at 100%. So I tried to run the command with these parameters:

              --batch_size 1 \
              --max_batch_size 1 \
              --max_workspace_size 10000000000 \
              --static_batch_size 1 \

Yet I still got the problem and I don’t know how to reduce even more the memory usage.
If anyone has an idea…

Best regards,
Nicolas

Morganh · May 24, 2022, 1:49pm

You can export to fp16 or INT8 mode.

taoonvision · May 24, 2022, 3:45pm

With int8:

!tao bpnet export -m $USER_EXPERIMENT_DIR/models/exp_m1_retrain/$RETRAIN_MODEL_CHECKPOINT
-e $SPECS_DIR/bpnet_retrain_m1_coco.yaml
-o $USER_EXPERIMENT_DIR/models/exp_m1_final/bpnet_model.etlt
-k $KEY
-t tfonnx
–data_type int8\

2022-05-24 15:33:37,262 [WARNING] tensorflow: From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/driveix/build_wheel.runfiles/ai_infra/driveix/common/utilities/tlt_utils.py:503: The name tf.gfile.GFile is deprecated. Please use tf.io.gfile.GFile instead.

2022-05-24 15:33:38,333 [INFO] tf2onnx.tf_utils: Computed 85 values for constant folding
2022-05-24 15:33:38,977 [INFO] tf2onnx.optimizer: Optimizing ONNX model
2022-05-24 15:33:39,735 [INFO] tf2onnx.optimizer: After optimization: Add -37 (37->0), Const -13 (87->74), Identity -2 (2->0), Mul -11 (11->0), Transpose -71 (74->3)
2022-05-24 15:33:40,217 [INFO] driveix.common.export.base_exporter: Output Tensors: [‘paf_out/BiasAdd:0’, ‘heatmap_out/BiasAdd:0’]
2022-05-24 15:33:40,218 [INFO] driveix.common.export.base_exporter: Input Tensors: input_1:0 of shape: (None, None, None, 3)
2022-05-24 15:33:41,074 [INFO] numba.cuda.cudadrv.driver: init
2022-05-24 15:33:41,099 [INFO] driveix.common.export.base_exporter: Generating a tensorfile with random tensor images. This may work well as a profiling tool, however, it may result in inaccurate results at inference. Please generate a tensorfile using the tlt-int8-tensorfile, or provide a custom directory of images for best performance.
Traceback (most recent call last):
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/driveix/build_wheel.runfiles/ai_infra/driveix/bpnet/scripts/export.py”, line 236, in
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/driveix/build_wheel.runfiles/ai_infra/driveix/bpnet/scripts/export.py”, line 232, in main
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/driveix/build_wheel.runfiles/ai_infra/driveix/bpnet/scripts/export.py”, line 226, in run_export
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/driveix/build_wheel.runfiles/ai_infra/driveix/bpnet/exporter/bpnet_exporter.py”, line 218, in export
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/driveix/build_wheel.runfiles/ai_infra/driveix/common/export/base_exporter.py”, line 204, in get_calibrator
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/driveix/build_wheel.runfiles/ai_infra/driveix/common/export/base_exporter.py”, line 329, in generate_tensor_file
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/driveix/build_wheel.runfiles/ai_infra/driveix/common/export/base_exporter.py”, line 378, in generate_random_tensorfile
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/common/export/tensorfile.py”, line 54, in init
File “/usr/local/lib/python3.6/dist-packages/h5py/_hl/files.py”, line 312, in init
fid = make_fid(name, mode, userblock_size, fapl, swmr=swmr)
File “/usr/local/lib/python3.6/dist-packages/h5py/_hl/files.py”, line 148, in make_fid
fid = h5f.create(name, h5f.ACC_TRUNC, fapl=fapl, fcpl=fcpl)
File “h5py/_objects.pyx”, line 54, in h5py._objects.with_phil.wrapper
File “h5py/_objects.pyx”, line 55, in h5py._objects.with_phil.wrapper
File “h5py/h5f.pyx”, line 98, in h5py.h5f.create
ValueError: Invalid file name (invalid file name)
Error in atexit._run_exitfuncs:
Traceback (most recent call last):
File “/usr/local/lib/python3.6/dist-packages/pycuda/autoinit.py”, line 14, in _finish_up
context.pop()
pycuda._driver.LogicError: cuCtxPopCurrent failed: invalid device context

PyCUDA ERROR: The context stack was not empty upon module cleanup.

A context was still active when the context stack was being
cleaned up. At this point in our execution, CUDA may already
have been deinitialized, so there is no way we can finish
cleanly. The program will be aborted now.
Use Context.pop() to avoid this problem.

Aborted (core dumped)
Traceback (most recent call last):
File “/usr/local/bin/bpnet”, line 8, in
sys.exit(main())
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/driveix/build_wheel.runfiles/ai_infra/driveix/bpnet/entrypoint/bpnet.py”, line 12, in main
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/driveix/build_wheel.runfiles/ai_infra/driveix/common/entrypoint/entrypoint.py”, line 300, in launch_job
AssertionError: Process run failed.
2022-05-24 17:33:42,695 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.

With fp16:

!tao bpnet export -m $USER_EXPERIMENT_DIR/models/exp_m1_retrain/$RETRAIN_MODEL_CHECKPOINT
-e $SPECS_DIR/bpnet_retrain_m1_coco.yaml
-o $USER_EXPERIMENT_DIR/models/exp_m1_final/bpnet_model.etlt
-k $KEY
-t tfonnx
–data_type fp16\

2022-05-24 15:35:53,946 [ERROR] modulus.export._tensorrt: Specified FP16 but not supported on platform.
Traceback (most recent call last):
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/driveix/build_wheel.runfiles/ai_infra/driveix/bpnet/scripts/export.py”, line 236, in
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/driveix/build_wheel.runfiles/ai_infra/driveix/bpnet/scripts/export.py”, line 232, in main
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/driveix/build_wheel.runfiles/ai_infra/driveix/bpnet/scripts/export.py”, line 226, in run_export
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/driveix/build_wheel.runfiles/ai_infra/driveix/bpnet/exporter/bpnet_exporter.py”, line 251, in export
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/core/build_wheel.runfiles/ai_infra/moduluspy/modulus/export/_tensorrt.py”, line 781, in init
AttributeError: Specified FP16 but not supported on platform.
Traceback (most recent call last):
File “/usr/local/bin/bpnet”, line 8, in
sys.exit(main())
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/driveix/build_wheel.runfiles/ai_infra/driveix/bpnet/entrypoint/bpnet.py”, line 12, in main
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/driveix/build_wheel.runfiles/ai_infra/driveix/common/entrypoint/entrypoint.py”, line 300, in launch_job
AssertionError: Process run failed.
2022-05-24 17:35:55,363 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.

FP16 seems not to be supported and INT8 isn’t working because of an invalid filename even though I verified the weights path, coco_spec.json path, images path and json keypoints path.

Sorry to bother you,
Nicolas

Morganh · May 24, 2022, 3:48pm

Please try to follow jupyter notebook for exporting int8 model.
https://catalog.ngc.nvidia.com/orgs/nvidia/teams/tao/resources/cv_samples/version/v1.3.0/files/bpnet/bpnet.ipynb#head-10-4

taoonvision · May 30, 2022, 7:02am

The problem is that my GPU is way below the hardware requirements:

Thanks for the help

Morganh · May 30, 2022, 7:34am

Please try to use another machine. The Quadro K610M has not enough GPU memory.

Topic		Replies	Views
Tao toolkit facenet Error TAO Toolkit	14	1282	March 7, 2022
Bpnet sample code error TAO Toolkit	13	771	October 11, 2022
Train with my own tlt model #2 TAO Toolkit	42	2777	February 8, 2022
[TAO 5] [Object Detection] Can't export a DINO model after training successfully. Missing Layers? TAO Toolkit	19	822	September 29, 2023
TensoRT export of PoseNet batch size problems TAO Toolkit tensorrt	7	674	April 20, 2023
Tao Training Model Error TAO Toolkit	7	494	January 15, 2024
Bpnet dataset_convert error in tao TAO Toolkit	6	510	October 20, 2022
Int8 Optimization on BodyPose Net fails TAO Toolkit	9	572	November 16, 2021
TAO toolkit happend some .so bug TAO Toolkit tao	19	903	September 9, 2022
Unable to export QAT yolov3 in int8 TAO Toolkit	7	550	April 25, 2023

BodyPoseNet TAO - Exporting Model Out of Memory

PyCUDA ERROR: The context stack was not empty upon module cleanup.

A context was still active when the context stack was being cleaned up. At this point in our execution, CUDA may already have been deinitialized, so there is no way we can finish cleanly. The program will be aborted now. Use Context.pop() to avoid this problem.

Related topics

A context was still active when the context stack was being
cleaned up. At this point in our execution, CUDA may already
have been deinitialized, so there is no way we can finish
cleanly. The program will be aborted now.
Use Context.pop() to avoid this problem.