Please provide the following information when requesting support.
• Hardware (Ubuntu 18.04 PC with NVIDIA Quadro K610M)
• Network Type (BodyPoseNet)
• TLT Version (3.22.02)
• How to reproduce the issue ?
While following the BodyPoseNet TAO Jupyter notebook I encountered a problem when I had to export the model as tfonnx format (part 9.2 of the notebook).
After running:
!tao bpnet export -m $USER_EXPERIMENT_DIR/models/exp_m1_retrain/$RETRAIN_MODEL_CHECKPOINT
-e $SPECS_DIR/bpnet_retrain_m1_coco.yaml
-o $USER_EXPERIMENT_DIR/models/exp_m1_final/bpnet_model.etlt
-k $KEY
-t tfonnx \
I got this:
INFO:tensorflow:Restoring parameters from /tmp/tmp_pzary68.ckpt
2022-05-24 12:04:55,419 [INFO] tensorflow: Restoring parameters from /tmp/tmp_pzary68.ckpt
INFO:tensorflow:Froze 107 variables.
2022-05-24 12:04:55,664 [INFO] tensorflow: Froze 107 variables.
INFO:tensorflow:Converted 107 variables to const ops.
2022-05-24 12:04:55,711 [INFO] tensorflow: Converted 107 variables to const ops.
WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/driveix/build_wheel.runfiles/ai_infra/driveix/common/utilities/tlt_utils.py:503: The name tf.gfile.GFile is deprecated. Please use tf.io.gfile.GFile instead.2022-05-24 12:04:56,056 [WARNING] tensorflow: From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/driveix/build_wheel.runfiles/ai_infra/driveix/common/utilities/tlt_utils.py:503: The name tf.gfile.GFile is deprecated. Please use tf.io.gfile.GFile instead.
2022-05-24 12:04:56,952 [INFO] tf2onnx.tf_utils: Computed 85 values for constant folding
2022-05-24 12:04:57,619 [INFO] tf2onnx.optimizer: Optimizing ONNX model
2022-05-24 12:04:58,301 [INFO] tf2onnx.optimizer: After optimization: Add -37 (37->0), Const -13 (87->74), Identity -2 (2->0), Mul -11 (11->0), Transpose -71 (74->3)
2022-05-24 12:04:58,675 [INFO] driveix.common.export.base_exporter: Output Tensors: [‘paf_out/BiasAdd:0’, ‘heatmap_out/BiasAdd:0’]
2022-05-24 12:04:58,676 [INFO] driveix.common.export.base_exporter: Input Tensors: input_1:0 of shape: (None, None, None, 3)
2022-05-24 12:04:59,461 [INFO] numba.cuda.cudadrv.driver: init
[TensorRT] INTERNAL ERROR: [virtualMemoryBuffer.cpp::resizePhysical::79] Error Code 2: OutOfMemory (no further information)
[TensorRT] INTERNAL ERROR: [virtualMemoryBuffer.cpp::resizePhysical::65] Error Code 2: OutOfMemory (no further information)
[TensorRT] ERROR: Requested amount of GPU memory (536870912 bytes) could not be allocated. There may not be enough free memory for allocation to succeed.
[TensorRT] INTERNAL ERROR: [virtualMemoryBuffer.cpp::resizePhysical::65] Error Code 2: OutOfMemory (no further information)
[TensorRT] INTERNAL ERROR: [virtualMemoryBuffer.cpp::resizePhysical::65] Error Code 2: OutOfMemory (no further information)
[TensorRT] ERROR: Requested amount of GPU memory (1073741824 bytes) could not be allocated. There may not be enough free memory for allocation to succeed.
[TensorRT] INTERNAL ERROR: [virtualMemoryBuffer.cpp::resizePhysical::65] Error Code 2: OutOfMemory (no further information)
[TensorRT] INTERNAL ERROR: [virtualMemoryBuffer.cpp::resizePhysical::65] Error Code 2: OutOfMemory (no further information)
[TensorRT] ERROR: Requested amount of GPU memory (1073741824 bytes) could not be allocated. There may not be enough free memory for allocation to succeed.
[TensorRT] INTERNAL ERROR: [virtualMemoryBuffer.cpp::resizePhysical::65] Error Code 2: OutOfMemory (no further information)
[TensorRT] INTERNAL ERROR: [virtualMemoryBuffer.cpp::resizePhysical::65] Error Code 2: OutOfMemory (no further information)
[TensorRT] ERROR: Requested amount of GPU memory (1073741824 bytes) could not be allocated. There may not be enough free memory for allocation to succeed.
[TensorRT] INTERNAL ERROR: [virtualMemoryBuffer.cpp::resizePhysical::65] Error Code 2: OutOfMemory (no further information)
[TensorRT] INTERNAL ERROR: [virtualMemoryBuffer.cpp::resizePhysical::65] Error Code 2: OutOfMemory (no further information)
[TensorRT] ERROR: Requested amount of GPU memory (1073741824 bytes) could not be allocated. There may not be enough free memory for allocation to succeed.
[TensorRT] INTERNAL ERROR: [virtualMemoryBuffer.cpp::resizePhysical::65] Error Code 2: OutOfMemory (no further information)
[TensorRT] INTERNAL ERROR: [virtualMemoryBuffer.cpp::resizePhysical::65] Error Code 2: OutOfMemory (no further information)
[TensorRT] ERROR: Requested amount of GPU memory (1073741824 bytes) could not be allocated. There may not be enough free memory for allocation to succeed.
[TensorRT] INTERNAL ERROR: [virtualMemoryBuffer.cpp::resizePhysical::65] Error Code 2: OutOfMemory (no further information)
[TensorRT] INTERNAL ERROR: [virtualMemoryBuffer.cpp::resizePhysical::65] Error Code 2: OutOfMemory (no further information)
[TensorRT] ERROR: Requested amount of GPU memory (1073741824 bytes) could not be allocated. There may not be enough free memory for allocation to succeed.
[TensorRT] INTERNAL ERROR: [virtualMemoryBuffer.cpp::resizePhysical::65] Error Code 2: OutOfMemory (no further information)
[TensorRT] INTERNAL ERROR: [virtualMemoryBuffer.cpp::resizePhysical::65] Error Code 2: OutOfMemory (no further information)
[TensorRT] ERROR: Requested amount of GPU memory (1073741824 bytes) could not be allocated. There may not be enough free memory for allocation to succeed.
[TensorRT] INTERNAL ERROR: [virtualMemoryBuffer.cpp::resizePhysical::65] Error Code 2: OutOfMemory (no further information)
[TensorRT] INTERNAL ERROR: [virtualMemoryBuffer.cpp::resizePhysical::65] Error Code 2: OutOfMemory (no further information)
[TensorRT] ERROR: Requested amount of GPU memory (1073741824 bytes) could not be allocated. There may not be enough free memory for allocation to succeed.
[TensorRT] INTERNAL ERROR: [virtualMemoryBuffer.cpp::resizePhysical::65] Error Code 2: OutOfMemory (no further information)
[TensorRT] INTERNAL ERROR: [virtualMemoryBuffer.cpp::resizePhysical::65] Error Code 2: OutOfMemory (no further information)
[TensorRT] ERROR: Requested amount of GPU memory (536870912 bytes) could not be allocated. There may not be enough free memory for allocation to succeed.
[TensorRT] INTERNAL ERROR: [virtualMemoryBuffer.cpp::resizePhysical::65] Error Code 2: OutOfMemory (no further information)
[TensorRT] INTERNAL ERROR: [virtualMemoryBuffer.cpp::resizePhysical::65] Error Code 2: OutOfMemory (no further information)
[TensorRT] ERROR: Requested amount of GPU memory (536870912 bytes) could not be allocated. There may not be enough free memory for allocation to succeed.
2022-05-24 14:05:16,267 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.
It happens because my GPU doesn’t have enough memory: nvidia-smi showed me that"Volatile GPU-Util" peaked at 100%. So I tried to run the command with these parameters:
--batch_size 1 \ --max_batch_size 1 \ --max_workspace_size 10000000000 \ --static_batch_size 1 \
Yet I still got the problem and I don’t know how to reduce even more the memory usage.
If anyone has an idea…
Best regards,
Nicolas