Tao unet inference would stop at at around 80% of the process and no output result

Hello,

below is my command :

!tao unet inference --gpu_index=$GPU_INDEX -e $SPECS_DIR/unet_peoplesemsegnet_vanilla_unet_dynamic.txt \
                  -m $USER_EXPERIMENT_DIR/export/trt.fp32.tlt.peoplesemsegnet.engine \
                  -o $USER_EXPERIMENT_DIR/itri_experiment_peoplesemsegnet_1108/ \
                  -k tlt_encode

There is no error but the inference process would stop at around 80%, and the output files (vis_overlay_tlt & mask_labels_tlt) are empty.

And here is the log:

2022-12-08 10:08:20,262 [INFO] root: Registry: [‘nvcr.io’]
2022-12-08 10:08:20,349 [INFO] tlt.components.instance_handler.local_instance: Running command in container: nvcr.io/nvidia/tao/tao-toolkit-tf:v3.22.05-tf1.15.5-py3
Matplotlib created a temporary config/cache directory at /tmp/matplotlib-vep4ztzs because the default path (/.config/matplotlib) is not a writable directory; it is highly recommended to set the MPLCONFIGDIR environment variable to a writable directory, in particular to speed up the import of Matplotlib and to better support multiprocessing.
Using TensorFlow backend.
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
/usr/local/lib/python3.6/dist-packages/requests/init.py:91: RequestsDependencyWarning: urllib3 (1.26.5) or chardet (3.0.4) doesn’t match a supported version!
RequestsDependencyWarning)
Using TensorFlow backend.
WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/unet/scripts/inference.py:51: The name tf.logging.set_verbosity is deprecated. Please use tf.compat.v1.logging.set_verbosity instead.
2022-12-08 02:08:28,960 [WARNING] tensorflow: From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/unet/scripts/inference.py:51: The name tf.logging.set_verbosity is deprecated. Please use tf.compat.v1.logging.set_verbosity instead.
WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/unet/scripts/inference.py:51: The name tf.logging.INFO is deprecated. Please use tf.compat.v1.logging.INFO instead.
2022-12-08 02:08:28,961 [WARNING] tensorflow: From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/unet/scripts/inference.py:51: The name tf.logging.INFO is deprecated. Please use tf.compat.v1.logging.INFO instead.
2022-12-08 02:08:28,962 [INFO] iva.common.logging.logging: Log file already exists at /workspace/tao-experiments/unet_tao/itri_experiment_peoplesemsegnet_1108/status.json
2022-12-08 02:08:28,962 [INFO] root: Starting UNet Inference
2022-12-08 02:08:28,962 [INFO] main: Loading experiment spec at /workspace/tao-experiments/unet/specs/unet_peoplesemsegnet_vanilla_unet_dynamic.txt.
2022-12-08 02:08:28,962 [INFO] iva.unet.spec_handler.spec_loader: Merging specification from /workspace/tao-experiments/unet/specs/unet_peoplesemsegnet_vanilla_unet_dynamic.txt
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:153: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.
2022-12-08 02:08:28,964 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:153: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.
2022-12-08 02:08:28,966 [INFO] iva.unet.model.utilities: Label Id 1: Train Id 1
2022-12-08 02:08:28,966 [INFO] iva.unet.model.utilities: Label Id 0: Train Id 0
Phase test: Total 8500 files.
[12/08/2022-02:08:29] [TRT] [W] Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors.
[12/08/2022-02:08:31] [TRT] [W] Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors.
0%| | 0/2834 [00:00<?, ?it/s]WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/unet/utils/data_loader.py:477: The name tf.image.resize_images is deprecated. Please use tf.image.resize instead.
2022-12-08 02:08:31,848 [WARNING] tensorflow: From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/unet/utils/data_loader.py:477: The name tf.image.resize_images is deprecated. Please use tf.image.resize instead.
81%|█████████████████████████████▊ | 2288/2834 [3:58:09<56:49, 6.25s/it]2022-12-08 14:07:28,696 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.

Is it 100% reproduced? Could you check if there is enough memory? More, could you retry with less images?
More, for debug, can you open a terminal and run inference inside the docker?
$ tao unet run /bin/bash
then inside the docker, run the command. (Note, please run with “tao” )
i.e.,
#unet inference balabala

It’s successful when I try with less images.
But wouldn’t there be any warning message if memory problems occurred?

Could you please follow below way to check the log? Thanks.

Open a terminal and run inference inside the docker?
$ tao unet run /bin/bash
then inside the docker, run the command. (Note, please run with “tao” )
i.e.,
# unet inference balabala

After I ran the command “tao unet run /bin/bash” to enter the docker, I get such an error when I ran the command “unet inference --gpu_index=1 -e $SPECS_DIR/unet_peoplesemsegnet_vanilla_unet_dynamic.txt -m $USER_EXPERIMENT_DIR/export/trt.fp32.tlt.peoplesemsegnet.engine -o $USER_EXPERIMENT_DIR/itri_experiment_peoplesemsegnet_1108/ -k tlt_encode”

Matplotlib created a temporary config/cache directory at /tmp/matplotlib-2ws4n99_ because the default path (/.config/matplotlib) is not a writable directory; it is highly recommended to set the MPLCONFIGDIR environment variable to a writable directory, in particular to speed up the import of Matplotlib and to better support multiprocessing.
Using TensorFlow backend.
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
/usr/local/lib/python3.6/dist-packages/requests/__init__.py:91: RequestsDependencyWarning: urllib3 (1.26.5) or chardet (3.0.4) doesn't match a supported version!
  RequestsDependencyWarning)
Using TensorFlow backend.
WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/unet/scripts/inference.py:51: The name tf.logging.set_verbosity is deprecated. Please use tf.compat.v1.logging.set_verbosity instead.

2022-12-14 01:42:21,988 [WARNING] tensorflow: From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/unet/scripts/inference.py:51: The name tf.logging.set_verbosity is deprecated. Please use tf.compat.v1.logging.set_verbosity instead.

WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/unet/scripts/inference.py:51: The name tf.logging.INFO is deprecated. Please use tf.compat.v1.logging.INFO instead.

2022-12-14 01:42:21,988 [WARNING] tensorflow: From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/unet/scripts/inference.py:51: The name tf.logging.INFO is deprecated. Please use tf.compat.v1.logging.INFO instead.

Traceback (most recent call last):
  File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/unet/scripts/inference.py", line 476, in <module>
  File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/unet/scripts/inference.py", line 471, in main
  File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/unet/scripts/inference.py", line 351, in run_experiment
  File "/usr/lib/python3.6/os.py", line 220, in makedirs
    mkdir(name, mode)
PermissionError: [Errno 13] Permission denied: '/itri_experiment_peoplesemsegnet_1108/'
-------------------------------------------------------------------
PyCUDA ERROR: The context stack was not empty upon module cleanup.
-------------------------------------------------------------------
A context was still active when the context stack was being
cleaned up. At this point in our execution, CUDA may already
have been deinitialized, so there is no way we can finish
cleanly. The program will be aborted now.
Use Context.pop() to avoid this problem.
-------------------------------------------------------------------

Please modify the command. To set correct explicit path of
export/trt.fp32.tlt.peoplesemsegnet.engine
and
itri_experiment_peoplesemsegnet_1108

I modified my command as “unet inference --gpu_index=1 -e /home/justin927/NVIDIA_TLT/cv_samples_vv1.4.1/unet/tao_itri/specs/unet_peoplesemsegnet_vanilla_unet_dynamic.txt -m /home/justin927/NVIDIA_TLT/unet_itri/unet_tao/export/trt.fp32.tlt.peoplesemsegnet.engine -o /home/justin927/NVIDIA_TLT/unet_itri/unet_tao/itri_experiment_peoplesemsegnet_1108/ -k tlt_encode”

And I got similar error:

Matplotlib created a temporary config/cache directory at /tmp/matplotlib-1n8u5xx3 because the default path (/.config/matplotlib) is not a writable directory; it is highly recommended to set the MPLCONFIGDIR environment variable to a writable directory, in particular to speed up the import of Matplotlib and to better support multiprocessing.
Using TensorFlow backend.
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
/usr/local/lib/python3.6/dist-packages/requests/__init__.py:91: RequestsDependencyWarning: urllib3 (1.26.5) or chardet (3.0.4) doesn't match a supported version!
  RequestsDependencyWarning)
Using TensorFlow backend.
WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/unet/scripts/inference.py:51: The name tf.logging.set_verbosity is deprecated. Please use tf.compat.v1.logging.set_verbosity instead.

2022-12-14 01:52:37,187 [WARNING] tensorflow: From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/unet/scripts/inference.py:51: The name tf.logging.set_verbosity is deprecated. Please use tf.compat.v1.logging.set_verbosity instead.

WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/unet/scripts/inference.py:51: The name tf.logging.INFO is deprecated. Please use tf.compat.v1.logging.INFO instead.

2022-12-14 01:52:37,187 [WARNING] tensorflow: From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/unet/scripts/inference.py:51: The name tf.logging.INFO is deprecated. Please use tf.compat.v1.logging.INFO instead.

Traceback (most recent call last):
  File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/unet/scripts/inference.py", line 476, in <module>
  File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/unet/scripts/inference.py", line 471, in main
  File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/unet/scripts/inference.py", line 351, in run_experiment
  File "/usr/lib/python3.6/os.py", line 210, in makedirs
    makedirs(head, mode, exist_ok)
  File "/usr/lib/python3.6/os.py", line 210, in makedirs
    makedirs(head, mode, exist_ok)
  File "/usr/lib/python3.6/os.py", line 210, in makedirs
    makedirs(head, mode, exist_ok)
  [Previous line repeated 1 more time]
  File "/usr/lib/python3.6/os.py", line 220, in makedirs
    mkdir(name, mode)
PermissionError: [Errno 13] Permission denied: '/home/justin927'
-------------------------------------------------------------------
PyCUDA ERROR: The context stack was not empty upon module cleanup.
-------------------------------------------------------------------
A context was still active when the context stack was being
cleaned up. At this point in our execution, CUDA may already
have been deinitialized, so there is no way we can finish
cleanly. The program will be aborted now.
Use Context.pop() to avoid this problem.
-------------------------------------------------------------------

Please share and check your ~/.tao_mounts.json file.
It is the mapping file for tao container.
The command needs to use the path inside the tao container.

OK, I successfully ran the command!
And I think we don’t need to go on with the question about the memory problems warning message because I found that the number of my testing files is too many.
Thanks for your help!!

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.