CLI update

Hello,

Recently, a problem appeared in nvidia CLI.
When starting a training, I get the message “Stopping container” while the same notebook was fine in the past (faster rcnn).

Some posts on the forum explain that there is a problem in a recent nvidia CLI update.
I still have the problem and the proposed workaround are not working for me.

My question is: do you have any idea when the problem will be solved ? (a part of the notebook makes an update of cli, I suppose that I will get the corrected version when it will be available).

Thank you for your help

The workarounds should be working. Please double check.

Thank you for your help.

I am not sure to follow the right procedure, I did the following:

  • update docker_handler.py as requested. Note that the name of the folder is “tlt” and not “tao” in the path “lib/python3.8/site-packages/tlt/components/docker_handler”

  • run the notebook as usually

export PATH=$PATH:/home/denis/.local/bin
export WORKON_HOME=~/Envs
export VIRTUALENVWRAPPER_PYTHON=/usr/bin/python3
source $HOME/.local/bin/virtualenvwrapper.sh
workon launcher
jupyter notebook --ip 0.0.0.0 --port 8888 --allow-root

  • update the tao_mounts.json generation in the notebook

drive_map = {
“Mounts”: [
# Mapping the data directory
{
“source”: os.environ[“LOCAL_PROJECT_DIR”],
“destination”: “/workspace/tao-experiments”
},
# Mapping the specs directory.
{
“source”: os.environ[“LOCAL_SPECS_DIR”],
“destination”: os.environ[“SPECS_DIR”]
},
],
“DockerOptions”: [
{
“user”: “{}:{}”.format(os.getuid(), os.getgid())
},
{
“entrypoint”: “”
},
],
# set gpu index for tao-converter
“Envs”: [
{“variable”: “CUDA_VISIBLE_DEVICES”, “value”: os.getenv(“GPU_INDEX”)},
]
}

  • run training from the notebook

I get the fllowing error

2022-06-21 14:06:58,118 [INFO] root: Registry: [‘nvcr.io’]
2022-06-21 14:06:58,178 [INFO] tlt.components.instance_handler.local_instance: Running command in container: nvcr.io/nvidia/tao/tao-toolkit-tf:v3.21.11-tf1.15.4-py3
Traceback (most recent call last):
File “/home/denis/Envs/launcher/bin/tao”, line 8, in
sys.exit(main())
File “/home/denis/Envs/launcher/lib/python3.8/site-packages/tlt/entrypoint/entrypoint.py”, line 113, in main
local_instance.launch_command(
File “/home/denis/Envs/launcher/lib/python3.8/site-packages/tlt/components/instance_handler/local_instance.py”, line 319, in launch_command
docker_handler.run_container(command)
File “/home/denis/Envs/launcher/lib/python3.8/site-packages/tlt/components/docker_handler/docker_handler.py”, line 289, in run_container
self.start_container(volumes, env_variables, docker_options)
File “/home/denis/Envs/launcher/lib/python3.8/site-packages/tlt/components/docker_handler/docker_handler.py”, line 248, in start_container
docker_args = self.get_docker_option_args(docker_options)
File “/home/denis/Envs/launcher/lib/python3.8/site-packages/tlt/components/docker_handler/docker_handler.py”, line 228, in get_docker_option_args
for key, value in docker_options.items():
AttributeError: ‘list’ object has no attribute ‘items’

Thank you for your help.

Suggest you to modify the ~/.tao_mount.json in terminal.
Please set correct “DockerOptions” as mentioned in the workaround 2.
Pay attention to its "{ " , etc.

More, the new version of the wheel has already been released to PyPI.
nvidia-tao==0.1.24
You can install it now.

Thank your for your help.

I updated nvidia CLI by running in the notebook

Skip this step if you have already installed the TAO launcher.

!pip3 install nvidia-pyindex
!pip3 install nvidia-tao==0.1.24

and I get the following error at training stage

2022-06-21 15:44:56.122142: F ./tensorflow/core/kernels/random_op_gpu.h:225] Non-OK-status: GpuLaunchKernel(FillPhiloxRandomKernelLaunch, num_blocks, block_size, 0, d.stream(), gen, data, size, dist) status: Internal: the provided PTX was compiled with an unsupported toolchain.
[c6e7e29f8671:00056] *** Process received signal ***
[c6e7e29f8671:00056] Signal: Aborted (6)
[c6e7e29f8671:00056] Signal code: (-6)
[c6e7e29f8671:00056] [ 0] /usr/lib/x86_64-linux-gnu/libc.so.6(+0x46210)[0x7f610ba96210]
[c6e7e29f8671:00056] [ 1] /usr/lib/x86_64-linux-gnu/libc.so.6(gsignal+0xcb)[0x7f610ba9618b]
[c6e7e29f8671:00056] [ 2] /usr/lib/x86_64-linux-gnu/libc.so.6(abort+0x12b)[0x7f610ba75859]
[c6e7e29f8671:00056] [ 3] /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/_pywrap_tensorflow_internal.so(+0xc1b1788)[0x7f60af824788]
[c6e7e29f8671:00056] [ 4] /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/_pywrap_tensorflow_internal.so(ZN10tensorflow7functor16FillPhiloxRandomIN5Eigen9GpuDeviceENS_6random19UniformDistributionINS4_12PhiloxRandomEfEEEclEPNS_15OpKernelContextERKS3_S6_PfxS7+0x209)[0x7f60ac4ba529]
[c6e7e29f8671:00056] [ 5] /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/_pywrap_tensorflow_internal.so(+0x8e4401e)[0x7f60ac4b701e]
[c6e7e29f8671:00056] [ 6] /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/…/libtensorflow_framework.so.1(_ZN10tensorflow13BaseGPUDevice7ComputeEPNS_8OpKernelEPNS_15OpKernelContextE+0x3d3)[0x7f60a2973333]
[c6e7e29f8671:00056] [ 7] /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/…/libtensorflow_framework.so.1(+0x11500b7)[0x7f60a29d10b7]
[c6e7e29f8671:00056] [ 8] /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/…/libtensorflow_framework.so.1(+0x1150723)[0x7f60a29d1723]
[c6e7e29f8671:00056] [ 9] /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/…/libtensorflow_framework.so.1(_ZN5Eigen15ThreadPoolTemplIN10tensorflow6thread16EigenEnvironmentEE10WorkerLoopEi+0x28d)[0x7f60a2a86e6d]
[c6e7e29f8671:00056] [10] /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/…/libtensorflow_framework.so.1(_ZNSt17_Function_handlerIFvvEZN10tensorflow6thread16EigenEnvironment12CreateThreadESt8functionIS0_EEUlvE_E9_M_invokeERKSt9_Any_data+0x4c)[0x7f60a2a8397c]
[c6e7e29f8671:00056] [11] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xd6de4)[0x7f610adb9de4]
[c6e7e29f8671:00056] [12] /usr/lib/x86_64-linux-gnu/libpthread.so.0(+0x9609)[0x7f610ba36609]
[c6e7e29f8671:00056] [13] /usr/lib/x86_64-linux-gnu/libc.so.6(clone+0x43)[0x7f610bb72293]
[c6e7e29f8671:00056] *** End of error message ***

2022-06-21 17:44:56,683 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.

Any idea ?

Tahnks

Could you please share full command and full log?

I use the faster rcnn notebook.

!tao info

Configuration of the TAO Toolkit Instance
dockers: [‘nvidia/tao/tao-toolkit-tf’, ‘nvidia/tao/tao-toolkit-pyt’, ‘nvidia/tao/tao-toolkit-lm’]
format_version: 2.0
toolkit_version: 3.22.05
published_date: 05/25/2022t

!pip3 install nvidia-pyindex
!pip3 install nvidia-tao==0.1.24`

%env CLI=ngccli_cat_linux.zip
!mkdir -p $PROJECT_DIR/ngccli
!rm -rf $PROJECT_DIR/ngccli/*
!wget “NVIDIA NGC” -P $PROJECT_DIR/ngccli
!unzip -u “$PROJECT_DIR/ngccli/$CLI” -d $PROJECT_DIR/ngccli/
!rm $PROJECT_DIR/ngccli/*.zip
os.environ[“PATH”]=“{}/ngccli:{}”.format(os.getenv(“PROJECT_DIR”, “”), os.getenv(“PATH”, “”))

The error mentionned in the last post is caused by the following command:
!tao faster_rcnn train --gpu_index $GPU_INDEX -e $SPECS_DIR/default_spec_resnet18.txt

2022-06-21 17:44:40,146 [INFO] root: Registry: [‘nvcr.io’]
2022-06-21 17:44:40,210 [INFO] tlt.components.instance_handler.local_instance: Running command in container: nvcr.io/nvidia/tao/tao-toolkit-tf:v3.22.05-tf1.15.5-py3
2022-06-21 17:44:40,223 [WARNING] tlt.components.docker_handler.docker_handler:
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the “user”:“UID:GID” in the
DockerOptions portion of the “/home/pryntec/.tao_mounts.json” file. You can obtain your
users UID and GID by using the “id -u” and “id -g” commands on the
terminal.
Using TensorFlow backend.
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
/usr/local/lib/python3.6/dist-packages/requests/init.py:91: RequestsDependencyWarning: urllib3 (1.26.5) or chardet (3.0.4) doesn’t match a supported version!
RequestsDependencyWarning)
Using TensorFlow backend.
2022-06-21 15:44:45,803 [INFO] iva.faster_rcnn.spec_loader.spec_loader: Loading experiment spec at /workspace/tao-experiments/faster_rcnn/specs/default_spec_resnet18.txt.
2022-06-21 15:44:46,100 [INFO] iva.common.logging.logging: Log file already exists at /workspace/tao-experiments/exp/tlt/status.json
WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/faster_rcnn/scripts/train.py:69: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.
2022-06-21 15:44:46,102 [WARNING] tensorflow: From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/faster_rcnn/scripts/train.py:69: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.
WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/faster_rcnn/scripts/train.py:78: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.
2022-06-21 15:44:46,102 [WARNING] tensorflow: From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/faster_rcnn/scripts/train.py:78: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:153: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.
2022-06-21 15:44:48,334 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:153: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.
WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/faster_rcnn/utils/utils.py:407: The name tf.set_random_seed is deprecated. Please use tf.compat.v1.set_random_seed instead.
2022-06-21 15:44:48,335 [WARNING] tensorflow: From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/faster_rcnn/utils/utils.py:407: The name tf.set_random_seed is deprecated. Please use tf.compat.v1.set_random_seed instead.
2022-06-21 15:44:48,460 [INFO] root: Sampling mode of the dataloader was set to user_defined.
2022-06-21 15:44:48,462 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: Serial augmentation enabled = False
2022-06-21 15:44:48,462 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: Pseudo sharding enabled = False
2022-06-21 15:44:48,462 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: Max Image Dimensions (all sources): (0, 0)
2022-06-21 15:44:48,462 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: number of cpus: 4, io threads: 8, compute threads: 4, buffered batches: 4
2022-06-21 15:44:48,462 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: total dataset size 392, number of sources: 1, batch size per gpu: 8, steps: 49
WARNING:tensorflow:Entity <bound method DriveNetTFRecordsParser.call of <iva.detectnet_v2.dataloader.drivenet_dataloader.DriveNetTFRecordsParser object at 0x7f605dbae2b0>> could not be transformed and will be executed as-is. Please report this to the AutoGraph team. When filing the bug, set the verbosity to 10 (on Linux, export AUTOGRAPH_VERBOSITY=10) and attach the full output. Cause: Unable to locate the source code of <bound method DriveNetTFRecordsParser.call of <iva.detectnet_v2.dataloader.drivenet_dataloader.DriveNetTFRecordsParser object at 0x7f605dbae2b0>>. Note that functions defined in certain environments, like the interactive Python shell do not expose their source code. If that is the case, you should to define them in a .py source file. If you are certain the code is graph-compatible, wrap the call using @tf.autograph.do_not_convert. Original error: could not get source code
2022-06-21 15:44:48,543 [WARNING] tensorflow: Entity <bound method DriveNetTFRecordsParser.call of <iva.detectnet_v2.dataloader.drivenet_dataloader.DriveNetTFRecordsParser object at 0x7f605dbae2b0>> could not be transformed and will be executed as-is. Please report this to the AutoGraph team. When filing the bug, set the verbosity to 10 (on Linux, export AUTOGRAPH_VERBOSITY=10) and attach the full output. Cause: Unable to locate the source code of <bound method DriveNetTFRecordsParser.call of <iva.detectnet_v2.dataloader.drivenet_dataloader.DriveNetTFRecordsParser object at 0x7f605dbae2b0>>. Note that functions defined in certain environments, like the interactive Python shell do not expose their source code. If that is the case, you should to define them in a .py source file. If you are certain the code is graph-compatible, wrap the call using @tf.autograph.do_not_convert. Original error: could not get source code
2022-06-21 15:44:48,560 [INFO] iva.detectnet_v2.dataloader.default_dataloader: Bounding box coordinates were detected in the input specification! Bboxes will be automatically converted to polygon coordinates.
2022-06-21 15:44:48,790 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: shuffle: True - shard 0 of 1
2022-06-21 15:44:48,796 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: sampling 1 datasets with weights:
2022-06-21 15:44:48,796 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: source: 0 weight: 1.000000
WARNING:tensorflow:Entity <bound method Processor.call of <modulus.blocks.data_loaders.multi_source_loader.processors.asset_loader.AssetLoader object at 0x7f603407bdd8>> could not be transformed and will be executed as-is. Please report this to the AutoGraph team. When filing the bug, set the verbosity to 10 (on Linux, export AUTOGRAPH_VERBOSITY=10) and attach the full output. Cause: Unable to locate the source code of <bound method Processor.call of <modulus.blocks.data_loaders.multi_source_loader.processors.asset_loader.AssetLoader object at 0x7f603407bdd8>>. Note that functions defined in certain environments, like the interactive Python shell do not expose their source code. If that is the case, you should to define them in a .py source file. If you are certain the code is graph-compatible, wrap the call using @tf.autograph.do_not_convert. Original error: could not get source code
2022-06-21 15:44:48,808 [WARNING] tensorflow: Entity <bound method Processor.call of <modulus.blocks.data_loaders.multi_source_loader.processors.asset_loader.AssetLoader object at 0x7f603407bdd8>> could not be transformed and will be executed as-is. Please report this to the AutoGraph team. When filing the bug, set the verbosity to 10 (on Linux, export AUTOGRAPH_VERBOSITY=10) and attach the full output. Cause: Unable to locate the source code of <bound method Processor.call of <modulus.blocks.data_loaders.multi_source_loader.processors.asset_loader.AssetLoader object at 0x7f603407bdd8>>. Note that functions defined in certain environments, like the interactive Python shell do not expose their source code. If that is the case, you should to define them in a .py source file. If you are certain the code is graph-compatible, wrap the call using @tf.autograph.do_not_convert. Original error: could not get source code
WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/core/build_wheel.runfiles/ai_infra/moduluspy/modulus/blocks/data_loaders/multi_source_loader/types/images2d_reference.py:427: The name tf.image.resize_images is deprecated. Please use tf.image.resize instead.
2022-06-21 15:44:48,831 [WARNING] tensorflow: From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/core/build_wheel.runfiles/ai_infra/moduluspy/modulus/blocks/data_loaders/multi_source_loader/types/images2d_reference.py:427: The name tf.image.resize_images is deprecated. Please use tf.image.resize instead.
WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/faster_rcnn/data_loader/inputs_loader.py:230: The name tf.debugging.assert_less_equal is deprecated. Please use tf.compat.v1.debugging.assert_less_equal instead.
2022-06-21 15:44:49,506 [WARNING] tensorflow: From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/faster_rcnn/data_loader/inputs_loader.py:230: The name tf.debugging.assert_less_equal is deprecated. Please use tf.compat.v1.debugging.assert_less_equal instead.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:4138: The name tf.random_uniform is deprecated. Please use tf.random.uniform instead.
2022-06-21 15:44:49,874 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:4138: The name tf.random_uniform is deprecated. Please use tf.random.uniform instead.
WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/faster_rcnn/layers/utils.py:76: The name tf.debugging.assert_less is deprecated. Please use tf.compat.v1.debugging.assert_less instead.
2022-06-21 15:44:50,314 [WARNING] tensorflow: From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/faster_rcnn/layers/utils.py:76: The name tf.debugging.assert_less is deprecated. Please use tf.compat.v1.debugging.assert_less instead.
WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/faster_rcnn/layers/utils.py:389: The name tf.random_shuffle is deprecated. Please use tf.random.shuffle instead.
2022-06-21 15:44:51,512 [WARNING] tensorflow: From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/faster_rcnn/layers/utils.py:389: The name tf.random_shuffle is deprecated. Please use tf.random.shuffle instead.
WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/faster_rcnn/layers/utils.py:262: The name tf.log is deprecated. Please use tf.math.log instead.
2022-06-21 15:44:51,770 [WARNING] tensorflow: From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/faster_rcnn/layers/utils.py:262: The name tf.log is deprecated. Please use tf.math.log instead.
WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/faster_rcnn/layers/CropAndResize.py:79: The name tf.floor_div is deprecated. Please use tf.math.floordiv instead.
2022-06-21 15:44:54,570 [WARNING] tensorflow: From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/faster_rcnn/layers/CropAndResize.py:79: The name tf.floor_div is deprecated. Please use tf.math.floordiv instead.
WARNING:tensorflow:From /opt/nvidia/third_party/keras/tensorflow_backend.py:187: The name tf.nn.avg_pool is deprecated. Please use tf.nn.avg_pool2d instead.
2022-06-21 15:44:54,709 [WARNING] tensorflow: From /opt/nvidia/third_party/keras/tensorflow_backend.py:187: The name tf.nn.avg_pool is deprecated. Please use tf.nn.avg_pool2d instead.
2022-06-21 15:44:54,740 [INFO] main: Loading pretrained weights from /workspace/tao-experiments/faster_rcnn/resnet_18.hdf5
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:174: The name tf.get_default_session is deprecated. Please use tf.compat.v1.get_default_session instead.
2022-06-21 15:44:54,740 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:174: The name tf.get_default_session is deprecated. Please use tf.compat.v1.get_default_session instead.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:190: The name tf.global_variables is deprecated. Please use tf.compat.v1.global_variables instead.
2022-06-21 15:44:54,740 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:190: The name tf.global_variables is deprecated. Please use tf.compat.v1.global_variables instead.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:199: The name tf.is_variable_initialized is deprecated. Please use tf.compat.v1.is_variable_initialized instead.
2022-06-21 15:44:54,740 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:199: The name tf.is_variable_initialized is deprecated. Please use tf.compat.v1.is_variable_initialized instead.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:206: The name tf.variables_initializer is deprecated. Please use tf.compat.v1.variables_initializer instead.
2022-06-21 15:44:55,414 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:206: The name tf.variables_initializer is deprecated. Please use tf.compat.v1.variables_initializer instead.
2022-06-21 15:44:56.122142: F ./tensorflow/core/kernels/random_op_gpu.h:225] Non-OK-status: GpuLaunchKernel(FillPhiloxRandomKernelLaunch, num_blocks, block_size, 0, d.stream(), gen, data, size, dist) status: Internal: the provided PTX was compiled with an unsupported toolchain.
[c6e7e29f8671:00056] *** Process received signal ***
[c6e7e29f8671:00056] Signal: Aborted (6)
[c6e7e29f8671:00056] Signal code: (-6)
[c6e7e29f8671:00056] [ 0] /usr/lib/x86_64-linux-gnu/libc.so.6(+0x46210)[0x7f610ba96210]
[c6e7e29f8671:00056] [ 1] /usr/lib/x86_64-linux-gnu/libc.so.6(gsignal+0xcb)[0x7f610ba9618b]
[c6e7e29f8671:00056] [ 2] /usr/lib/x86_64-linux-gnu/libc.so.6(abort+0x12b)[0x7f610ba75859]
[c6e7e29f8671:00056] [ 3] /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/_pywrap_tensorflow_internal.so(+0xc1b1788)[0x7f60af824788]
[c6e7e29f8671:00056] [ 4] /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/_pywrap_tensorflow_internal.so(ZN10tensorflow7functor16FillPhiloxRandomIN5Eigen9GpuDeviceENS_6random19UniformDistributionINS4_12PhiloxRandomEfEEEclEPNS_15OpKernelContextERKS3_S6_PfxS7+0x209)[0x7f60ac4ba529]
[c6e7e29f8671:00056] [ 5] /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/_pywrap_tensorflow_internal.so(+0x8e4401e)[0x7f60ac4b701e]
[c6e7e29f8671:00056] [ 6] /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/…/libtensorflow_framework.so.1(_ZN10tensorflow13BaseGPUDevice7ComputeEPNS_8OpKernelEPNS_15OpKernelContextE+0x3d3)[0x7f60a2973333]
[c6e7e29f8671:00056] [ 7] /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/…/libtensorflow_framework.so.1(+0x11500b7)[0x7f60a29d10b7]
[c6e7e29f8671:00056] [ 8] /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/…/libtensorflow_framework.so.1(+0x1150723)[0x7f60a29d1723]
[c6e7e29f8671:00056] [ 9] /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/…/libtensorflow_framework.so.1(_ZN5Eigen15ThreadPoolTemplIN10tensorflow6thread16EigenEnvironmentEE10WorkerLoopEi+0x28d)[0x7f60a2a86e6d]
[c6e7e29f8671:00056] [10] /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/…/libtensorflow_framework.so.1(_ZNSt17_Function_handlerIFvvEZN10tensorflow6thread16EigenEnvironment12CreateThreadESt8functionIS0_EEUlvE_E9_M_invokeERKSt9_Any_data+0x4c)[0x7f60a2a8397c]
[c6e7e29f8671:00056] [11] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xd6de4)[0x7f610adb9de4]
[c6e7e29f8671:00056] [12] /usr/lib/x86_64-linux-gnu/libpthread.so.0(+0x9609)[0x7f610ba36609]
[c6e7e29f8671:00056] [13] /usr/lib/x86_64-linux-gnu/libc.so.6(clone+0x43)[0x7f610bb72293]
[c6e7e29f8671:00056] *** End of error message ***
2022-06-21 17:44:56,683 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.

!ngc registry model list nvidia/tao/pretrained_object_detection*

/usr/bin/sh: 1: ngc: not found

Hi,
Could you please open an terminal and run belows.
$ tao info --verbose

Configuration of the TAO Toolkit Instance

dockers:
nvidia/tao/tao-toolkit-tf:
v3.22.05-tf1.15.5-py3:
docker_registry: nvcr.io
tasks:
1. augment
2. bpnet
3. classification
4. dssd
5. faster_rcnn
6. emotionnet
7. efficientdet
8. fpenet
9. gazenet
10. gesturenet
11. heartratenet
12. lprnet
13. mask_rcnn
14. multitask_classification
15. retinanet
16. ssd
17. unet
18. yolo_v3
19. yolo_v4
20. yolo_v4_tiny
21. converter
v3.22.05-tf1.15.4-py3:
docker_registry: nvcr.io
tasks:
1. detectnet_v2
nvidia/tao/tao-toolkit-pyt:
v3.22.05-py3:
docker_registry: nvcr.io
tasks:
1. speech_to_text
2. speech_to_text_citrinet
3. speech_to_text_conformer
4. action_recognition
5. pointpillars
6. pose_classification
7. spectro_gen
8. vocoder
v3.21.11-py3:
docker_registry: nvcr.io
tasks:
1. text_classification
2. question_answering
3. token_classification
4. intent_slot_classification
5. punctuation_and_capitalization
nvidia/tao/tao-toolkit-lm:
v3.22.05-py3:
docker_registry: nvcr.io
tasks:
1. n_gram
format_version: 2.0
toolkit_version: 3.22.05
published_date: 05/25/2022

Could you share the result of
$ nvidia-smi
and
$ nvidia-smi -L

Wed Jun 22 17:42:37 2022
±----------------------------------------------------------------------------+
| NVIDIA-SMI 470.129.06 Driver Version: 470.129.06 CUDA Version: 11.4 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce … Off | 00000000:01:00.0 On | N/A |
| 0% 35C P8 10W / 170W | 265MiB / 12051MiB | 4% Default |
| | | N/A |
±------------------------------±---------------------±---------------------+

±----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 896 G /usr/lib/xorg/Xorg 118MiB |
| 0 N/A N/A 1207 G /usr/bin/gnome-shell 26MiB |
| 0 N/A N/A 2844 G /usr/lib/firefox/firefox 118MiB |
±----------------------------------------------------------------------------+

GPU 0: NVIDIA GeForce RTX 3060 (UUID: GPU-27eb2639-0090-059e-074c-d4ecde9054a4)

Please update nvidia-driver to 510.
Refer to TAO Toolkit Quick Start Guide — TAO Toolkit 3.22.05 documentation

I will try this tomorrow, I will keep you updated. Thank you

The update of the driver solved the problem, thank you for your help :)

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.