I use the faster rcnn notebook.
!tao info
Configuration of the TAO Toolkit Instance
dockers: [‘nvidia/tao/tao-toolkit-tf’, ‘nvidia/tao/tao-toolkit-pyt’, ‘nvidia/tao/tao-toolkit-lm’]
format_version: 2.0
toolkit_version: 3.22.05
published_date: 05/25/2022t
!pip3 install nvidia-pyindex
!pip3 install nvidia-tao==0.1.24`
%env CLI=ngccli_cat_linux.zip
!mkdir -p $PROJECT_DIR/ngccli
!rm -rf $PROJECT_DIR/ngccli/*
!wget “NVIDIA NGC” -P $PROJECT_DIR/ngccli
!unzip -u “$PROJECT_DIR/ngccli/$CLI” -d $PROJECT_DIR/ngccli/
!rm $PROJECT_DIR/ngccli/*.zip
os.environ[“PATH”]=“{}/ngccli:{}”.format(os.getenv(“PROJECT_DIR”, “”), os.getenv(“PATH”, “”))
The error mentionned in the last post is caused by the following command:
!tao faster_rcnn train --gpu_index $GPU_INDEX -e $SPECS_DIR/default_spec_resnet18.txt
2022-06-21 17:44:40,146 [INFO] root: Registry: [‘nvcr.io’]
2022-06-21 17:44:40,210 [INFO] tlt.components.instance_handler.local_instance: Running command in container: nvcr.io/nvidia/tao/tao-toolkit-tf:v3.22.05-tf1.15.5-py3
2022-06-21 17:44:40,223 [WARNING] tlt.components.docker_handler.docker_handler:
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the “user”:“UID:GID” in the
DockerOptions portion of the “/home/pryntec/.tao_mounts.json” file. You can obtain your
users UID and GID by using the “id -u” and “id -g” commands on the
terminal.
Using TensorFlow backend.
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
/usr/local/lib/python3.6/dist-packages/requests/init.py:91: RequestsDependencyWarning: urllib3 (1.26.5) or chardet (3.0.4) doesn’t match a supported version!
RequestsDependencyWarning)
Using TensorFlow backend.
2022-06-21 15:44:45,803 [INFO] iva.faster_rcnn.spec_loader.spec_loader: Loading experiment spec at /workspace/tao-experiments/faster_rcnn/specs/default_spec_resnet18.txt.
2022-06-21 15:44:46,100 [INFO] iva.common.logging.logging: Log file already exists at /workspace/tao-experiments/exp/tlt/status.json
WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/faster_rcnn/scripts/train.py:69: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.
2022-06-21 15:44:46,102 [WARNING] tensorflow: From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/faster_rcnn/scripts/train.py:69: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.
WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/faster_rcnn/scripts/train.py:78: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.
2022-06-21 15:44:46,102 [WARNING] tensorflow: From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/faster_rcnn/scripts/train.py:78: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:153: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.
2022-06-21 15:44:48,334 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:153: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.
WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/faster_rcnn/utils/utils.py:407: The name tf.set_random_seed is deprecated. Please use tf.compat.v1.set_random_seed instead.
2022-06-21 15:44:48,335 [WARNING] tensorflow: From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/faster_rcnn/utils/utils.py:407: The name tf.set_random_seed is deprecated. Please use tf.compat.v1.set_random_seed instead.
2022-06-21 15:44:48,460 [INFO] root: Sampling mode of the dataloader was set to user_defined.
2022-06-21 15:44:48,462 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: Serial augmentation enabled = False
2022-06-21 15:44:48,462 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: Pseudo sharding enabled = False
2022-06-21 15:44:48,462 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: Max Image Dimensions (all sources): (0, 0)
2022-06-21 15:44:48,462 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: number of cpus: 4, io threads: 8, compute threads: 4, buffered batches: 4
2022-06-21 15:44:48,462 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: total dataset size 392, number of sources: 1, batch size per gpu: 8, steps: 49
WARNING:tensorflow:Entity <bound method DriveNetTFRecordsParser.call of <iva.detectnet_v2.dataloader.drivenet_dataloader.DriveNetTFRecordsParser object at 0x7f605dbae2b0>> could not be transformed and will be executed as-is. Please report this to the AutoGraph team. When filing the bug, set the verbosity to 10 (on Linux, export AUTOGRAPH_VERBOSITY=10
) and attach the full output. Cause: Unable to locate the source code of <bound method DriveNetTFRecordsParser.call of <iva.detectnet_v2.dataloader.drivenet_dataloader.DriveNetTFRecordsParser object at 0x7f605dbae2b0>>. Note that functions defined in certain environments, like the interactive Python shell do not expose their source code. If that is the case, you should to define them in a .py source file. If you are certain the code is graph-compatible, wrap the call using @tf.autograph.do_not_convert. Original error: could not get source code
2022-06-21 15:44:48,543 [WARNING] tensorflow: Entity <bound method DriveNetTFRecordsParser.call of <iva.detectnet_v2.dataloader.drivenet_dataloader.DriveNetTFRecordsParser object at 0x7f605dbae2b0>> could not be transformed and will be executed as-is. Please report this to the AutoGraph team. When filing the bug, set the verbosity to 10 (on Linux, export AUTOGRAPH_VERBOSITY=10
) and attach the full output. Cause: Unable to locate the source code of <bound method DriveNetTFRecordsParser.call of <iva.detectnet_v2.dataloader.drivenet_dataloader.DriveNetTFRecordsParser object at 0x7f605dbae2b0>>. Note that functions defined in certain environments, like the interactive Python shell do not expose their source code. If that is the case, you should to define them in a .py source file. If you are certain the code is graph-compatible, wrap the call using @tf.autograph.do_not_convert. Original error: could not get source code
2022-06-21 15:44:48,560 [INFO] iva.detectnet_v2.dataloader.default_dataloader: Bounding box coordinates were detected in the input specification! Bboxes will be automatically converted to polygon coordinates.
2022-06-21 15:44:48,790 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: shuffle: True - shard 0 of 1
2022-06-21 15:44:48,796 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: sampling 1 datasets with weights:
2022-06-21 15:44:48,796 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: source: 0 weight: 1.000000
WARNING:tensorflow:Entity <bound method Processor.call of <modulus.blocks.data_loaders.multi_source_loader.processors.asset_loader.AssetLoader object at 0x7f603407bdd8>> could not be transformed and will be executed as-is. Please report this to the AutoGraph team. When filing the bug, set the verbosity to 10 (on Linux, export AUTOGRAPH_VERBOSITY=10
) and attach the full output. Cause: Unable to locate the source code of <bound method Processor.call of <modulus.blocks.data_loaders.multi_source_loader.processors.asset_loader.AssetLoader object at 0x7f603407bdd8>>. Note that functions defined in certain environments, like the interactive Python shell do not expose their source code. If that is the case, you should to define them in a .py source file. If you are certain the code is graph-compatible, wrap the call using @tf.autograph.do_not_convert. Original error: could not get source code
2022-06-21 15:44:48,808 [WARNING] tensorflow: Entity <bound method Processor.call of <modulus.blocks.data_loaders.multi_source_loader.processors.asset_loader.AssetLoader object at 0x7f603407bdd8>> could not be transformed and will be executed as-is. Please report this to the AutoGraph team. When filing the bug, set the verbosity to 10 (on Linux, export AUTOGRAPH_VERBOSITY=10
) and attach the full output. Cause: Unable to locate the source code of <bound method Processor.call of <modulus.blocks.data_loaders.multi_source_loader.processors.asset_loader.AssetLoader object at 0x7f603407bdd8>>. Note that functions defined in certain environments, like the interactive Python shell do not expose their source code. If that is the case, you should to define them in a .py source file. If you are certain the code is graph-compatible, wrap the call using @tf.autograph.do_not_convert. Original error: could not get source code
WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/core/build_wheel.runfiles/ai_infra/moduluspy/modulus/blocks/data_loaders/multi_source_loader/types/images2d_reference.py:427: The name tf.image.resize_images is deprecated. Please use tf.image.resize instead.
2022-06-21 15:44:48,831 [WARNING] tensorflow: From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/core/build_wheel.runfiles/ai_infra/moduluspy/modulus/blocks/data_loaders/multi_source_loader/types/images2d_reference.py:427: The name tf.image.resize_images is deprecated. Please use tf.image.resize instead.
WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/faster_rcnn/data_loader/inputs_loader.py:230: The name tf.debugging.assert_less_equal is deprecated. Please use tf.compat.v1.debugging.assert_less_equal instead.
2022-06-21 15:44:49,506 [WARNING] tensorflow: From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/faster_rcnn/data_loader/inputs_loader.py:230: The name tf.debugging.assert_less_equal is deprecated. Please use tf.compat.v1.debugging.assert_less_equal instead.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:4138: The name tf.random_uniform is deprecated. Please use tf.random.uniform instead.
2022-06-21 15:44:49,874 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:4138: The name tf.random_uniform is deprecated. Please use tf.random.uniform instead.
WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/faster_rcnn/layers/utils.py:76: The name tf.debugging.assert_less is deprecated. Please use tf.compat.v1.debugging.assert_less instead.
2022-06-21 15:44:50,314 [WARNING] tensorflow: From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/faster_rcnn/layers/utils.py:76: The name tf.debugging.assert_less is deprecated. Please use tf.compat.v1.debugging.assert_less instead.
WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/faster_rcnn/layers/utils.py:389: The name tf.random_shuffle is deprecated. Please use tf.random.shuffle instead.
2022-06-21 15:44:51,512 [WARNING] tensorflow: From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/faster_rcnn/layers/utils.py:389: The name tf.random_shuffle is deprecated. Please use tf.random.shuffle instead.
WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/faster_rcnn/layers/utils.py:262: The name tf.log is deprecated. Please use tf.math.log instead.
2022-06-21 15:44:51,770 [WARNING] tensorflow: From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/faster_rcnn/layers/utils.py:262: The name tf.log is deprecated. Please use tf.math.log instead.
WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/faster_rcnn/layers/CropAndResize.py:79: The name tf.floor_div is deprecated. Please use tf.math.floordiv instead.
2022-06-21 15:44:54,570 [WARNING] tensorflow: From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/faster_rcnn/layers/CropAndResize.py:79: The name tf.floor_div is deprecated. Please use tf.math.floordiv instead.
WARNING:tensorflow:From /opt/nvidia/third_party/keras/tensorflow_backend.py:187: The name tf.nn.avg_pool is deprecated. Please use tf.nn.avg_pool2d instead.
2022-06-21 15:44:54,709 [WARNING] tensorflow: From /opt/nvidia/third_party/keras/tensorflow_backend.py:187: The name tf.nn.avg_pool is deprecated. Please use tf.nn.avg_pool2d instead.
2022-06-21 15:44:54,740 [INFO] main: Loading pretrained weights from /workspace/tao-experiments/faster_rcnn/resnet_18.hdf5
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:174: The name tf.get_default_session is deprecated. Please use tf.compat.v1.get_default_session instead.
2022-06-21 15:44:54,740 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:174: The name tf.get_default_session is deprecated. Please use tf.compat.v1.get_default_session instead.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:190: The name tf.global_variables is deprecated. Please use tf.compat.v1.global_variables instead.
2022-06-21 15:44:54,740 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:190: The name tf.global_variables is deprecated. Please use tf.compat.v1.global_variables instead.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:199: The name tf.is_variable_initialized is deprecated. Please use tf.compat.v1.is_variable_initialized instead.
2022-06-21 15:44:54,740 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:199: The name tf.is_variable_initialized is deprecated. Please use tf.compat.v1.is_variable_initialized instead.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:206: The name tf.variables_initializer is deprecated. Please use tf.compat.v1.variables_initializer instead.
2022-06-21 15:44:55,414 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:206: The name tf.variables_initializer is deprecated. Please use tf.compat.v1.variables_initializer instead.
2022-06-21 15:44:56.122142: F ./tensorflow/core/kernels/random_op_gpu.h:225] Non-OK-status: GpuLaunchKernel(FillPhiloxRandomKernelLaunch, num_blocks, block_size, 0, d.stream(), gen, data, size, dist) status: Internal: the provided PTX was compiled with an unsupported toolchain.
[c6e7e29f8671:00056] *** Process received signal ***
[c6e7e29f8671:00056] Signal: Aborted (6)
[c6e7e29f8671:00056] Signal code: (-6)
[c6e7e29f8671:00056] [ 0] /usr/lib/x86_64-linux-gnu/libc.so.6(+0x46210)[0x7f610ba96210]
[c6e7e29f8671:00056] [ 1] /usr/lib/x86_64-linux-gnu/libc.so.6(gsignal+0xcb)[0x7f610ba9618b]
[c6e7e29f8671:00056] [ 2] /usr/lib/x86_64-linux-gnu/libc.so.6(abort+0x12b)[0x7f610ba75859]
[c6e7e29f8671:00056] [ 3] /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/_pywrap_tensorflow_internal.so(+0xc1b1788)[0x7f60af824788]
[c6e7e29f8671:00056] [ 4] /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/_pywrap_tensorflow_internal.so(ZN10tensorflow7functor16FillPhiloxRandomIN5Eigen9GpuDeviceENS_6random19UniformDistributionINS4_12PhiloxRandomEfEEEclEPNS_15OpKernelContextERKS3_S6_PfxS7+0x209)[0x7f60ac4ba529]
[c6e7e29f8671:00056] [ 5] /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/_pywrap_tensorflow_internal.so(+0x8e4401e)[0x7f60ac4b701e]
[c6e7e29f8671:00056] [ 6] /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/…/libtensorflow_framework.so.1(_ZN10tensorflow13BaseGPUDevice7ComputeEPNS_8OpKernelEPNS_15OpKernelContextE+0x3d3)[0x7f60a2973333]
[c6e7e29f8671:00056] [ 7] /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/…/libtensorflow_framework.so.1(+0x11500b7)[0x7f60a29d10b7]
[c6e7e29f8671:00056] [ 8] /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/…/libtensorflow_framework.so.1(+0x1150723)[0x7f60a29d1723]
[c6e7e29f8671:00056] [ 9] /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/…/libtensorflow_framework.so.1(_ZN5Eigen15ThreadPoolTemplIN10tensorflow6thread16EigenEnvironmentEE10WorkerLoopEi+0x28d)[0x7f60a2a86e6d]
[c6e7e29f8671:00056] [10] /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/…/libtensorflow_framework.so.1(_ZNSt17_Function_handlerIFvvEZN10tensorflow6thread16EigenEnvironment12CreateThreadESt8functionIS0_EEUlvE_E9_M_invokeERKSt9_Any_data+0x4c)[0x7f60a2a8397c]
[c6e7e29f8671:00056] [11] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xd6de4)[0x7f610adb9de4]
[c6e7e29f8671:00056] [12] /usr/lib/x86_64-linux-gnu/libpthread.so.0(+0x9609)[0x7f610ba36609]
[c6e7e29f8671:00056] [13] /usr/lib/x86_64-linux-gnu/libc.so.6(clone+0x43)[0x7f610bb72293]
[c6e7e29f8671:00056] *** End of error message ***
2022-06-21 17:44:56,683 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.
!ngc registry model list nvidia/tao/pretrained_object_detection*
/usr/bin/sh: 1: ngc: not found