Slow GPU workaround for NHWC error when training

Hi – I’m trying to follow the tutorial for facemask/no-mask TLT for deployment on the new Nano for a review I’m writing. But I’ve gotten myself into a pickle. I set up an old laptop to boot Ubuntu so I could use the docker container, etc. It’s GPU is only a 750M, so apparently the training script switches to GPU and I get the error about a custom layer needing NHWC. Is there a way I can force it to use the GPU (performance doesn’t matter, I can run it for as long as needed), or tweak the data processing to give it what it wants? I’m using DetectNet_v2, although with ResNet34, which seems to be the current download, although obviously I could switch back to 18 if needed. Thanks for any thoughts. Command, output, and error below:

root@2b89b661a973:/workspace# tlt-train detectnet_v2 -k tlt_encode \

-e /home/david/masks/face-mask-detection/tlt_specs/detectnet_v2_train_resnet34_kitti.txt
-n resnet34_detector --gpus 1
-r /home/david/masks/models/experiment_dir_unpruned
Using TensorFlow backend.
2020-10-27 20:09:02.855926: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0


The library attempted to open the following supporting CUDA libraries,
but each of them failed. CUDA-aware support is disabled.
libcuda.so.1: cannot open shared object file: No such file or directory
libcuda.dylib: cannot open shared object file: No such file or directory
/usr/lib64/libcuda.so.1: cannot open shared object file: No such file or directory
/usr/lib64/libcuda.dylib: cannot open shared object file: No such file or directory
If you are not interested in CUDA-aware support, then run with
–mca mpi_cuda_support 0 to suppress this message. If you are interested
in CUDA-aware support, then try setting LD_LIBRARY_PATH to the location
of libcuda.so.1 to get passed this issue.

2020-10-27 20:09:06.194523: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library ‘libcuda.so.1’; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/extras/CUPTI/lib64:/usr/local/lib:/usr/local/nvidia/lib:/usr/local/nvidia/lib64
2020-10-27 20:09:06.194562: E tensorflow/stream_executor/cuda/cuda_driver.cc:318] failed call to cuInit: UNKNOWN ERROR (303)
2020-10-27 20:09:06.194604: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:163] no NVIDIA GPU device is present: /dev/nvidia0 does not exist
2020-10-27 20:09:06,195 [INFO] iva.detectnet_v2.scripts.train: Loading experiment spec at /home/david/masks/face-mask-detection/tlt_specs/detectnet_v2_train_resnet34_kitti.txt.
2020-10-27 20:09:06,197 [INFO] iva.detectnet_v2.spec_handler.spec_loader: Merging specification from /home/david/masks/face-mask-detection/tlt_specs/detectnet_v2_train_resnet34_kitti.txt
2020-10-27 20:09:06,367 [INFO] iva.detectnet_v2.scripts.train: Cannot iterate over exactly 598 samples with a batch size of 24; each epoch will therefore take one extra step.


Layer (type) Output Shape Param # Connected to

input_1 (InputLayer) (None, 3, 544, 960) 0


conv1 (Conv2D) (None, 64, 272, 480) 9472 input_1[0][0]


bn_conv1 (BatchNormalization) (None, 64, 272, 480) 256 conv1[0][0]


activation_1 (Activation) (None, 64, 272, 480) 0 bn_conv1[0][0]


block_1a_conv_1 (Conv2D) (None, 64, 136, 240) 36928 activation_1[0][0]


block_1a_bn_1 (BatchNormalizati (None, 64, 136, 240) 256 block_1a_conv_1[0][0]


block_1a_relu_1 (Activation) (None, 64, 136, 240) 0 block_1a_bn_1[0][0]


block_1a_conv_2 (Conv2D) (None, 64, 136, 240) 36928 block_1a_relu_1[0][0]


block_1a_conv_shortcut (Conv2D) (None, 64, 136, 240) 4160 activation_1[0][0]


block_1a_bn_2 (BatchNormalizati (None, 64, 136, 240) 256 block_1a_conv_2[0][0]


block_1a_bn_shortcut (BatchNorm (None, 64, 136, 240) 256 block_1a_conv_shortcut[0][0]


add_1 (Add) (None, 64, 136, 240) 0 block_1a_bn_2[0][0]
block_1a_bn_shortcut[0][0]


block_1a_relu (Activation) (None, 64, 136, 240) 0 add_1[0][0]


block_1b_conv_1 (Conv2D) (None, 64, 136, 240) 36928 block_1a_relu[0][0]


block_1b_bn_1 (BatchNormalizati (None, 64, 136, 240) 256 block_1b_conv_1[0][0]


block_1b_relu_1 (Activation) (None, 64, 136, 240) 0 block_1b_bn_1[0][0]


block_1b_conv_2 (Conv2D) (None, 64, 136, 240) 36928 block_1b_relu_1[0][0]


block_1b_bn_2 (BatchNormalizati (None, 64, 136, 240) 256 block_1b_conv_2[0][0]


add_2 (Add) (None, 64, 136, 240) 0 block_1b_bn_2[0][0]
block_1a_relu[0][0]


block_1b_relu (Activation) (None, 64, 136, 240) 0 add_2[0][0]


block_1c_conv_1 (Conv2D) (None, 64, 136, 240) 36928 block_1b_relu[0][0]


block_1c_bn_1 (BatchNormalizati (None, 64, 136, 240) 256 block_1c_conv_1[0][0]


block_1c_relu_1 (Activation) (None, 64, 136, 240) 0 block_1c_bn_1[0][0]


block_1c_conv_2 (Conv2D) (None, 64, 136, 240) 36928 block_1c_relu_1[0][0]


block_1c_bn_2 (BatchNormalizati (None, 64, 136, 240) 256 block_1c_conv_2[0][0]


add_3 (Add) (None, 64, 136, 240) 0 block_1c_bn_2[0][0]
block_1b_relu[0][0]


block_1c_relu (Activation) (None, 64, 136, 240) 0 add_3[0][0]


block_2a_conv_1 (Conv2D) (None, 128, 68, 120) 73856 block_1c_relu[0][0]


block_2a_bn_1 (BatchNormalizati (None, 128, 68, 120) 512 block_2a_conv_1[0][0]


block_2a_relu_1 (Activation) (None, 128, 68, 120) 0 block_2a_bn_1[0][0]


block_2a_conv_2 (Conv2D) (None, 128, 68, 120) 147584 block_2a_relu_1[0][0]


block_2a_conv_shortcut (Conv2D) (None, 128, 68, 120) 8320 block_1c_relu[0][0]


block_2a_bn_2 (BatchNormalizati (None, 128, 68, 120) 512 block_2a_conv_2[0][0]


block_2a_bn_shortcut (BatchNorm (None, 128, 68, 120) 512 block_2a_conv_shortcut[0][0]


add_4 (Add) (None, 128, 68, 120) 0 block_2a_bn_2[0][0]
block_2a_bn_shortcut[0][0]


block_2a_relu (Activation) (None, 128, 68, 120) 0 add_4[0][0]


block_2b_conv_1 (Conv2D) (None, 128, 68, 120) 147584 block_2a_relu[0][0]


block_2b_bn_1 (BatchNormalizati (None, 128, 68, 120) 512 block_2b_conv_1[0][0]


block_2b_relu_1 (Activation) (None, 128, 68, 120) 0 block_2b_bn_1[0][0]


block_2b_conv_2 (Conv2D) (None, 128, 68, 120) 147584 block_2b_relu_1[0][0]


block_2b_bn_2 (BatchNormalizati (None, 128, 68, 120) 512 block_2b_conv_2[0][0]


add_5 (Add) (None, 128, 68, 120) 0 block_2b_bn_2[0][0]
block_2a_relu[0][0]


block_2b_relu (Activation) (None, 128, 68, 120) 0 add_5[0][0]


block_2c_conv_1 (Conv2D) (None, 128, 68, 120) 147584 block_2b_relu[0][0]


block_2c_bn_1 (BatchNormalizati (None, 128, 68, 120) 512 block_2c_conv_1[0][0]


block_2c_relu_1 (Activation) (None, 128, 68, 120) 0 block_2c_bn_1[0][0]


block_2c_conv_2 (Conv2D) (None, 128, 68, 120) 147584 block_2c_relu_1[0][0]


block_2c_bn_2 (BatchNormalizati (None, 128, 68, 120) 512 block_2c_conv_2[0][0]


add_6 (Add) (None, 128, 68, 120) 0 block_2c_bn_2[0][0]
block_2b_relu[0][0]


block_2c_relu (Activation) (None, 128, 68, 120) 0 add_6[0][0]


block_2d_conv_1 (Conv2D) (None, 128, 68, 120) 147584 block_2c_relu[0][0]


block_2d_bn_1 (BatchNormalizati (None, 128, 68, 120) 512 block_2d_conv_1[0][0]


block_2d_relu_1 (Activation) (None, 128, 68, 120) 0 block_2d_bn_1[0][0]


block_2d_conv_2 (Conv2D) (None, 128, 68, 120) 147584 block_2d_relu_1[0][0]


block_2d_bn_2 (BatchNormalizati (None, 128, 68, 120) 512 block_2d_conv_2[0][0]


add_7 (Add) (None, 128, 68, 120) 0 block_2d_bn_2[0][0]
block_2c_relu[0][0]


block_2d_relu (Activation) (None, 128, 68, 120) 0 add_7[0][0]


block_3a_conv_1 (Conv2D) (None, 256, 34, 60) 295168 block_2d_relu[0][0]


block_3a_bn_1 (BatchNormalizati (None, 256, 34, 60) 1024 block_3a_conv_1[0][0]


block_3a_relu_1 (Activation) (None, 256, 34, 60) 0 block_3a_bn_1[0][0]


block_3a_conv_2 (Conv2D) (None, 256, 34, 60) 590080 block_3a_relu_1[0][0]


block_3a_conv_shortcut (Conv2D) (None, 256, 34, 60) 33024 block_2d_relu[0][0]


block_3a_bn_2 (BatchNormalizati (None, 256, 34, 60) 1024 block_3a_conv_2[0][0]


block_3a_bn_shortcut (BatchNorm (None, 256, 34, 60) 1024 block_3a_conv_shortcut[0][0]


add_8 (Add) (None, 256, 34, 60) 0 block_3a_bn_2[0][0]
block_3a_bn_shortcut[0][0]


block_3a_relu (Activation) (None, 256, 34, 60) 0 add_8[0][0]


block_3b_conv_1 (Conv2D) (None, 256, 34, 60) 590080 block_3a_relu[0][0]


block_3b_bn_1 (BatchNormalizati (None, 256, 34, 60) 1024 block_3b_conv_1[0][0]


block_3b_relu_1 (Activation) (None, 256, 34, 60) 0 block_3b_bn_1[0][0]


block_3b_conv_2 (Conv2D) (None, 256, 34, 60) 590080 block_3b_relu_1[0][0]


block_3b_bn_2 (BatchNormalizati (None, 256, 34, 60) 1024 block_3b_conv_2[0][0]


add_9 (Add) (None, 256, 34, 60) 0 block_3b_bn_2[0][0]
block_3a_relu[0][0]


block_3b_relu (Activation) (None, 256, 34, 60) 0 add_9[0][0]


block_3c_conv_1 (Conv2D) (None, 256, 34, 60) 590080 block_3b_relu[0][0]


block_3c_bn_1 (BatchNormalizati (None, 256, 34, 60) 1024 block_3c_conv_1[0][0]


block_3c_relu_1 (Activation) (None, 256, 34, 60) 0 block_3c_bn_1[0][0]


block_3c_conv_2 (Conv2D) (None, 256, 34, 60) 590080 block_3c_relu_1[0][0]


block_3c_bn_2 (BatchNormalizati (None, 256, 34, 60) 1024 block_3c_conv_2[0][0]


add_10 (Add) (None, 256, 34, 60) 0 block_3c_bn_2[0][0]
block_3b_relu[0][0]


block_3c_relu (Activation) (None, 256, 34, 60) 0 add_10[0][0]


block_3d_conv_1 (Conv2D) (None, 256, 34, 60) 590080 block_3c_relu[0][0]


block_3d_bn_1 (BatchNormalizati (None, 256, 34, 60) 1024 block_3d_conv_1[0][0]


block_3d_relu_1 (Activation) (None, 256, 34, 60) 0 block_3d_bn_1[0][0]


block_3d_conv_2 (Conv2D) (None, 256, 34, 60) 590080 block_3d_relu_1[0][0]


block_3d_bn_2 (BatchNormalizati (None, 256, 34, 60) 1024 block_3d_conv_2[0][0]


add_11 (Add) (None, 256, 34, 60) 0 block_3d_bn_2[0][0]
block_3c_relu[0][0]


block_3d_relu (Activation) (None, 256, 34, 60) 0 add_11[0][0]


block_3e_conv_1 (Conv2D) (None, 256, 34, 60) 590080 block_3d_relu[0][0]


block_3e_bn_1 (BatchNormalizati (None, 256, 34, 60) 1024 block_3e_conv_1[0][0]


block_3e_relu_1 (Activation) (None, 256, 34, 60) 0 block_3e_bn_1[0][0]


block_3e_conv_2 (Conv2D) (None, 256, 34, 60) 590080 block_3e_relu_1[0][0]


block_3e_bn_2 (BatchNormalizati (None, 256, 34, 60) 1024 block_3e_conv_2[0][0]


add_12 (Add) (None, 256, 34, 60) 0 block_3e_bn_2[0][0]
block_3d_relu[0][0]


block_3e_relu (Activation) (None, 256, 34, 60) 0 add_12[0][0]


block_3f_conv_1 (Conv2D) (None, 256, 34, 60) 590080 block_3e_relu[0][0]


block_3f_bn_1 (BatchNormalizati (None, 256, 34, 60) 1024 block_3f_conv_1[0][0]


block_3f_relu_1 (Activation) (None, 256, 34, 60) 0 block_3f_bn_1[0][0]


block_3f_conv_2 (Conv2D) (None, 256, 34, 60) 590080 block_3f_relu_1[0][0]


block_3f_bn_2 (BatchNormalizati (None, 256, 34, 60) 1024 block_3f_conv_2[0][0]


add_13 (Add) (None, 256, 34, 60) 0 block_3f_bn_2[0][0]
block_3e_relu[0][0]


block_3f_relu (Activation) (None, 256, 34, 60) 0 add_13[0][0]


block_4a_conv_1 (Conv2D) (None, 512, 34, 60) 1180160 block_3f_relu[0][0]


block_4a_bn_1 (BatchNormalizati (None, 512, 34, 60) 2048 block_4a_conv_1[0][0]


block_4a_relu_1 (Activation) (None, 512, 34, 60) 0 block_4a_bn_1[0][0]


block_4a_conv_2 (Conv2D) (None, 512, 34, 60) 2359808 block_4a_relu_1[0][0]


block_4a_conv_shortcut (Conv2D) (None, 512, 34, 60) 131584 block_3f_relu[0][0]


block_4a_bn_2 (BatchNormalizati (None, 512, 34, 60) 2048 block_4a_conv_2[0][0]


block_4a_bn_shortcut (BatchNorm (None, 512, 34, 60) 2048 block_4a_conv_shortcut[0][0]


add_14 (Add) (None, 512, 34, 60) 0 block_4a_bn_2[0][0]
block_4a_bn_shortcut[0][0]


block_4a_relu (Activation) (None, 512, 34, 60) 0 add_14[0][0]


block_4b_conv_1 (Conv2D) (None, 512, 34, 60) 2359808 block_4a_relu[0][0]


block_4b_bn_1 (BatchNormalizati (None, 512, 34, 60) 2048 block_4b_conv_1[0][0]


block_4b_relu_1 (Activation) (None, 512, 34, 60) 0 block_4b_bn_1[0][0]


block_4b_conv_2 (Conv2D) (None, 512, 34, 60) 2359808 block_4b_relu_1[0][0]


block_4b_bn_2 (BatchNormalizati (None, 512, 34, 60) 2048 block_4b_conv_2[0][0]


add_15 (Add) (None, 512, 34, 60) 0 block_4b_bn_2[0][0]
block_4a_relu[0][0]


block_4b_relu (Activation) (None, 512, 34, 60) 0 add_15[0][0]


block_4c_conv_1 (Conv2D) (None, 512, 34, 60) 2359808 block_4b_relu[0][0]


block_4c_bn_1 (BatchNormalizati (None, 512, 34, 60) 2048 block_4c_conv_1[0][0]


block_4c_relu_1 (Activation) (None, 512, 34, 60) 0 block_4c_bn_1[0][0]


block_4c_conv_2 (Conv2D) (None, 512, 34, 60) 2359808 block_4c_relu_1[0][0]


block_4c_bn_2 (BatchNormalizati (None, 512, 34, 60) 2048 block_4c_conv_2[0][0]


add_16 (Add) (None, 512, 34, 60) 0 block_4c_bn_2[0][0]
block_4b_relu[0][0]


block_4c_relu (Activation) (None, 512, 34, 60) 0 add_16[0][0]


output_bbox (Conv2D) (None, 8, 34, 60) 4104 block_4c_relu[0][0]


output_cov (Conv2D) (None, 2, 34, 60) 1026 block_4c_relu[0][0]

Total params: 21,319,754
Trainable params: 21,302,602
Non-trainable params: 17,152


2020-10-27 20:09:24,341 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: Serial augmentation enabled = False
2020-10-27 20:09:24,342 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: Pseudo sharding enabled = False
2020-10-27 20:09:24,342 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: Max Image Dimensions (all sources): (0, 0)
2020-10-27 20:09:24,342 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: number of cpus: 8, io threads: 16, compute threads: 8, buffered batches: 4
2020-10-27 20:09:24,342 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: total dataset size 598, number of sources: 1, batch size per gpu: 24, steps: 25
2020-10-27 20:09:24,510 [INFO] iva.detectnet_v2.dataloader.default_dataloader: Bounding box coordinates were detected in the input specification! Bboxes will be automatically converted to polygon coordinates.
2020-10-27 20:09:24,844 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: shuffle: True - shard 0 of 1
2020-10-27 20:09:24,856 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: sampling 1 datasets with weights:
2020-10-27 20:09:24,857 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: source: 0 weight: 1.000000
2020-10-27 20:09:25.254545: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-10-27 20:09:25,524 [INFO] iva.detectnet_v2.scripts.train: Found 598 samples in training set
2020-10-27 20:09:29,885 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: Serial augmentation enabled = False
2020-10-27 20:09:29,885 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: Pseudo sharding enabled = False
2020-10-27 20:09:29,885 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: Max Image Dimensions (all sources): (0, 0)
2020-10-27 20:09:29,885 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: number of cpus: 8, io threads: 16, compute threads: 8, buffered batches: 4
2020-10-27 20:09:29,885 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: total dataset size 149, number of sources: 1, batch size per gpu: 24, steps: 7
2020-10-27 20:09:29,929 [INFO] iva.detectnet_v2.dataloader.default_dataloader: Bounding box coordinates were detected in the input specification! Bboxes will be automatically converted to polygon coordinates.
2020-10-27 20:09:30,247 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: shuffle: False - shard 0 of 1
2020-10-27 20:09:30,256 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: sampling 1 datasets with weights:
2020-10-27 20:09:30,257 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: source: 0 weight: 1.000000
2020-10-27 20:09:30,691 [INFO] iva.detectnet_v2.scripts.train: Found 149 samples in validation set
2020-10-27 20:10:42.066938: E tensorflow/core/common_runtime/executor.cc:648] Executor failed to create kernel. Invalid argument: Conv2DCustomBackpropInputOp only supports NHWC.
[[{{node gradients/resnet34_nopool_bn_detectnet_v2/output_bbox/convolution_grad/Conv2DBackpropInput}}]]
Traceback (most recent call last):
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py”, line 1365, in _do_call
return fn(*args)
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py”, line 1350, in _run_fn
target_list, run_metadata)
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py”, line 1443, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Conv2DCustomBackpropInputOp only supports NHWC.
[[{{node gradients/resnet34_nopool_bn_detectnet_v2/output_bbox/convolution_grad/Conv2DBackpropInput}}]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “/usr/local/bin/tlt-train-g1”, line 8, in
sys.exit(main())
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/common/magnet_train.py”, line 55, in main
File “”, line 2, in main
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/utilities/timer.py”, line 46, in wrapped_fn
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/scripts/train.py”, line 773, in main
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/scripts/train.py”, line 691, in run_experiment
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/scripts/train.py”, line 624, in train_gridbox
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/scripts/train.py”, line 149, in run_training_loop
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/monitored_session.py”, line 754, in run
run_metadata=run_metadata)
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/monitored_session.py”, line 1360, in run
raise six.reraise(*original_exc_info)
File “/usr/local/lib/python3.6/dist-packages/six.py”, line 693, in reraise
raise value
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/monitored_session.py”, line 1345, in run
return self._sess.run(*args, **kwargs)
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/monitored_session.py”, line 1418, in run
run_metadata=run_metadata)
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/monitored_session.py”, line 1176, in run
return self._sess.run(*args, **kwargs)
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py”, line 956, in run
run_metadata_ptr)
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py”, line 1180, in _run
feed_dict_tensor, options, run_metadata)
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py”, line 1359, in _do_run
run_metadata)
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py”, line 1384, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Conv2DCustomBackpropInputOp only supports NHWC.
[[node gradients/resnet34_nopool_bn_detectnet_v2/output_bbox/convolution_grad/Conv2DBackpropInput (defined at /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py:1748) ]]

Original stack trace for ‘gradients/resnet34_nopool_bn_detectnet_v2/output_bbox/convolution_grad/Conv2DBackpropInput’:
File “/usr/local/bin/tlt-train-g1”, line 8, in
sys.exit(main())
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/common/magnet_train.py”, line 55, in main
File “”, line 2, in main
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/utilities/timer.py”, line 46, in wrapped_fn
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/scripts/train.py”, line 773, in main
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/scripts/train.py”, line 691, in run_experiment
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/scripts/train.py”, line 599, in train_gridbox
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/scripts/train.py”, line 454, in build_training_graph
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/model/detectnet_model.py”, line 583, in build_training_graph
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/training/train_op_generator.py”, line 59, in get_train_op
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/training/train_op_generator.py”, line 74, in _get_train_op_without_cost_scaling
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/optimizer.py”, line 419, in minimize
grad_loss=grad_loss)
File “/usr/local/lib/python3.6/dist-packages/horovod/tensorflow/init.py”, line 253, in compute_gradients
gradients = self._optimizer.compute_gradients(*args, **kwargs)
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/optimizer.py”, line 537, in compute_gradients
colocate_gradients_with_ops=colocate_gradients_with_ops)
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/gradients_impl.py”, line 158, in gradients
unconnected_gradients)
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/gradients_util.py”, line 703, in _GradientsHelper
lambda: grad_fn(op, *out_grads))
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/gradients_util.py”, line 362, in _MaybeCompile
return grad_fn() # Exit early
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/gradients_util.py”, line 703, in
lambda: grad_fn(op, *out_grads))
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/nn_grad.py”, line 596, in _Conv2DGrad
data_format=data_format),
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/gen_nn_ops.py”, line 1407, in conv2d_backprop_input
name=name)
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/op_def_library.py”, line 794, in _apply_op_helper
op_def=op_def)
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/util/deprecation.py”, line 507, in new_func
return func(*args, **kwargs)
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py”, line 3357, in create_op
attrs, op_def, compute_device)
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py”, line 3426, in _create_op_internal
op_def=op_def)
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py”, line 1748, in init
self._traceback = tf_stack.extract_stack()

…which was originally created as op ‘resnet34_nopool_bn_detectnet_v2/output_bbox/convolution’, defined at:
File “/usr/local/bin/tlt-train-g1”, line 8, in
sys.exit(main())
[elided 6 identical lines from previous traceback]
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/scripts/train.py”, line 454, in build_training_graph
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/model/detectnet_model.py”, line 557, in build_training_graph
File “/usr/local/lib/python3.6/dist-packages/keras/engine/base_layer.py”, line 457, in call
output = self.call(inputs, **kwargs)
File “/usr/local/lib/python3.6/dist-packages/keras/engine/network.py”, line 564, in call
output_tensors, _, _ = self.run_internal_graph(inputs, masks)
File “/usr/local/lib/python3.6/dist-packages/keras/engine/network.py”, line 721, in run_internal_graph
layer.call(computed_tensor, **kwargs))
File “/usr/local/lib/python3.6/dist-packages/keras/layers/convolutional.py”, line 171, in call
dilation_rate=self.dilation_rate)
File “/opt/nvidia/third_party/keras/tensorflow_backend.py”, line 102, in conv2d
data_format=tf_data_format,
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/nn_ops.py”, line 921, in convolution
name=name)
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/nn_ops.py”, line 1032, in convolution_internal
name=name)
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/gen_nn_ops.py”, line 1071, in conv2d
data_format=data_format, dilations=dilations, name=name)
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/op_def_library.py”, line 794, in _apply_op_helper
op_def=op_def)
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/util/deprecation.py”, line 507, in new_func
return func(*args, **kwargs)
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py”, line 3357, in create_op
attrs, op_def, compute_device)
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py”, line 3426, in _create_op_internal
op_def=op_def)
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py”, line 1748, in init
self._traceback = tf_stack.extract_stack()

root@2b89b661a973:/workspace#

Hi @digitaldave
Please check your hardware and software meet the requirement mentioned in tlt user guide. https://docs.nvidia.com/metropolis/TLT/tlt-getting-started-guide/text/requirements_and_installation.html#hardware-requirements

For the error, refer to


Morgan – Thanks for your reply, but I don’t think it addresses my issue. My system meets all the requirements in the link you sent. And the two other solutions refer to using a faster GPU (which is NOT specifically required in the baseline specs). What I need to do is either force the tutorial to use my GPU (even though it is < 5.2 Compute Index), or figure out how to modify the scripts to prepare the data so it can be processed by the CPU. Thanks! – David

Could you run below firstly? During your log , there is some error for CUDA lib.
$ nvidia-smi

Morgan – Certainly:

david@nano-dev:~$ nvidia-smi
Tue Oct 27 20:41:08 2020
±-----------------------------------------------------+
| NVIDIA-SMI 340.108 Driver Version: 340.108 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GT 750M Off | 0000:02:00.0 N/A | N/A |
| N/A 56C P0 N/A / N/A | 831MiB / 2047MiB | N/A Default |
±------------------------------±---------------------±---------------------+

±----------------------------------------------------------------------------+
| Compute processes: GPU Memory |
| GPU PID Process name Usage |
|=============================================================================|
| 0 Not Supported |
±----------------------------------------------------------------------------+
david@nano-dev:~$ ^

I am afraid the NVIDIA GPU driver does not meet requirement.

TLT has the following software requirements:

Morgan – Ah, okay. I think there is a newer one available I can try. – David