Using TensorFlow backend. 2021-02-15 17:37:51.401519: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.11.0 WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them. 2021-02-15 17:37:53.737693: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcuda.so.1 2021-02-15 17:37:53.743033: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-02-15 17:37:53.743356: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Found device 0 with properties: name: GeForce RTX 2060 major: 7 minor: 5 memoryClockRate(GHz): 1.2 pciBusID: 0000:01:00.0 2021-02-15 17:37:53.743373: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.11.0 2021-02-15 17:37:53.743435: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.11 2021-02-15 17:37:53.744256: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10 2021-02-15 17:37:53.744434: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10 2021-02-15 17:37:53.746402: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10 2021-02-15 17:37:53.746862: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.11 2021-02-15 17:37:53.746895: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.8 2021-02-15 17:37:53.746954: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-02-15 17:37:53.747286: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-02-15 17:37:53.747651: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1767] Adding visible gpu devices: 0 2021-02-15 17:37:53.747675: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.11.0 2021-02-15 17:37:54.124044: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1180] Device interconnect StreamExecutor with strength 1 edge matrix: 2021-02-15 17:37:54.124077: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1186] 0 2021-02-15 17:37:54.124085: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1199] 0: N 2021-02-15 17:37:54.124250: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-02-15 17:37:54.124585: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-02-15 17:37:54.124874: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-02-15 17:37:54.125185: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1325] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4394 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2060, pci bus id: 0000:01:00.0, compute capability: 7.5) 2021-02-15 17:37:54,125 [INFO] iva.detectnet_v2.scripts.train: Loading experiment spec at tlt_specs/detectnet_v2_train_resnet18_kitti_WORK.txt. 2021-02-15 17:37:54,127 [INFO] iva.detectnet_v2.spec_handler.spec_loader: Merging specification from tlt_specs/detectnet_v2_train_resnet18_kitti_WORK.txt __________________________________________________________________________________________________ Layer (type) Output Shape Param # Connected to ================================================================================================== input_1 (InputLayer) (None, 3, 544, 960) 0 __________________________________________________________________________________________________ conv1 (Conv2D) (None, 64, 272, 480) 9472 input_1[0][0] __________________________________________________________________________________________________ bn_conv1 (BatchNormalization) (None, 64, 272, 480) 256 conv1[0][0] __________________________________________________________________________________________________ activation_1 (Activation) (None, 64, 272, 480) 0 bn_conv1[0][0] __________________________________________________________________________________________________ block_1a_conv_1 (Conv2D) (None, 64, 136, 240) 36928 activation_1[0][0] __________________________________________________________________________________________________ block_1a_bn_1 (BatchNormalizati (None, 64, 136, 240) 256 block_1a_conv_1[0][0] __________________________________________________________________________________________________ block_1a_relu_1 (Activation) (None, 64, 136, 240) 0 block_1a_bn_1[0][0] __________________________________________________________________________________________________ block_1a_conv_2 (Conv2D) (None, 64, 136, 240) 36928 block_1a_relu_1[0][0] __________________________________________________________________________________________________ block_1a_conv_shortcut (Conv2D) (None, 64, 136, 240) 4160 activation_1[0][0] __________________________________________________________________________________________________ block_1a_bn_2 (BatchNormalizati (None, 64, 136, 240) 256 block_1a_conv_2[0][0] __________________________________________________________________________________________________ block_1a_bn_shortcut (BatchNorm (None, 64, 136, 240) 256 block_1a_conv_shortcut[0][0] __________________________________________________________________________________________________ add_1 (Add) (None, 64, 136, 240) 0 block_1a_bn_2[0][0] block_1a_bn_shortcut[0][0] __________________________________________________________________________________________________ block_1a_relu (Activation) (None, 64, 136, 240) 0 add_1[0][0] __________________________________________________________________________________________________ block_1b_conv_1 (Conv2D) (None, 64, 136, 240) 36928 block_1a_relu[0][0] __________________________________________________________________________________________________ block_1b_bn_1 (BatchNormalizati (None, 64, 136, 240) 256 block_1b_conv_1[0][0] __________________________________________________________________________________________________ block_1b_relu_1 (Activation) (None, 64, 136, 240) 0 block_1b_bn_1[0][0] __________________________________________________________________________________________________ block_1b_conv_2 (Conv2D) (None, 64, 136, 240) 36928 block_1b_relu_1[0][0] __________________________________________________________________________________________________ block_1b_bn_2 (BatchNormalizati (None, 64, 136, 240) 256 block_1b_conv_2[0][0] __________________________________________________________________________________________________ add_2 (Add) (None, 64, 136, 240) 0 block_1b_bn_2[0][0] block_1a_relu[0][0] __________________________________________________________________________________________________ block_1b_relu (Activation) (None, 64, 136, 240) 0 add_2[0][0] __________________________________________________________________________________________________ block_2a_conv_1 (Conv2D) (None, 128, 68, 120) 73856 block_1b_relu[0][0] __________________________________________________________________________________________________ block_2a_bn_1 (BatchNormalizati (None, 128, 68, 120) 512 block_2a_conv_1[0][0] __________________________________________________________________________________________________ block_2a_relu_1 (Activation) (None, 128, 68, 120) 0 block_2a_bn_1[0][0] __________________________________________________________________________________________________ block_2a_conv_2 (Conv2D) (None, 128, 68, 120) 147584 block_2a_relu_1[0][0] __________________________________________________________________________________________________ block_2a_conv_shortcut (Conv2D) (None, 128, 68, 120) 8320 block_1b_relu[0][0] __________________________________________________________________________________________________ block_2a_bn_2 (BatchNormalizati (None, 128, 68, 120) 512 block_2a_conv_2[0][0] __________________________________________________________________________________________________ block_2a_bn_shortcut (BatchNorm (None, 128, 68, 120) 512 block_2a_conv_shortcut[0][0] __________________________________________________________________________________________________ add_3 (Add) (None, 128, 68, 120) 0 block_2a_bn_2[0][0] block_2a_bn_shortcut[0][0] __________________________________________________________________________________________________ block_2a_relu (Activation) (None, 128, 68, 120) 0 add_3[0][0] __________________________________________________________________________________________________ block_2b_conv_1 (Conv2D) (None, 128, 68, 120) 147584 block_2a_relu[0][0] __________________________________________________________________________________________________ block_2b_bn_1 (BatchNormalizati (None, 128, 68, 120) 512 block_2b_conv_1[0][0] __________________________________________________________________________________________________ block_2b_relu_1 (Activation) (None, 128, 68, 120) 0 block_2b_bn_1[0][0] __________________________________________________________________________________________________ block_2b_conv_2 (Conv2D) (None, 128, 68, 120) 147584 block_2b_relu_1[0][0] __________________________________________________________________________________________________ block_2b_bn_2 (BatchNormalizati (None, 128, 68, 120) 512 block_2b_conv_2[0][0] __________________________________________________________________________________________________ add_4 (Add) (None, 128, 68, 120) 0 block_2b_bn_2[0][0] block_2a_relu[0][0] __________________________________________________________________________________________________ block_2b_relu (Activation) (None, 128, 68, 120) 0 add_4[0][0] __________________________________________________________________________________________________ block_3a_conv_1 (Conv2D) (None, 256, 34, 60) 295168 block_2b_relu[0][0] __________________________________________________________________________________________________ block_3a_bn_1 (BatchNormalizati (None, 256, 34, 60) 1024 block_3a_conv_1[0][0] __________________________________________________________________________________________________ block_3a_relu_1 (Activation) (None, 256, 34, 60) 0 block_3a_bn_1[0][0] __________________________________________________________________________________________________ block_3a_conv_2 (Conv2D) (None, 256, 34, 60) 590080 block_3a_relu_1[0][0] __________________________________________________________________________________________________ block_3a_conv_shortcut (Conv2D) (None, 256, 34, 60) 33024 block_2b_relu[0][0] __________________________________________________________________________________________________ block_3a_bn_2 (BatchNormalizati (None, 256, 34, 60) 1024 block_3a_conv_2[0][0] __________________________________________________________________________________________________ block_3a_bn_shortcut (BatchNorm (None, 256, 34, 60) 1024 block_3a_conv_shortcut[0][0] __________________________________________________________________________________________________ add_5 (Add) (None, 256, 34, 60) 0 block_3a_bn_2[0][0] block_3a_bn_shortcut[0][0] __________________________________________________________________________________________________ block_3a_relu (Activation) (None, 256, 34, 60) 0 add_5[0][0] __________________________________________________________________________________________________ block_3b_conv_1 (Conv2D) (None, 256, 34, 60) 590080 block_3a_relu[0][0] __________________________________________________________________________________________________ block_3b_bn_1 (BatchNormalizati (None, 256, 34, 60) 1024 block_3b_conv_1[0][0] __________________________________________________________________________________________________ block_3b_relu_1 (Activation) (None, 256, 34, 60) 0 block_3b_bn_1[0][0] __________________________________________________________________________________________________ block_3b_conv_2 (Conv2D) (None, 256, 34, 60) 590080 block_3b_relu_1[0][0] __________________________________________________________________________________________________ block_3b_bn_2 (BatchNormalizati (None, 256, 34, 60) 1024 block_3b_conv_2[0][0] __________________________________________________________________________________________________ add_6 (Add) (None, 256, 34, 60) 0 block_3b_bn_2[0][0] block_3a_relu[0][0] __________________________________________________________________________________________________ block_3b_relu (Activation) (None, 256, 34, 60) 0 add_6[0][0] __________________________________________________________________________________________________ block_4a_conv_1 (Conv2D) (None, 512, 34, 60) 1180160 block_3b_relu[0][0] __________________________________________________________________________________________________ block_4a_bn_1 (BatchNormalizati (None, 512, 34, 60) 2048 block_4a_conv_1[0][0] __________________________________________________________________________________________________ block_4a_relu_1 (Activation) (None, 512, 34, 60) 0 block_4a_bn_1[0][0] __________________________________________________________________________________________________ block_4a_conv_2 (Conv2D) (None, 512, 34, 60) 2359808 block_4a_relu_1[0][0] __________________________________________________________________________________________________ block_4a_conv_shortcut (Conv2D) (None, 512, 34, 60) 131584 block_3b_relu[0][0] __________________________________________________________________________________________________ block_4a_bn_2 (BatchNormalizati (None, 512, 34, 60) 2048 block_4a_conv_2[0][0] __________________________________________________________________________________________________ block_4a_bn_shortcut (BatchNorm (None, 512, 34, 60) 2048 block_4a_conv_shortcut[0][0] __________________________________________________________________________________________________ add_7 (Add) (None, 512, 34, 60) 0 block_4a_bn_2[0][0] block_4a_bn_shortcut[0][0] __________________________________________________________________________________________________ block_4a_relu (Activation) (None, 512, 34, 60) 0 add_7[0][0] __________________________________________________________________________________________________ block_4b_conv_1 (Conv2D) (None, 512, 34, 60) 2359808 block_4a_relu[0][0] __________________________________________________________________________________________________ block_4b_bn_1 (BatchNormalizati (None, 512, 34, 60) 2048 block_4b_conv_1[0][0] __________________________________________________________________________________________________ block_4b_relu_1 (Activation) (None, 512, 34, 60) 0 block_4b_bn_1[0][0] __________________________________________________________________________________________________ block_4b_conv_2 (Conv2D) (None, 512, 34, 60) 2359808 block_4b_relu_1[0][0] __________________________________________________________________________________________________ block_4b_bn_2 (BatchNormalizati (None, 512, 34, 60) 2048 block_4b_conv_2[0][0] __________________________________________________________________________________________________ add_8 (Add) (None, 512, 34, 60) 0 block_4b_bn_2[0][0] block_4a_relu[0][0] __________________________________________________________________________________________________ block_4b_relu (Activation) (None, 512, 34, 60) 0 add_8[0][0] __________________________________________________________________________________________________ output_bbox (Conv2D) (None, 12, 34, 60) 6156 block_4b_relu[0][0] __________________________________________________________________________________________________ output_cov (Conv2D) (None, 3, 34, 60) 1539 block_4b_relu[0][0] ================================================================================================== Total params: 11,203,023 Trainable params: 8,408,591 Non-trainable params: 2,794,432 __________________________________________________________________________________________________ 2021-02-15 17:38:02,089 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: Serial augmentation enabled = False 2021-02-15 17:38:02,089 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: Pseudo sharding enabled = False 2021-02-15 17:38:02,089 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: Max Image Dimensions (all sources): (0, 0) 2021-02-15 17:38:02,090 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: number of cpus: 12, io threads: 24, compute threads: 12, buffered batches: 4 2021-02-15 17:38:02,090 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: total dataset size 5705, number of sources: 1, batch size per gpu: 1, steps: 5705 2021-02-15 17:38:02,191 [INFO] iva.detectnet_v2.dataloader.default_dataloader: Bounding box coordinates were detected in the input specification! Bboxes will be automatically converted to polygon coordinates. 2021-02-15 17:38:02.219648: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-02-15 17:38:02.220042: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Found device 0 with properties: name: GeForce RTX 2060 major: 7 minor: 5 memoryClockRate(GHz): 1.2 pciBusID: 0000:01:00.0 2021-02-15 17:38:02.220070: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.11.0 2021-02-15 17:38:02.220104: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.11 2021-02-15 17:38:02.220129: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10 2021-02-15 17:38:02.220149: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10 2021-02-15 17:38:02.220168: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10 2021-02-15 17:38:02.220187: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.11 2021-02-15 17:38:02.220300: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.8 2021-02-15 17:38:02.220370: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-02-15 17:38:02.220828: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-02-15 17:38:02.221137: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1767] Adding visible gpu devices: 0 2021-02-15 17:38:02,426 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: shuffle: True - shard 0 of 1 2021-02-15 17:38:02,432 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: sampling 1 datasets with weights: 2021-02-15 17:38:02,432 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: source: 0 weight: 1.000000 2021-02-15 17:38:02,790 [INFO] iva.detectnet_v2.scripts.train: Found 5705 samples in training set 2021-02-15 17:38:04,221 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: Serial augmentation enabled = False 2021-02-15 17:38:04,221 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: Pseudo sharding enabled = False 2021-02-15 17:38:04,221 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: Max Image Dimensions (all sources): (0, 0) 2021-02-15 17:38:04,221 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: number of cpus: 12, io threads: 24, compute threads: 12, buffered batches: 4 2021-02-15 17:38:04,222 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: total dataset size 1426, number of sources: 1, batch size per gpu: 1, steps: 1426 2021-02-15 17:38:04,248 [INFO] iva.detectnet_v2.dataloader.default_dataloader: Bounding box coordinates were detected in the input specification! Bboxes will be automatically converted to polygon coordinates. 2021-02-15 17:38:04,474 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: shuffle: False - shard 0 of 1 2021-02-15 17:38:04,479 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: sampling 1 datasets with weights: 2021-02-15 17:38:04,479 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: source: 0 weight: 1.000000 2021-02-15 17:38:04,789 [INFO] iva.detectnet_v2.scripts.train: Found 1426 samples in validation set 2021-02-15 17:38:08.505414: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-02-15 17:38:08.505888: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Found device 0 with properties: name: GeForce RTX 2060 major: 7 minor: 5 memoryClockRate(GHz): 1.2 pciBusID: 0000:01:00.0 2021-02-15 17:38:08.505945: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.11.0 2021-02-15 17:38:08.505985: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.11 2021-02-15 17:38:08.506041: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10 2021-02-15 17:38:08.506061: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10 2021-02-15 17:38:08.506078: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10 2021-02-15 17:38:08.506097: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.11 2021-02-15 17:38:08.506113: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.8 2021-02-15 17:38:08.506168: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-02-15 17:38:08.506448: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-02-15 17:38:08.506693: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1767] Adding visible gpu devices: 0 2021-02-15 17:38:08.507578: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1180] Device interconnect StreamExecutor with strength 1 edge matrix: 2021-02-15 17:38:08.507591: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1186] 0 2021-02-15 17:38:08.507598: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1199] 0: N 2021-02-15 17:38:08.507669: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-02-15 17:38:08.507963: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2021-02-15 17:38:08.508220: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1325] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4394 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2060, pci bus id: 0000:01:00.0, compute capability: 7.5) 2021-02-15 17:38:26.619581: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.11 2021-02-15 17:38:27.111446: I tensorflow/core/kernels/cuda_solvers.cc:159] Creating CudaSolver handles for stream 0x19a88160 2021-02-15 17:38:27.111560: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10 2021-02-15 17:38:27.648418: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.11 2021-02-15 17:38:27.649675: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.8 2021-02-15 17:38:29.782748: W tensorflow/core/framework/op_kernel.cc:1651] OP_REQUIRES failed at conv_grad_filter_ops.cc:1038 : Not found: No algorithm worked! Traceback (most recent call last): File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1365, in _do_call return fn(*args) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1350, in _run_fn target_list, run_metadata) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1443, in _call_tf_sessionrun run_metadata) tensorflow.python.framework.errors_impl.NotFoundError: No algorithm worked! [[{{node gradients/resnet18_nopool_bn_detectnet_v2/block_4b_conv_2/convolution_grad/Conv2DBackpropFilter}}]] During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/local/bin/tlt-train-g1", line 8, in sys.exit(main()) File "/home/obaba/.cache/dazel/_dazel_obaba/e56ee0dba0ec09ac4333617b53ded644/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/common/magnet_train.py", line 56, in main File "", line 2, in main File "/home/obaba/.cache/dazel/_dazel_obaba/e56ee0dba0ec09ac4333617b53ded644/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/utilities/timer.py", line 46, in wrapped_fn File "/home/obaba/.cache/dazel/_dazel_obaba/e56ee0dba0ec09ac4333617b53ded644/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/scripts/train.py", line 790, in main File "/home/obaba/.cache/dazel/_dazel_obaba/e56ee0dba0ec09ac4333617b53ded644/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/scripts/train.py", line 691, in run_experiment File "/home/obaba/.cache/dazel/_dazel_obaba/e56ee0dba0ec09ac4333617b53ded644/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/scripts/train.py", line 624, in train_gridbox File "/home/obaba/.cache/dazel/_dazel_obaba/e56ee0dba0ec09ac4333617b53ded644/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/scripts/train.py", line 149, in run_training_loop File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/monitored_session.py", line 754, in run run_metadata=run_metadata) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/monitored_session.py", line 1360, in run raise six.reraise(*original_exc_info) File "/usr/local/lib/python3.6/dist-packages/six.py", line 696, in reraise raise value File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/monitored_session.py", line 1345, in run return self._sess.run(*args, **kwargs) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/monitored_session.py", line 1418, in run run_metadata=run_metadata) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/monitored_session.py", line 1176, in run return self._sess.run(*args, **kwargs) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 956, in run run_metadata_ptr) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1180, in _run feed_dict_tensor, options, run_metadata) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1359, in _do_run run_metadata) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1384, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.NotFoundError: No algorithm worked! [[node gradients/resnet18_nopool_bn_detectnet_v2/block_4b_conv_2/convolution_grad/Conv2DBackpropFilter (defined at /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py:1748) ]] Original stack trace for 'gradients/resnet18_nopool_bn_detectnet_v2/block_4b_conv_2/convolution_grad/Conv2DBackpropFilter': File "/usr/local/bin/tlt-train-g1", line 8, in sys.exit(main()) File "/home/obaba/.cache/dazel/_dazel_obaba/e56ee0dba0ec09ac4333617b53ded644/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/common/magnet_train.py", line 56, in main File "", line 2, in main File "/home/obaba/.cache/dazel/_dazel_obaba/e56ee0dba0ec09ac4333617b53ded644/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/utilities/timer.py", line 46, in wrapped_fn File "/home/obaba/.cache/dazel/_dazel_obaba/e56ee0dba0ec09ac4333617b53ded644/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/scripts/train.py", line 790, in main File "/home/obaba/.cache/dazel/_dazel_obaba/e56ee0dba0ec09ac4333617b53ded644/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/scripts/train.py", line 691, in run_experiment File "/home/obaba/.cache/dazel/_dazel_obaba/e56ee0dba0ec09ac4333617b53ded644/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/scripts/train.py", line 599, in train_gridbox File "/home/obaba/.cache/dazel/_dazel_obaba/e56ee0dba0ec09ac4333617b53ded644/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/scripts/train.py", line 454, in build_training_graph File "/home/obaba/.cache/dazel/_dazel_obaba/e56ee0dba0ec09ac4333617b53ded644/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/model/detectnet_model.py", line 585, in build_training_graph File "/home/obaba/.cache/dazel/_dazel_obaba/e56ee0dba0ec09ac4333617b53ded644/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/training/train_op_generator.py", line 59, in get_train_op File "/home/obaba/.cache/dazel/_dazel_obaba/e56ee0dba0ec09ac4333617b53ded644/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/training/train_op_generator.py", line 74, in _get_train_op_without_cost_scaling File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/optimizer.py", line 419, in minimize grad_loss=grad_loss) File "/usr/local/lib/python3.6/dist-packages/horovod/tensorflow/__init__.py", line 253, in compute_gradients gradients = self._optimizer.compute_gradients(*args, **kwargs) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/optimizer.py", line 537, in compute_gradients colocate_gradients_with_ops=colocate_gradients_with_ops) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/gradients_impl.py", line 158, in gradients unconnected_gradients) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/gradients_util.py", line 703, in _GradientsHelper lambda: grad_fn(op, *out_grads)) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/gradients_util.py", line 362, in _MaybeCompile return grad_fn() # Exit early File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/gradients_util.py", line 703, in lambda: grad_fn(op, *out_grads)) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/nn_grad.py", line 606, in _Conv2DGrad data_format=data_format) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/gen_nn_ops.py", line 1238, in conv2d_backprop_filter name=name) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/op_def_library.py", line 794, in _apply_op_helper op_def=op_def) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/util/deprecation.py", line 513, in new_func return func(*args, **kwargs) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py", line 3357, in create_op attrs, op_def, compute_device) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py", line 3426, in _create_op_internal op_def=op_def) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py", line 1748, in __init__ self._traceback = tf_stack.extract_stack() ...which was originally created as op 'resnet18_nopool_bn_detectnet_v2/block_4b_conv_2/convolution', defined at: File "/usr/local/bin/tlt-train-g1", line 8, in sys.exit(main()) [elided 6 identical lines from previous traceback] File "/home/obaba/.cache/dazel/_dazel_obaba/e56ee0dba0ec09ac4333617b53ded644/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/scripts/train.py", line 454, in build_training_graph File "/home/obaba/.cache/dazel/_dazel_obaba/e56ee0dba0ec09ac4333617b53ded644/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/model/detectnet_model.py", line 559, in build_training_graph File "/usr/local/lib/python3.6/dist-packages/keras/engine/base_layer.py", line 457, in __call__ output = self.call(inputs, **kwargs) File "/usr/local/lib/python3.6/dist-packages/keras/engine/network.py", line 564, in call output_tensors, _, _ = self.run_internal_graph(inputs, masks) File "/usr/local/lib/python3.6/dist-packages/keras/engine/network.py", line 721, in run_internal_graph layer.call(computed_tensor, **kwargs)) File "/usr/local/lib/python3.6/dist-packages/keras/layers/convolutional.py", line 171, in call dilation_rate=self.dilation_rate) File "/opt/nvidia/third_party/keras/tensorflow_backend.py", line 113, in conv2d data_format=tf_data_format, File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/nn_ops.py", line 921, in convolution name=name) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/nn_ops.py", line 1032, in convolution_internal name=name) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/gen_nn_ops.py", line 1071, in conv2d data_format=data_format, dilations=dilations, name=name) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/op_def_library.py", line 794, in _apply_op_helper op_def=op_def) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/util/deprecation.py", line 513, in new_func return func(*args, **kwargs) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py", line 3357, in create_op attrs, op_def, compute_device) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py", line 3426, in _create_op_internal op_def=op_def) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py", line 1748, in __init__ self._traceback = tf_stack.extract_stack()