The error remains same:
Using TensorFlow backend.
2020-03-12 16:13:47,534 [INFO] iva.detectnet_v2.spec_handler.spec_loader: Merging specification from /workspace/spec_files/train.txt
WARNING:tensorflow:From /usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
2020-03-12 16:13:48,077 [WARNING] tensorflow: From /usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
2020-03-12 16:13:48.729739: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-03-12 16:13:48.831875: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-03-12 16:13:48.832559: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x7664140 executing computations on platform CUDA. Devices:
2020-03-12 16:13:48.832594: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): GeForce RTX 2070, Compute Capability 7.5
2020-03-12 16:13:48.858352: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3408000000 Hz
2020-03-12 16:13:48.859118: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x76cce90 executing computations on platform Host. Devices:
2020-03-12 16:13:48.859139: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): <undefined>, <undefined>
2020-03-12 16:13:48.859275: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties:
name: GeForce RTX 2070 major: 7 minor: 5 memoryClockRate(GHz): 1.86
pciBusID: 0000:01:00.0
totalMemory: 7.76GiB freeMemory: 6.95GiB
2020-03-12 16:13:48.859292: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2020-03-12 16:13:48.860084: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-03-12 16:13:48.860097: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0
2020-03-12 16:13:48.860105: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N
2020-03-12 16:13:48.860186: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6764 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2070, pci bus id: 0000:01:00.0, compute capability: 7.5)
/usr/local/lib/python2.7/dist-packages/keras/engine/saving.py:292: UserWarning: No training configuration found in save file: the model was *not* compiled. Compile it manually.
warnings.warn('No training configuration found in save file: '
WARNING:tensorflow:From ./detectnet_v2/dataloader/utilities.py:114: tf_record_iterator (from tensorflow.python.lib.io.tf_record) is deprecated and will be removed in a future version.
Instructions for updating:
Use eager execution and:
`tf.data.TFRecordDataset(path)`
2020-03-12 16:13:49,968 [WARNING] tensorflow: From ./detectnet_v2/dataloader/utilities.py:114: tf_record_iterator (from tensorflow.python.lib.io.tf_record) is deprecated and will be removed in a future version.
Instructions for updating:
Use eager execution and:
`tf.data.TFRecordDataset(path)`
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
2020-03-12 16:13:56,039 [INFO] /usr/local/lib/python2.7/dist-packages/iva/detectnet_v2/evaluation/build_evaluator.pyc: Found 1122 samples in validation set
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_1 (InputLayer) (None, 3, 384, 1248) 0
__________________________________________________________________________________________________
conv1 (Conv2D) (None, 64, 192, 624) 9472 input_1[0][0]
__________________________________________________________________________________________________
bn_conv1 (BatchNormalization) (None, 64, 192, 624) 256 conv1[0][0]
__________________________________________________________________________________________________
activation_1 (Activation) (None, 64, 192, 624) 0 bn_conv1[0][0]
__________________________________________________________________________________________________
block_1a_conv_1 (Conv2D) (None, 64, 96, 312) 36928 activation_1[0][0]
__________________________________________________________________________________________________
block_1a_bn_1 (BatchNormalizati (None, 64, 96, 312) 256 block_1a_conv_1[0][0]
__________________________________________________________________________________________________
activation_2 (Activation) (None, 64, 96, 312) 0 block_1a_bn_1[0][0]
__________________________________________________________________________________________________
block_1a_conv_2 (Conv2D) (None, 64, 96, 312) 36928 activation_2[0][0]
__________________________________________________________________________________________________
block_1a_conv_shortcut (Conv2D) (None, 64, 96, 312) 4160 activation_1[0][0]
__________________________________________________________________________________________________
block_1a_bn_2 (BatchNormalizati (None, 64, 96, 312) 256 block_1a_conv_2[0][0]
__________________________________________________________________________________________________
block_1a_bn_shortcut (BatchNorm (None, 64, 96, 312) 256 block_1a_conv_shortcut[0][0]
__________________________________________________________________________________________________
add_1 (Add) (None, 64, 96, 312) 0 block_1a_bn_2[0][0]
block_1a_bn_shortcut[0][0]
__________________________________________________________________________________________________
activation_3 (Activation) (None, 64, 96, 312) 0 add_1[0][0]
__________________________________________________________________________________________________
block_2a_conv_1 (Conv2D) (None, 128, 48, 156) 73856 activation_3[0][0]
__________________________________________________________________________________________________
block_2a_bn_1 (BatchNormalizati (None, 128, 48, 156) 512 block_2a_conv_1[0][0]
__________________________________________________________________________________________________
activation_4 (Activation) (None, 128, 48, 156) 0 block_2a_bn_1[0][0]
__________________________________________________________________________________________________
block_2a_conv_2 (Conv2D) (None, 128, 48, 156) 147584 activation_4[0][0]
__________________________________________________________________________________________________
block_2a_conv_shortcut (Conv2D) (None, 128, 48, 156) 8320 activation_3[0][0]
__________________________________________________________________________________________________
block_2a_bn_2 (BatchNormalizati (None, 128, 48, 156) 512 block_2a_conv_2[0][0]
__________________________________________________________________________________________________
block_2a_bn_shortcut (BatchNorm (None, 128, 48, 156) 512 block_2a_conv_shortcut[0][0]
__________________________________________________________________________________________________
add_2 (Add) (None, 128, 48, 156) 0 block_2a_bn_2[0][0]
block_2a_bn_shortcut[0][0]
__________________________________________________________________________________________________
activation_5 (Activation) (None, 128, 48, 156) 0 add_2[0][0]
__________________________________________________________________________________________________
block_3a_conv_1 (Conv2D) (None, 256, 24, 78) 295168 activation_5[0][0]
__________________________________________________________________________________________________
block_3a_bn_1 (BatchNormalizati (None, 256, 24, 78) 1024 block_3a_conv_1[0][0]
__________________________________________________________________________________________________
activation_6 (Activation) (None, 256, 24, 78) 0 block_3a_bn_1[0][0]
__________________________________________________________________________________________________
block_3a_conv_2 (Conv2D) (None, 256, 24, 78) 590080 activation_6[0][0]
__________________________________________________________________________________________________
block_3a_conv_shortcut (Conv2D) (None, 256, 24, 78) 33024 activation_5[0][0]
__________________________________________________________________________________________________
block_3a_bn_2 (BatchNormalizati (None, 256, 24, 78) 1024 block_3a_conv_2[0][0]
__________________________________________________________________________________________________
block_3a_bn_shortcut (BatchNorm (None, 256, 24, 78) 1024 block_3a_conv_shortcut[0][0]
__________________________________________________________________________________________________
add_3 (Add) (None, 256, 24, 78) 0 block_3a_bn_2[0][0]
block_3a_bn_shortcut[0][0]
__________________________________________________________________________________________________
activation_7 (Activation) (None, 256, 24, 78) 0 add_3[0][0]
__________________________________________________________________________________________________
block_4a_conv_1 (Conv2D) (None, 512, 24, 78) 1180160 activation_7[0][0]
__________________________________________________________________________________________________
block_4a_bn_1 (BatchNormalizati (None, 512, 24, 78) 2048 block_4a_conv_1[0][0]
__________________________________________________________________________________________________
activation_8 (Activation) (None, 512, 24, 78) 0 block_4a_bn_1[0][0]
__________________________________________________________________________________________________
block_4a_conv_2 (Conv2D) (None, 512, 24, 78) 2359808 activation_8[0][0]
__________________________________________________________________________________________________
block_4a_conv_shortcut (Conv2D) (None, 512, 24, 78) 131584 activation_7[0][0]
__________________________________________________________________________________________________
block_4a_bn_2 (BatchNormalizati (None, 512, 24, 78) 2048 block_4a_conv_2[0][0]
__________________________________________________________________________________________________
block_4a_bn_shortcut (BatchNorm (None, 512, 24, 78) 2048 block_4a_conv_shortcut[0][0]
__________________________________________________________________________________________________
add_4 (Add) (None, 512, 24, 78) 0 block_4a_bn_2[0][0]
block_4a_bn_shortcut[0][0]
__________________________________________________________________________________________________
activation_9 (Activation) (None, 512, 24, 78) 0 add_4[0][0]
__________________________________________________________________________________________________
output_bbox (Conv2D) (None, 12, 24, 78) 6156 activation_9[0][0]
__________________________________________________________________________________________________
output_cov (Conv2D) (None, 3, 24, 78) 1539 activation_9[0][0]
==================================================================================================
Total params: 4,926,543
Trainable params: 4,911,183
Non-trainable params: 15,360
__________________________________________________________________________________________________
INFO:tensorflow:Graph was finalized.
2020-03-12 16:14:03,516 [INFO] tensorflow: Graph was finalized.
2020-03-12 16:14:03.516861: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2020-03-12 16:14:03.516900: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-03-12 16:14:03.516909: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0
2020-03-12 16:14:03.516916: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N
2020-03-12 16:14:03.516994: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6764 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2070, pci bus id: 0000:01:00.0, compute capability: 7.5)
INFO:tensorflow:Running local_init_op.
2020-03-12 16:14:04,576 [INFO] tensorflow: Running local_init_op.
INFO:tensorflow:Done running local_init_op.
2020-03-12 16:14:04,847 [INFO] tensorflow: Done running local_init_op.
2020-03-12 16:14:06,510 [INFO] iva.detectnet_v2.evaluation.evaluation: step 0 / 71, 0.00s/step
2020-03-12 16:14:10.638651: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally
2020-03-12 16:14:11.514059: E tensorflow/stream_executor/cuda/cuda_dnn.cc:334] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2020-03-12 16:14:11.540502: E tensorflow/stream_executor/cuda/cuda_dnn.cc:334] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
Traceback (most recent call last):
File "/usr/local/bin/tlt-evaluate", line 10, in <module>
sys.exit(main())
File "./common/magnet_evaluate.py", line 38, in main
File "</usr/local/lib/python2.7/dist-packages/decorator.pyc:decorator-gen-2>", line 2, in main
File "./detectnet_v2/utilities/timer.py", line 46, in wrapped_fn
File "./detectnet_v2/scripts/evaluate.py", line 126, in main
File "./detectnet_v2/evaluation/evaluation.py", line 156, in evaluate
File "./detectnet_v2/evaluation/evaluation.py", line 116, in _get_validation_iterator
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 929, in run
run_metadata_ptr)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1152, in _run
feed_dict_tensor, options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1328, in _do_run
run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1348, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
[[node resnet10_nopool_bn_detectnet_v2/conv1/convolution (defined at /opt/nvidia/third_party/keras/tensorflow_backend.py:93) ]]
[[node strided_slice_355 (defined at ./detectnet_v2/model/utilities.py:53) ]]
Caused by op u'resnet10_nopool_bn_detectnet_v2/conv1/convolution', defined at:
File "/usr/local/bin/tlt-evaluate", line 10, in <module>
sys.exit(main())
File "./common/magnet_evaluate.py", line 38, in main
File "</usr/local/lib/python2.7/dist-packages/decorator.pyc:decorator-gen-2>", line 2, in main
File "./detectnet_v2/utilities/timer.py", line 46, in wrapped_fn
File "./detectnet_v2/scripts/evaluate.py", line 119, in main
File "./detectnet_v2/evaluation/build_evaluator.py", line 124, in build_evaluator_for_trained_gridbox
File "./detectnet_v2/model/utilities.py", line 26, in _fn_wrapper
File "./detectnet_v2/model/detectnet_model.py", line 617, in build_validation_graph
File "./detectnet_v2/model/utilities.py", line 26, in _fn_wrapper
File "./detectnet_v2/model/detectnet_model.py", line 577, in build_inference_graph
File "/usr/local/lib/python2.7/dist-packages/keras/engine/base_layer.py", line 457, in __call__
output = self.call(inputs, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/keras/engine/network.py", line 564, in call
output_tensors, _, _ = self.run_internal_graph(inputs, masks)
File "/usr/local/lib/python2.7/dist-packages/keras/engine/network.py", line 721, in run_internal_graph
layer.call(computed_tensor, **kwargs))
File "/usr/local/lib/python2.7/dist-packages/keras/layers/convolutional.py", line 171, in call
dilation_rate=self.dilation_rate)
File "/opt/nvidia/third_party/keras/tensorflow_backend.py", line 93, in conv2d
data_format=tf_data_format)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/nn_ops.py", line 851, in convolution
return op(input, filter)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/nn_ops.py", line 966, in __call__
return self.conv_op(inp, filter)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/nn_ops.py", line 591, in __call__
return self.call(inp, filter)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/nn_ops.py", line 208, in __call__
name=self.name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_nn_ops.py", line 1026, in conv2d
data_format=data_format, dilations=dilations, name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper
op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
return func(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 3300, in create_op
op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1801, in __init__
self._traceback = tf_stack.extract_stack()
UnknownError (see above for traceback): Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
[[node resnet10_nopool_bn_detectnet_v2/conv1/convolution (defined at /opt/nvidia/third_party/keras/tensorflow_backend.py:93) ]]
[[node strided_slice_355 (defined at ./detectnet_v2/model/utilities.py:53) ]]