pcp@pcp-All-Series:~$ sudo su [sudo] password for pcp: root@pcp-All-Series:/home/pcp# nvidia-smi Mon Aug 12 10:43:32 2019 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 430.40 Driver Version: 430.40 CUDA Version: 10.1 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 GeForce RTX 208... Off | 00000000:06:00.0 On | N/A | | 0% 49C P0 37W / 250W | 450MiB / 7979MiB | 0% Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 1125 G /usr/lib/xorg/Xorg 40MiB | | 0 1156 G /usr/bin/gnome-shell 50MiB | | 0 1347 G /usr/lib/xorg/Xorg 240MiB | | 0 1474 G /usr/bin/gnome-shell 117MiB | +-----------------------------------------------------------------------------+ root@pcp-All-Series:/home/pcp# source activate tensorflow (tensorflow) root@pcp-All-Series:/home/pcp# cd ~/benchmarks/scripts/tf_cnn_benchmarks/ (tensorflow) root@pcp-All-Series:~/benchmarks/scripts/tf_cnn_benchmarks# python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=64 --model=resnet50 --variable_update=independent --local_parameter_device=gpu 2019-08-12 10:44:07.329277: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA 2019-08-12 10:44:07.363553: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3598265000 Hz 2019-08-12 10:44:07.364970: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x55a856b7f0f0 executing computations on platform Host. Devices: 2019-08-12 10:44:07.365046: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): , 2019-08-12 10:44:07.554376: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: name: GeForce RTX 2080 SUPER major: 7 minor: 5 memoryClockRate(GHz): 1.875 pciBusID: 0000:06:00.0 totalMemory: 7.79GiB freeMemory: 7.23GiB 2019-08-12 10:44:07.554412: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0 2019-08-12 10:44:07.555615: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-08-12 10:44:07.555630: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0 2019-08-12 10:44:07.555639: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N 2019-08-12 10:44:07.555724: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7029 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080 SUPER, pci bus id: 0000:06:00.0, compute capability: 7.5) 2019-08-12 10:44:07.557749: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x55a855d5a0b0 executing computations on platform CUDA. Devices: 2019-08-12 10:44:07.557766: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): GeForce RTX 2080 SUPER, Compute Capability 7.5 TensorFlow: 1.13 Model: resnet50 Dataset: imagenet (synthetic) Mode: training SingleSess: False Batch size: 64 global 64 per device Num batches: 100 Num epochs: 0.00 Devices: ['/gpu:0'] NUMA bind: False Data format: NCHW Optimizer: sgd Variables: independent ========== Generating training model W0812 10:44:07.564668 140464772257600 deprecation.py:323] From /root/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version. Instructions for updating: Colocations handled automatically by placer. W0812 10:44:07.577384 140464772257600 deprecation.py:323] From /root/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:129: conv2d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.conv2d instead. W0812 10:44:07.605700 140464772257600 deprecation.py:323] From /root/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:261: max_pooling2d (from tensorflow.python.layers.pooling) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.max_pooling2d instead. W0812 10:44:09.051364 140464772257600 deprecation.py:323] From /root/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/ops/losses/losses_impl.py:209: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. W0812 10:44:09.157744 140464772257600 deprecation.py:323] From /root/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. Initializing graph W0812 10:44:09.881873 140464772257600 deprecation.py:323] From /root/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py:2252: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version. Instructions for updating: Please switch to tf.train.MonitoredTrainingSession 2019-08-12 10:44:10.167405: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0 2019-08-12 10:44:10.167470: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-08-12 10:44:10.167478: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0 2019-08-12 10:44:10.167485: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N 2019-08-12 10:44:10.167557: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7029 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080 SUPER, pci bus id: 0000:06:00.0, compute capability: 7.5) I0812 10:44:11.651131 140464772257600 session_manager.py:491] Running local_init_op. I0812 10:44:11.697105 140464772257600 session_manager.py:493] Done running local_init_op. Running warm up 2019-08-12 10:44:12.584643: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally 2019-08-12 10:44:13.869459: E tensorflow/stream_executor/cuda/cuda_dnn.cc:334] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR 2019-08-12 10:44:13.888221: E tensorflow/stream_executor/cuda/cuda_dnn.cc:334] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR I0812 10:44:13.902160 140464772257600 coordinator.py:224] Error reported to Coordinator: , Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above. [[node tower_0/v0/cg/conv0/conv2d/Conv2D (defined at /root/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:129) ]] [[node average_loss/Mean (defined at /root/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py:2915) ]] Caused by op 'tower_0/v0/cg/conv0/conv2d/Conv2D', defined at: File "tf_cnn_benchmarks.py", line 72, in app.run(main) # Raises error on invalid flags, unlike tf.app.run() File "/root/anaconda3/envs/tensorflow/lib/python3.6/site-packages/absl/app.py", line 300, in run _run_main(main, args) File "/root/anaconda3/envs/tensorflow/lib/python3.6/site-packages/absl/app.py", line 251, in _run_main sys.exit(main(argv)) File "tf_cnn_benchmarks.py", line 68, in main bench.run() File "/root/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py", line 1865, in run return self._benchmark_train() File "/root/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py", line 2061, in _benchmark_train build_result = self._build_graph() File "/root/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py", line 2095, in _build_graph (input_producer_op, enqueue_ops, fetches) = self._build_model() File "/root/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py", line 2807, in _build_model gpu_compute_stage_ops, gpu_grad_stage_ops) File "/root/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py", line 3324, in add_forward_pass_and_gradients outputs = maybe_compile(forward_pass_and_gradients, self.params) File "/root/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py", line 3521, in maybe_compile return computation() File "/root/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py", line 3178, in forward_pass_and_gradients input_list, phase_train, nclass) File "/root/benchmarks/scripts/tf_cnn_benchmarks/models/model.py", line 285, in build_network self.add_inference(network) File "/root/benchmarks/scripts/tf_cnn_benchmarks/models/resnet_model.py", line 308, in add_inference cnn.conv(64, 7, 7, 2, 2, mode='SAME_RESNET', use_batch_norm=True) File "/root/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py", line 204, in conv kernel_initializer=kernel_initializer) File "/root/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py", line 129, in _conv2d_impl use_bias=False) File "/root/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 324, in new_func return func(*args, **kwargs) File "/root/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/layers/convolutional.py", line 424, in conv2d return layer.apply(inputs) File "/root/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/keras/engine/base_layer.py", line 1227, in apply return self.__call__(inputs, *args, **kwargs) File "/root/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/layers/base.py", line 530, in __call__ outputs = super(Layer, self).__call__(inputs, *args, **kwargs) File "/root/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/keras/engine/base_layer.py", line 554, in __call__ outputs = self.call(inputs, *args, **kwargs) File "/root/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/keras/layers/convolutional.py", line 194, in call outputs = self._convolution_op(inputs, self.kernel) File "/root/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/ops/nn_ops.py", line 966, in __call__ return self.conv_op(inp, filter) File "/root/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/ops/nn_ops.py", line 591, in __call__ return self.call(inp, filter) File "/root/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/ops/nn_ops.py", line 208, in __call__ name=self.name) File "/root/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/ops/gen_nn_ops.py", line 1026, in conv2d data_format=data_format, dilations=dilations, name=name) File "/root/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper op_def=op_def) File "/root/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func return func(*args, **kwargs) File "/root/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3300, in create_op op_def=op_def) File "/root/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1801, in __init__ self._traceback = tf_stack.extract_stack() UnknownError (see above for traceback): Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above. [[node tower_0/v0/cg/conv0/conv2d/Conv2D (defined at /root/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:129) ]] [[node average_loss/Mean (defined at /root/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py:2915) ]] Traceback (most recent call last): File "/root/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1334, in _do_call return fn(*args) File "/root/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1319, in _run_fn options, feed_dict, fetch_list, target_list, run_metadata) File "/root/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1407, in _call_tf_sessionrun run_metadata) tensorflow.python.framework.errors_impl.UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above. [[{{node tower_0/v0/cg/conv0/conv2d/Conv2D}}]] [[{{node average_loss/Mean}}]] During handling of the above exception, another exception occurred: Traceback (most recent call last): File "tf_cnn_benchmarks.py", line 72, in app.run(main) # Raises error on invalid flags, unlike tf.app.run() File "/root/anaconda3/envs/tensorflow/lib/python3.6/site-packages/absl/app.py", line 300, in run _run_main(main, args) File "/root/anaconda3/envs/tensorflow/lib/python3.6/site-packages/absl/app.py", line 251, in _run_main sys.exit(main(argv)) File "tf_cnn_benchmarks.py", line 68, in main bench.run() File "/root/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py", line 1865, in run return self._benchmark_train() File "/root/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py", line 2070, in _benchmark_train return self._benchmark_graph(result_to_benchmark, eval_build_results) File "/root/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py", line 2279, in _benchmark_graph is_chief, summary_writer, profiler) File "/root/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py", line 2414, in benchmark_with_session collective_graph_key=collective_graph_key) File "/root/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py", line 854, in benchmark_one_step results = sess.run(fetches, options=run_options, run_metadata=run_metadata) File "/root/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 929, in run run_metadata_ptr) File "/root/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1152, in _run feed_dict_tensor, options, run_metadata) File "/root/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1328, in _do_run run_metadata) File "/root/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1348, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above. [[node tower_0/v0/cg/conv0/conv2d/Conv2D (defined at /root/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:129) ]] [[node average_loss/Mean (defined at /root/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py:2915) ]] Caused by op 'tower_0/v0/cg/conv0/conv2d/Conv2D', defined at: File "tf_cnn_benchmarks.py", line 72, in app.run(main) # Raises error on invalid flags, unlike tf.app.run() File "/root/anaconda3/envs/tensorflow/lib/python3.6/site-packages/absl/app.py", line 300, in run _run_main(main, args) File "/root/anaconda3/envs/tensorflow/lib/python3.6/site-packages/absl/app.py", line 251, in _run_main sys.exit(main(argv)) File "tf_cnn_benchmarks.py", line 68, in main bench.run() File "/root/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py", line 1865, in run return self._benchmark_train() File "/root/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py", line 2061, in _benchmark_train build_result = self._build_graph() File "/root/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py", line 2095, in _build_graph (input_producer_op, enqueue_ops, fetches) = self._build_model() File "/root/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py", line 2807, in _build_model gpu_compute_stage_ops, gpu_grad_stage_ops) File "/root/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py", line 3324, in add_forward_pass_and_gradients outputs = maybe_compile(forward_pass_and_gradients, self.params) File "/root/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py", line 3521, in maybe_compile return computation() File "/root/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py", line 3178, in forward_pass_and_gradients input_list, phase_train, nclass) File "/root/benchmarks/scripts/tf_cnn_benchmarks/models/model.py", line 285, in build_network self.add_inference(network) File "/root/benchmarks/scripts/tf_cnn_benchmarks/models/resnet_model.py", line 308, in add_inference cnn.conv(64, 7, 7, 2, 2, mode='SAME_RESNET', use_batch_norm=True) File "/root/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py", line 204, in conv kernel_initializer=kernel_initializer) File "/root/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py", line 129, in _conv2d_impl use_bias=False) File "/root/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 324, in new_func return func(*args, **kwargs) File "/root/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/layers/convolutional.py", line 424, in conv2d return layer.apply(inputs) File "/root/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/keras/engine/base_layer.py", line 1227, in apply return self.__call__(inputs, *args, **kwargs) File "/root/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/layers/base.py", line 530, in __call__ outputs = super(Layer, self).__call__(inputs, *args, **kwargs) File "/root/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/keras/engine/base_layer.py", line 554, in __call__ outputs = self.call(inputs, *args, **kwargs) File "/root/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/keras/layers/convolutional.py", line 194, in call outputs = self._convolution_op(inputs, self.kernel) File "/root/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/ops/nn_ops.py", line 966, in __call__ return self.conv_op(inp, filter) File "/root/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/ops/nn_ops.py", line 591, in __call__ return self.call(inp, filter) File "/root/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/ops/nn_ops.py", line 208, in __call__ name=self.name) File "/root/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/ops/gen_nn_ops.py", line 1026, in conv2d data_format=data_format, dilations=dilations, name=name) File "/root/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper op_def=op_def) File "/root/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func return func(*args, **kwargs) File "/root/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3300, in create_op op_def=op_def) File "/root/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1801, in __init__ self._traceback = tf_stack.extract_stack() UnknownError (see above for traceback): Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above. [[node tower_0/v0/cg/conv0/conv2d/Conv2D (defined at /root/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:129) ]] [[node average_loss/Mean (defined at /root/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py:2915) ]] (tensorflow) root@pcp-All-Series:~/benchmarks/scripts/tf_cnn_benchmarks#