I was running Nvidia Modulus with bare-metal installation in a conda environment.
I am getting this error failed to run cuBLAS routine: CUBLAS_STATUS_EXECUTION_FAILED
. No idea how to fix this.
Here is the whole error.
[s.1915438@scs2042 ldc]$ python ldc_2d.py
/home/s.1915438/.conda/envs/modulus/lib/python3.6/site-packages/modulus-21.6-py3.6.egg/modulus/controller.py:8: UserWarning: horovod was not imported. This will make multi-gpu runs impossible
warnings.warn("horovod was not imported. This will make multi-gpu runs impossible")
WARNING:tensorflow:From /home/s.1915438/.conda/envs/modulus/lib/python3.6/site-packages/modulus-21.6-py3.6.egg/modulus/optimizer.py:353: The name tf.train.AdamOptimizer is deprecated. Please use tf.compat.v1.train.AdamOptimizer instead.
WARNING:tensorflow:From /home/s.1915438/.conda/envs/modulus/lib/python3.6/site-packages/modulus-21.6-py3.6.egg/modulus/optimizer.py:361: The name tf.train.Optimizer is deprecated. Please use tf.compat.v1.train.Optimizer instead.
CONFIGS: FullyConnectedArch, /home/s.1915438/.conda/envs/modulus/lib/python3.6/site-packages/modulus-21.6-py3.6.egg/modulus/architecture/fully_connected.py
activation_fn: swish
layer_size: 512
nr_layers: 6
skip_connections: False
weight_norm: True
adaptive_activations: False
CONFIGS: ExponentialDecayLR, /home/s.1915438/.conda/envs/modulus/lib/python3.6/site-packages/modulus-21.6-py3.6.egg/modulus/learning_rate.py
start_lr: 0.001
end_lr: 0.0
decay_steps: 4000
decay_rate: 0.95
CONFIGS: AdamOptimizer, /home/s.1915438/.conda/envs/modulus/lib/python3.6/site-packages/modulus-21.6-py3.6.egg/modulus/optimizer.py
beta1: 0.9
beta2: 0.999
epsilon: 1e-08
amp: False
WARNING:tensorflow:From /home/s.1915438/.conda/envs/modulus/lib/python3.6/site-packages/modulus-21.6-py3.6.egg/modulus/arch.py:36: The name tf.make_template is deprecated. Please use tf.compat.v1.make_template instead.
CONFIGS: LDCSolver, ldc_2d.py
network_dir: ./network_checkpoint_ldc_2d
initialize_network_dir:
added_config_dir:
rec_results: True
rec_results_cpu: False
rec_results_freq: 1000
max_steps: 400000
save_filetypes: vtk,np
xla: False
inner_norm: 2
outer_norm: 2
save_network_freq: 1000
print_stats_freq: 100
tf_summary_freq: 500
optimizer_params_index: None
initialize_network_params: None
seq_train_domain: [<class '__main__.LDCTrain'>]
config: {'config': ModulusConfig(activation_fn='swish', adaptive_activations=False, added_config_dir='', amp=False, beta1=0.9, beta2=0.999, decay_rate=0.95, decay_steps=4000, end_lr=0.0, epsilon=1e-08, initialize_network_dir='', inner_norm=2, layer_size=512, max_steps=400000, network_dir='./network_checkpoint_ldc_2d', nr_layers=6, outer_norm=2, rec_results=True, rec_results_cpu=False, rec_results_freq=1000, run_mode='solve', save_filetypes='vtk,np', skip_connections=False, start_lr=0.001, weight_norm=True, xla=False)}
arch: <modulus.architecture.fully_connected.FullyConnectedArch object at 0x7f93e0020ef0>
lr: <modulus.learning_rate.ExponentialDecayLR object at 0x7f93e1005e10>
optimizer: <modulus.optimizer.AdamOptimizer object at 0x7f93e1005cf8>
equations: [<modulus.node.Node object at 0x7f93e1015160>, <modulus.node.Node object at 0x7f93defb9390>, <modulus.node.Node object at 0x7f93e0020358>]
nets: [<modulus.node.Node object at 0x7f93df01b128>]
diff_nodes: []
WARNING:tensorflow:From /home/s.1915438/.conda/envs/modulus/lib/python3.6/site-packages/modulus-21.6-py3.6.egg/modulus/solver.py:224: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.
WARNING:tensorflow:From /home/s.1915438/.conda/envs/modulus/lib/python3.6/site-packages/modulus-21.6-py3.6.egg/modulus/solver.py:236: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.
2022-04-08 09:26:08.773805: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2022-04-08 09:26:08.848041: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: NVIDIA A100-PCIE-40GB major: 8 minor: 0 memoryClockRate(GHz): 1.41
pciBusID: 0000:27:00.0
2022-04-08 09:26:08.854994: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2022-04-08 09:26:08.888436: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2022-04-08 09:26:08.912782: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
2022-04-08 09:26:08.936875: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
2022-04-08 09:26:08.966060: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
2022-04-08 09:26:08.989298: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
2022-04-08 09:26:09.070889: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2022-04-08 09:26:09.075647: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2022-04-08 09:26:09.085358: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2022-04-08 09:26:09.464066: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2350065000 Hz
2022-04-08 09:26:09.472379: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5579824ebc40 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2022-04-08 09:26:09.472406: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
2022-04-08 09:26:09.708881: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5579825039c0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2022-04-08 09:26:09.708955: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): NVIDIA A100-PCIE-40GB, Compute Capability 8.0
2022-04-08 09:26:09.713053: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: NVIDIA A100-PCIE-40GB major: 8 minor: 0 memoryClockRate(GHz): 1.41
pciBusID: 0000:27:00.0
2022-04-08 09:26:09.713109: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2022-04-08 09:26:09.713140: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2022-04-08 09:26:09.713160: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
2022-04-08 09:26:09.713181: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
2022-04-08 09:26:09.713200: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
2022-04-08 09:26:09.713219: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
2022-04-08 09:26:09.713238: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2022-04-08 09:26:09.717462: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2022-04-08 09:26:09.717508: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2022-04-08 09:26:09.720791: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2022-04-08 09:26:09.720829: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165] 0
2022-04-08 09:26:09.720853: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0: N
2022-04-08 09:26:09.726211: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 37943 MB memory) -> physical GPU (device: 0, name: NVIDIA A100-PCIE-40GB, pci bus id: 0000:27:00.0, compute capability: 8.0)
WARNING:tensorflow:
The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
* https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
* https://github.com/tensorflow/addons
* https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.
WARNING:tensorflow:From /home/s.1915438/.conda/envs/modulus/lib/python3.6/site-packages/modulus-21.6-py3.6.egg/modulus/solver.py:175: get_or_create_global_step (from tensorflow.contrib.framework.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Please switch to tf.train.get_or_create_global_step
WARNING:tensorflow:From /home/s.1915438/.conda/envs/modulus/lib/python3.6/site-packages/modulus-21.6-py3.6.egg/modulus/variables.py:241: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.
UNROLLING GRAPH:
TopWall
WARNING:tensorflow:From /home/s.1915438/.conda/envs/modulus/lib/python3.6/site-packages/modulus-21.6-py3.6.egg/modulus/tf_utils/layers.py:34: The name tf.variable_scope is deprecated. Please use tf.compat.v1.variable_scope instead.
WARNING:tensorflow:From /home/s.1915438/.conda/envs/modulus/lib/python3.6/site-packages/modulus-21.6-py3.6.egg/modulus/tf_utils/layers.py:34: The name tf.AUTO_REUSE is deprecated. Please use tf.compat.v1.AUTO_REUSE instead.
WARNING:tensorflow:From /home/s.1915438/.conda/envs/modulus/lib/python3.6/site-packages/modulus-21.6-py3.6.egg/modulus/tf_utils/layers.py:307: The name tf.get_variable is deprecated. Please use tf.compat.v1.get_variable instead.
NoSlip
Interior
grad calls: 2
calculated: [v__x, u__x, p__x, v__y, u__y, p__y]
grad calls: 2
calculated: [v__y, u__y, u__y__y, v__y__y, v__x, u__x, v__x__x, u__x__x]
WARNING:tensorflow:From /home/s.1915438/.conda/envs/modulus/lib/python3.6/site-packages/modulus-21.6-py3.6.egg/modulus/variables.py:218: The name tf.summary.scalar is deprecated. Please use tf.compat.v1.summary.scalar instead.
WARNING:tensorflow:From /home/s.1915438/.conda/envs/modulus/lib/python3.6/site-packages/modulus-21.6-py3.6.egg/modulus/learning_rate.py:65: The name tf.train.exponential_decay is deprecated. Please use tf.compat.v1.train.exponential_decay instead.
WARNING:tensorflow:From /home/s.1915438/.conda/envs/modulus/lib/python3.6/site-packages/tensorflow_core/python/ops/math_grad.py:1375: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
UNROLLING GRAPH:
Val
WARNING:tensorflow:From /home/s.1915438/.conda/envs/modulus/lib/python3.6/site-packages/modulus-21.6-py3.6.egg/modulus/solver.py:480: The name tf.summary.merge_all is deprecated. Please use tf.compat.v1.summary.merge_all instead.
WARNING:tensorflow:From /home/s.1915438/.conda/envs/modulus/lib/python3.6/site-packages/modulus-21.6-py3.6.egg/modulus/solver.py:241: The name tf.global_variables_initializer is deprecated. Please use tf.compat.v1.global_variables_initializer instead.
WARNING:tensorflow:From /home/s.1915438/.conda/envs/modulus/lib/python3.6/site-packages/modulus-21.6-py3.6.egg/modulus/solver.py:262: The name tf.train.Saver is deprecated. Please use tf.compat.v1.train.Saver instead.
WARNING:tensorflow:From /home/s.1915438/.conda/envs/modulus/lib/python3.6/site-packages/modulus-21.6-py3.6.egg/modulus/solver.py:262: The name tf.global_variables is deprecated. Please use tf.compat.v1.global_variables instead.
WARNING:tensorflow:From /home/s.1915438/.conda/envs/modulus/lib/python3.6/site-packages/modulus-21.6-py3.6.egg/modulus/solver.py:520: The name tf.summary.FileWriter is deprecated. Please use tf.compat.v1.summary.FileWriter instead.
Solving for Domain iteration 0
2022-04-08 09:33:55.831349: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2022-04-08 09:36:01.040438: E tensorflow/stream_executor/cuda/cuda_blas.cc:428] failed to run cuBLAS routine: CUBLAS_STATUS_EXECUTION_FAILED
Traceback (most recent call last):
File "/home/s.1915438/.conda/envs/modulus/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1365, in _do_call
return fn(*args)
File "/home/s.1915438/.conda/envs/modulus/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1350, in _run_fn
target_list, run_metadata)
File "/home/s.1915438/.conda/envs/modulus/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1443, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.InternalError: 2 root error(s) found.
(0) Internal: Blas GEMM launch failed : a.shape=(1000, 2), b.shape=(2, 512), m=1000, n=512, k=2
[[{{node flow_net/fc0/MatMul}}]]
[[Sum_7/_41]]
(1) Internal: Blas GEMM launch failed : a.shape=(1000, 2), b.shape=(2, 512), m=1000, n=512, k=2
[[{{node flow_net/fc0/MatMul}}]]
0 successful operations.
0 derived errors ignored.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "ldc_2d.py", line 91, in <module>
ctr.run()
File "/home/s.1915438/.conda/envs/modulus/lib/python3.6/site-packages/modulus-21.6-py3.6.egg/modulus/controller.py", line 91, in run
self.solver.solve()
File "/home/s.1915438/.conda/envs/modulus/lib/python3.6/site-packages/modulus-21.6-py3.6.egg/modulus/solver.py", line 527, in solve
train_stats = seq_train_step[domain_index](train_np_var)
File "/home/s.1915438/.conda/envs/modulus/lib/python3.6/site-packages/modulus-21.6-py3.6.egg/modulus/variables.py", line 510, in np_function
np_outvar_list = sess.run(outvar_placeholders, feed_dict)
File "/home/s.1915438/.conda/envs/modulus/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 956, in run
run_metadata_ptr)
File "/home/s.1915438/.conda/envs/modulus/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1180, in _run
feed_dict_tensor, options, run_metadata)
File "/home/s.1915438/.conda/envs/modulus/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1359, in _do_run
run_metadata)
File "/home/s.1915438/.conda/envs/modulus/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1384, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: 2 root error(s) found.
(0) Internal: Blas GEMM launch failed : a.shape=(1000, 2), b.shape=(2, 512), m=1000, n=512, k=2
[[node flow_net/fc0/MatMul (defined at /home/s.1915438/.conda/envs/modulus/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py:1748) ]]
[[Sum_7/_41]]
(1) Internal: Blas GEMM launch failed : a.shape=(1000, 2), b.shape=(2, 512), m=1000, n=512, k=2
[[node flow_net/fc0/MatMul (defined at /home/s.1915438/.conda/envs/modulus/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py:1748) ]]
0 successful operations.
0 derived errors ignored.
Original stack trace for 'flow_net/fc0/MatMul':
File "ldc_2d.py", line 91, in <module>
ctr.run()
File "/home/s.1915438/.conda/envs/modulus/lib/python3.6/site-packages/modulus-21.6-py3.6.egg/modulus/controller.py", line 91, in run
self.solver.solve()
File "/home/s.1915438/.conda/envs/modulus/lib/python3.6/site-packages/modulus-21.6-py3.6.egg/modulus/solver.py", line 389, in solve
train_pred_domain_outvar = unroll_graph_on_dict(self.nets+self.equations, train_domain_invar, train_true_domain_outvar, diff_nodes=self.diff_nodes)
File "/home/s.1915438/.conda/envs/modulus/lib/python3.6/site-packages/modulus-21.6-py3.6.egg/modulus/graph.py", line 128, in unroll_graph_on_dict
outvar_dict[key] = unroll_graph(nodes, invar_with_global, req_outvar_names, diff_nodes)
File "/home/s.1915438/.conda/envs/modulus/lib/python3.6/site-packages/modulus-21.6-py3.6.egg/modulus/graph.py", line 67, in unroll_graph
outvar.update(node.evaluate(input_variables))
File "/home/s.1915438/.conda/envs/modulus/lib/python3.6/site-packages/tensorflow_core/python/ops/template.py", line 393, in __call__
return self._call_func(args, kwargs)
File "/home/s.1915438/.conda/envs/modulus/lib/python3.6/site-packages/tensorflow_core/python/ops/template.py", line 355, in _call_func
result = self._func(*args, **kwargs)
File "/home/s.1915438/.conda/envs/modulus/lib/python3.6/site-packages/modulus-21.6-py3.6.egg/modulus/arch.py", line 36, in <lambda>
network_template = tf.make_template(name, lambda x: self._network_template(x, output_keys=Key.convert_list(outputs)))
File "/home/s.1915438/.conda/envs/modulus/lib/python3.6/site-packages/modulus-21.6-py3.6.egg/modulus/architecture/fully_connected.py", line 73, in _network_template
activation_par = activation_par)
File "/home/s.1915438/.conda/envs/modulus/lib/python3.6/site-packages/modulus-21.6-py3.6.egg/modulus/tf_utils/layers.py", line 48, in fc_layer
outputs = tf.add(tf.matmul(inputs, weights), biases, name=name)
File "/home/s.1915438/.conda/envs/modulus/lib/python3.6/site-packages/tensorflow_core/python/util/dispatch.py", line 180, in wrapper
return target(*args, **kwargs)
File "/home/s.1915438/.conda/envs/modulus/lib/python3.6/site-packages/tensorflow_core/python/ops/math_ops.py", line 2754, in matmul
a, b, transpose_a=transpose_a, transpose_b=transpose_b, name=name)
File "/home/s.1915438/.conda/envs/modulus/lib/python3.6/site-packages/tensorflow_core/python/ops/gen_math_ops.py", line 6136, in mat_mul
name=name)
File "/home/s.1915438/.conda/envs/modulus/lib/python3.6/site-packages/tensorflow_core/python/framework/op_def_library.py", line 794, in _apply_op_helper
op_def=op_def)
File "/home/s.1915438/.conda/envs/modulus/lib/python3.6/site-packages/tensorflow_core/python/util/deprecation.py", line 507, in new_func
return func(*args, **kwargs)
File "/home/s.1915438/.conda/envs/modulus/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py", line 3357, in create_op
attrs, op_def, compute_device)
File "/home/s.1915438/.conda/envs/modulus/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py", line 3426, in _create_op_internal
op_def=op_def)
File "/home/s.1915438/.conda/envs/modulus/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py", line 1748, in __init__
self._traceback = tf_stack.extract_stack()
[s.1915438@scs2042 ldc]$