Nvidia Modulus: failed to run cuBLAS routine: CUBLAS_STATUS_EXECUTION_FAILED

I was running Nvidia Modulus with bare-metal installation in a conda environment.

I am getting this error failed to run cuBLAS routine: CUBLAS_STATUS_EXECUTION_FAILED. No idea how to fix this.

Here is the whole error.

[s.1915438@scs2042 ldc]$ python ldc_2d.py
/home/s.1915438/.conda/envs/modulus/lib/python3.6/site-packages/modulus-21.6-py3.6.egg/modulus/controller.py:8: UserWarning: horovod was not imported. This will make multi-gpu runs impossible
  warnings.warn("horovod was not imported. This will make multi-gpu runs impossible")
WARNING:tensorflow:From /home/s.1915438/.conda/envs/modulus/lib/python3.6/site-packages/modulus-21.6-py3.6.egg/modulus/optimizer.py:353: The name tf.train.AdamOptimizer is deprecated. Please use tf.compat.v1.train.AdamOptimizer instead.

WARNING:tensorflow:From /home/s.1915438/.conda/envs/modulus/lib/python3.6/site-packages/modulus-21.6-py3.6.egg/modulus/optimizer.py:361: The name tf.train.Optimizer is deprecated. Please use tf.compat.v1.train.Optimizer instead.

CONFIGS: FullyConnectedArch, /home/s.1915438/.conda/envs/modulus/lib/python3.6/site-packages/modulus-21.6-py3.6.egg/modulus/architecture/fully_connected.py
  activation_fn: swish
  layer_size: 512
  nr_layers: 6
  skip_connections: False
  weight_norm: True
  adaptive_activations: False
CONFIGS: ExponentialDecayLR, /home/s.1915438/.conda/envs/modulus/lib/python3.6/site-packages/modulus-21.6-py3.6.egg/modulus/learning_rate.py
  start_lr: 0.001
  end_lr: 0.0
  decay_steps: 4000
  decay_rate: 0.95
CONFIGS: AdamOptimizer, /home/s.1915438/.conda/envs/modulus/lib/python3.6/site-packages/modulus-21.6-py3.6.egg/modulus/optimizer.py
  beta1: 0.9
  beta2: 0.999
  epsilon: 1e-08
  amp: False
WARNING:tensorflow:From /home/s.1915438/.conda/envs/modulus/lib/python3.6/site-packages/modulus-21.6-py3.6.egg/modulus/arch.py:36: The name tf.make_template is deprecated. Please use tf.compat.v1.make_template instead.

CONFIGS: LDCSolver, ldc_2d.py
  network_dir: ./network_checkpoint_ldc_2d
  initialize_network_dir: 
  added_config_dir: 
  rec_results: True
  rec_results_cpu: False
  rec_results_freq: 1000
  max_steps: 400000
  save_filetypes: vtk,np
  xla: False
  inner_norm: 2
  outer_norm: 2
  save_network_freq: 1000
  print_stats_freq: 100
  tf_summary_freq: 500
  optimizer_params_index: None
  initialize_network_params: None
  seq_train_domain: [<class '__main__.LDCTrain'>]
  config: {'config': ModulusConfig(activation_fn='swish', adaptive_activations=False, added_config_dir='', amp=False, beta1=0.9, beta2=0.999, decay_rate=0.95, decay_steps=4000, end_lr=0.0, epsilon=1e-08, initialize_network_dir='', inner_norm=2, layer_size=512, max_steps=400000, network_dir='./network_checkpoint_ldc_2d', nr_layers=6, outer_norm=2, rec_results=True, rec_results_cpu=False, rec_results_freq=1000, run_mode='solve', save_filetypes='vtk,np', skip_connections=False, start_lr=0.001, weight_norm=True, xla=False)}
  arch: <modulus.architecture.fully_connected.FullyConnectedArch object at 0x7f93e0020ef0>
  lr: <modulus.learning_rate.ExponentialDecayLR object at 0x7f93e1005e10>
  optimizer: <modulus.optimizer.AdamOptimizer object at 0x7f93e1005cf8>
  equations: [<modulus.node.Node object at 0x7f93e1015160>, <modulus.node.Node object at 0x7f93defb9390>, <modulus.node.Node object at 0x7f93e0020358>]
  nets: [<modulus.node.Node object at 0x7f93df01b128>]
  diff_nodes: []
WARNING:tensorflow:From /home/s.1915438/.conda/envs/modulus/lib/python3.6/site-packages/modulus-21.6-py3.6.egg/modulus/solver.py:224: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.

WARNING:tensorflow:From /home/s.1915438/.conda/envs/modulus/lib/python3.6/site-packages/modulus-21.6-py3.6.egg/modulus/solver.py:236: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.

2022-04-08 09:26:08.773805: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2022-04-08 09:26:08.848041: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties: 
name: NVIDIA A100-PCIE-40GB major: 8 minor: 0 memoryClockRate(GHz): 1.41
pciBusID: 0000:27:00.0
2022-04-08 09:26:08.854994: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2022-04-08 09:26:08.888436: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2022-04-08 09:26:08.912782: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
2022-04-08 09:26:08.936875: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
2022-04-08 09:26:08.966060: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
2022-04-08 09:26:08.989298: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
2022-04-08 09:26:09.070889: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2022-04-08 09:26:09.075647: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2022-04-08 09:26:09.085358: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2022-04-08 09:26:09.464066: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2350065000 Hz
2022-04-08 09:26:09.472379: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5579824ebc40 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2022-04-08 09:26:09.472406: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2022-04-08 09:26:09.708881: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5579825039c0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2022-04-08 09:26:09.708955: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): NVIDIA A100-PCIE-40GB, Compute Capability 8.0
2022-04-08 09:26:09.713053: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties: 
name: NVIDIA A100-PCIE-40GB major: 8 minor: 0 memoryClockRate(GHz): 1.41
pciBusID: 0000:27:00.0
2022-04-08 09:26:09.713109: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2022-04-08 09:26:09.713140: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2022-04-08 09:26:09.713160: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
2022-04-08 09:26:09.713181: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
2022-04-08 09:26:09.713200: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
2022-04-08 09:26:09.713219: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
2022-04-08 09:26:09.713238: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2022-04-08 09:26:09.717462: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2022-04-08 09:26:09.717508: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2022-04-08 09:26:09.720791: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2022-04-08 09:26:09.720829: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165]      0 
2022-04-08 09:26:09.720853: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0:   N 
2022-04-08 09:26:09.726211: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 37943 MB memory) -> physical GPU (device: 0, name: NVIDIA A100-PCIE-40GB, pci bus id: 0000:27:00.0, compute capability: 8.0)
WARNING:tensorflow:
The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
  * https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.

WARNING:tensorflow:From /home/s.1915438/.conda/envs/modulus/lib/python3.6/site-packages/modulus-21.6-py3.6.egg/modulus/solver.py:175: get_or_create_global_step (from tensorflow.contrib.framework.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Please switch to tf.train.get_or_create_global_step
WARNING:tensorflow:From /home/s.1915438/.conda/envs/modulus/lib/python3.6/site-packages/modulus-21.6-py3.6.egg/modulus/variables.py:241: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.

UNROLLING GRAPH: 
    TopWall
WARNING:tensorflow:From /home/s.1915438/.conda/envs/modulus/lib/python3.6/site-packages/modulus-21.6-py3.6.egg/modulus/tf_utils/layers.py:34: The name tf.variable_scope is deprecated. Please use tf.compat.v1.variable_scope instead.

WARNING:tensorflow:From /home/s.1915438/.conda/envs/modulus/lib/python3.6/site-packages/modulus-21.6-py3.6.egg/modulus/tf_utils/layers.py:34: The name tf.AUTO_REUSE is deprecated. Please use tf.compat.v1.AUTO_REUSE instead.

WARNING:tensorflow:From /home/s.1915438/.conda/envs/modulus/lib/python3.6/site-packages/modulus-21.6-py3.6.egg/modulus/tf_utils/layers.py:307: The name tf.get_variable is deprecated. Please use tf.compat.v1.get_variable instead.

    NoSlip
    Interior
grad calls: 2
calculated: [v__x, u__x, p__x, v__y, u__y, p__y]
grad calls: 2
calculated: [v__y, u__y, u__y__y, v__y__y, v__x, u__x, v__x__x, u__x__x]
WARNING:tensorflow:From /home/s.1915438/.conda/envs/modulus/lib/python3.6/site-packages/modulus-21.6-py3.6.egg/modulus/variables.py:218: The name tf.summary.scalar is deprecated. Please use tf.compat.v1.summary.scalar instead.

WARNING:tensorflow:From /home/s.1915438/.conda/envs/modulus/lib/python3.6/site-packages/modulus-21.6-py3.6.egg/modulus/learning_rate.py:65: The name tf.train.exponential_decay is deprecated. Please use tf.compat.v1.train.exponential_decay instead.

WARNING:tensorflow:From /home/s.1915438/.conda/envs/modulus/lib/python3.6/site-packages/tensorflow_core/python/ops/math_grad.py:1375: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
UNROLLING GRAPH: 
    Val
WARNING:tensorflow:From /home/s.1915438/.conda/envs/modulus/lib/python3.6/site-packages/modulus-21.6-py3.6.egg/modulus/solver.py:480: The name tf.summary.merge_all is deprecated. Please use tf.compat.v1.summary.merge_all instead.

WARNING:tensorflow:From /home/s.1915438/.conda/envs/modulus/lib/python3.6/site-packages/modulus-21.6-py3.6.egg/modulus/solver.py:241: The name tf.global_variables_initializer is deprecated. Please use tf.compat.v1.global_variables_initializer instead.

WARNING:tensorflow:From /home/s.1915438/.conda/envs/modulus/lib/python3.6/site-packages/modulus-21.6-py3.6.egg/modulus/solver.py:262: The name tf.train.Saver is deprecated. Please use tf.compat.v1.train.Saver instead.

WARNING:tensorflow:From /home/s.1915438/.conda/envs/modulus/lib/python3.6/site-packages/modulus-21.6-py3.6.egg/modulus/solver.py:262: The name tf.global_variables is deprecated. Please use tf.compat.v1.global_variables instead.

WARNING:tensorflow:From /home/s.1915438/.conda/envs/modulus/lib/python3.6/site-packages/modulus-21.6-py3.6.egg/modulus/solver.py:520: The name tf.summary.FileWriter is deprecated. Please use tf.compat.v1.summary.FileWriter instead.

Solving for Domain  iteration 0
2022-04-08 09:33:55.831349: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2022-04-08 09:36:01.040438: E tensorflow/stream_executor/cuda/cuda_blas.cc:428] failed to run cuBLAS routine: CUBLAS_STATUS_EXECUTION_FAILED
Traceback (most recent call last):
  File "/home/s.1915438/.conda/envs/modulus/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1365, in _do_call
    return fn(*args)
  File "/home/s.1915438/.conda/envs/modulus/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1350, in _run_fn
    target_list, run_metadata)
  File "/home/s.1915438/.conda/envs/modulus/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1443, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.InternalError: 2 root error(s) found.
  (0) Internal: Blas GEMM launch failed : a.shape=(1000, 2), b.shape=(2, 512), m=1000, n=512, k=2
         [[{{node flow_net/fc0/MatMul}}]]
         [[Sum_7/_41]]
  (1) Internal: Blas GEMM launch failed : a.shape=(1000, 2), b.shape=(2, 512), m=1000, n=512, k=2
         [[{{node flow_net/fc0/MatMul}}]]
0 successful operations.
0 derived errors ignored.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "ldc_2d.py", line 91, in <module>
    ctr.run()
  File "/home/s.1915438/.conda/envs/modulus/lib/python3.6/site-packages/modulus-21.6-py3.6.egg/modulus/controller.py", line 91, in run
    self.solver.solve()
  File "/home/s.1915438/.conda/envs/modulus/lib/python3.6/site-packages/modulus-21.6-py3.6.egg/modulus/solver.py", line 527, in solve
    train_stats = seq_train_step[domain_index](train_np_var)
  File "/home/s.1915438/.conda/envs/modulus/lib/python3.6/site-packages/modulus-21.6-py3.6.egg/modulus/variables.py", line 510, in np_function
    np_outvar_list = sess.run(outvar_placeholders, feed_dict)
  File "/home/s.1915438/.conda/envs/modulus/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 956, in run
    run_metadata_ptr)
  File "/home/s.1915438/.conda/envs/modulus/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1180, in _run
    feed_dict_tensor, options, run_metadata)
  File "/home/s.1915438/.conda/envs/modulus/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1359, in _do_run
    run_metadata)
  File "/home/s.1915438/.conda/envs/modulus/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1384, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: 2 root error(s) found.
  (0) Internal: Blas GEMM launch failed : a.shape=(1000, 2), b.shape=(2, 512), m=1000, n=512, k=2
         [[node flow_net/fc0/MatMul (defined at /home/s.1915438/.conda/envs/modulus/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py:1748) ]]
         [[Sum_7/_41]]
  (1) Internal: Blas GEMM launch failed : a.shape=(1000, 2), b.shape=(2, 512), m=1000, n=512, k=2
         [[node flow_net/fc0/MatMul (defined at /home/s.1915438/.conda/envs/modulus/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py:1748) ]]
0 successful operations.
0 derived errors ignored.

Original stack trace for 'flow_net/fc0/MatMul':
  File "ldc_2d.py", line 91, in <module>
    ctr.run()
  File "/home/s.1915438/.conda/envs/modulus/lib/python3.6/site-packages/modulus-21.6-py3.6.egg/modulus/controller.py", line 91, in run
    self.solver.solve()
  File "/home/s.1915438/.conda/envs/modulus/lib/python3.6/site-packages/modulus-21.6-py3.6.egg/modulus/solver.py", line 389, in solve
    train_pred_domain_outvar = unroll_graph_on_dict(self.nets+self.equations, train_domain_invar, train_true_domain_outvar, diff_nodes=self.diff_nodes)
  File "/home/s.1915438/.conda/envs/modulus/lib/python3.6/site-packages/modulus-21.6-py3.6.egg/modulus/graph.py", line 128, in unroll_graph_on_dict
    outvar_dict[key] = unroll_graph(nodes, invar_with_global, req_outvar_names, diff_nodes)
  File "/home/s.1915438/.conda/envs/modulus/lib/python3.6/site-packages/modulus-21.6-py3.6.egg/modulus/graph.py", line 67, in unroll_graph
    outvar.update(node.evaluate(input_variables))
  File "/home/s.1915438/.conda/envs/modulus/lib/python3.6/site-packages/tensorflow_core/python/ops/template.py", line 393, in __call__
    return self._call_func(args, kwargs)
  File "/home/s.1915438/.conda/envs/modulus/lib/python3.6/site-packages/tensorflow_core/python/ops/template.py", line 355, in _call_func
    result = self._func(*args, **kwargs)
  File "/home/s.1915438/.conda/envs/modulus/lib/python3.6/site-packages/modulus-21.6-py3.6.egg/modulus/arch.py", line 36, in <lambda>
    network_template = tf.make_template(name, lambda x: self._network_template(x, output_keys=Key.convert_list(outputs)))
  File "/home/s.1915438/.conda/envs/modulus/lib/python3.6/site-packages/modulus-21.6-py3.6.egg/modulus/architecture/fully_connected.py", line 73, in _network_template
    activation_par = activation_par)
  File "/home/s.1915438/.conda/envs/modulus/lib/python3.6/site-packages/modulus-21.6-py3.6.egg/modulus/tf_utils/layers.py", line 48, in fc_layer
    outputs = tf.add(tf.matmul(inputs, weights), biases, name=name)
  File "/home/s.1915438/.conda/envs/modulus/lib/python3.6/site-packages/tensorflow_core/python/util/dispatch.py", line 180, in wrapper
    return target(*args, **kwargs)
  File "/home/s.1915438/.conda/envs/modulus/lib/python3.6/site-packages/tensorflow_core/python/ops/math_ops.py", line 2754, in matmul
    a, b, transpose_a=transpose_a, transpose_b=transpose_b, name=name)
  File "/home/s.1915438/.conda/envs/modulus/lib/python3.6/site-packages/tensorflow_core/python/ops/gen_math_ops.py", line 6136, in mat_mul
    name=name)
  File "/home/s.1915438/.conda/envs/modulus/lib/python3.6/site-packages/tensorflow_core/python/framework/op_def_library.py", line 794, in _apply_op_helper
    op_def=op_def)
  File "/home/s.1915438/.conda/envs/modulus/lib/python3.6/site-packages/tensorflow_core/python/util/deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "/home/s.1915438/.conda/envs/modulus/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py", line 3357, in create_op
    attrs, op_def, compute_device)
  File "/home/s.1915438/.conda/envs/modulus/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py", line 3426, in _create_op_internal
    op_def=op_def)
  File "/home/s.1915438/.conda/envs/modulus/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py", line 1748, in __init__
    self._traceback = tf_stack.extract_stack()

[s.1915438@scs2042 ldc]$ 

Hello, we have release a new version of Modulus that uses PyTorch so this problem should be resolved now.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.