Tlt train error: Value 'sm_86' is not defined for option 'gpu-name'

Hello, I was following the documents provided using nvidia tlt to create a classification model to be distributed on tx2. I use rtx3080 and it is nvidia-docker2, ubuntu 18.04 development environment.

I think the tlt-train says it can’t find gpu, what should I do?


!tlt-train classification -e $SPECS_DIR/classification_spec.cfg -r $USER_EXPERIMENT_DIR/output -k $KEY

Using TensorFlow backend.
2020-11-25 09:35:28.105800: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-11-25 09:35:30.683060: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-11-25 09:35:30.699311: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-11-25 09:35:30.699669: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: GeForce RTX 3080 major: 8 minor: 6 memoryClockRate(GHz): 1.74
pciBusID: 0000:01:00.0
2020-11-25 09:35:30.699687: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-11-25 09:35:30.719943: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2020-11-25 09:35:30.728906: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
2020-11-25 09:35:30.731359: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
2020-11-25 09:35:30.753132: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
2020-11-25 09:35:30.765369: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
2020-11-25 09:35:30.806369: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-11-25 09:35:30.806558: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-11-25 09:35:30.807037: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-11-25 09:35:30.807380: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2020-11-25 09:35:30.807660: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-11-25 09:38:48.320544: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-11-25 09:38:48.320565: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165] 0
2020-11-25 09:38:48.320571: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0: N
2020-11-25 09:38:48.320701: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-11-25 09:38:48.321114: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-11-25 09:38:48.321458: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-11-25 09:38:48.321782: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 8588 MB memory) → physical GPU (device: 0, name: GeForce RTX 3080, pci bus id: 0000:01:00.0, compute capability: 8.6)
2020-11-25 09:38:48,325 [INFO] iva.makenet.scripts.train: Loading experiment spec at /workspace/examples/classification/specs/classification_spec.cfg.
Found 11667 images belonging to 20 classes.
2020-11-25 09:38:48,548 [INFO] iva.makenet.scripts.train: Processing dataset (train): /workspace/tlt-experiments/data/split/train
Found 1670 images belonging to 20 classes.
2020-11-25 09:38:48,662 [INFO] iva.makenet.scripts.train: Processing dataset (validation): /workspace/tlt-experiments/data/split/val


Layer (type) Output Shape Param # Connected to

input_1 (InputLayer) (None, 3, 224, 224) 0


conv1 (Conv2D) (None, 64, 112, 112) 9408 input_1[0][0]


bn_conv1 (BatchNormalization) (None, 64, 112, 112) 256 conv1[0][0]


activation_1 (Activation) (None, 64, 112, 112) 0 bn_conv1[0][0]


block_1a_conv_1 (Conv2D) (None, 64, 56, 56) 36864 activation_1[0][0]


block_1a_bn_1 (BatchNormalizati (None, 64, 56, 56) 256 block_1a_conv_1[0][0]


block_1a_relu_1 (Activation) (None, 64, 56, 56) 0 block_1a_bn_1[0][0]


block_1a_conv_2 (Conv2D) (None, 64, 56, 56) 36864 block_1a_relu_1[0][0]


block_1a_conv_shortcut (Conv2D) (None, 64, 56, 56) 4096 activation_1[0][0]


block_1a_bn_2 (BatchNormalizati (None, 64, 56, 56) 256 block_1a_conv_2[0][0]


block_1a_bn_shortcut (BatchNorm (None, 64, 56, 56) 256 block_1a_conv_shortcut[0][0]


add_1 (Add) (None, 64, 56, 56) 0 block_1a_bn_2[0][0]
block_1a_bn_shortcut[0][0]


block_1a_relu (Activation) (None, 64, 56, 56) 0 add_1[0][0]


block_1b_conv_1 (Conv2D) (None, 64, 56, 56) 36864 block_1a_relu[0][0]


block_1b_bn_1 (BatchNormalizati (None, 64, 56, 56) 256 block_1b_conv_1[0][0]


block_1b_relu_1 (Activation) (None, 64, 56, 56) 0 block_1b_bn_1[0][0]


block_1b_conv_2 (Conv2D) (None, 64, 56, 56) 36864 block_1b_relu_1[0][0]


block_1b_conv_shortcut (Conv2D) (None, 64, 56, 56) 4096 block_1a_relu[0][0]


block_1b_bn_2 (BatchNormalizati (None, 64, 56, 56) 256 block_1b_conv_2[0][0]


block_1b_bn_shortcut (BatchNorm (None, 64, 56, 56) 256 block_1b_conv_shortcut[0][0]


add_2 (Add) (None, 64, 56, 56) 0 block_1b_bn_2[0][0]
block_1b_bn_shortcut[0][0]


block_1b_relu (Activation) (None, 64, 56, 56) 0 add_2[0][0]


block_2a_conv_1 (Conv2D) (None, 128, 28, 28) 73728 block_1b_relu[0][0]


block_2a_bn_1 (BatchNormalizati (None, 128, 28, 28) 512 block_2a_conv_1[0][0]


block_2a_relu_1 (Activation) (None, 128, 28, 28) 0 block_2a_bn_1[0][0]


block_2a_conv_2 (Conv2D) (None, 128, 28, 28) 147456 block_2a_relu_1[0][0]


block_2a_conv_shortcut (Conv2D) (None, 128, 28, 28) 8192 block_1b_relu[0][0]


block_2a_bn_2 (BatchNormalizati (None, 128, 28, 28) 512 block_2a_conv_2[0][0]


block_2a_bn_shortcut (BatchNorm (None, 128, 28, 28) 512 block_2a_conv_shortcut[0][0]


add_3 (Add) (None, 128, 28, 28) 0 block_2a_bn_2[0][0]
block_2a_bn_shortcut[0][0]


block_2a_relu (Activation) (None, 128, 28, 28) 0 add_3[0][0]


block_2b_conv_1 (Conv2D) (None, 128, 28, 28) 147456 block_2a_relu[0][0]


block_2b_bn_1 (BatchNormalizati (None, 128, 28, 28) 512 block_2b_conv_1[0][0]


block_2b_relu_1 (Activation) (None, 128, 28, 28) 0 block_2b_bn_1[0][0]


block_2b_conv_2 (Conv2D) (None, 128, 28, 28) 147456 block_2b_relu_1[0][0]


block_2b_conv_shortcut (Conv2D) (None, 128, 28, 28) 16384 block_2a_relu[0][0]


block_2b_bn_2 (BatchNormalizati (None, 128, 28, 28) 512 block_2b_conv_2[0][0]


block_2b_bn_shortcut (BatchNorm (None, 128, 28, 28) 512 block_2b_conv_shortcut[0][0]


add_4 (Add) (None, 128, 28, 28) 0 block_2b_bn_2[0][0]
block_2b_bn_shortcut[0][0]


block_2b_relu (Activation) (None, 128, 28, 28) 0 add_4[0][0]


block_3a_conv_1 (Conv2D) (None, 256, 14, 14) 294912 block_2b_relu[0][0]


block_3a_bn_1 (BatchNormalizati (None, 256, 14, 14) 1024 block_3a_conv_1[0][0]


block_3a_relu_1 (Activation) (None, 256, 14, 14) 0 block_3a_bn_1[0][0]


block_3a_conv_2 (Conv2D) (None, 256, 14, 14) 589824 block_3a_relu_1[0][0]


block_3a_conv_shortcut (Conv2D) (None, 256, 14, 14) 32768 block_2b_relu[0][0]


block_3a_bn_2 (BatchNormalizati (None, 256, 14, 14) 1024 block_3a_conv_2[0][0]


block_3a_bn_shortcut (BatchNorm (None, 256, 14, 14) 1024 block_3a_conv_shortcut[0][0]


add_5 (Add) (None, 256, 14, 14) 0 block_3a_bn_2[0][0]
block_3a_bn_shortcut[0][0]


block_3a_relu (Activation) (None, 256, 14, 14) 0 add_5[0][0]


block_3b_conv_1 (Conv2D) (None, 256, 14, 14) 589824 block_3a_relu[0][0]


block_3b_bn_1 (BatchNormalizati (None, 256, 14, 14) 1024 block_3b_conv_1[0][0]


block_3b_relu_1 (Activation) (None, 256, 14, 14) 0 block_3b_bn_1[0][0]


block_3b_conv_2 (Conv2D) (None, 256, 14, 14) 589824 block_3b_relu_1[0][0]


block_3b_conv_shortcut (Conv2D) (None, 256, 14, 14) 65536 block_3a_relu[0][0]


block_3b_bn_2 (BatchNormalizati (None, 256, 14, 14) 1024 block_3b_conv_2[0][0]


block_3b_bn_shortcut (BatchNorm (None, 256, 14, 14) 1024 block_3b_conv_shortcut[0][0]


add_6 (Add) (None, 256, 14, 14) 0 block_3b_bn_2[0][0]
block_3b_bn_shortcut[0][0]


block_3b_relu (Activation) (None, 256, 14, 14) 0 add_6[0][0]


block_4a_conv_1 (Conv2D) (None, 512, 14, 14) 1179648 block_3b_relu[0][0]


block_4a_bn_1 (BatchNormalizati (None, 512, 14, 14) 2048 block_4a_conv_1[0][0]


block_4a_relu_1 (Activation) (None, 512, 14, 14) 0 block_4a_bn_1[0][0]


block_4a_conv_2 (Conv2D) (None, 512, 14, 14) 2359296 block_4a_relu_1[0][0]


block_4a_conv_shortcut (Conv2D) (None, 512, 14, 14) 131072 block_3b_relu[0][0]


block_4a_bn_2 (BatchNormalizati (None, 512, 14, 14) 2048 block_4a_conv_2[0][0]


block_4a_bn_shortcut (BatchNorm (None, 512, 14, 14) 2048 block_4a_conv_shortcut[0][0]


add_7 (Add) (None, 512, 14, 14) 0 block_4a_bn_2[0][0]
block_4a_bn_shortcut[0][0]


block_4a_relu (Activation) (None, 512, 14, 14) 0 add_7[0][0]


block_4b_conv_1 (Conv2D) (None, 512, 14, 14) 2359296 block_4a_relu[0][0]


block_4b_bn_1 (BatchNormalizati (None, 512, 14, 14) 2048 block_4b_conv_1[0][0]


block_4b_relu_1 (Activation) (None, 512, 14, 14) 0 block_4b_bn_1[0][0]


block_4b_conv_2 (Conv2D) (None, 512, 14, 14) 2359296 block_4b_relu_1[0][0]


block_4b_conv_shortcut (Conv2D) (None, 512, 14, 14) 262144 block_4a_relu[0][0]


block_4b_bn_2 (BatchNormalizati (None, 512, 14, 14) 2048 block_4b_conv_2[0][0]


block_4b_bn_shortcut (BatchNorm (None, 512, 14, 14) 2048 block_4b_conv_shortcut[0][0]


add_8 (Add) (None, 512, 14, 14) 0 block_4b_bn_2[0][0]
block_4b_bn_shortcut[0][0]


block_4b_relu (Activation) (None, 512, 14, 14) 0 add_8[0][0]


avg_pool (AveragePooling2D) (None, 512, 1, 1) 0 block_4b_relu[0][0]


flatten (Flatten) (None, 512) 0 avg_pool[0][0]


predictions (Dense) (None, 20) 10260 flatten[0][0]

Total params: 11,552,724
Trainable params: 11,376,020
Non-trainable params: 176,704


Epoch 1/80
2020-11-25 09:39:04.404465: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2020-11-25 09:40:08.268602: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-11-25 09:52:57.561648: W tensorflow/stream_executor/cuda/redzone_allocator.cc:312] Internal: ptxas exited with non-zero error code 65280, output: ptxas fatal : Value ‘sm_86’ is not defined for option ‘gpu-name’

Relying on driver to perform ptx compilation. This message will be only logged once.
2020-11-25 09:53:10.261065: E tensorflow/stream_executor/cuda/cuda_blas.cc:429] failed to run cuBLAS routine: CUBLAS_STATUS_EXECUTION_FAILED
Traceback (most recent call last):
File “/usr/local/bin/tlt-train-g1”, line 8, in
sys.exit(main())
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/common/magnet_train.py”, line 38, in main
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/makenet/scripts/train.py”, line 497, in main
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/makenet/scripts/train.py”, line 471, in run_experiment
File “/usr/local/lib/python3.6/dist-packages/keras/legacy/interfaces.py”, line 91, in wrapper
return func(*args, **kwargs)
File “/usr/local/lib/python3.6/dist-packages/keras/engine/training.py”, line 1418, in fit_generator
initial_epoch=initial_epoch)
File “/usr/local/lib/python3.6/dist-packages/keras/engine/training_generator.py”, line 217, in fit_generator
class_weight=class_weight)
File “/usr/local/lib/python3.6/dist-packages/keras/engine/training.py”, line 1217, in train_on_batch
outputs = self.train_function(ins)
File “/usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py”, line 2715, in call
return self._call(inputs)
File “/usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py”, line 2675, in _call
fetched = self._callable_fn(*array_vals)
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py”, line 1472, in call
run_metadata_ptr)
tensorflow.python.framework.errors_impl.InternalError: 2 root error(s) found.
(0) Internal: Blas GEMM launch failed : a.shape=(64, 512), b.shape=(512, 20), m=64, n=20, k=512
[[{{node predictions_1/MatMul}}]]
[[loss/add_25/_3247]]
(1) Internal: Blas GEMM launch failed : a.shape=(64, 512), b.shape=(512, 20), m=64, n=20, k=512
[[{{node predictions_1/MatMul}}]]
0 successful operations.
0 derived errors ignored

Current version of tlt docker does not support RTX3080.
More info: RTX 3090 Cuda 10.1 Kernal image error · Issue #43718 · tensorflow/tensorflow · GitHub