I tried to train yolov3 model using TLT and get the following error:
failed to run cuBLAS routine: CUBLAS_STATUS_EXECUTION_FAILED
System set up:
- 2 x GeForce RTX 3090
- Driver Version: 455.38
- CUDA Version: 11.1
- tlt-streamanalytics:v2.0_py3
- cuda:11.0-base
- Output of
nvidia-smi
(within container)
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 455.38 Driver Version: 455.38 CUDA Version: 11.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 GeForce RTX 3090 Off | 00000000:24:00.0 Off | N/A |
| 0% 32C P8 22W / 350W | 10MiB / 24268MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 GeForce RTX 3090 Off | 00000000:2D:00.0 On | N/A |
| 0% 31C P8 30W / 350W | 289MiB / 24265MiB | 1% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+
- Ouput of
nvcc --version
(within container)
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130
The command that cause error:
!tlt-train yolo -e $SPECS_DIR/yolo_train_resnet18_kitti.txt \
-r $USER_EXPERIMENT_DIR/experiment_dir_unpruned \
-k $KEY \
-m $USER_EXPERIMENT_DIR/pretrained_resnet18/tlt_pretrained_object_detection_vresnet10/resnet_10.hdf5 \
--gpus 2
- Error log:
2020-12-21 22:25:56.433979: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-12-21 22:25:56.433972: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-12-21 22:25:58.263005: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-12-21 22:25:58.263816: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-12-21 22:25:58.278119: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-12-21 22:25:58.279960: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: GeForce RTX 3090 major: 8 minor: 6 memoryClockRate(GHz): 1.695
pciBusID: 0000:24:00.0
2020-12-21 22:25:58.279982: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-12-21 22:25:58.280777: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-12-21 22:25:58.280969: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2020-12-21 22:25:58.281865: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: GeForce RTX 3090 major: 8 minor: 6 memoryClockRate(GHz): 1.695
pciBusID: 0000:2d:00.0
2020-12-21 22:25:58.281888: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-12-21 22:25:58.281897: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
2020-12-21 22:25:58.282155: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
2020-12-21 22:25:58.282889: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2020-12-21 22:25:58.283329: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
2020-12-21 22:25:58.283847: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
2020-12-21 22:25:58.284106: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
2020-12-21 22:25:58.284255: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
2020-12-21 22:25:58.285234: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
2020-12-21 22:25:58.286123: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
2020-12-21 22:25:58.286353: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-12-21 22:25:58.286483: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-12-21 22:25:58.287331: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-12-21 22:25:58.288087: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2020-12-21 22:25:58.288111: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-12-21 22:25:58.288794: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-12-21 22:25:58.288914: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-12-21 22:25:58.290581: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-12-21 22:25:58.291544: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 1
2020-12-21 22:25:58.291566: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-12-21 22:25:58.931030: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-12-21 22:25:58.931065: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165] 0
2020-12-21 22:25:58.931070: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0: N
2020-12-21 22:25:58.931327: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-12-21 22:25:58.932122: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-12-21 22:25:58.932886: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-12-21 22:25:58.933609: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 22128 MB memory) -> physical GPU (device: 0, name: GeForce RTX 3090, pci bus id: 0000:24:00.0, compute capability: 8.6)
Using TensorFlow backend.
2020-12-21 22:25:58,934 [INFO] iva.yolo.scripts.train: Loading experiment spec at /workspace/tlt-experiments/code/yolo/specs/yolo_train_resnet18_kitti.txt.
2020-12-21 22:25:58,935 [INFO] /usr/local/lib/python3.6/dist-packages/iva/yolo/utils/spec_loader.pyc: Merging specification from /workspace/tlt-experiments/code/yolo/specs/yolo_train_resnet18_kitti.txt
2020-12-21 22:25:58.936614: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-12-21 22:25:58.936645: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165] 1
2020-12-21 22:25:58.936651: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 1: N
2020-12-21 22:25:58.936860: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-12-21 22:25:58.937640: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-12-21 22:25:58.938393: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-12-21 22:25:58.939310: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 21845 MB memory) -> physical GPU (device: 1, name: GeForce RTX 3090, pci bus id: 0000:2d:00.0, compute capability: 8.6)
Using TensorFlow backend.
2020-12-21 22:25:58,940 [INFO] iva.yolo.scripts.train: Loading experiment spec at /workspace/tlt-experiments/code/yolo/specs/yolo_train_resnet18_kitti.txt.
2020-12-21 22:25:58,941 [INFO] /usr/local/lib/python3.6/dist-packages/iva/yolo/utils/spec_loader.pyc: Merging specification from /workspace/tlt-experiments/code/yolo/specs/yolo_train_resnet18_kitti.txt
2020-12-21 22:25:58,945 [INFO] iva.yolo.scripts.train: Loading pretrained weights. This may take a while...
2020-12-21 22:25:58,958 [INFO] iva.yolo.scripts.train: Loading pretrained weights. This may take a while...
2020-12-21 22:25:59,212 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: Serial augmentation enabled = False
2020-12-21 22:25:59,212 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: Pseudo sharding enabled = False
2020-12-21 22:25:59,212 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: Max Image Dimensions (all sources): (0, 0)
2020-12-21 22:25:59,212 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: number of cpus: 24, io threads: 48, compute threads: 24, buffered batches: 4
2020-12-21 22:25:59,212 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: total dataset size 3868, number of sources: 1, batch size per gpu: 5, steps: 774
2020-12-21 22:25:59,216 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: Serial augmentation enabled = False
2020-12-21 22:25:59,216 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: Pseudo sharding enabled = False
2020-12-21 22:25:59,217 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: Max Image Dimensions (all sources): (0, 0)
2020-12-21 22:25:59,217 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: number of cpus: 24, io threads: 48, compute threads: 24, buffered batches: 4
2020-12-21 22:25:59,217 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: total dataset size 3868, number of sources: 1, batch size per gpu: 5, steps: 774
2020-12-21 22:25:59,288 [INFO] iva.detectnet_v2.dataloader.default_dataloader: Bounding box coordinates were detected in the input specification! Bboxes will be automatically converted to polygon coordinates.
2020-12-21 22:25:59,292 [INFO] iva.detectnet_v2.dataloader.default_dataloader: Bounding box coordinates were detected in the input specification! Bboxes will be automatically converted to polygon coordinates.
2020-12-21 22:25:59.311274: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-12-21 22:25:59.312086: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: GeForce RTX 3090 major: 8 minor: 6 memoryClockRate(GHz): 1.695
pciBusID: 0000:24:00.0
2020-12-21 22:25:59.312155: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-12-21 22:25:59.312865: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 1 with properties:
name: GeForce RTX 3090 major: 8 minor: 6 memoryClockRate(GHz): 1.695
pciBusID: 0000:2d:00.0
2020-12-21 22:25:59.312884: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-12-21 22:25:59.312922: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2020-12-21 22:25:59.312935: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
2020-12-21 22:25:59.312946: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
2020-12-21 22:25:59.312957: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
2020-12-21 22:25:59.312967: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
2020-12-21 22:25:59.312978: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-12-21 22:25:59.313036: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-12-21 22:25:59.313831: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-12-21 22:25:59.314545: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-12-21 22:25:59.314611: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-12-21 22:25:59.316322: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: GeForce RTX 3090 major: 8 minor: 6 memoryClockRate(GHz): 1.695
pciBusID: 0000:24:00.0
2020-12-21 22:25:59.316392: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-12-21 22:25:59.316397: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-12-21 22:25:59.317877: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0, 1
2020-12-21 22:25:59.317900: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 1 with properties:
name: GeForce RTX 3090 major: 8 minor: 6 memoryClockRate(GHz): 1.695
pciBusID: 0000:2d:00.0
2020-12-21 22:25:59.317919: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-12-21 22:25:59.317948: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2020-12-21 22:25:59.317962: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
2020-12-21 22:25:59.317972: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
2020-12-21 22:25:59.317983: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
2020-12-21 22:25:59.317993: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
2020-12-21 22:25:59.318004: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-12-21 22:25:59.318062: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-12-21 22:25:59.318829: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-12-21 22:25:59.319586: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-12-21 22:25:59.320344: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-12-21 22:25:59.321052: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0, 1
2020-12-21 22:25:59,468 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: shuffle: True - shard 0 of 1
2020-12-21 22:25:59,470 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: shuffle: True - shard 0 of 1
2020-12-21 22:25:59,473 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: sampling 1 datasets with weights:
2020-12-21 22:25:59,473 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: source: 0 weight: 1.000000
2020-12-21 22:25:59,474 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: sampling 1 datasets with weights:
2020-12-21 22:25:59,474 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: source: 0 weight: 1.000000
Weights for those layers can not be loaded: ['expand_conv1', 'expand_conv1_bn', 'expand_conv1_lrelu']
STOP trainig now and check the pre-train model if this is not expected!
2020-12-21 22:26:16,348 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: Serial augmentation enabled = False
2020-12-21 22:26:16,348 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: Pseudo sharding enabled = False
2020-12-21 22:26:16,348 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: Max Image Dimensions (all sources): (0, 0)
2020-12-21 22:26:16,349 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: number of cpus: 24, io threads: 48, compute threads: 24, buffered batches: 4
2020-12-21 22:26:16,349 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: total dataset size 682, number of sources: 1, batch size per gpu: 16, steps: 43
2020-12-21 22:26:16,371 [INFO] iva.detectnet_v2.dataloader.default_dataloader: Bounding box coordinates were detected in the input specification! Bboxes will be automatically converted to polygon coordinates.
Weights for those layers can not be loaded: ['expand_conv1', 'expand_conv1_bn', 'expand_conv1_lrelu']
STOP trainig now and check the pre-train model if this is not expected!
2020-12-21 22:26:16,535 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: shuffle: False - shard 0 of 1
2020-12-21 22:26:16,538 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: sampling 1 datasets with weights:
2020-12-21 22:26:16,538 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: source: 0 weight: 1.000000
2020-12-21 22:26:16,681 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: Serial augmentation enabled = False
2020-12-21 22:26:16,681 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: Pseudo sharding enabled = False
2020-12-21 22:26:16,681 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: Max Image Dimensions (all sources): (0, 0)
2020-12-21 22:26:16,681 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: number of cpus: 24, io threads: 48, compute threads: 24, buffered batches: 4
2020-12-21 22:26:16,681 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: total dataset size 682, number of sources: 1, batch size per gpu: 16, steps: 43
2020-12-21 22:26:16,703 [INFO] iva.detectnet_v2.dataloader.default_dataloader: Bounding box coordinates were detected in the input specification! Bboxes will be automatically converted to polygon coordinates.
2020-12-21 22:26:16,869 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: shuffle: False - shard 0 of 1
2020-12-21 22:26:16,873 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: sampling 1 datasets with weights:
2020-12-21 22:26:16,873 [INFO] modulus.blocks.data_loaders.multi_source_loader.data_loader: source: 0 weight: 1.000000
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
Input (InputLayer) (5, 3, 1152, 1440) 0
__________________________________________________________________________________________________
conv1 (Conv2D) (5, 64, 576, 720) 9408 Input[0][0]
__________________________________________________________________________________________________
bn_conv1 (BatchNormalization) (5, 64, 576, 720) 256 conv1[0][0]
__________________________________________________________________________________________________
activation_1 (Activation) (5, 64, 576, 720) 0 bn_conv1[0][0]
__________________________________________________________________________________________________
block_1a_conv_1 (Conv2D) (5, 64, 288, 360) 36864 activation_1[0][0]
__________________________________________________________________________________________________
block_1a_bn_1 (BatchNormalizati (5, 64, 288, 360) 256 block_1a_conv_1[0][0]
__________________________________________________________________________________________________
block_1a_relu_1 (Activation) (5, 64, 288, 360) 0 block_1a_bn_1[0][0]
__________________________________________________________________________________________________
block_1a_conv_2 (Conv2D) (5, 64, 288, 360) 36864 block_1a_relu_1[0][0]
__________________________________________________________________________________________________
block_1a_conv_shortcut (Conv2D) (5, 64, 288, 360) 4096 activation_1[0][0]
__________________________________________________________________________________________________
block_1a_bn_2 (BatchNormalizati (5, 64, 288, 360) 256 block_1a_conv_2[0][0]
__________________________________________________________________________________________________
block_1a_bn_shortcut (BatchNorm (5, 64, 288, 360) 256 block_1a_conv_shortcut[0][0]
__________________________________________________________________________________________________
add_1 (Add) (5, 64, 288, 360) 0 block_1a_bn_2[0][0]
block_1a_bn_shortcut[0][0]
__________________________________________________________________________________________________
block_1a_relu (Activation) (5, 64, 288, 360) 0 add_1[0][0]
__________________________________________________________________________________________________
block_2a_conv_1 (Conv2D) (5, 128, 144, 180) 73728 block_1a_relu[0][0]
__________________________________________________________________________________________________
block_2a_bn_1 (BatchNormalizati (5, 128, 144, 180) 512 block_2a_conv_1[0][0]
__________________________________________________________________________________________________
block_2a_relu_1 (Activation) (5, 128, 144, 180) 0 block_2a_bn_1[0][0]
__________________________________________________________________________________________________
block_2a_conv_2 (Conv2D) (5, 128, 144, 180) 147456 block_2a_relu_1[0][0]
__________________________________________________________________________________________________
block_2a_conv_shortcut (Conv2D) (5, 128, 144, 180) 8192 block_1a_relu[0][0]
__________________________________________________________________________________________________
block_2a_bn_2 (BatchNormalizati (5, 128, 144, 180) 512 block_2a_conv_2[0][0]
__________________________________________________________________________________________________
block_2a_bn_shortcut (BatchNorm (5, 128, 144, 180) 512 block_2a_conv_shortcut[0][0]
__________________________________________________________________________________________________
add_2 (Add) (5, 128, 144, 180) 0 block_2a_bn_2[0][0]
block_2a_bn_shortcut[0][0]
__________________________________________________________________________________________________
block_2a_relu (Activation) (5, 128, 144, 180) 0 add_2[0][0]
__________________________________________________________________________________________________
block_3a_conv_1 (Conv2D) (5, 256, 72, 90) 294912 block_2a_relu[0][0]
__________________________________________________________________________________________________
block_3a_bn_1 (BatchNormalizati (5, 256, 72, 90) 1024 block_3a_conv_1[0][0]
__________________________________________________________________________________________________
block_3a_relu_1 (Activation) (5, 256, 72, 90) 0 block_3a_bn_1[0][0]
__________________________________________________________________________________________________
block_3a_conv_2 (Conv2D) (5, 256, 72, 90) 589824 block_3a_relu_1[0][0]
__________________________________________________________________________________________________
block_3a_conv_shortcut (Conv2D) (5, 256, 72, 90) 32768 block_2a_relu[0][0]
__________________________________________________________________________________________________
block_3a_bn_2 (BatchNormalizati (5, 256, 72, 90) 1024 block_3a_conv_2[0][0]
__________________________________________________________________________________________________
block_3a_bn_shortcut (BatchNorm (5, 256, 72, 90) 1024 block_3a_conv_shortcut[0][0]
__________________________________________________________________________________________________
add_3 (Add) (5, 256, 72, 90) 0 block_3a_bn_2[0][0]
block_3a_bn_shortcut[0][0]
__________________________________________________________________________________________________
block_3a_relu (Activation) (5, 256, 72, 90) 0 add_3[0][0]
__________________________________________________________________________________________________
block_4a_conv_1 (Conv2D) (5, 512, 72, 90) 1179648 block_3a_relu[0][0]
__________________________________________________________________________________________________
block_4a_bn_1 (BatchNormalizati (5, 512, 72, 90) 2048 block_4a_conv_1[0][0]
__________________________________________________________________________________________________
block_4a_relu_1 (Activation) (5, 512, 72, 90) 0 block_4a_bn_1[0][0]
__________________________________________________________________________________________________
block_4a_conv_2 (Conv2D) (5, 512, 72, 90) 2359296 block_4a_relu_1[0][0]
__________________________________________________________________________________________________
block_4a_conv_shortcut (Conv2D) (5, 512, 72, 90) 131072 block_3a_relu[0][0]
__________________________________________________________________________________________________
block_4a_bn_2 (BatchNormalizati (5, 512, 72, 90) 2048 block_4a_conv_2[0][0]
__________________________________________________________________________________________________
block_4a_bn_shortcut (BatchNorm (5, 512, 72, 90) 2048 block_4a_conv_shortcut[0][0]
__________________________________________________________________________________________________
add_4 (Add) (5, 512, 72, 90) 0 block_4a_bn_2[0][0]
block_4a_bn_shortcut[0][0]
__________________________________________________________________________________________________
block_4a_relu (Activation) (5, 512, 72, 90) 0 add_4[0][0]
__________________________________________________________________________________________________
expand_conv1 (Conv2D) (5, 512, 36, 45) 2359296 block_4a_relu[0][0]
__________________________________________________________________________________________________
expand_conv1_bn (BatchNormaliza (5, 512, 36, 45) 2048 expand_conv1[0][0]
__________________________________________________________________________________________________
expand_conv1_lrelu (LeakyReLU) (5, 512, 36, 45) 0 expand_conv1_bn[0][0]
__________________________________________________________________________________________________
yolo_conv1_1 (Conv2D) (5, 256, 36, 45) 131072 expand_conv1_lrelu[0][0]
__________________________________________________________________________________________________
yolo_conv1_1_bn (BatchNormaliza (5, 256, 36, 45) 1024 yolo_conv1_1[0][0]
__________________________________________________________________________________________________
yolo_conv1_1_lrelu (LeakyReLU) (5, 256, 36, 45) 0 yolo_conv1_1_bn[0][0]
__________________________________________________________________________________________________
yolo_conv1_2 (Conv2D) (5, 512, 36, 45) 1179648 yolo_conv1_1_lrelu[0][0]
__________________________________________________________________________________________________
yolo_conv1_2_bn (BatchNormaliza (5, 512, 36, 45) 2048 yolo_conv1_2[0][0]
__________________________________________________________________________________________________
yolo_conv1_2_lrelu (LeakyReLU) (5, 512, 36, 45) 0 yolo_conv1_2_bn[0][0]
__________________________________________________________________________________________________
yolo_conv1_3 (Conv2D) (5, 256, 36, 45) 131072 yolo_conv1_2_lrelu[0][0]
__________________________________________________________________________________________________
yolo_conv1_3_bn (BatchNormaliza (5, 256, 36, 45) 1024 yolo_conv1_3[0][0]
__________________________________________________________________________________________________
yolo_conv1_3_lrelu (LeakyReLU) (5, 256, 36, 45) 0 yolo_conv1_3_bn[0][0]
__________________________________________________________________________________________________
yolo_conv1_4 (Conv2D) (5, 512, 36, 45) 1179648 yolo_conv1_3_lrelu[0][0]
__________________________________________________________________________________________________
yolo_conv1_4_bn (BatchNormaliza (5, 512, 36, 45) 2048 yolo_conv1_4[0][0]
__________________________________________________________________________________________________
yolo_conv1_4_lrelu (LeakyReLU) (5, 512, 36, 45) 0 yolo_conv1_4_bn[0][0]
__________________________________________________________________________________________________
yolo_conv1_5 (Conv2D) (5, 256, 36, 45) 131072 yolo_conv1_4_lrelu[0][0]
__________________________________________________________________________________________________
yolo_conv1_5_bn (BatchNormaliza (5, 256, 36, 45) 1024 yolo_conv1_5[0][0]
__________________________________________________________________________________________________
yolo_conv1_5_lrelu (LeakyReLU) (5, 256, 36, 45) 0 yolo_conv1_5_bn[0][0]
__________________________________________________________________________________________________
yolo_conv2 (Conv2D) (5, 128, 36, 45) 32768 yolo_conv1_5_lrelu[0][0]
__________________________________________________________________________________________________
yolo_conv2_bn (BatchNormalizati (5, 128, 36, 45) 512 yolo_conv2[0][0]
__________________________________________________________________________________________________
yolo_conv2_lrelu (LeakyReLU) (5, 128, 36, 45) 0 yolo_conv2_bn[0][0]
__________________________________________________________________________________________________
upsample0 (UpSampling2D) (5, 128, 72, 90) 0 yolo_conv2_lrelu[0][0]
__________________________________________________________________________________________________
concatenate_1 (Concatenate) (5, 384, 72, 90) 0 upsample0[0][0]
block_3a_relu[0][0]
__________________________________________________________________________________________________
yolo_conv3_1 (Conv2D) (5, 128, 72, 90) 49152 concatenate_1[0][0]
__________________________________________________________________________________________________
yolo_conv3_1_bn (BatchNormaliza (5, 128, 72, 90) 512 yolo_conv3_1[0][0]
__________________________________________________________________________________________________
yolo_conv3_1_lrelu (LeakyReLU) (5, 128, 72, 90) 0 yolo_conv3_1_bn[0][0]
__________________________________________________________________________________________________
yolo_conv3_2 (Conv2D) (5, 256, 72, 90) 294912 yolo_conv3_1_lrelu[0][0]
__________________________________________________________________________________________________
yolo_conv3_2_bn (BatchNormaliza (5, 256, 72, 90) 1024 yolo_conv3_2[0][0]
__________________________________________________________________________________________________
yolo_conv3_2_lrelu (LeakyReLU) (5, 256, 72, 90) 0 yolo_conv3_2_bn[0][0]
__________________________________________________________________________________________________
yolo_conv3_3 (Conv2D) (5, 128, 72, 90) 32768 yolo_conv3_2_lrelu[0][0]
__________________________________________________________________________________________________
yolo_conv3_3_bn (BatchNormaliza (5, 128, 72, 90) 512 yolo_conv3_3[0][0]
__________________________________________________________________________________________________
yolo_conv3_3_lrelu (LeakyReLU) (5, 128, 72, 90) 0 yolo_conv3_3_bn[0][0]
__________________________________________________________________________________________________
yolo_conv3_4 (Conv2D) (5, 256, 72, 90) 294912 yolo_conv3_3_lrelu[0][0]
__________________________________________________________________________________________________
yolo_conv3_4_bn (BatchNormaliza (5, 256, 72, 90) 1024 yolo_conv3_4[0][0]
__________________________________________________________________________________________________
yolo_conv3_4_lrelu (LeakyReLU) (5, 256, 72, 90) 0 yolo_conv3_4_bn[0][0]
__________________________________________________________________________________________________
yolo_conv3_5 (Conv2D) (5, 128, 72, 90) 32768 yolo_conv3_4_lrelu[0][0]
__________________________________________________________________________________________________
yolo_conv3_5_bn (BatchNormaliza (5, 128, 72, 90) 512 yolo_conv3_5[0][0]
__________________________________________________________________________________________________
yolo_conv3_5_lrelu (LeakyReLU) (5, 128, 72, 90) 0 yolo_conv3_5_bn[0][0]
__________________________________________________________________________________________________
yolo_conv4 (Conv2D) (5, 64, 72, 90) 8192 yolo_conv3_5_lrelu[0][0]
__________________________________________________________________________________________________
yolo_conv4_bn (BatchNormalizati (5, 64, 72, 90) 256 yolo_conv4[0][0]
__________________________________________________________________________________________________
yolo_conv4_lrelu (LeakyReLU) (5, 64, 72, 90) 0 yolo_conv4_bn[0][0]
__________________________________________________________________________________________________
upsample1 (UpSampling2D) (5, 64, 144, 180) 0 yolo_conv4_lrelu[0][0]
__________________________________________________________________________________________________
concatenate_2 (Concatenate) (5, 192, 144, 180) 0 upsample1[0][0]
block_2a_relu[0][0]
__________________________________________________________________________________________________
yolo_conv5_1 (Conv2D) (5, 64, 144, 180) 12288 concatenate_2[0][0]
__________________________________________________________________________________________________
yolo_conv5_1_bn (BatchNormaliza (5, 64, 144, 180) 256 yolo_conv5_1[0][0]
__________________________________________________________________________________________________
yolo_conv5_1_lrelu (LeakyReLU) (5, 64, 144, 180) 0 yolo_conv5_1_bn[0][0]
__________________________________________________________________________________________________
yolo_conv5_2 (Conv2D) (5, 128, 144, 180) 73728 yolo_conv5_1_lrelu[0][0]
__________________________________________________________________________________________________
yolo_conv5_2_bn (BatchNormaliza (5, 128, 144, 180) 512 yolo_conv5_2[0][0]
__________________________________________________________________________________________________
yolo_conv5_2_lrelu (LeakyReLU) (5, 128, 144, 180) 0 yolo_conv5_2_bn[0][0]
__________________________________________________________________________________________________
yolo_conv5_3 (Conv2D) (5, 64, 144, 180) 8192 yolo_conv5_2_lrelu[0][0]
__________________________________________________________________________________________________
yolo_conv5_3_bn (BatchNormaliza (5, 64, 144, 180) 256 yolo_conv5_3[0][0]
__________________________________________________________________________________________________
yolo_conv5_3_lrelu (LeakyReLU) (5, 64, 144, 180) 0 yolo_conv5_3_bn[0][0]
__________________________________________________________________________________________________
yolo_conv5_4 (Conv2D) (5, 128, 144, 180) 73728 yolo_conv5_3_lrelu[0][0]
__________________________________________________________________________________________________
yolo_conv5_4_bn (BatchNormaliza (5, 128, 144, 180) 512 yolo_conv5_4[0][0]
__________________________________________________________________________________________________
yolo_conv5_4_lrelu (LeakyReLU) (5, 128, 144, 180) 0 yolo_conv5_4_bn[0][0]
__________________________________________________________________________________________________
yolo_conv5_5 (Conv2D) (5, 64, 144, 180) 8192 yolo_conv5_4_lrelu[0][0]
__________________________________________________________________________________________________
yolo_conv5_5_bn (BatchNormaliza (5, 64, 144, 180) 256 yolo_conv5_5[0][0]
__________________________________________________________________________________________________
yolo_conv5_5_lrelu (LeakyReLU) (5, 64, 144, 180) 0 yolo_conv5_5_bn[0][0]
__________________________________________________________________________________________________
yolo_conv1_6 (Conv2D) (5, 512, 36, 45) 1179648 yolo_conv1_5_lrelu[0][0]
__________________________________________________________________________________________________
yolo_conv3_6 (Conv2D) (5, 256, 72, 90) 294912 yolo_conv3_5_lrelu[0][0]
__________________________________________________________________________________________________
yolo_conv5_6 (Conv2D) (5, 128, 144, 180) 73728 yolo_conv5_5_lrelu[0][0]
__________________________________________________________________________________________________
yolo_conv1_6_bn (BatchNormaliza (5, 512, 36, 45) 2048 yolo_conv1_6[0][0]
__________________________________________________________________________________________________
yolo_conv3_6_bn (BatchNormaliza (5, 256, 72, 90) 1024 yolo_conv3_6[0][0]
__________________________________________________________________________________________________
yolo_conv5_6_bn (BatchNormaliza (5, 128, 144, 180) 512 yolo_conv5_6[0][0]
__________________________________________________________________________________________________
yolo_conv1_6_lrelu (LeakyReLU) (5, 512, 36, 45) 0 yolo_conv1_6_bn[0][0]
__________________________________________________________________________________________________
yolo_conv3_6_lrelu (LeakyReLU) (5, 256, 72, 90) 0 yolo_conv3_6_bn[0][0]
__________________________________________________________________________________________________
yolo_conv5_6_lrelu (LeakyReLU) (5, 128, 144, 180) 0 yolo_conv5_6_bn[0][0]
__________________________________________________________________________________________________
conv_big_object (Conv2D) (5, 21, 36, 45) 10773 yolo_conv1_6_lrelu[0][0]
__________________________________________________________________________________________________
conv_mid_object (Conv2D) (5, 21, 72, 90) 5397 yolo_conv3_6_lrelu[0][0]
__________________________________________________________________________________________________
conv_sm_object (Conv2D) (5, 21, 144, 180) 2709 yolo_conv5_6_lrelu[0][0]
__________________________________________________________________________________________________
bg_permute (Permute) (5, 36, 45, 21) 0 conv_big_object[0][0]
__________________________________________________________________________________________________
md_permute (Permute) (5, 72, 90, 21) 0 conv_mid_object[0][0]
__________________________________________________________________________________________________
sm_permute (Permute) (5, 144, 180, 21) 0 conv_sm_object[0][0]
__________________________________________________________________________________________________
bg_anchor (YOLOAnchorBox) (5, 1, 4860, 6) 0 conv_big_object[0][0]
__________________________________________________________________________________________________
bg_reshape (Reshape) (5, 1, 4860, 7) 0 bg_permute[0][0]
__________________________________________________________________________________________________
md_anchor (YOLOAnchorBox) (5, 1, 19440, 6) 0 conv_mid_object[0][0]
__________________________________________________________________________________________________
md_reshape (Reshape) (5, 1, 19440, 7) 0 md_permute[0][0]
__________________________________________________________________________________________________
sm_anchor (YOLOAnchorBox) (5, 1, 77760, 6) 0 conv_sm_object[0][0]
__________________________________________________________________________________________________
sm_reshape (Reshape) (5, 1, 77760, 7) 0 sm_permute[0][0]
__________________________________________________________________________________________________
encoded_bg (Concatenate) (5, 1, 4860, 13) 0 bg_anchor[0][0]
bg_reshape[0][0]
__________________________________________________________________________________________________
encoded_md (Concatenate) (5, 1, 19440, 13) 0 md_anchor[0][0]
md_reshape[0][0]
__________________________________________________________________________________________________
encoded_sm (Concatenate) (5, 1, 77760, 13) 0 sm_anchor[0][0]
sm_reshape[0][0]
__________________________________________________________________________________________________
encoded_detections (Concatenate (5, 1, 102060, 13) 0 encoded_bg[0][0]
encoded_md[0][0]
encoded_sm[0][0]
==================================================================================================
Total params: 12,535,423
Trainable params: 12,510,655
Non-trainable params: 24,768
__________________________________________________________________________________________________
2020-12-21 22:26:19,539 [INFO] iva.yolo.scripts.train: Number of images in the training dataset: 3868
Epoch 1/120
2020-12-21 22:26:26.687360: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2020-12-21 22:26:27.256256: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2020-12-21 22:27:08.895203: E tensorflow/stream_executor/cuda/cuda_blas.cc:429] failed to run cuBLAS routine: CUBLAS_STATUS_EXECUTION_FAILED
2020-12-21 22:27:08.895235: E tensorflow/stream_executor/cuda/cuda_blas.cc:2437] Internal: failed BLAS call, see log for details
Traceback (most recent call last):
File "/usr/local/bin/tlt-train-g1", line 8, in <module>
sys.exit(main())
File "/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/common/magnet_train.py", line 51, in main
File "/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/yolo/scripts/train.py", line 239, in main
File "/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/yolo/scripts/train.py", line 183, in run_experiment
File "/usr/local/lib/python3.6/dist-packages/keras/engine/training.py", line 1039, in fit
validation_steps=validation_steps)
File "/usr/local/lib/python3.6/dist-packages/keras/engine/training_arrays.py", line 154, in fit_loop
outs = f(ins)
File "/usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py", line 2715, in __call__
return self._call(inputs)
File "/usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py", line 2675, in _call
fetched = self._callable_fn(*array_vals)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1472, in __call__
run_metadata_ptr)
tensorflow.python.framework.errors_impl.InternalError: 2 root error(s) found.
(0) Internal: Blas xGEMMBatched launch failed : a.shape=[5,3,3], b.shape=[5,3,3], m=3, n=3, k=3, batch_size=5
[[{{node CompositeTransform_6/CompositeTransform_5/CompositeTransform_4/CompositeTransform_3/CompositeTransform_2/CompositeTransform_1/CompositeTransform/RandomFlip/MatMul}}]]
[[cond_6/cond/SliceReplace/ListDiff/Switch/_3151]]
(1) Internal: Blas xGEMMBatched launch failed : a.shape=[5,3,3], b.shape=[5,3,3], m=3, n=3, k=3, batch_size=5
[[{{node CompositeTransform_6/CompositeTransform_5/CompositeTransform_4/CompositeTransform_3/CompositeTransform_2/CompositeTransform_1/CompositeTransform/RandomFlip/MatMul}}]]
0 successful operations.
0 derived errors ignored.
2020-12-21 22:27:09.439647: E tensorflow/stream_executor/cuda/cuda_blas.cc:429] failed to run cuBLAS routine: CUBLAS_STATUS_EXECUTION_FAILED
2020-12-21 22:27:09.439680: E tensorflow/stream_executor/cuda/cuda_blas.cc:2437] Internal: failed BLAS call, see log for details
Traceback (most recent call last):
File "/usr/local/bin/tlt-train-g1", line 8, in <module>
sys.exit(main())
File "/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/common/magnet_train.py", line 51, in main
File "/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/yolo/scripts/train.py", line 239, in main
File "/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/yolo/scripts/train.py", line 183, in run_experiment
File "/usr/local/lib/python3.6/dist-packages/keras/engine/training.py", line 1039, in fit
validation_steps=validation_steps)
File "/usr/local/lib/python3.6/dist-packages/keras/engine/training_arrays.py", line 154, in fit_loop
outs = f(ins)
File "/usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py", line 2715, in __call__
return self._call(inputs)
File "/usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py", line 2675, in _call
fetched = self._callable_fn(*array_vals)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1472, in __call__
run_metadata_ptr)
tensorflow.python.framework.errors_impl.InternalError: 2 root error(s) found.
(0) Internal: Blas xGEMMBatched launch failed : a.shape=[5,3,3], b.shape=[5,3,3], m=3, n=3, k=3, batch_size=5
[[{{node CompositeTransform_6/CompositeTransform_5/CompositeTransform_4/CompositeTransform_3/CompositeTransform_2/CompositeTransform_1/CompositeTransform/RandomFlip/MatMul}}]]
[[cond_6/cond/SliceReplace/ListDiff/Switch/_3151]]
(1) Internal: Blas xGEMMBatched launch failed : a.shape=[5,3,3], b.shape=[5,3,3], m=3, n=3, k=3, batch_size=5
[[{{node CompositeTransform_6/CompositeTransform_5/CompositeTransform_4/CompositeTransform_3/CompositeTransform_2/CompositeTransform_1/CompositeTransform/RandomFlip/MatMul}}]]
0 successful operations.
0 derived errors ignored.
-------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
-------------------------------------------------------
--------------------------------------------------------------------------
mpirun.real detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:
Process name: [[44497,1],0]
Exit code: 1
--------------------------------------------------------------------------
What I have tried:
Run the first code block content as follow:
import tensorflow as tf
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
session = tf.Session(config=config)