Internal error: CASK: all shaders must have unique names

d.mus · November 28, 2018, 2:31pm

Hi, I am running my application in this container on a Google cloud VM using 4 NVIDIA P100 GPUs: https://docs.nvidia.com/deeplearning/dgx/tensorflow-release-notes/rel_18.11.html

Now I am getting the following error when loading in the optimized graph in Tensorflow (optimized with TensorRT):

python: cask/shaderlist_impl.h:50: void cask::ShaderList<ShaderType, OperationType>::sortHandles() const [with ShaderType = cask::ConvolutionShader; OperationType = cask::Convolution]: Assertion `((*i)->handle != (*prevI)->handle) && “Internal error: CASK: all shaders must have unique names”’ failed.

The graph is loaded in with:
tf.import_graph_def(classifier_graph_def, input_map=input_map, return_elements=return_elements)

How can I solve this?

NVES · November 28, 2018, 3:57pm

Hello,

This is usually due to dependency issues among different libraries. Was your graph optimized in a different version of TF or TRT as in 18.11 container?

d.mus · November 29, 2018, 9:59am

Also the graph optimization is done in the same container.

The log output when doing the optimization:

INFO:tensorflow:Running against TensorRT version 5.0.2
I1129 09:40:24.167615 28 tf_logging.py:115] Running against TensorRT version 5.0.2
2018-11-29 09:40:24.651538: I tensorflow/core/grappler/devices.cc:51] Number of eligible GPUs (core count >= 8): 4
2018-11-29 09:40:24.654939: I tensorflow/core/grappler/clusters/single_machine.cc:359] Starting new session
2018-11-29 09:40:24.655574: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0, 1, 2, 3
2018-11-29 09:40:24.655797: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-11-29 09:40:24.655842: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988] 0 1 2 3
2018-11-29 09:40:24.655851: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0: N Y N N
2018-11-29 09:40:24.655868: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 1: Y N N N
2018-11-29 09:40:24.655884: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 2: N N N Y
2018-11-29 09:40:24.655890: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 3: N N Y N
2018-11-29 09:40:24.656901: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 15030 MB memory) → physical GPU (device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:04.0, compute capability: 6.0)
2018-11-29 09:40:24.657268: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 15108 MB memory) → physical GPU (device: 1, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:05.0, compute capability: 6.0)
2018-11-29 09:40:24.657619: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:2 with 15108 MB memory) → physical GPU (device: 2, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:06.0, compute capability: 6.0)
2018-11-29 09:40:24.657827: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:3 with 15108 MB memory) → physical GPU (device: 3, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:07.0, compute capability: 6.0)
2018-11-29 09:40:25.191757: I tensorflow/contrib/tensorrt/convert/convert_graph.cc:853] MULTIPLE tensorrt candidate conversion: 13
2018-11-29 09:40:25.192324: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:3056] Segment @scope ‘’, converted to graph
2018-11-29 09:40:25.192363: E tensorflow/contrib/tensorrt/convert/convert_graph.cc:418] Can’t find a device placement for the op!
2018-11-29 09:40:25.198213: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:3056] Segment @scope ‘’, converted to graph
2018-11-29 09:40:25.198264: E tensorflow/contrib/tensorrt/convert/convert_graph.cc:418] Can’t find a device placement for the op!
2018-11-29 09:40:25.199297: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:3056] Segment @scope ‘’, converted to graph
2018-11-29 09:40:25.199332: E tensorflow/contrib/tensorrt/convert/convert_graph.cc:418] Can’t find a device placement for the op!
2018-11-29 09:40:25.199710: W tensorflow/contrib/tensorrt/convert/convert_graph.cc:891] Failed to register segment graphdef as a function 2: Invalid argument: Input 0 of node TensorRTOutputPH_0 was passed float from TopKV2:0 incompatible with expected int32.
2018-11-29 09:40:25.200140: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:3056] Segment @scope ‘’, converted to graph
2018-11-29 09:40:25.200172: E tensorflow/contrib/tensorrt/convert/convert_graph.cc:418] Can’t find a device placement for the op!
2018-11-29 09:40:25.200506: W tensorflow/contrib/tensorrt/convert/convert_graph.cc:891] Failed to register segment graphdef as a function 3: Invalid argument: Input 0 of node TensorRTOutputPH_0 was passed float from TopKV2_1:0 incompatible with expected int32.
2018-11-29 09:40:25.200745: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:3056] Segment @scope ‘model/output_4_108_56dim/’, converted to graph
2018-11-29 09:40:25.200777: E tensorflow/contrib/tensorrt/convert/convert_graph.cc:418] Can’t find a device placement for the op!
2018-11-29 09:40:25.204282: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:3056] Segment @scope ‘model/wide-resnet/’, converted to graph
2018-11-29 09:40:25.204326: E tensorflow/contrib/tensorrt/convert/convert_graph.cc:418] Can’t find a device placement for the op!
2018-11-29 09:40:25.214711: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:3056] Segment @scope ‘model_1/output_4_108_56dim/’, converted to graph
2018-11-29 09:40:25.214763: E tensorflow/contrib/tensorrt/convert/convert_graph.cc:418] Can’t find a device placement for the op!
2018-11-29 09:40:25.219756: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:3056] Segment @scope ‘model_1/wide-resnet/’, converted to graph
2018-11-29 09:40:25.219795: E tensorflow/contrib/tensorrt/convert/convert_graph.cc:418] Can’t find a device placement for the op!
2018-11-29 09:40:25.233015: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:3056] Segment @scope ‘model_2/’, converted to graph
2018-11-29 09:40:25.233052: E tensorflow/contrib/tensorrt/convert/convert_graph.cc:418] Can’t find a device placement for the op!
2018-11-29 09:40:25.242917: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:3056] Segment @scope ‘model_2/wide-resnet/’, converted to graph
2018-11-29 09:40:25.242963: E tensorflow/contrib/tensorrt/convert/convert_graph.cc:418] Can’t find a device placement for the op!
2018-11-29 09:40:25.257458: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:3056] Segment @scope ‘model_3/output_8_108_56dim/’, converted to graph
2018-11-29 09:40:25.257501: E tensorflow/contrib/tensorrt/convert/convert_graph.cc:418] Can’t find a device placement for the op!
2018-11-29 09:40:25.272904: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:3056] Segment @scope ‘model_3/wide-resnet/’, converted to graph
2018-11-29 09:40:25.272980: E tensorflow/contrib/tensorrt/convert/convert_graph.cc:418] Can’t find a device placement for the op!
2018-11-29 09:40:25.289882: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:3056] Segment @scope ‘’, converted to graph
2018-11-29 09:40:25.289936: E tensorflow/contrib/tensorrt/convert/convert_graph.cc:418] Can’t find a device placement for the op!
2018-11-29 09:40:40.418073: I tensorflow/contrib/tensorrt/convert/convert_graph.cc:952] Engine my_trt_op_0 creation for segment 0, composed of 2 nodes succeeded.
2018-11-29 09:40:40.450207: I tensorflow/contrib/tensorrt/convert/convert_graph.cc:952] Engine my_trt_op_1 creation for segment 1, composed of 2 nodes succeeded.
2018-11-29 09:40:40.641688: I tensorflow/contrib/tensorrt/convert/convert_graph.cc:952] Engine model/output_4_108_56dim/my_trt_op_4 creation for segment 2, composed of 4 nodes succeeded.
2018-11-29 09:41:06.770987: I tensorflow/contrib/tensorrt/convert/convert_graph.cc:952] Engine model/wide-resnet/my_trt_op_5 creation for segment 3, composed of 223 nodes succeeded.
2018-11-29 09:41:06.943108: I tensorflow/contrib/tensorrt/convert/convert_graph.cc:952] Engine model_1/output_4_108_56dim/my_trt_op_6 creation for segment 4, composed of 4 nodes succeeded.
2018-11-29 09:41:32.902693: I tensorflow/contrib/tensorrt/convert/convert_graph.cc:952] Engine model_1/wide-resnet/my_trt_op_7 creation for segment 5, composed of 223 nodes succeeded.
2018-11-29 09:41:33.474359: I tensorflow/contrib/tensorrt/convert/convert_graph.cc:952] Engine model_2/my_trt_op_8 creation for segment 6, composed of 17 nodes succeeded.
2018-11-29 09:41:59.413435: I tensorflow/contrib/tensorrt/convert/convert_graph.cc:952] Engine model_2/wide-resnet/my_trt_op_9 creation for segment 7, composed of 223 nodes succeeded.
2018-11-29 09:41:59.587183: I tensorflow/contrib/tensorrt/convert/convert_graph.cc:952] Engine model_3/output_8_108_56dim/my_trt_op_10 creation for segment 8, composed of 4 nodes succeeded.
2018-11-29 09:42:25.531645: I tensorflow/contrib/tensorrt/convert/convert_graph.cc:952] Engine model_3/wide-resnet/my_trt_op_11 creation for segment 9, composed of 223 nodes succeeded.
2018-11-29 09:42:25.653696: I tensorflow/contrib/tensorrt/convert/convert_graph.cc:952] Engine my_trt_op_12 creation for segment 10, composed of 2 nodes succeeded.
2018-11-29 09:42:25.769440: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:3056] Segment @scope ‘’, converted to graph
2018-11-29 09:42:25.769521: E tensorflow/contrib/tensorrt/convert/convert_graph.cc:418] Can’t find a device placement for the op!
2018-11-29 09:42:25.787068: W tensorflow/contrib/tensorrt/convert/convert_graph.cc:891] Failed to register segment graphdef as a function 0: Invalid argument: Input 0 of node TensorRTOutputPH_0 was passed float from TopKV2:0 incompatible with expected int32.
2018-11-29 09:42:25.893006: W tensorflow/contrib/tensorrt/convert/trt_optimization_pass.cc:185] TensorRTOptimizer is probably called on funcdef! This optimizer must NOT be called on function objects.
2018-11-29 09:42:25.929001: W tensorflow/contrib/tensorrt/convert/trt_optimization_pass.cc:185] TensorRTOptimizer is probably called on funcdef! This optimizer must NOT be called on function objects.
2018-11-29 09:42:26.008819: W tensorflow/contrib/tensorrt/convert/trt_optimization_pass.cc:185] TensorRTOptimizer is probably called on funcdef! This optimizer must NOT be called on function objects.
2018-11-29 09:42:26.045368: W tensorflow/contrib/tensorrt/convert/trt_optimization_pass.cc:185] TensorRTOptimizer is probably called on funcdef! This optimizer must NOT be called on function objects.
2018-11-29 09:42:26.127372: W tensorflow/contrib/tensorrt/convert/trt_optimization_pass.cc:185] TensorRTOptimizer is probably called on funcdef! This optimizer must NOT be called on function objects.
2018-11-29 09:42:26.164375: W tensorflow/contrib/tensorrt/convert/trt_optimization_pass.cc:185] TensorRTOptimizer is probably called on funcdef! This optimizer must NOT be called on function objects.
2018-11-29 09:42:26.282139: W tensorflow/contrib/tensorrt/convert/trt_optimization_pass.cc:185] TensorRTOptimizer is probably called on funcdef! This optimizer must NOT be called on function objects.
2018-11-29 09:42:26.318606: W tensorflow/contrib/tensorrt/convert/trt_optimization_pass.cc:185] TensorRTOptimizer is probably called on funcdef! This optimizer must NOT be called on function objects.
2018-11-29 09:42:26.440117: W tensorflow/contrib/tensorrt/convert/trt_optimization_pass.cc:185] TensorRTOptimizer is probably called on funcdef! This optimizer must NOT be called on function objects.
2018-11-29 09:42:26.494539: W tensorflow/contrib/tensorrt/convert/trt_optimization_pass.cc:185] TensorRTOptimizer is probably called on funcdef! This optimizer must NOT be called on function objects.
2018-11-29 09:42:26.614278: W tensorflow/contrib/tensorrt/convert/trt_optimization_pass.cc:185] TensorRTOptimizer is probably called on funcdef! This optimizer must NOT be called on function objects.
2018-11-29 09:42:26.668181: W tensorflow/contrib/tensorrt/convert/trt_optimization_pass.cc:185] TensorRTOptimizer is probably called on funcdef! This optimizer must NOT be called on function objects.
2018-11-29 09:42:26.787353: W tensorflow/contrib/tensorrt/convert/trt_optimization_pass.cc:185] TensorRTOptimizer is probably called on funcdef! This optimizer must NOT be called on function objects.
2018-11-29 09:42:26.841824: W tensorflow/contrib/tensorrt/convert/trt_optimization_pass.cc:185] TensorRTOptimizer is probably called on funcdef! This optimizer must NOT be called on function objects.
2018-11-29 09:42:26.962737: W tensorflow/contrib/tensorrt/convert/trt_optimization_pass.cc:185] TensorRTOptimizer is probably called on funcdef! This optimizer must NOT be called on function objects.
2018-11-29 09:42:27.016635: W tensorflow/contrib/tensorrt/convert/trt_optimization_pass.cc:185] TensorRTOptimizer is probably called on funcdef! This optimizer must NOT be called on function objects.
2018-11-29 09:42:27.044637: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:501] Optimization results for grappler item: tf_graph
2018-11-29 09:42:27.044703: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503] constant folding: Graph size after: 1014 nodes (-422), 1060 edges (-545), time = 195.78ms.
2018-11-29 09:42:27.044713: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503] layout: Graph size after: 1024 nodes (10), 1076 edges (16), time = 62.572ms.
2018-11-29 09:42:27.044719: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503] TensorRTOptimizer: Graph size after: 108 nodes (-916), 122 edges (-954), time = 120603.203ms.
2018-11-29 09:42:27.044749: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503] constant folding: Graph size after: 108 nodes (0), 122 edges (0), time = 43.359ms.
2018-11-29 09:42:27.044766: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503] TensorRTOptimizer: Graph size after: 108 nodes (0), 122 edges (0), time = 83.243ms.
2018-11-29 09:42:27.044774: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:501] Optimization results for grappler item: model_2/my_trt_op_8_native_segment
2018-11-29 09:42:27.044780: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503] constant folding: Graph size after: 18 nodes (0), 19 edges (0), time = 36.011ms.
2018-11-29 09:42:27.044788: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503] layout: Graph size after: 18 nodes (0), 19 edges (0), time = 16.487ms.
2018-11-29 09:42:27.044818: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503] TensorRTOptimizer: Graph size after: 18 nodes (0), 19 edges (0), time = 4.844ms.
2018-11-29 09:42:27.044826: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503] constant folding: Graph size after: 18 nodes (0), 19 edges (0), time = 31.016ms.
2018-11-29 09:42:27.044839: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503] TensorRTOptimizer: Graph size after: 18 nodes (0), 19 edges (0), time = 5.544ms.
2018-11-29 09:42:27.044853: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:501] Optimization results for grappler item: model_3/output_8_108_56dim/my_trt_op_10_native_segment
2018-11-29 09:42:27.044863: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503] constant folding: Graph size after: 5 nodes (0), 4 edges (0), time = 34.587ms.
2018-11-29 09:42:27.044892: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503] layout: Graph size after: 5 nodes (0), 4 edges (0), time = 17.526ms.
2018-11-29 09:42:27.044903: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503] TensorRTOptimizer: Graph size after: 5 nodes (0), 4 edges (0), time = 4.659ms.
2018-11-29 09:42:27.044921: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503] constant folding: Graph size after: 5 nodes (0), 4 edges (0), time = 31.75ms.
2018-11-29 09:42:27.044931: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503] TensorRTOptimizer: Graph size after: 5 nodes (0), 4 edges (0), time = 4.961ms.
2018-11-29 09:42:27.044959: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:501] Optimization results for grappler item: model_1/output_4_108_56dim/my_trt_op_6_native_segment
2018-11-29 09:42:27.044993: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503] constant folding: Graph size after: 5 nodes (0), 4 edges (0), time = 36.605ms.
2018-11-29 09:42:27.045009: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503] layout: Graph size after: 5 nodes (0), 4 edges (0), time = 17.15ms.
2018-11-29 09:42:27.045015: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503] TensorRTOptimizer: Graph size after: 5 nodes (0), 4 edges (0), time = 4.381ms.
2018-11-29 09:42:27.045048: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503] constant folding: Graph size after: 5 nodes (0), 4 edges (0), time = 32.507ms.
2018-11-29 09:42:27.045055: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503] TensorRTOptimizer: Graph size after: 5 nodes (0), 4 edges (0), time = 4.846ms.
2018-11-29 09:42:27.045061: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:501] Optimization results for grappler item: model/output_4_108_56dim/my_trt_op_4_native_segment
2018-11-29 09:42:27.045068: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503] constant folding: Graph size after: 5 nodes (0), 4 edges (0), time = 36.148ms.
2018-11-29 09:42:27.045074: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503] layout: Graph size after: 5 nodes (0), 4 edges (0), time = 18.145ms.
2018-11-29 09:42:27.045089: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503] TensorRTOptimizer: Graph size after: 5 nodes (0), 4 edges (0), time = 4.467ms.
2018-11-29 09:42:27.045100: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503] constant folding: Graph size after: 5 nodes (0), 4 edges (0), time = 31.955ms.
2018-11-29 09:42:27.045111: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503] TensorRTOptimizer: Graph size after: 5 nodes (0), 4 edges (0), time = 4.977ms.
2018-11-29 09:42:27.045122: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:501] Optimization results for grappler item: model_1/wide-resnet/my_trt_op_7_native_segment
2018-11-29 09:42:27.045134: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503] constant folding: Graph size after: 224 nodes (0), 232 edges (0), time = 55.085ms.
2018-11-29 09:42:27.045144: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503] layout: Graph size after: 224 nodes (0), 232 edges (0), time = 32.213ms.
2018-11-29 09:42:27.045164: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503] TensorRTOptimizer: Graph size after: 224 nodes (0), 232 edges (0), time = 7.749ms.
2018-11-29 09:42:27.045180: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503] constant folding: Graph size after: 224 nodes (0), 232 edges (0), time = 46.68ms.
2018-11-29 09:42:27.045192: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503] TensorRTOptimizer: Graph size after: 224 nodes (0), 232 edges (0), time = 8.96ms.
2018-11-29 09:42:27.045203: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:501] Optimization results for grappler item: model_2/wide-resnet/my_trt_op_9_native_segment
2018-11-29 09:42:27.045211: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503] constant folding: Graph size after: 224 nodes (0), 232 edges (0), time = 47.917ms.
2018-11-29 09:42:27.045228: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503] layout: Graph size after: 224 nodes (0), 232 edges (0), time = 29.938ms.
2018-11-29 09:42:27.045237: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503] TensorRTOptimizer: Graph size after: 224 nodes (0), 232 edges (0), time = 8.003ms.
2018-11-29 09:42:27.045248: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503] constant folding: Graph size after: 224 nodes (0), 232 edges (0), time = 45.782ms.
2018-11-29 09:42:27.045261: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503] TensorRTOptimizer: Graph size after: 224 nodes (0), 232 edges (0), time = 8.429ms.
2018-11-29 09:42:27.045272: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:501] Optimization results for grappler item: model/wide-resnet/my_trt_op_5_native_segment
2018-11-29 09:42:27.045283: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503] constant folding: Graph size after: 224 nodes (0), 232 edges (0), time = 48.493ms.
2018-11-29 09:42:27.045293: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503] layout: Graph size after: 224 nodes (0), 232 edges (0), time = 29.765ms.
2018-11-29 09:42:27.045303: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503] TensorRTOptimizer: Graph size after: 224 nodes (0), 232 edges (0), time = 7.656ms.
2018-11-29 09:42:27.045313: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503] constant folding: Graph size after: 224 nodes (0), 232 edges (0), time = 46.6ms.
2018-11-29 09:42:27.045323: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503] TensorRTOptimizer: Graph size after: 224 nodes (0), 232 edges (0), time = 8.956ms.
2018-11-29 09:42:27.045333: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:501] Optimization results for grappler item: model_3/wide-resnet/my_trt_op_11_native_segment
2018-11-29 09:42:27.045343: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503] constant folding: Graph size after: 224 nodes (0), 232 edges (0), time = 48.345ms.
2018-11-29 09:42:27.045358: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503] layout: Graph size after: 224 nodes (0), 232 edges (0), time = 30.467ms.
2018-11-29 09:42:27.045369: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503] TensorRTOptimizer: Graph size after: 224 nodes (0), 232 edges (0), time = 7.688ms.
2018-11-29 09:42:27.045382: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503] constant folding: Graph size after: 224 nodes (0), 232 edges (0), time = 46.09ms.
2018-11-29 09:42:27.045392: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503] TensorRTOptimizer: Graph size after: 224 nodes (0), 232 edges (0), time = 8.902ms.

Is there anything special in this logs?

The weird thing is that when I start the application again with exactly the same graph then it works. So starting it the first time it fails, but second time it runs normally.

So sometimes it works, but sometimes not:

The error message is the same, but sometimes it appears 4 times
python: cask/shaderlist_impl.h:50: void cask::ShaderList<ShaderType, OperationType>::sortHandles() const [with ShaderType = cask::ConvolutionShader; OperationType = cask::Convolution]: Assertion ((*i)->handle != (*prevI)->handle) && "Internal error: CASK: all shaders must have unique names"' failed. python: cask/shaderlist_impl.h:50: void cask::ShaderList<ShaderType, OperationType>::sortHandles() const [with ShaderType = cask::ConvolutionShader; OperationType = cask::Convolution]: Assertion ((*i)->handle != (*prevI)->handle) && “Internal error: CASK: all shaders must have unique names”’ failed.
python: cask/shaderlist_impl.h:50: void cask::ShaderList<ShaderType, OperationType>::sortHandles() const [with ShaderType = cask::ConvolutionShader; OperationType = cask::Convolution]: Assertion ((*i)->handle != (*prevI)->handle) && "Internal error: CASK: all shaders must have unique names"' failed. python: cask/shaderlist_impl.h:50: void cask::ShaderList<ShaderType, OperationType>::sortHandles() const [with ShaderType = cask::ConvolutionShader; OperationType = cask::Convolution]: Assertion ((*i)->handle != (*prevI)->handle) && “Internal error: CASK: all shaders must have unique names”’ failed.
Aborted (core dumped)

d.mus · November 29, 2018, 11:21am

To give some more information. When I start the python script 20 times only one time there is a problem (the first time). The runs after that do not give any problem

The optimized graph is loaded on a variable number of GPUs with:

num_devices = len(self.params['cuda_visible_devices'].split(','))
    splits = tf.split(images, num_devices)
            
    features = []
        for i in range(num_devices):
            # On each GPU the optimized inference graph is loaded
            with tf.device(tf.DeviceSpec(device_type='GPU', device_index=i)):
                input_map = {
                    'input_images:0': splits[i],
                    'input_centroids:0': centroids
                }

                [features_split] = tf.import_graph_def(classifier_graph_def, input_map=input_map, return_elements=return_elements)

                features.append(features_split)
                # Results from all the GPUs will be concatenated again
        features = tf.concat(features, axis=0)

NVES · November 29, 2018, 7:38pm

Interesting. Curious if you are using deepstream? if so, which version?

can you share the trt code you used to optimize the graph?
you experience the error at this line?

[features_split] = tf.import_graph_def(classifier_graph_def, input_map=input_map, return_elements=return_elements)

d.mus · November 30, 2018, 9:54am

No, I do not use deepstream.

the error occurs when calling sess.run() for the first time, so after loading the graph
the trt code:

import os
import glog as log
import tensorflow as tf
from tensorflow.contrib import tensorrt as trt

pre, ext = os.path.splitext(params['track_ckpt_path'])
json_model_path = pre + '.json'
with open(json_model_path, 'r') as f:
    tracking_model = tf.keras.models.model_from_json(f.read())
    tracking_model.load_weights(params['track_ckpt_path'])

num_devices = len(params['cuda_visible_devices'].split(','))

# Create input placeholders
images = tf.placeholder(tf.float32, shape=(int(params['images_per_batch'] / num_devices),) + params['input_shape'], name='input_images')

# Build the inference graph

tracking_features = tracking_model(images)

# Use tf.identiy to give names

tracking_features = tf.identity(tracking_features, name='output_tracking_features')

# Specify node names for inputs and outputs
input_node_names = [
    'input_images'
]

output_node_names = [
    'output_tracking_features'
]

# Create a frozen graph
with tf.keras.backend.get_session() as sess:
    output_graph_def = tf.graph_util.convert_variables_to_constants(
        sess, # The session is used to retrieve the weights
        tf.get_default_graph().as_graph_def(), # The graph_def is used to retrieve the nodes 
        output_node_names # The output node names are used to select the usefull nodes
    )

for node in output_graph_def.node:
    print(node.name)

trt_graph_def = trt.create_inference_graph(
    input_graph_def=output_graph_def,
    minimum_segment_size=2,
    outputs=input_node_names+output_node_names,
    max_batch_size=int(params['images_per_batch']/num_devices),
    max_workspace_size_bytes=params['workspace_size_bytes'],
    precision_mode=params['precision_mode'])

#trt_graph_def=trt.calib_graph_to_infer_graph(trt_graph_def) # For only 'INT8'
log.info('Generated TensorRT graph def')

with tf.gfile.GFile(params['output_path'], 'wb') as f:
    f.write(trt_graph_def.SerializeToString())
log.info('%d ops in the final graph.' % len(trt_graph_def.node))

So there are no errors with creating the trt graph. The error occurs the first time after the Google VM is started,after that it runs fine

d.mus · November 30, 2018, 12:24pm

I have all the code to fully reproduce the error. The code runs on a VM in the google cloud with 4 P100 GPUs. Driver version is 410.73. Eventually I can give the keras model file.

Code to create the optimized graph.

import os
import glog as log
import tensorflow as tf
from tensorflow.contrib import tensorrt as trt

tf.keras.backend.set_learning_phase(0)

params = {}
params['track_ckpt_path'] = '/models/track.h5'
params['output_path'] = '/models/optimized_model.pb'

params['images_per_batch'] = 4096
params['cuda_visible_devices'] = '0,1,2,3'
params['input_shape'] = (108,56,3)
params['workspace_size_bytes'] = 1 << 30
params['precision_mode'] = 'FP16' # use 'FP32' for K80

pre, ext = os.path.splitext(params['track_ckpt_path'])
json_model_path = pre + '.json'
with open(json_model_path, 'r') as f:
    tracking_model = tf.keras.models.model_from_json(f.read())
    tracking_model.load_weights(params['track_ckpt_path'])

num_devices = len(params['cuda_visible_devices'].split(','))

# Create input placeholders
images = tf.placeholder(tf.float32, shape=(int(params['images_per_batch'] / num_devices),) + params['input_shape'], name='input_images')

# Build the inference graph

tracking_features = tracking_model(images)

# Use tf.identiy to give names
tracking_features = tf.identity(tracking_features, name='output_tracking_features')

# Specify node names for inputs and outputs
input_node_names = [
    'input_images'
]

output_node_names = [
    'output_tracking_features'
]

# Create a frozen graph
with tf.keras.backend.get_session() as sess:
    output_graph_def = tf.graph_util.convert_variables_to_constants(
        sess, # The session is used to retrieve the weights
        tf.get_default_graph().as_graph_def(), # The graph_def is used to retrieve the nodes 
        output_node_names # The output node names are used to select the usefull nodes
    )

for node in output_graph_def.node:
    print(node.name)

trt_graph_def = trt.create_inference_graph(
    input_graph_def=output_graph_def,
    minimum_segment_size=2,
    outputs=input_node_names+output_node_names,
    max_batch_size=int(params['images_per_batch']/num_devices),
    max_workspace_size_bytes=params['workspace_size_bytes'],
    precision_mode=params['precision_mode'])

#trt_graph_def=trt.calib_graph_to_infer_graph(trt_graph_def) # For only 'INT8'
log.info('Generated TensorRT graph def')

with tf.gfile.GFile(params['output_path'], 'wb') as f:
    f.write(trt_graph_def.SerializeToString())
log.info('%d ops in the final graph.' % len(trt_graph_def.node))

And for inference

import numpy as np
import os
import glog as log
import tensorflow as tf
from tensorflow.contrib import tensorrt as trt

params = {}
params['model_file'] = '/models/optimized_model.pb'
params['images_per_batch'] = 4096
params['per_process_gpu_memory_fraction'] = 0.6
params['cuda_visible_devices'] = '0,1,2,3'

# Load the computation graph
classifier_model_file = params['model_file']
classifier_graph_def = tf.GraphDef()
with tf.gfile.Open(classifier_model_file, 'rb') as f:
    data = f.read()
    classifier_graph_def.ParseFromString(data)
    log.info('Loaded classifier graph definition')

trt_gpu_ops = tf.GPUOptions(per_process_gpu_memory_fraction = params['per_process_gpu_memory_fraction'])

return_elements = [
    'output_tracking_features:0'
]

# On the CPU all the preprocessing of batches is done
with tf.device('/cpu:0'):
    def gen():
        while True:
            yield np.random.rand(params['images_per_batch'], 108, 56, 3)

    dataset = tf.data.Dataset.from_generator(gen, (tf.float32)) 
    iterator = dataset.make_one_shot_iterator()

    images = iterator.get_next()

    # The batch is divided into one split for each GPU
    num_devices = len(params['cuda_visible_devices'].split(','))
    splits = tf.split(images, num_devices)
    
    tracking_features = []

for i in range(num_devices):
    # On each GPU the optimized inference graph is loaded
    with tf.device(tf.DeviceSpec(device_type='GPU', device_index=i)):
        input_map = {
            'input_images:0': splits[i]
        }

        [tracking_features_split] = tf.import_graph_def(classifier_graph_def, input_map=input_map, return_elements=return_elements)

        tracking_features.append(tracking_features_split)

# Results from all the GPUs will be concatenated again
tracking_features = tf.concat(tracking_features, axis=0)

# Not all GPU memory is used for Tensorflow, also a part for TensorRT engines
with tf.Session(config=tf.ConfigProto(gpu_options=trt_gpu_ops)) as sess:
    while True:
        try:
            result = sess.run([tracking_features])
            print('No error!')

        except tf.errors.OutOfRangeError:
            break

d.mus · December 3, 2018, 1:18pm

Is this possibly a bug in TensorRT or is there a way I can solve this?

NVES · December 3, 2018, 4:29pm

Hello,

we are triaging this issue. possibly a TF-TRT issue. Will keep you updated.

d.mus · December 10, 2018, 9:42am

@NVES any updates on this issue?

NVES · December 10, 2018, 5:05pm

@d.mus, this seems to be a Tensorflow-TenorRT integration issue, we are triaging this appropriately.

mehrzadtj09v · December 20, 2018, 4:40pm

@NVES same problem here. It seems only happens on multiGPU node.

mehrzadtj09v · January 7, 2019, 4:24pm

@NVES happy new year! we still have the same problem even with TensorRT. It happens sometimes at the beginning of our application. Let me know when you expect to have an update on this?

mehrzadtj09v · January 29, 2019, 2:53pm

Do we have any updates on this?

ankit11398 · January 29, 2019, 7:46pm

Same problem here. Any idea when can we expect a fix for this or any workaround?

NVES · January 29, 2019, 7:48pm

this is being discussed. no updates yet.

d.mus · February 18, 2019, 2:51pm

@NVES any idea when we can expect an update?

ankit11398 · February 18, 2019, 3:00pm

I realized that the problem for me was for the following scenario:
I wanted to use multiple GPUs for inferencing. Each GPU was being managed by a different CPU threads.
When all CPU threads were calling deserializeCudaEngine and createExecutionContext I saw the error.

If I deserializeCudaEngine engine and createExecutionContext in a critical section using mutex lock, the error goes away.

Hope it helps.

hl2997 · June 6, 2019, 6:00am

ankit11398, I have the same problem as yours, did you find out the reason why creating engine with two threads causing the problem?

jimwenqiang · August 28, 2019, 12:07pm

have you solve the problem?

Topic		Replies	Views
"Internal error: CASK: all shaders must have unique names"' failed TensorRT	0	442	August 28, 2019
Running 2 models on the same GPU with TensorRT TensorRT	7	1336	January 15, 2021
[TensorRT] engine happed a error in multithreaded TensorRT tensorrt , cuda	2	1638	January 19, 2023
tensorNet::LoadNetwork Internal error: CASK: all shaders must have unique names Jetson TX2	9	861	October 18, 2021
TensorRT unnecessary synchronization in multi-GPU system TensorRT tensorrt , performance , synchronization	7	1550	January 23, 2023
Segmentation fault with multithreaded engine build TensorRT	21	4493	December 24, 2021
How to use TensorRT by the multi-threading package of python Jetson AGX Xavier tensorrt	13	19291	October 18, 2021
Error run 2 context parallel in TensorRT7 TensorRT	13	2751	July 5, 2021
Multiple threads execution with different engines in tensorrt TensorRT tensorrt	3	2642	December 13, 2022
Don't get any 'TRTEngineOp' after optimizing model via TensorRT in Jeton TX2 TensorRT	17	3822	October 12, 2021

Internal error: CASK: all shaders must have unique names

Related topics