Internal error: CASK: all shaders must have unique names

Hi, I am running my application in this container on a Google cloud VM using 4 NVIDIA P100 GPUs: https://docs.nvidia.com/deeplearning/dgx/tensorflow-release-notes/rel_18.11.html

Now I am getting the following error when loading in the optimized graph in Tensorflow (optimized with TensorRT):

python: cask/shaderlist_impl.h:50: void cask::ShaderList<ShaderType, OperationType>::sortHandles() const [with ShaderType = cask::ConvolutionShader; OperationType = cask::Convolution]: Assertion `((*i)->handle != (*prevI)->handle) && “Internal error: CASK: all shaders must have unique names”’ failed.

The graph is loaded in with:
tf.import_graph_def(classifier_graph_def, input_map=input_map, return_elements=return_elements)

How can I solve this?

Hello,

This is usually due to dependency issues among different libraries. Was your graph optimized in a different version of TF or TRT as in 18.11 container?

Also the graph optimization is done in the same container.

The log output when doing the optimization:

INFO:tensorflow:Running against TensorRT version 5.0.2
I1129 09:40:24.167615 28 tf_logging.py:115] Running against TensorRT version 5.0.2
2018-11-29 09:40:24.651538: I tensorflow/core/grappler/devices.cc:51] Number of eligible GPUs (core count >= 8): 4
2018-11-29 09:40:24.654939: I tensorflow/core/grappler/clusters/single_machine.cc:359] Starting new session
2018-11-29 09:40:24.655574: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0, 1, 2, 3
2018-11-29 09:40:24.655797: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-11-29 09:40:24.655842: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988] 0 1 2 3
2018-11-29 09:40:24.655851: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0: N Y N N
2018-11-29 09:40:24.655868: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 1: Y N N N
2018-11-29 09:40:24.655884: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 2: N N N Y
2018-11-29 09:40:24.655890: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 3: N N Y N
2018-11-29 09:40:24.656901: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 15030 MB memory) -> physical GPU (device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:04.0, compute capability: 6.0)
2018-11-29 09:40:24.657268: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 15108 MB memory) -> physical GPU (device: 1, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:05.0, compute capability: 6.0)
2018-11-29 09:40:24.657619: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:2 with 15108 MB memory) -> physical GPU (device: 2, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:06.0, compute capability: 6.0)
2018-11-29 09:40:24.657827: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:3 with 15108 MB memory) -> physical GPU (device: 3, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:07.0, compute capability: 6.0)
2018-11-29 09:40:25.191757: I tensorflow/contrib/tensorrt/convert/convert_graph.cc:853] MULTIPLE tensorrt candidate conversion: 13
2018-11-29 09:40:25.192324: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:3056] Segment @scope ‘’, converted to graph
2018-11-29 09:40:25.192363: E tensorflow/contrib/tensorrt/convert/convert_graph.cc:418] Can’t find a device placement for the op!
2018-11-29 09:40:25.198213: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:3056] Segment @scope ‘’, converted to graph
2018-11-29 09:40:25.198264: E tensorflow/contrib/tensorrt/convert/convert_graph.cc:418] Can’t find a device placement for the op!
2018-11-29 09:40:25.199297: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:3056] Segment @scope ‘’, converted to graph
2018-11-29 09:40:25.199332: E tensorflow/contrib/tensorrt/convert/convert_graph.cc:418] Can’t find a device placement for the op!
2018-11-29 09:40:25.199710: W tensorflow/contrib/tensorrt/convert/convert_graph.cc:891] Failed to register segment graphdef as a function 2: Invalid argument: Input 0 of node TensorRTOutputPH_0 was passed float from TopKV2:0 incompatible with expected int32.
2018-11-29 09:40:25.200140: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:3056] Segment @scope ‘’, converted to graph
2018-11-29 09:40:25.200172: E tensorflow/contrib/tensorrt/convert/convert_graph.cc:418] Can’t find a device placement for the op!
2018-11-29 09:40:25.200506: W tensorflow/contrib/tensorrt/convert/convert_graph.cc:891] Failed to register segment graphdef as a function 3: Invalid argument: Input 0 of node TensorRTOutputPH_0 was passed float from TopKV2_1:0 incompatible with expected int32.
2018-11-29 09:40:25.200745: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:3056] Segment @scope ‘model/output_4_108_56dim/’, converted to graph
2018-11-29 09:40:25.200777: E tensorflow/contrib/tensorrt/convert/convert_graph.cc:418] Can’t find a device placement for the op!
2018-11-29 09:40:25.204282: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:3056] Segment @scope ‘model/wide-resnet/’, converted to graph
2018-11-29 09:40:25.204326: E tensorflow/contrib/tensorrt/convert/convert_graph.cc:418] Can’t find a device placement for the op!
2018-11-29 09:40:25.214711: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:3056] Segment @scope ‘model_1/output_4_108_56dim/’, converted to graph
2018-11-29 09:40:25.214763: E tensorflow/contrib/tensorrt/convert/convert_graph.cc:418] Can’t find a device placement for the op!
2018-11-29 09:40:25.219756: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:3056] Segment @scope ‘model_1/wide-resnet/’, converted to graph
2018-11-29 09:40:25.219795: E tensorflow/contrib/tensorrt/convert/convert_graph.cc:418] Can’t find a device placement for the op!
2018-11-29 09:40:25.233015: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:3056] Segment @scope ‘model_2/’, converted to graph
2018-11-29 09:40:25.233052: E tensorflow/contrib/tensorrt/convert/convert_graph.cc:418] Can’t find a device placement for the op!
2018-11-29 09:40:25.242917: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:3056] Segment @scope ‘model_2/wide-resnet/’, converted to graph
2018-11-29 09:40:25.242963: E tensorflow/contrib/tensorrt/convert/convert_graph.cc:418] Can’t find a device placement for the op!
2018-11-29 09:40:25.257458: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:3056] Segment @scope ‘model_3/output_8_108_56dim/’, converted to graph
2018-11-29 09:40:25.257501: E tensorflow/contrib/tensorrt/convert/convert_graph.cc:418] Can’t find a device placement for the op!
2018-11-29 09:40:25.272904: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:3056] Segment @scope ‘model_3/wide-resnet/’, converted to graph
2018-11-29 09:40:25.272980: E tensorflow/contrib/tensorrt/convert/convert_graph.cc:418] Can’t find a device placement for the op!
2018-11-29 09:40:25.289882: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:3056] Segment @scope ‘’, converted to graph
2018-11-29 09:40:25.289936: E tensorflow/contrib/tensorrt/convert/convert_graph.cc:418] Can’t find a device placement for the op!
2018-11-29 09:40:40.418073: I tensorflow/contrib/tensorrt/convert/convert_graph.cc:952] Engine my_trt_op_0 creation for segment 0, composed of 2 nodes succeeded.
2018-11-29 09:40:40.450207: I tensorflow/contrib/tensorrt/convert/convert_graph.cc:952] Engine my_trt_op_1 creation for segment 1, composed of 2 nodes succeeded.
2018-11-29 09:40:40.641688: I tensorflow/contrib/tensorrt/convert/convert_graph.cc:952] Engine model/output_4_108_56dim/my_trt_op_4 creation for segment 2, composed of 4 nodes succeeded.
2018-11-29 09:41:06.770987: I tensorflow/contrib/tensorrt/convert/convert_graph.cc:952] Engine model/wide-resnet/my_trt_op_5 creation for segment 3, composed of 223 nodes succeeded.
2018-11-29 09:41:06.943108: I tensorflow/contrib/tensorrt/convert/convert_graph.cc:952] Engine model_1/output_4_108_56dim/my_trt_op_6 creation for segment 4, composed of 4 nodes succeeded.
2018-11-29 09:41:32.902693: I tensorflow/contrib/tensorrt/convert/convert_graph.cc:952] Engine model_1/wide-resnet/my_trt_op_7 creation for segment 5, composed of 223 nodes succeeded.
2018-11-29 09:41:33.474359: I tensorflow/contrib/tensorrt/convert/convert_graph.cc:952] Engine model_2/my_trt_op_8 creation for segment 6, composed of 17 nodes succeeded.
2018-11-29 09:41:59.413435: I tensorflow/contrib/tensorrt/convert/convert_graph.cc:952] Engine model_2/wide-resnet/my_trt_op_9 creation for segment 7, composed of 223 nodes succeeded.
2018-11-29 09:41:59.587183: I tensorflow/contrib/tensorrt/convert/convert_graph.cc:952] Engine model_3/output_8_108_56dim/my_trt_op_10 creation for segment 8, composed of 4 nodes succeeded.
2018-11-29 09:42:25.531645: I tensorflow/contrib/tensorrt/convert/convert_graph.cc:952] Engine model_3/wide-resnet/my_trt_op_11 creation for segment 9, composed of 223 nodes succeeded.
2018-11-29 09:42:25.653696: I tensorflow/contrib/tensorrt/convert/convert_graph.cc:952] Engine my_trt_op_12 creation for segment 10, composed of 2 nodes succeeded.
2018-11-29 09:42:25.769440: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:3056] Segment @scope ‘’, converted to graph
2018-11-29 09:42:25.769521: E tensorflow/contrib/tensorrt/convert/convert_graph.cc:418] Can’t find a device placement for the op!
2018-11-29 09:42:25.787068: W tensorflow/contrib/tensorrt/convert/convert_graph.cc:891] Failed to register segment graphdef as a function 0: Invalid argument: Input 0 of node TensorRTOutputPH_0 was passed float from TopKV2:0 incompatible with expected int32.
2018-11-29 09:42:25.893006: W tensorflow/contrib/tensorrt/convert/trt_optimization_pass.cc:185] TensorRTOptimizer is probably called on funcdef! This optimizer must NOT be called on function objects.
2018-11-29 09:42:25.929001: W tensorflow/contrib/tensorrt/convert/trt_optimization_pass.cc:185] TensorRTOptimizer is probably called on funcdef! This optimizer must NOT be called on function objects.
2018-11-29 09:42:26.008819: W tensorflow/contrib/tensorrt/convert/trt_optimization_pass.cc:185] TensorRTOptimizer is probably called on funcdef! This optimizer must NOT be called on function objects.
2018-11-29 09:42:26.045368: W tensorflow/contrib/tensorrt/convert/trt_optimization_pass.cc:185] TensorRTOptimizer is probably called on funcdef! This optimizer must NOT be called on function objects.
2018-11-29 09:42:26.127372: W tensorflow/contrib/tensorrt/convert/trt_optimization_pass.cc:185] TensorRTOptimizer is probably called on funcdef! This optimizer must NOT be called on function objects.
2018-11-29 09:42:26.164375: W tensorflow/contrib/tensorrt/convert/trt_optimization_pass.cc:185] TensorRTOptimizer is probably called on funcdef! This optimizer must NOT be called on function objects.
2018-11-29 09:42:26.282139: W tensorflow/contrib/tensorrt/convert/trt_optimization_pass.cc:185] TensorRTOptimizer is probably called on funcdef! This optimizer must NOT be called on function objects.
2018-11-29 09:42:26.318606: W tensorflow/contrib/tensorrt/convert/trt_optimization_pass.cc:185] TensorRTOptimizer is probably called on funcdef! This optimizer must NOT be called on function objects.
2018-11-29 09:42:26.440117: W tensorflow/contrib/tensorrt/convert/trt_optimization_pass.cc:185] TensorRTOptimizer is probably called on funcdef! This optimizer must NOT be called on function objects.
2018-11-29 09:42:26.494539: W tensorflow/contrib/tensorrt/convert/trt_optimization_pass.cc:185] TensorRTOptimizer is probably called on funcdef! This optimizer must NOT be called on function objects.
2018-11-29 09:42:26.614278: W tensorflow/contrib/tensorrt/convert/trt_optimization_pass.cc:185] TensorRTOptimizer is probably called on funcdef! This optimizer must NOT be called on function objects.
2018-11-29 09:42:26.668181: W tensorflow/contrib/tensorrt/convert/trt_optimization_pass.cc:185] TensorRTOptimizer is probably called on funcdef! This optimizer must NOT be called on function objects.
2018-11-29 09:42:26.787353: W tensorflow/contrib/tensorrt/convert/trt_optimization_pass.cc:185] TensorRTOptimizer is probably called on funcdef! This optimizer must NOT be called on function objects.
2018-11-29 09:42:26.841824: W tensorflow/contrib/tensorrt/convert/trt_optimization_pass.cc:185] TensorRTOptimizer is probably called on funcdef! This optimizer must NOT be called on function objects.
2018-11-29 09:42:26.962737: W tensorflow/contrib/tensorrt/convert/trt_optimization_pass.cc:185] TensorRTOptimizer is probably called on funcdef! This optimizer must NOT be called on function objects.
2018-11-29 09:42:27.016635: W tensorflow/contrib/tensorrt/convert/trt_optimization_pass.cc:185] TensorRTOptimizer is probably called on funcdef! This optimizer must NOT be called on function objects.
2018-11-29 09:42:27.044637: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:501] Optimization results for grappler item: tf_graph
2018-11-29 09:42:27.044703: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503] constant folding: Graph size after: 1014 nodes (-422), 1060 edges (-545), time = 195.78ms.
2018-11-29 09:42:27.044713: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503] layout: Graph size after: 1024 nodes (10), 1076 edges (16), time = 62.572ms.
2018-11-29 09:42:27.044719: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503] TensorRTOptimizer: Graph size after: 108 nodes (-916), 122 edges (-954), time = 120603.203ms.
2018-11-29 09:42:27.044749: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503] constant folding: Graph size after: 108 nodes (0), 122 edges (0), time = 43.359ms.
2018-11-29 09:42:27.044766: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503] TensorRTOptimizer: Graph size after: 108 nodes (0), 122 edges (0), time = 83.243ms.
2018-11-29 09:42:27.044774: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:501] Optimization results for grappler item: model_2/my_trt_op_8_native_segment
2018-11-29 09:42:27.044780: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503] constant folding: Graph size after: 18 nodes (0), 19 edges (0), time = 36.011ms.
2018-11-29 09:42:27.044788: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503] layout: Graph size after: 18 nodes (0), 19 edges (0), time = 16.487ms.
2018-11-29 09:42:27.044818: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503] TensorRTOptimizer: Graph size after: 18 nodes (0), 19 edges (0), time = 4.844ms.
2018-11-29 09:42:27.044826: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503] constant folding: Graph size after: 18 nodes (0), 19 edges (0), time = 31.016ms.
2018-11-29 09:42:27.044839: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503] TensorRTOptimizer: Graph size after: 18 nodes (0), 19 edges (0), time = 5.544ms.
2018-11-29 09:42:27.044853: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:501] Optimization results for grappler item: model_3/output_8_108_56dim/my_trt_op_10_native_segment
2018-11-29 09:42:27.044863: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503] constant folding: Graph size after: 5 nodes (0), 4 edges (0), time = 34.587ms.
2018-11-29 09:42:27.044892: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503] layout: Graph size after: 5 nodes (0), 4 edges (0), time = 17.526ms.
2018-11-29 09:42:27.044903: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503] TensorRTOptimizer: Graph size after: 5 nodes (0), 4 edges (0), time = 4.659ms.
2018-11-29 09:42:27.044921: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503] constant folding: Graph size after: 5 nodes (0), 4 edges (0), time = 31.75ms.
2018-11-29 09:42:27.044931: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503] TensorRTOptimizer: Graph size after: 5 nodes (0), 4 edges (0), time = 4.961ms.
2018-11-29 09:42:27.044959: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:501] Optimization results for grappler item: model_1/output_4_108_56dim/my_trt_op_6_native_segment
2018-11-29 09:42:27.044993: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503] constant folding: Graph size after: 5 nodes (0), 4 edges (0), time = 36.605ms.
2018-11-29 09:42:27.045009: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503] layout: Graph size after: 5 nodes (0), 4 edges (0), time = 17.15ms.
2018-11-29 09:42:27.045015: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503] TensorRTOptimizer: Graph size after: 5 nodes (0), 4 edges (0), time = 4.381ms.
2018-11-29 09:42:27.045048: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503] constant folding: Graph size after: 5 nodes (0), 4 edges (0), time = 32.507ms.
2018-11-29 09:42:27.045055: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503] TensorRTOptimizer: Graph size after: 5 nodes (0), 4 edges (0), time = 4.846ms.
2018-11-29 09:42:27.045061: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:501] Optimization results for grappler item: model/output_4_108_56dim/my_trt_op_4_native_segment
2018-11-29 09:42:27.045068: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503] constant folding: Graph size after: 5 nodes (0), 4 edges (0), time = 36.148ms.
2018-11-29 09:42:27.045074: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503] layout: Graph size after: 5 nodes (0), 4 edges (0), time = 18.145ms.
2018-11-29 09:42:27.045089: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503] TensorRTOptimizer: Graph size after: 5 nodes (0), 4 edges (0), time = 4.467ms.
2018-11-29 09:42:27.045100: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503] constant folding: Graph size after: 5 nodes (0), 4 edges (0), time = 31.955ms.
2018-11-29 09:42:27.045111: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503] TensorRTOptimizer: Graph size after: 5 nodes (0), 4 edges (0), time = 4.977ms.
2018-11-29 09:42:27.045122: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:501] Optimization results for grappler item: model_1/wide-resnet/my_trt_op_7_native_segment
2018-11-29 09:42:27.045134: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503] constant folding: Graph size after: 224 nodes (0), 232 edges (0), time = 55.085ms.
2018-11-29 09:42:27.045144: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503] layout: Graph size after: 224 nodes (0), 232 edges (0), time = 32.213ms.
2018-11-29 09:42:27.045164: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503] TensorRTOptimizer: Graph size after: 224 nodes (0), 232 edges (0), time = 7.749ms.
2018-11-29 09:42:27.045180: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503] constant folding: Graph size after: 224 nodes (0), 232 edges (0), time = 46.68ms.
2018-11-29 09:42:27.045192: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503] TensorRTOptimizer: Graph size after: 224 nodes (0), 232 edges (0), time = 8.96ms.
2018-11-29 09:42:27.045203: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:501] Optimization results for grappler item: model_2/wide-resnet/my_trt_op_9_native_segment
2018-11-29 09:42:27.045211: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503] constant folding: Graph size after: 224 nodes (0), 232 edges (0), time = 47.917ms.
2018-11-29 09:42:27.045228: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503] layout: Graph size after: 224 nodes (0), 232 edges (0), time = 29.938ms.
2018-11-29 09:42:27.045237: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503] TensorRTOptimizer: Graph size after: 224 nodes (0), 232 edges (0), time = 8.003ms.
2018-11-29 09:42:27.045248: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503] constant folding: Graph size after: 224 nodes (0), 232 edges (0), time = 45.782ms.
2018-11-29 09:42:27.045261: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503] TensorRTOptimizer: Graph size after: 224 nodes (0), 232 edges (0), time = 8.429ms.
2018-11-29 09:42:27.045272: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:501] Optimization results for grappler item: model/wide-resnet/my_trt_op_5_native_segment
2018-11-29 09:42:27.045283: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503] constant folding: Graph size after: 224 nodes (0), 232 edges (0), time = 48.493ms.
2018-11-29 09:42:27.045293: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503] layout: Graph size after: 224 nodes (0), 232 edges (0), time = 29.765ms.
2018-11-29 09:42:27.045303: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503] TensorRTOptimizer: Graph size after: 224 nodes (0), 232 edges (0), time = 7.656ms.
2018-11-29 09:42:27.045313: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503] constant folding: Graph size after: 224 nodes (0), 232 edges (0), time = 46.6ms.
2018-11-29 09:42:27.045323: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503] TensorRTOptimizer: Graph size after: 224 nodes (0), 232 edges (0), time = 8.956ms.
2018-11-29 09:42:27.045333: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:501] Optimization results for grappler item: model_3/wide-resnet/my_trt_op_11_native_segment
2018-11-29 09:42:27.045343: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503] constant folding: Graph size after: 224 nodes (0), 232 edges (0), time = 48.345ms.
2018-11-29 09:42:27.045358: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503] layout: Graph size after: 224 nodes (0), 232 edges (0), time = 30.467ms.
2018-11-29 09:42:27.045369: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503] TensorRTOptimizer: Graph size after: 224 nodes (0), 232 edges (0), time = 7.688ms.
2018-11-29 09:42:27.045382: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503] constant folding: Graph size after: 224 nodes (0), 232 edges (0), time = 46.09ms.
2018-11-29 09:42:27.045392: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503] TensorRTOptimizer: Graph size after: 224 nodes (0), 232 edges (0), time = 8.902ms.

Is there anything special in this logs?

The weird thing is that when I start the application again with exactly the same graph then it works. So starting it the first time it fails, but second time it runs normally.

So sometimes it works, but sometimes not:

The error message is the same, but sometimes it appears 4 times
python: cask/shaderlist_impl.h:50: void cask::ShaderList<ShaderType, OperationType>::sortHandles() const [with ShaderType = cask::ConvolutionShader; OperationType = cask::Convolution]: Assertion ((*i)->handle != (*prevI)->handle) && "Internal error: CASK: all shaders must have unique names"' failed. python: cask/shaderlist_impl.h:50: void cask::ShaderList<ShaderType, OperationType>::sortHandles() const [with ShaderType = cask::ConvolutionShader; OperationType = cask::Convolution]: Assertion ((*i)->handle != (*prevI)->handle) && “Internal error: CASK: all shaders must have unique names”’ failed.
python: cask/shaderlist_impl.h:50: void cask::ShaderList<ShaderType, OperationType>::sortHandles() const [with ShaderType = cask::ConvolutionShader; OperationType = cask::Convolution]: Assertion ((*i)->handle != (*prevI)->handle) && "Internal error: CASK: all shaders must have unique names"' failed. python: cask/shaderlist_impl.h:50: void cask::ShaderList<ShaderType, OperationType>::sortHandles() const [with ShaderType = cask::ConvolutionShader; OperationType = cask::Convolution]: Assertion ((*i)->handle != (*prevI)->handle) && “Internal error: CASK: all shaders must have unique names”’ failed.
Aborted (core dumped)

To give some more information. When I start the python script 20 times only one time there is a problem (the first time). The runs after that do not give any problem

The optimized graph is loaded on a variable number of GPUs with:

num_devices = len(self.params['cuda_visible_devices'].split(','))
    splits = tf.split(images, num_devices)
            
    features = []
        for i in range(num_devices):
            # On each GPU the optimized inference graph is loaded
            with tf.device(tf.DeviceSpec(device_type='GPU', device_index=i)):
                input_map = {
                    'input_images:0': splits[i],
                    'input_centroids:0': centroids
                }

                [features_split] = tf.import_graph_def(classifier_graph_def, input_map=input_map, return_elements=return_elements)

                features.append(features_split)
                # Results from all the GPUs will be concatenated again
        features = tf.concat(features, axis=0)

Interesting. Curious if you are using deepstream? if so, which version?

  • can you share the trt code you used to optimize the graph?
  • you experience the error at this line?
[features_split] = tf.import_graph_def(classifier_graph_def, input_map=input_map, return_elements=return_elements)

No, I do not use deepstream.

  • the error occurs when calling sess.run() for the first time, so after loading the graph

  • the trt code:

import os
import glog as log
import tensorflow as tf
from tensorflow.contrib import tensorrt as trt

pre, ext = os.path.splitext(params['track_ckpt_path'])
json_model_path = pre + '.json'
with open(json_model_path, 'r') as f:
    tracking_model = tf.keras.models.model_from_json(f.read())
    tracking_model.load_weights(params['track_ckpt_path'])

num_devices = len(params['cuda_visible_devices'].split(','))

# Create input placeholders
images = tf.placeholder(tf.float32, shape=(int(params['images_per_batch'] / num_devices),) + params['input_shape'], name='input_images')

# Build the inference graph

tracking_features = tracking_model(images)

# Use tf.identiy to give names

tracking_features = tf.identity(tracking_features, name='output_tracking_features')

# Specify node names for inputs and outputs
input_node_names = [
    'input_images'
]

output_node_names = [
    'output_tracking_features'
]

# Create a frozen graph
with tf.keras.backend.get_session() as sess:
    output_graph_def = tf.graph_util.convert_variables_to_constants(
        sess, # The session is used to retrieve the weights
        tf.get_default_graph().as_graph_def(), # The graph_def is used to retrieve the nodes 
        output_node_names # The output node names are used to select the usefull nodes
    )

for node in output_graph_def.node:
    print(node.name)

trt_graph_def = trt.create_inference_graph(
    input_graph_def=output_graph_def,
    minimum_segment_size=2,
    outputs=input_node_names+output_node_names,
    max_batch_size=int(params['images_per_batch']/num_devices),
    max_workspace_size_bytes=params['workspace_size_bytes'],
    precision_mode=params['precision_mode'])

#trt_graph_def=trt.calib_graph_to_infer_graph(trt_graph_def) # For only 'INT8'
log.info('Generated TensorRT graph def')

with tf.gfile.GFile(params['output_path'], 'wb') as f:
    f.write(trt_graph_def.SerializeToString())
log.info('%d ops in the final graph.' % len(trt_graph_def.node))

So there are no errors with creating the trt graph. The error occurs the first time after the Google VM is started,after that it runs fine

I have all the code to fully reproduce the error. The code runs on a VM in the google cloud with 4 P100 GPUs. Driver version is 410.73. Eventually I can give the keras model file.

Code to create the optimized graph.

import os
import glog as log
import tensorflow as tf
from tensorflow.contrib import tensorrt as trt

tf.keras.backend.set_learning_phase(0)

params = {}
params['track_ckpt_path'] = '/models/track.h5'
params['output_path'] = '/models/optimized_model.pb'

params['images_per_batch'] = 4096
params['cuda_visible_devices'] = '0,1,2,3'
params['input_shape'] = (108,56,3)
params['workspace_size_bytes'] = 1 << 30
params['precision_mode'] = 'FP16' # use 'FP32' for K80

pre, ext = os.path.splitext(params['track_ckpt_path'])
json_model_path = pre + '.json'
with open(json_model_path, 'r') as f:
    tracking_model = tf.keras.models.model_from_json(f.read())
    tracking_model.load_weights(params['track_ckpt_path'])

num_devices = len(params['cuda_visible_devices'].split(','))

# Create input placeholders
images = tf.placeholder(tf.float32, shape=(int(params['images_per_batch'] / num_devices),) + params['input_shape'], name='input_images')

# Build the inference graph

tracking_features = tracking_model(images)

# Use tf.identiy to give names
tracking_features = tf.identity(tracking_features, name='output_tracking_features')

# Specify node names for inputs and outputs
input_node_names = [
    'input_images'
]

output_node_names = [
    'output_tracking_features'
]

# Create a frozen graph
with tf.keras.backend.get_session() as sess:
    output_graph_def = tf.graph_util.convert_variables_to_constants(
        sess, # The session is used to retrieve the weights
        tf.get_default_graph().as_graph_def(), # The graph_def is used to retrieve the nodes 
        output_node_names # The output node names are used to select the usefull nodes
    )

for node in output_graph_def.node:
    print(node.name)

trt_graph_def = trt.create_inference_graph(
    input_graph_def=output_graph_def,
    minimum_segment_size=2,
    outputs=input_node_names+output_node_names,
    max_batch_size=int(params['images_per_batch']/num_devices),
    max_workspace_size_bytes=params['workspace_size_bytes'],
    precision_mode=params['precision_mode'])

#trt_graph_def=trt.calib_graph_to_infer_graph(trt_graph_def) # For only 'INT8'
log.info('Generated TensorRT graph def')

with tf.gfile.GFile(params['output_path'], 'wb') as f:
    f.write(trt_graph_def.SerializeToString())
log.info('%d ops in the final graph.' % len(trt_graph_def.node))

And for inference

import numpy as np
import os
import glog as log
import tensorflow as tf
from tensorflow.contrib import tensorrt as trt

params = {}
params['model_file'] = '/models/optimized_model.pb'
params['images_per_batch'] = 4096
params['per_process_gpu_memory_fraction'] = 0.6
params['cuda_visible_devices'] = '0,1,2,3'

# Load the computation graph
classifier_model_file = params['model_file']
classifier_graph_def = tf.GraphDef()
with tf.gfile.Open(classifier_model_file, 'rb') as f:
    data = f.read()
    classifier_graph_def.ParseFromString(data)
    log.info('Loaded classifier graph definition')

trt_gpu_ops = tf.GPUOptions(per_process_gpu_memory_fraction = params['per_process_gpu_memory_fraction'])

return_elements = [
    'output_tracking_features:0'
]

# On the CPU all the preprocessing of batches is done
with tf.device('/cpu:0'):
    def gen():
        while True:
            yield np.random.rand(params['images_per_batch'], 108, 56, 3)

    dataset = tf.data.Dataset.from_generator(gen, (tf.float32)) 
    iterator = dataset.make_one_shot_iterator()

    images = iterator.get_next()

    # The batch is divided into one split for each GPU
    num_devices = len(params['cuda_visible_devices'].split(','))
    splits = tf.split(images, num_devices)
    
    tracking_features = []

for i in range(num_devices):
    # On each GPU the optimized inference graph is loaded
    with tf.device(tf.DeviceSpec(device_type='GPU', device_index=i)):
        input_map = {
            'input_images:0': splits[i]
        }

        [tracking_features_split] = tf.import_graph_def(classifier_graph_def, input_map=input_map, return_elements=return_elements)

        tracking_features.append(tracking_features_split)

# Results from all the GPUs will be concatenated again
tracking_features = tf.concat(tracking_features, axis=0)

# Not all GPU memory is used for Tensorflow, also a part for TensorRT engines
with tf.Session(config=tf.ConfigProto(gpu_options=trt_gpu_ops)) as sess:
    while True:
        try:
            result = sess.run([tracking_features])
            print('No error!')

        except tf.errors.OutOfRangeError:
            break

Is this possibly a bug in TensorRT or is there a way I can solve this?

Hello,

we are triaging this issue. possibly a TF-TRT issue. Will keep you updated.

@NVES any updates on this issue?

@d.mus, this seems to be a Tensorflow-TenorRT integration issue, we are triaging this appropriately.

@NVES same problem here. It seems only happens on multiGPU node.

@NVES happy new year! we still have the same problem even with TensorRT. It happens sometimes at the beginning of our application. Let me know when you expect to have an update on this?

Do we have any updates on this?

Same problem here. Any idea when can we expect a fix for this or any workaround?

this is being discussed. no updates yet.

@NVES any idea when we can expect an update?

I realized that the problem for me was for the following scenario:
I wanted to use multiple GPUs for inferencing. Each GPU was being managed by a different CPU threads.
When all CPU threads were calling deserializeCudaEngine and createExecutionContext I saw the error.

If I deserializeCudaEngine engine and createExecutionContext in a critical section using mutex lock, the error goes away.

Hope it helps.

ankit11398, I have the same problem as yours, did you find out the reason why creating engine with two threads causing the problem?

have you solve the problem?