need help!!!: Non-OK-status: CudaLaunchKernel( SwapDimension1And2InTensor3UsingTiles

puj · June 18, 2019, 7:39am

When I tried to perform inference using tensorflow c++ (v1.14.0), I have some problem below:

2019-06-18 03:18:48.253160: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
2019-06-18 03:18:48.276641: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library nvcuda.dll
2019-06-18 03:18:48.471959: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties:
name: TITAN Xp major: 6 minor: 1 memoryClockRate(GHz): 1.582
pciBusID: 0000:b3:00.0
2019-06-18 03:18:48.475413: I tensorflow/stream_executor/platform/default/dlopen_checker_stub.cc:25] GPU libraries are statically linked, skip dlopen check.
2019-06-18 03:18:48.498737: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0
2019-06-18 03:21:30.304837: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-06-18 03:21:30.307133: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187] 0
2019-06-18 03:21:30.308538: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0: N
2019-06-18 03:21:30.320300: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 9578 MB memory) → physical GPU (device: 0, name: TITAN Xp, pci bus id: 0000:b3:00.0, compute capability: 6.1)
Successfully setup the session and load the graph…(b)
loading test_image…
loading the image is done
2019-06-18 03:21:37.759369: F .\tensorflow/core/kernels/conv_2d_gpu.h:935] Non-OK-status: CudaLaunchKernel( SwapDimension1And2InTensor3UsingTiles<T, kNumThreads, kTileSize, kTileSize, conjugate>, total_tiles_count, kNumThreads, 0, d.stream(), input, input_dims, output) status: Internal: invalid configuration argument

Can anybody provide some help?

Summer_2020 · July 17, 2020, 9:54am

hi puj, Do you resolve this issue now? I encountered the same issue with tensorflow-gpu 1.14, cuda10.0. Appreciate with any clue.

jazeel.jk · October 8, 2021, 8:02pm

I have this same issue when i am trying to build tensorrt engine from Tf saved model…

input_saved_model_dir = "tf_saved_model"
output_saved_model_dir = "tf_trt_engine"
converter = tf.experimental.tensorrt.Converter(input_saved_model_dir=input_saved_model_dir)
converter.convert()
converter.build(input_fn=my_input_fn)
converter.save(output_saved_model_dir)

when converter.build is executed this error is coming,

2021-10-09 00:01:11.416251: E tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:42] DefaultLogger safeContext.cpp (124) - Cudnn Error in initializeCommonContext: 4 (Could not initialize cudnn, please check cudnn installation.)
2021-10-09 00:01:11.416570: E tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:42] DefaultLogger safeContext.cpp (124) - Cudnn Error in initializeCommonContext: 4 (Could not initialize cudnn, please check cudnn installation.)
2021-10-09 00:01:11.416841: W tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:758] TF-TRT Warning: Engine creation for TRTEngineOp_0_1 failed. The native segment will be used instead. Reason: Internal: Failed to build TensorRT engine
2021-10-09 00:01:11.416861: W tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:628] TF-TRT Warning: Engine retrieval for input shapes: [[1,512,512,3]] failed. Running native segment for TRTEngineOp_0_1
2021-10-09 00:01:11.469584: F ./tensorflow/core/kernels/conv_2d_gpu.h:708] Non-OK-status: GpuLaunchKernel( SwapDimension1And2InTensor3UsingTiles<T, NumThreads, TileLongSide, TileShortSide>, total_tiles_count, NumThreads, 0, d.stream(), input, input_dims, output) status: Internal: out of memory
Aborted (core dumped)