need help!!!: Non-OK-status: CudaLaunchKernel( SwapDimension1And2InTensor3UsingTiles

When I tried to perform inference using tensorflow c++ (v1.14.0), I have some problem below:

2019-06-18 03:18:48.253160: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
2019-06-18 03:18:48.276641: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library nvcuda.dll
2019-06-18 03:18:48.471959: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties:
name: TITAN Xp major: 6 minor: 1 memoryClockRate(GHz): 1.582
pciBusID: 0000:b3:00.0
2019-06-18 03:18:48.475413: I tensorflow/stream_executor/platform/default/dlopen_checker_stub.cc:25] GPU libraries are statically linked, skip dlopen check.
2019-06-18 03:18:48.498737: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0
2019-06-18 03:21:30.304837: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-06-18 03:21:30.307133: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187] 0
2019-06-18 03:21:30.308538: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0: N
2019-06-18 03:21:30.320300: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 9578 MB memory) -> physical GPU (device: 0, name: TITAN Xp, pci bus id: 0000:b3:00.0, compute capability: 6.1)
Successfully setup the session and load the graph…(b)
loading test_image…
loading the image is done
2019-06-18 03:21:37.759369: F .\tensorflow/core/kernels/conv_2d_gpu.h:935] Non-OK-status: CudaLaunchKernel( SwapDimension1And2InTensor3UsingTiles<T, kNumThreads, kTileSize, kTileSize, conjugate>, total_tiles_count, kNumThreads, 0, d.stream(), input, input_dims, output) status: Internal: invalid configuration argument

Can anybody provide some help?

hi puj, Do you resolve this issue now? I encountered the same issue with tensorflow-gpu 1.14, cuda10.0. Appreciate with any clue.