Compute-sanitizer infinite loops when instrument a cuDNN function

I try to use compute-sanitizer to instrument the following tensorflow program.

import tensorflow as tf
from keras import layers
import os
os.environ["TF_DISABLE_RZ_CHECK"] = "1"
os.environ["TF_GPU_ALLOCATOR"] = "cuda_malloc_async"
tf.keras.backend.set_image_data_format('channels_first')
gpus = tf.config.experimental.list_physical_devices('GPU')
for gpu in gpus:
    tf.config.experimental.set_memory_growth(gpu, True)
tf.config.run_functions_eagerly(True)

tensor = tf.zeros([1, 2, 859043])
model = layers.Conv1D(filters=2, kernel_size=524287, strides=1, groups=2)
model(tensor)

print("DONE")

It stuck indefinitely for an hour (without compute-sanitizer it finishes in seconds)
I use NVBit to instrument the program (NVBit also stuck it; I presume this is because they both rely on dynamic instrumentation).
The last executed kernel is the following.

MEMTRACE: CTX 0x00000000050f8db0 - LAUNCH - Kernel pc 0x00007ff9a038f900 - Kernel name sm80_xmma_fprop_implicit_gemm_indexed_tf32f32_tf32f32_f32_nhwckrsc_nhwc_tilesize64x64x64_stage4_warpsize1x4x1_g16_tensor16x8x8_execute_kernel__5x_cudnn - grid launch id 12 - grid size 1,5231,1 - block size 128,1,1 - nregs 166 - shmem 132096 - cuda stream id 1276264096

which seems to be a cudnn kernel.

Since both compute-sanitizer and cuDNN are pretty closed, I don’t know how to debug this.

Thank you!