load_matrix_sync report Warp Misaligned Address under

It will report following error only if “compiled with -g -G” && “C is half data type”.
“CUDA_EXCEPTION_6, Warp Misaligned Address”.
gdb show the address is 512byte aligned for a , b and C. SDK is 9.1.85.
one possible issue might be the hash function which is used to implement load_matrix_sync.
following is the cuda-gdb output
18 wmma::load_matrix_sync(a_, a, 16);
(cuda-gdb) print a
$3 = (@generic half * @parameter) 0x7fffed600000
(cuda-gdb) print b
$4 = (@generic half * @parameter) 0x7fffed600200
(cuda-gdb) print c
$5 = (@generic half * @parameter) 0x7fffed600400
gcc is
Configured with: …/src/configure -v --with-pkgversion=‘Ubuntu 5.4.0-6ubuntu1~16.04.5’ --with-bugurl=file:///usr/share/doc/gcc-5/README.Bugs --enable-languages=c,ada,c++,java,go,d,fortran,objc,obj-c++ --prefix=/usr --program-suffix=-5 --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-libmpx --enable-plugin --with-system-zlib --disable-browser-plugin --enable-java-awt=gtk --enable-gtk-cairo --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-5-amd64/jre --enable-java-home --with-jvm-root-dir=/usr/lib/jvm/java-1.5.0-gcj-5-amd64 --with-jvm-jar-dir=/usr/lib/jvm-exports/java-1.5.0-gcj-5-amd64 --with-arch-directory=amd64 --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --enable-objc-gc --enable-multiarch --disable-werror --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu

if you’re attempting to report a bug, or ask a question, you should probably provide a short, complete example.