Template in cuda

i meet with some strange question. that is
when declare a template function in .h file like bellow

template<int32_t BLOCK_SIZE> __global__ void test_shabal_with_share_mem(shabal_info);

and realize it in cu file


template<int32_t BLOCK_SIZE> __global__ void shabal256::test_shabal_with_share_mem(shabal256::shabal_info info){
    ...
}

but when compiler the symbol is not found

void __cdecl shabal256::test_shabal_with_share_mem<8>(struct shabal256::shabal256_chunk_data)" (??$test_shabal_with_share_mem@$07@shabal256@@YAXUshabal256_chunk_data@0@@Z)

However, if define the function just in the .h file

  template<int32_t BLOCK_SIZE> __global__ void test_shabal_with_share_mem(shabal_info info){
      ...
  }

the compiler will find that symbol, and pass normally , why ?