It seems that the handle return by cublasLtCreate() is static. So It is recommended to use blasLt handle without call cublasLtDestroy(), just let it auto release when the program terminate ?
In doc, it will implicitly call cudaDeviceSynchronize().
“Because cublasLtCreate() allocates some internal resources and the release of those resources by calling cublasLtDestroy() will implicitly call cudaDeviceSynchronize(), it is recommended to minimize the number of times these functions are called.”