If I would like to call BatchGemm(like semantics in cublasGemmBatchedEx) fusion kernels, how could I achieve using Backend API?
I cannot find examples on batch gemm array in latest cudnn-frontend, and no description in CuDNN document, could you shed some light on this usage?
Thanks
Gino
Hi,
The Batch Gemm operation is described in our API reference doc here,
A sample matmul example is here at
S.devPtr);
checkCudaErr(cudaDeviceSynchronize());
checkCudaErr(cudaMemcpy(Y.hostPtr, Y.devPtr, sizeof(Y.hostPtr[0]) * Ysize, cudaMemcpyDeviceToHost));
checkCudaErr(cudaDeviceSynchronize());
std::cout << "\n========================================================================================\n";
}
TEST_CASE("PoolScaleBiasAct_int8 sample", "[frontend][fusion][PoolScaleBiasAct_int8]") {
std::cout << "TEST_CASE PoolScaleBiasAct_int8 :: Sample resample runtime fusion code with backend API" << std::endl;
INFO("TEST_CASE :: Sample resample runtime fusion code with backend API");
int64_t xTensorDim[] = {16, 16, 32, 32};
int64_t yTensorDim[] = {16, 16, 16, 16};
int64_t bTensorDim[] = {1, 16, 1, 1}; // bias
int64_t sTensorDim[] = {1, 16, 1, 1}; // scale
cudnnDataType_t compType = CUDNN_DATA_FLOAT;
#if (CUDNN_VERSION >= 8500)
One can modify the batch size by changing the example in accordance to the rules specified in the API.
Thank you.
Hello, I noticed the example already have ability to achieve the same semantics as bias API “cublasGemmStridedBatchedEx”, but my point is how could Backend API support same semantics of bias API “cublasGemmBatchedEx”. The main issue is how could user configure the pointer array for A & B?
Thanks
Gino