Is it possible to fuse gather-gemm-scatter using cuDNN?

My program gather the input first, then perform gemm, and scatter the gemm result to get the output. I want to fuse these three kernels, and find cutlass provide such fusion. Is it possible to implement such fusion using cuDNN?

Thanks for the request! This is not supported today, but it’s under consideration for our future roadmap.

Do you have any context that you’d be able to share about your use case? E.g. what workload, what framework, etc?

1 Like

Excuse me, is cudnn now supported for gather-gemm-scatter?

I don’t think so. You can look at the cutlass example https://github.com/NVIDIA/cutlass/tree/main/examples/36_gather_scatter_fusion if you want to implement your own.