Fuse Operators

sunggg · January 28, 2021, 1:31am

Hi, all.
I’m a beginner cuDNN developer and I’m trying to fuse operators for inference.
When going through the documentation (Developer Guide :: NVIDIA Deep Learning cuDNN Documentation), I learned that this might be done by using backend API. (Please correct me if I’m wrong. I’m not 100% sure because the “seem-to-be” relevant functions are in training library.)
Although this post provides a good high-level overview, I’m still not sure how to actually implement operator fusion.
For example, it seems like the following functions might be necessary.

cudnnCreateFusedOpsPlan()
cudnnCreateFusedOpsVariantParamPack()
cudnnCreateFusedOpsConstParamPack()
cudnnMakeFusedOpsPlan()
cudnnFusedOpsExecute()

Are these sufficient enough to get the right results?
Also, I’m not sure how to pass the input/output tensor information and detailed parameters for convolution, relu, etc.

I went through cuDNN sample codes to find the code example but unfortunately, I had no luck.
If anyone can provide any hint/feedback/idea/code sample, that will be greatly helpful.

Thank you in advance!

yanxu · February 9, 2021, 1:28am

Hi Sunggg, cuDNN engineer here! What you have listed are our older fusion APIs (from the v7 era). If you are starting, we suggest you to start with the new v8 graph API via the frontend C++ wrapper for better support and convenience.
You can find the wrapper and fusion code samples here

(search for run_conv_bias_add_activation in cudnn-frontend/conv_sample.cpp at main · NVIDIA/cudnn-frontend · GitHub)

Let us know if you have more questions!

sunggg · February 9, 2021, 7:45pm

Finally :) Thank you so much!
This is exactly what I’ve been looking for.

One quick question.
Say we provided operations we want to fuse.
Do we have guarantee that all of them will be fused? Or does the engine perform some kind of pattern matching and try its best?
For example, if we provide (conv, batchnorm, relu, add),
I’m wondering if only conv+batchnorm+relu but leave add alone since the engine only supports fused operation for conv+batchnrom+relu.

Appreciate your help.

yanxu · February 10, 2021, 8:55am

Hi Sunggg,

Because batchnorm involves several reductions and broadcasts across multiple cuda blocks, currently it cannot be fused as a whole into the end of a convolution.

For more details on supported fusion combinations in v8.1.0, refer to our release notes:
https://docs.nvidia.com/deeplearning/cudnn/release-notes/rel_8.html#rel-810
If a pattern is within the support, we’ll fuse everything into one kernel.
Note we’ll keep adding support to more things as we develop, so we may support a different way of fusing BN into conv in a future release.

sunggg · February 14, 2021, 3:47am

Thank you for the reference.

When checking the reference, I could find this.

With runtime op fusion, the engine can generate and compile fused tensor-core kernels on the fly for the specified operation graph during the execution plan finalization stage. Some of the operation graph patterns supported in this preview are: convolution or matrix multiplication operation with arbitrary combination of one or more pointwise operations, and reduction operations fused onto the output tensor.

Although this gives a high-level idea of the supported patterns, I want more guarantee that all the operations I provide are fused.
Is there any way to check whether all operations are successfully fused?
If there is not, I’m wondering if there is a complete set of supported patterns.

Thank you so much for your help. It is a great help!

yanxu · March 31, 2021, 5:28pm

Our engineers has been working on the documentation. In the mean time, you can refer to the recent release notes for fusion support

We have also released a new version of C++ frontend that includes a bunch of fusion samples - we recommend you to try it out

system · July 21, 2021, 7:38pm

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Dose cuDNN support operator fusion or graph fusion cuDNN	3	796	October 12, 2021
Cudnn backend api for fused op cuDNN cudnn	8	2200	September 13, 2021
Fusion of convolution and BatchNorm cuDNN	4	2002	April 29, 2022
Cudnn fused conv+bias cuDNN	3	2084	December 9, 2021
Cudnn conv+bias fusion using backend cuDNN	1	783	March 31, 2023
Question regarding fusion engine in cuDNN frontend library cuDNN	2	1747	August 30, 2021
How to use cudnn backend to train a cnn network with relu or bn layers? cuDNN	8	928	July 22, 2021
Problem of fusing Convolution and BatchNorm cuDNN	0	627	December 29, 2021
Does cudnnFusedOps_t not support backward fusion? cuDNN	1	706	June 4, 2022
Problems encountered in executing cudnnFusedOpsExecute() cuDNN	0	484	September 26, 2022

Fuse Operators

Related topics