Cudnn backend api for fused op


I'm trying to create a conv+bias fused operation based on cuDNN backend API.

But it throws an error, that is

fuseOpDemo.cpp(476): Error code: 9, reason: CUDNN_STATUS_NOT_SUPPORTED

I have tried single convolution op and single bias(pointwise add) op. Only convolution returns the correct result, while bias throws the same error.

I have also tried [](https://cudnn frontend ).  It also throws some errors, the report is pitched below.


fuseOpDemo.cpp (18.5 KB)
log.txt (24.6 KB)

Hi wo5028928,

Thanks for your interest trying out cudnn fusion! There might be several issues here:

  1. Can you install cuda 11.2u1 or later, and make sure is visible in your LD_LIBRARY_PATH? Also make sure you use cudnn 8.1.1 or later compiled against cuda 11.2u1 or later.
  2. Since we generate fusion kernels targeting tensor cores, input/output conv channels need to be a multiple of 8 if you use fp16 tensors or multiple of 4 if you use fp32 tensors .
  3. I see in your example, you are using fp32 tensors, this is only supported on Ampere GPU currently (through TF32 tensor cores). These hardware units are not available on Turing GPUs.
  4. I see in your example, you are using NCHW layout (judging from the way you compute strides), however NHWC (i.e. channels last) layout is needed to utilize tensor cores.

If you make sure (1), you should be able to run the fusion samples without issue. For (2) - (4) you can follow the examples in the fusion sample.

Let us know how things go for you!


Thank you for your advice!

I have finished some tests, but it seems not good.

1.I’m using A100+CUDA11.2.2+CUDNN8.1.1 now, and LD_LIBRARY_PATH set.

2.cudnn_frontend tests are all passed. That’s great!

3.I have modified the tensor format to NHWC, and the tensors’ shape is 1x8x8x8. But it failed at the same place with the same error code.

Could you please provide a demo based on backend API? I know it’s almost the same between frontend and backend. But I want to figure out why and how.

Thank you again!

Hi @wo5028928 , can you post your latest code and the API log?
(follow instructions here Developer Guide :: NVIDIA Deep Learning cuDNN Documentation )
We can take a look what change is needed to get it to run

Hi @yanxu

fuseOpDemo.cpp (14.6 KB)

Sorry, I forgot to upload it.