Cudnn backend api for fused op

Hi,

I'm trying to create a conv+bias fused operation based on cuDNN backend API.

But it throws an error, that is

fuseOpDemo.cpp(476): Error code: 9, reason: CUDNN_STATUS_NOT_SUPPORTED

I have tried single convolution op and single bias(pointwise add) op. Only convolution returns the correct result, while bias throws the same error.

I have also tried [https://github.com/NVIDIA/cudnn-frontend](https://cudnn frontend ).  It also throws some errors, the report is pitched below.

THX

Code:
fuseOpDemo.cpp (18.5 KB)
Log:
log.txt (24.6 KB)
ENV:
2070+cuda11.1+cudnn8.1.1

Hi wo5028928,

Thanks for your interest trying out cudnn fusion! There might be several issues here:

  1. Can you install cuda 11.2u1 or later, and make sure libnvrtc.so is visible in your LD_LIBRARY_PATH? Also make sure you use cudnn 8.1.1 or later compiled against cuda 11.2u1 or later.
  2. Since we generate fusion kernels targeting tensor cores, input/output conv channels need to be a multiple of 8 if you use fp16 tensors or multiple of 4 if you use fp32 tensors .
  3. I see in your example, you are using fp32 tensors, this is only supported on Ampere GPU currently (through TF32 tensor cores). These hardware units are not available on Turing GPUs.
  4. I see in your example, you are using NCHW layout (judging from the way you compute strides), however NHWC (i.e. channels last) layout is needed to utilize tensor cores.

If you make sure (1), you should be able to run the fusion samples without issue. For (2) - (4) you can follow the examples in the fusion sample.

Let us know how things go for you!

@yanxu

Thank you for your advice!

I have finished some tests, but it seems not good.

1.I’m using A100+CUDA11.2.2+CUDNN8.1.1 now, and LD_LIBRARY_PATH set.

2.cudnn_frontend tests are all passed. That’s great!

3.I have modified the tensor format to NHWC, and the tensors’ shape is 1x8x8x8. But it failed at the same place with the same error code.

Could you please provide a demo based on backend API? I know it’s almost the same between frontend and backend. But I want to figure out why and how.

Thank you again!

Hi @wo5028928 , can you post your latest code and the API log?
(follow instructions here Developer Guide :: NVIDIA Deep Learning cuDNN Documentation )
We can take a look what change is needed to get it to run

Hi @yanxu

fuseOpDemo.cpp (14.6 KB)

Sorry, I forgot to upload it.

@yanxu hi. Any updates?

Hi @wo5028928 sorry for the delay We have filed an internal bug and asking an engineer to take a look what’s going on. Will get back to you soon!

Hi @wo5028928 ,

I went through your updated code file. There are still some things that need to be corrected to run a convBias fusion. The main ones are listed below:

  1. The bias dimensions should be checkCUDNN(createTensor(&bDesc, 1, k_och, 1, 1, ‘b’)); The commented out statement.
  2. cudnnConvolutionMode_t mode = CUDNN_CROSS_CORRELATION;
  3. alignment should be 16 for each tensor.
  4. Add the cudnn handle to the plan too.
  5. devptrs and uids are incorrect. (can refer to the provided implementation)
  6. the workspace needs to be allocated and provided to the varPack.
  7. I assume you want the outputData tensor to be bound to Y which should be the final output of convBias. I have modified the implementation to reflect that.
    fuseOpDemo.cpp (20.7 KB)

I’m also attaching a working code snippet that I created by modifying the initial fusedOpDemo.cpp. I have marked down all the changes in the code by comments beginning with "CUDNN : ". Let us know how using that code goes for you.

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.