Cudnn backend api for fused op

wo5028928 · April 13, 2021, 9:01am

Hi,

I'm trying to create a conv+bias fused operation based on cuDNN backend API.

But it throws an error, that is

fuseOpDemo.cpp(476): Error code: 9, reason: CUDNN_STATUS_NOT_SUPPORTED

I have tried single convolution op and single bias(pointwise add) op. Only convolution returns the correct result, while bias throws the same error.

I have also tried [https://github.com/NVIDIA/cudnn-frontend](https://cudnn frontend ).  It also throws some errors, the report is pitched below.

THX

Code:
fuseOpDemo.cpp (18.5 KB)
Log:
log.txt (24.6 KB)
ENV:
2070+cuda11.1+cudnn8.1.1

yanxu · April 15, 2021, 5:48pm

Hi wo5028928,

Thanks for your interest trying out cudnn fusion! There might be several issues here:

Can you install cuda 11.2u1 or later, and make sure libnvrtc.so is visible in your LD_LIBRARY_PATH? Also make sure you use cudnn 8.1.1 or later compiled against cuda 11.2u1 or later.
Since we generate fusion kernels targeting tensor cores, input/output conv channels need to be a multiple of 8 if you use fp16 tensors or multiple of 4 if you use fp32 tensors .
I see in your example, you are using fp32 tensors, this is only supported on Ampere GPU currently (through TF32 tensor cores). These hardware units are not available on Turing GPUs.
I see in your example, you are using NCHW layout (judging from the way you compute strides), however NHWC (i.e. channels last) layout is needed to utilize tensor cores.

If you make sure (1), you should be able to run the fusion samples without issue. For (2) - (4) you can follow the examples in the fusion sample.

github.com

NVIDIA/cudnn-frontend/blob/main/samples/fusion_sample.cpp

/*
 * Copyright (c) 2020, NVIDIA CORPORATION. All rights reserved.
 *
 * Permission is hereby granted, free of charge, to any person obtaining a
 * copy of this software and associated documentation files (the "Software"),
 * to deal in the Software without restriction, including without limitation
 * the rights to use, copy, modify, merge, publish, distribute, sublicense,
 * and/or sell copies of the Software, and to permit persons to whom the
 * Software is furnished to do so, subject to the following conditions:
 *
 * The above copyright notice and this permission notice shall be included in
 * all copies or substantial portions of the Software.
 *
 * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
 * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
 * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
 * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
 * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
 * DEALINGS IN THE SOFTWARE.

This file has been truncated. show original

Let us know how things go for you!

wo5028928 · April 20, 2021, 9:48am

@yanxu

Thank you for your advice!

I have finished some tests, but it seems not good.

1.I’m using A100+CUDA11.2.2+CUDNN8.1.1 now, and LD_LIBRARY_PATH set.

2.cudnn_frontend tests are all passed. That’s great!

3.I have modified the tensor format to NHWC, and the tensors’ shape is 1x8x8x8. But it failed at the same place with the same error code.

Could you please provide a demo based on backend API? I know it’s almost the same between frontend and backend. But I want to figure out why and how.

Thank you again!

yanxu · April 20, 2021, 6:56pm

Hi @wo5028928 , can you post your latest code and the API log?
(follow instructions here Developer Guide :: NVIDIA Deep Learning cuDNN Documentation )
We can take a look what change is needed to get it to run

wo5028928 · April 21, 2021, 2:51am

Hi @yanxu

fuseOpDemo.cpp (14.6 KB)

Sorry, I forgot to upload it.

wo5028928 · May 20, 2021, 8:49am

@yanxu hi. Any updates?

yanxu · July 8, 2021, 7:01am

Hi @wo5028928 sorry for the delay We have filed an internal bug and asking an engineer to take a look what’s going on. Will get back to you soon!

gautamj · July 15, 2021, 7:28pm

Hi @wo5028928 ,

I went through your updated code file. There are still some things that need to be corrected to run a convBias fusion. The main ones are listed below:

The bias dimensions should be checkCUDNN(createTensor(&bDesc, 1, k_och, 1, 1, ‘b’)); The commented out statement.
cudnnConvolutionMode_t mode = CUDNN_CROSS_CORRELATION;
alignment should be 16 for each tensor.
Add the cudnn handle to the plan too.
devptrs and uids are incorrect. (can refer to the provided implementation)
the workspace needs to be allocated and provided to the varPack.
I assume you want the outputData tensor to be bound to Y which should be the final output of convBias. I have modified the implementation to reflect that.
fuseOpDemo.cpp (20.7 KB)

I’m also attaching a working code snippet that I created by modifying the initial fusedOpDemo.cpp. I have marked down all the changes in the code by comments beginning with "CUDNN : ". Let us know how using that code goes for you.

system · September 13, 2021, 7:28pm

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Cudnn fused conv+bias cuDNN	3	2006	December 9, 2021
cuDNN v8 backend API for Convolution cuDNN	11	1791	August 21, 2020
CUDNN_STATUS_NOT_SUPPORTED for cudnnConvolutionBiasActivationForward() cuDNN	3	2276	January 19, 2021
Fusion of convolution and BatchNorm cuDNN	4	1894	April 29, 2022
Fuse Operators cuDNN	6	2198	July 21, 2021
Cudnn conv+bias fusion using backend cuDNN	1	763	March 31, 2023
Get CUDNN_STATUS_BAD_PARAM while executing cudnnFusedOpsExecute() cuDNN debugging-and-troubleshooting	6	1509	October 23, 2022
Failed to get convolution algorithm. This is probably because cuDNN failed to initialize cuDNN	29	51521	October 12, 2021
cuDNN v6 INT8 convolution failing with CUDNN_STATUS_NOT_SUPPORTED cuDNN	12	5207	March 3, 2020
Cudnn-10.2-linux-x64-v8.1.0.77.tgz requires CUDA 11? cuDNN	3	789	February 5, 2021

Cudnn backend api for fused op

Related topics