Cudnn fused conv+bias

lxq2.t · October 19, 2021, 3:13pm

Hi!
I’m trying to implement working conv+bias fused operation via backend api, and try to use example provided in another topic (Cudnn backend api for fused op - #8 by gautamj), but on finalizing of execution plan there always CUDNN_STATUS_UNSUPPORTED. In our production code, i have workaround with adding conv+add+add graph (with zero alpha2 on first add), but in case of two operations (just conv + add) we also have same error.

Can you suggest me what can be wrong?

Tested on:
card - Tesla T4/GTX 1080ti
cudnn - 8.2.2/8.2.4
cuda - 11.1/11.4 (with LD_PRELOAD for libnvrtc.so)

fuseOpDemo.cpp (20.7 KB)

Also I have a few questions:

we actively use inference with cuda streams, and very often when backendExecute called with same plan from multiple streams, we get wrong convolution results until backendExecute is protected with mutex. Is there any requirement to not use backend execute with same plan from multiple streams in parallel.
Where I can find descriptions of knobs for engine (what each knob doing)? For some engines there’s knobs like “EDGE” or some documentation for specific engine?
Is there any information for engines on required input/output tensor data type and format (like engine_5 will not work on NCHW format or with float data type)? On some engines with usage we have loss of precision while numerical note on execution plan was taken into account, may be there’s some other hint?

Thank you in advance.

gautamj · October 20, 2021, 12:55am

Hi @lxq2.t,

Thank you for using our API and posting here. The runtime fusion engine that I used in the forum : Cudnn backend api for fused op - #8 by gautamj is only supported for Volta and later GPU’s. GTX 1080Ti is too old. The issue with T4 is that it only supports half datatype and the file uses float datatype for all the tensors.

The corrected file is attached. fuseOpDemo_turing.cpp (20.7 KB) This should work on T4.
You can also look at other samples of fusion from our public repository and try samples from there:

github.com

NVIDIA/cudnn-frontend/blob/main/samples/fusion_sample.cpp

/*
 * Copyright (c) 2020, NVIDIA CORPORATION. All rights reserved.
 *
 * Permission is hereby granted, free of charge, to any person obtaining a
 * copy of this software and associated documentation files (the "Software"),
 * to deal in the Software without restriction, including without limitation
 * the rights to use, copy, modify, merge, publish, distribute, sublicense,
 * and/or sell copies of the Software, and to permit persons to whom the
 * Software is furnished to do so, subject to the following conditions:
 *
 * The above copyright notice and this permission notice shall be included in
 * all copies or substantial portions of the Software.
 *
 * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
 * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
 * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
 * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
 * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
 * DEALINGS IN THE SOFTWARE.

This file has been truncated. show original

For your questions:
We are working on 1 and 2 and will provide an update.
A new feature in upcoming release 8.3.0 is error reporting which will give much more informative errors like data type and format issues which might help you for 3.

lxq2.t · October 29, 2021, 12:38pm

@gautamj,
Thank you for example!
We further investigate problem with multithreaded call of backendExecute, and found that it seems to be required to create execution plan on separate handles, i.e. for each cudaStream we need to use separate execution plan and we can’t use plan created once. In case where we use plan created once (for ex. on one cudnn handle) we have mismatch of results when plan executed from multiple threads.
Can you tell me, we really need to create a separate cudnnHandle with execution plan for each stream if plan executed in parallel, since I did not find information about this in the documentation?

yanxu · December 9, 2021, 6:46am

Hi @lxq2.t , can you try the latest cuDNN 8.3.1 release? we have fixed a issue that we suspect to have caused the mismatches that you observed. However we are still developing a better testing method to check whether there are other remaining multi-thread issues. We will know with more certainty soon.

Topic		Replies	Views
Cudnn backend api for fused op cuDNN cudnn	8	2150	September 13, 2021
Fuse Operators cuDNN	6	2310	July 21, 2021
Fusion of convolution and BatchNorm cuDNN	4	1940	April 29, 2022
cuDNN v8 backend API for Convolution cuDNN	11	1834	August 21, 2020
Get CUDNN_STATUS_BAD_PARAM while executing cudnnFusedOpsExecute() cuDNN debugging-and-troubleshooting	6	1531	October 23, 2022
Using cuDNN Backend to Create a Fused Attention fprop Graph cuDNN cudnn	7	74	January 3, 2025
Question regarding fusion engine in cuDNN frontend library cuDNN	2	1714	August 30, 2021
Error creating engineConfig with cuDNN Graph API in custom implementation cuDNN cudnn	1	53	August 30, 2024
cuDNNv6: MNIST example compile errors cuDNN	16	15116	April 26, 2018
CUDA Parallel Convolution Scheduling Issues(cuDNN) cuDNN kernel , cudnn	2	33	April 29, 2025

Cudnn fused conv+bias

Related topics