cuDNN v8 backend API for Convolution

The GTC presentation on cuDNN v8 hinted at an open-source C++ API for cuDNN. Where can I find it?

Is there a convolution sample that uses the new backend API? I can’t find any in the cudnn_v8_samples directory. The documentation isn’t detailed enough to guess my way through either.

  1. The GTC cuDNN 8 slide 29 uses INT64 type for UID. The developer guide uses text as UID. Can you please elaborate on what type UID is?

  2. Can UIDs be reused in different operation graphs? Two completely different operation graphs. Is the UID local to each operation graph or holds across all tensors across all operation graphs?

  3. The GTC slide 31 used CUDNN_TYPE_OPERATION but it isn’t there anymore. What should be used instead?

  4. What is CUDNN_ATTR_CONVOLUTION_SPATIAL_DIMS? Is it supposed to be an array of spatial dimensions (HW or DHW) or the number of spatial dimensions (2 or 3 respectively)?

  5. How to set the number of groups for convolution?

  6. I have been trying to guess my way through. cuDNN 8.0.2 is throwing CUDNN_STATUS_BAD_PARAM in line 105 while calling cudnnBackendFinalize on a convolution forward descriptor. I am unable to diagnose the problem. Can you please look into it?

Code: cuda_common.hpp · GitHub

1 Like

Hi @YashasSamaga,
I have noted your query and checking on this. Please allow me some time.
Meanwhile just wanted to check if you are referring to the same link

Thanks!

GTC Slides: http://developer.download.nvidia.com/video/gputechconf/gtc/2020/presentations/s21685-cuDNN-v8-New-Advances-in-Deep-Learning-Acceleration-APIs-Optimizations-and-How-to-Tackle-the-Future-Challenges-in-Hardware-and-Software.pdf

Developer Guide: Documentation Archives :: NVIDIA Deep Learning cuDNN Documentation

Doesn’t look like the same link but is probably the same.

Few more questions:

  1. Will cuDNN always fuse bias addition step with convolution if asked to?

  2. Is it possible to check what operations have been fused in a selected engine?

  3. Frameworks have their own fused kernels for bias, eltwise addition and activations. Prior to cuDNN 8, OpenCV used to use cuDNN’s fused convolution path if available. Otherwise the convolution would be done by cuDNN followed by a single fused kernel that would do bias addition, elementwise operations and activation.

    So now if cuDNN 8 chooses an engine where bias addition is not fused with convolution, there would be three operations: cuDNN conv, cuDNN bias addition and end-user’s fused eltwise activation kernel. A faster solution would be: cuDNN conv and fused bias eltwise activation kernel.

    How to decide when to use cuDNN to fuse the operations and when to use end-user’s fused kernels?

@AakankshaS has there been any progress on this? cuDNN 8 with v7 API is considerably slower than cuDNN 7. I have for now blamed it on the v7 API as the release notes explicitly states that v7 API doesn’t take care of fused convolutions. I am trying to implement with the new backend API but have been struggling to resolve errors. The new API isn’t very developer friendly (very difficult to debug).

Hi @YashasSamaga,
Apologies for the delayed response.
Here are the answer to your queries.

  1. The actual type should be int64, the text is just for easier illustration

  2. Yes, you can reuse UIDs. All operation graphs are independent of each other. We don’t cache UIDs globally.

  3. The recommendation is to call it this way cudnnBackendSetAttribute(opGraph, CUDNN_ATTR_OPERATIONGRAPH_OPS, CUDNN_TYPE_BACKEND_DESCRIPTOR, numOps, ops);

  4. It’s an int64_t value describing the number of spatial dims

  5. Refer to the v8 conv sample, there is a new group dim in the tensor descriptor, so instead of the old [N, C, H, W] we have now:
    X: [N,G,C,(D),H,W], with D being optional
    W: [G,K,C,T,R,S], with T being optional
    Y: [N, G, K, (O), P, Q], with O being optional

  6. One guess may be the group dim is missing from the tensor descriptors, see above
    If the tensors are set up according to above and you still see the issues, try adding the following code for setting alphe/beta

if (computeType == CUDNN_DATA_DOUBLE) {
CHECK_CUDNN(cudnnBackendSetAttribute(opDesc, CUDNN_ATTR_OPERATION_CONVOLUTION_FORWARD_ALPHA, CUDNN_TYPE_DOUBLE, 1, &alpha));
CHECK_CUDNN(cudnnBackendSetAttribute(opDesc, CUDNN_ATTR_OPERATION_CONVOLUTION_FORWARD_BETA, CUDNN_TYPE_DOUBLE, 1, &beta));
} else {
float alphaf = float(alpha);
float betaf = float(beta);
CHECK_CUDNN(cudnnBackendSetAttribute(opDesc, CUDNN_ATTR_OPERATION_CONVOLUTION_FORWARD_ALPHA, CUDNN_TYPE_FLOAT, 1, &alphaf));
CHECK_CUDNN(cudnnBackendSetAttribute(opDesc, CUDNN_ATTR_OPERATION_CONVOLUTION_FORWARD_BETA, CUDNN_TYPE_FLOAT, 1, &betaf));
}

Thanks!

Thank you for the answers. I downloaded the latest libcudnn8-doc_8.0.2.39-1+cuda11.0_amd64.deb from https://developer.nvidia.com/rdp/cudnn-download and installed. I am not able to find the v8 conv sample in the cudnn_samples_v8 directory. There is a no v8 conv sample but there is a sample that uses the v7 API. I did a grep -lr "Backend" in that directory and got no results.

Is the v8 conv sample part of the next release or the debs in the aforementioned link are outdated?

Hi @YashasSamaga,
Are you following the below link

Thanks!

Yes. I have also looked at the files in dpkg -x without installing.

  1. Downloaded libcudnn8-doc_8.0.2.39-1+cuda11.0_amd64.deb
  2. dpkg -x libcudnn8-doc_8.0.2.39-1+cuda11.0_amd64.deb tempdir
  3. cd tempdir/src/cudnn_samples_v8

There is no v8 sample but there is a v7 sample. It’s not an installation mistake. The correct files aren’t even there in the package.

You can verify it systematically by extracting the package contents and executing grep -lr "Backend". There is not even a single instance of that word which implies that there is no v8 sample.

I was directly in touch with cudnn@nvidia.com. They said that the v8 conv sample is packaged with the latest cuDNN release. But I am not able to find it.

Hi @YashasSamaga,
I think you have not downloaded the package with samples.


Can you please check that.
Thanks!

There are three packages:

  • runtime library
  • developer library
  • code samples and user guide

I have installed all three for Ubuntu x86_64 target.

I also downloaded the PPC package out of curiosity and checked its contents. It doesn’t have the v8 sample.

Hi @YashasSamaga,
The issue has been reported and fix will be available in future releases.
Please stay tuned.

Thanks!