Tensor packing and cryptic errors

Hi ezbDoubleZero, thanks for bring this to our attention! Let me try to help you with your use cases:

please refer to the fusion examples in our c++ frontend

High level suggestions for your use cases:

  1. If you need to use the runtime fusion engine, tensors need to be in fully packed NHWC layout, as this is the native tensor core layout, you can use code like below to compute the strides
    int64_t xDim = { n, c, h, w };
    int64_t xStr = { h * w * c, 1, w * c, c };
    Regarding your comments about channels need to be 1, that’s not a requirement. for the float tensor type you are using, you need to make sure input and output channels are multiple of 4 for Volta/Turing GPU or it can be any number for Ampere GPU with the latest cuDNN 8.4.0

  2. for convolutions, use CUDNN_CROSS_CORRELATION mode if you can - the other mode is not supported in the runtime fusion engine right now.

With 1 and 2, conv → activation should work.

  1. we have been working on improving the documentation,
    Developer Guide :: NVIDIA Deep Learning cuDNN Documentation
    See the limitation of “The input tensor to a Resample operation should not be produced by another operation within this graph, but should come from global memory.”. This means it’s currently not possible to fuse a resample directly at the output of a convolution. This is because the spatially neighboring pixels are not always available with the implicit-gemm convolution algorithm being used.

  2. we are working on adding pooling examples in the frontend

  3. Thanks for catching the documentation issues, our engineers will fix them asap.

Let us know if you have any other issues