Problems moving to cuDNN Frontend

winkler · September 17, 2025, 2:30pm

We are working on moving the implementation of our convolutional layer from the legacy cuDNN API (e.g. cudnnConvolutionBiasActivationForward) to cuDNN Frontend.

However, we are experiencing severe problems with the new API.

Internally our tensors are stored in NCHW layout and hence we are using this layout with cuDNN as well. We never encountered any problem with the legacy API with this layout. However, with the new API some configurations are just failing with NCHW layout with “no engine configuration found”. A simple example would be a stand-alone sigmoid activation with [N=2, C=1, H=1, W=2]. We could only workaround this problem by changing the shape of the tensor (which is no problem due to the nature of the activation operation).

The same problem occurs with the slice operation, where unfortunately changing the shape of the tensor is not an option.

Most (not all, e.g. activation seems to reliably find engines with 3 dimensional tensors only) problems vanish if we use NHWC layout with cuDNN Frontend (we tried that because tensor cores are optimized for this layout). However, this is a big issue for us, as a move to NHWC would imply a major rewrite of our codebase.

Another problem is performance. Basically, all operations are slower with the NCHW layout. Even slower than the legacy API (we implemented autotuning with cuDNN Frontend). E.g. we have a convolution with sigmoid activation that is up to a factor of 20 slower compared to the legacy API. We could work around this problem by using two separate graphs, but this introduces unnecessary additional memory usage.

The configuration for this graph is the following:

Input: N=1, C=256, H=80, W=128
Number of filters: 90
Kernel Size: 3x3
Padding: 1
Activation: Sigmoid

What is your advice on how to proceed? Will the situation regarding NCHW layout improve in cuDNN Frontend (resp. the new cuDNN Backend)? Or is a rewrite towards NHWC our only option?

yunzheq · September 18, 2025, 5:19pm

Hi @winkler,

Happy to help!

To help us understand and resolve your issue, please provide the following details:

Hardware Information: Which hardware are you using?
cuDNN Versions: Could you share the cuDNN and cuDNN frontend versions you are currently using?
Reproduction Script: Do you have a simple script that we can use to reproduce the issue?
API Log: Please enable cuDNN frontend and cuDNN backend logging and attach the API log to this thread.

For API logging, you could use the following.

// For cudnn_frontend
export CUDNN_FRONTEND_LOG_FLIE=fe.log
export CUDNN_FRONTEND_LOG_INFO=1

// For cudnn_backend
export CUDNN_LOGLEVEL_DBG=3
export CUDNN_LOGDEST_DBG=be.log

Moreover, engineers monitor and track issues on the cuDNN frontend GitHub more often. Utilizing the provided bug report template there allows engineers to gather all necessary information, which can lead to a faster response time for future inquiries.

Thanks!

Topic		Replies	Views
Problem using NHWC layout format in cuDNN cuDNN	3	958	October 12, 2021
Many problems with the new batch normalization API cuDNN	1	810	February 8, 2019
cuDNN6.0: NCHW vs. NHWC GPU-Accelerated Libraries	0	2028	May 22, 2017
Problem using a convolution filter formatted with a NHWC layout format in cuDNN cuDNN	0	640	May 17, 2021
TensorRT NCHW vs cuDNN NHWC TensorRT tensorrt	3	1862	August 3, 2023
Unsolvable, see posted link below why: CUDNN_TENSOR_NCHW vs. CUDNN_TENSOR_NHWC GPU-Accelerated Libraries	3	1932	March 9, 2017
Convolution incompatible with NHWC cuDNN	3	111	August 30, 2024
NHWC vs NCHW convolution cuDNN	4	4901	January 29, 2020
cudnnPoolingForward() tensor format support Jetson TK1	2	579	October 18, 2021
Identify cuDNN tensor format conversions for use by layout optimizers cuDNN	1	563	July 16, 2020

Problems moving to cuDNN Frontend

Related topics