Conceptual questions about how to use cuDNN v9

hsnyder · September 9, 2024, 3:51pm

Hi everyone. I have two beginner-level questions about how to use the cuDNN graph API:

1. How big should individual graphs be?

The graph API is organized around building a graph of operations, finding an execution engine that can handle them, and running that. How big should a graph be? I gather that it’s not supposed to be the entire neural network (e.g. not something resnet-50 sized), but it also seems like it should be more than just a single conv+activation or whatever. How do I determine how many components to include in a single graph? For example, how would I go about finding the answers to questions like these:

Should a resnet block (e.g. norm, conv, activation, conv, dropout, skip connection) be all one cuDNN graph?
Should a transformer encoder block (e.g. norm, SPDA, skip connection, norm, MLP, skip connection) be all one cuDNN graph?
Or, continuing with the transfomer encoder thought, should the SPDA be it’s own graph, and the MLP be it’s own graph, etc?

2. I don’t understand engine heuristic mode A and B, can someone explain?

I wasn’t able to follow the documentation here: cudnn_graph Library — NVIDIA cuDNN v9.4.0 documentation. Could someone please explain what each mode does? Specifically I’m not clear what “inference time on the CPU” means. Is a neural network involved in the selection of execution plan, and that neural network runs on the CPU? If that’s so, then the time cost of this inference could be amortized over many subsequent plan executions, making mode B generally better if you’re going to run the graph more than once… right? Or, have I completely misinterpreted this?

Thanks!

hsnyder · September 14, 2024, 5:20pm

I believe I’ve gotten partway towards answering my own questions. Posting again in case it is helpful to others.

How big should individual graphs be?

Definitely less than the entire graph! As far as I can tell, you need to manually look at the list of supported graph patterns (Graph API — NVIDIA cuDNN v9.4.0 documentation) and limit yourself to those.

kiwimanshare · January 3, 2025, 2:42pm

Hello,

Question:
Do you now work with multiple graphs and run those sequential?

Topic		Replies	Views
Differences between cudnn's graph Api and the previous one cuDNN cudnn	0	56	July 20, 2024
How to use graph API for cuDNN to do layer normalization? cuDNN	1	109	February 28, 2025
How use the cudnn graph API for do a convolution cuDNN cudnn	2	262	February 15, 2025
Using cuDNN Backend to Create a Fused Attention fprop Graph cuDNN cudnn	6	363	January 3, 2025
Getting Started with CUDA Graphs Technical Blog	11	2705	January 8, 2024
How to support a new activation in the Graph API cuDNN	0	58	July 17, 2024
Fail to call cuDNN or cuBLAS routine with explicit Graph APIs in CUDA Graph CUDA Programming and Performance	0	424	May 19, 2022
Just Released: NVIDIA cuDNN 9.7 Technical Blog cudnn	0	144	January 31, 2025
CUDNN_STATUS_NOT_SUPPORTED when finalizing engine config for simple ReLU graph cuDNN cudnn	3	362	July 30, 2025
cuDNN Bug Report: Backend Graph API Conv+Bias Fusion Returns NOT_SUPPORTED cuDNN cuda , cudnn	1	132	November 3, 2025

Conceptual questions about how to use cuDNN v9

Related topics