Cannot understand this error log

Hi everyone, I’m working on a cudnn wrapper but unfortunately, I cannot wrap my head around this error.

I include hereafter both the code (the wrapper is written in Rust) and the log.

Driver Version: 455.32.00 - CUDA Version: 11.1 - cuDNN Version: 8.3.2

Code:

    let ctx = CudnnContext::new().unwrap();

    let x = TensorBuilder::default()
        .set_id(0)
        .set_data_type::<f32>()
        .set_dimensions(&[1, 1, 1, 1])
        .set_strides(&[1, 1, 1, 1])
        .build()
        .unwrap();
    let y = TensorBuilder::default()
        .set_id(1)
        .set_data_type::<f32>()
        .set_dimensions(&[1, 1, 1, 1])
        .set_strides(&[1, 1, 1, 1])
        .build()
        .unwrap();
    let z = TensorBuilder::default()
        .set_id(2)
        .set_data_type::<f32>()
        .set_dimensions(&[1, 1, 1, 1])
        .set_strides(&[1, 1, 1, 1)
        .build()
        .unwrap();

    let cfg = PointwiseCfgBuilder::default()
        .set_mode(PointwiseMode::Add)
        .build()
        .unwrap();

    let op = PointwiseBuilder::default()
        .set_cfg(cfg)
        .set_x(x)
        .set_y(y)
        .set_b(z)
        .build()
        .unwrap();

    let mut ops = Vec::new();
    ops.push(op);

    let graph = GraphBuilder::default()
        .set_context(ctx)
        .set_operations(ops)
        .build()
        .unwrap();

    let engine = EngineBuilder::default()
        .set_graph(graph)
        .set_global_index(0)
        .build()
        .unwrap();

    let engine_cfg = EngineCfgBuilder::default()
        .set_engine(engine)
        .build()
        .unwrap();

    let execution_plan = ExecutionPlanBuilder::default()
        .set_engine_cfg(engine_cfg)
        .build()
        .unwrap();

Log: cudnn_log.txt (63.6 KB)

I think the following warning may lead to the solution with high probability, but I cannot understand it.

W! CuDNN (v8302) function cudnnBackendFinalize() called:
w!     Info: Traceback contains 4 message(s)
w!         Warning: CUDNN_STATUS_NOT_SUPPORTED; Reason: process_knob_choices()
w!         Warning: CUDNN_STATUS_NOT_SUPPORTED; Reason: ptr.isSupported()
w!         Warning: CUDNN_STATUS_NOT_SUPPORTED; Reason: engine_post_checks(handle, *ebuf.get(), engine.getPerfKnobs(), req_size)
w!         Warning: CUDNN_STATUS_NOT_SUPPORTED; Reason: finalize_internal()
w! Time: 2022-02-23T11:32:17.900087 (0d+0h+0m+1s since start)
w! Process=2907403; Thread=2907404; GPU=NULL; Handle=NULL; StreamId=NULL.

EDIT: as of now, cuDNN only supports certain operation fusion pattern, and this is not one of those… am I correct?

Hi @frjnn this basically means this engine (together with the knobs you are using) is not able to support this problem that you passed in.
Specifically with the problem/engine you passed in, if you change the problem size to have C and K dimension being multiple of 4 for FP32 (and make sure they are in NHWC layout) it should be supported. This is because the kernel currently is expecting to be able to do wide vectorized loads of multiple elements at a time. We are working on a relaxation of this limitation targeting a future release in a month or two.

let x = TensorBuilder::default()
    .set_id(0)
    .set_data_type::<f32>()
    .set_dimensions(&[1, 4, 1, 1])
    .set_strides(&[4, 1, 4, 4])
    .build()
    .unwrap();
let y = TensorBuilder::default()
    .set_id(1)
    .set_data_type::<f32>()
    .set_dimensions(&[1, 4, 1, 1])
    .set_strides(&[4, 1, 4, 4])
    .build()
    .unwrap();
let z = TensorBuilder::default()
    .set_id(2)
    .set_data_type::<f32>()
    .set_dimensions(&[1, 4, 1, 1])
    .set_strides(&[4, 1, 4, 4)
    .build()
    .unwrap();

Hi, thank you for the response. Could you elaborate a bit on this warning from the log I posted above?


W! CuDNN (v8302) function cudnnBackendFinalize() called:
w!     Info: Traceback contains 2 message(s)
w!         Error: CUDNN_STATUS_NOT_INITIALIZED; Reason: LinearPatternMatcher::matchPattern(userGraph, doOpBinding)
w!         Warning: CUDNN_STATUS_NOT_SUPPORTED; Reason: (userGraph->getAllNodes().size() != 4) && (userGraph->getAllNodes().size() != 8)
w! Time: 2022-02-23T11:42:54.480390 (0d+0h+0m+2s since start)
w! Process=2910375; Thread=2910376; GPU=NULL; Handle=NULL; StreamId=NULL.

Also, should I expect the backend API to eventually replace the legacy one or are they meant to interoperate? Thank you in advance.