Cudnn Backend Resample examples

Are there any simple examples of setting up a backend graph to do maxpooling and upsampling?

I did get maxpooling to work (made a 64x64 image down to 32x32), but it was at odds with how the docs say it should. I had to set the padding mode (CUDNN_ATTR_RESAMPLE_PADDING_MODE) to CUDNN_NEG_INF_PAD. Nothing in the docs about that. Also, I didn’t need to set the PRE/POST_PADDING attribs; docs say it’s required.
Here’s some code for maxpooling:

void create_maxpool_descriptor(cudnnBackendDescriptor_t& desc, int64_t h, int64_t w)
{
	std::cout << "Creating Maxpool Descriptor..." << std::endl;
	
	cudnnBackendCreateDescriptor(CUDNN_BACKEND_RESAMPLE_DESCRIPTOR, &desc);
	
	// this is set by default to nearest...change to maxpool
	int64_t resample_mode = CUDNN_RESAMPLE_MAXPOOL;
	cudnnBackendSetAttribute(desc, CUDNN_ATTR_RESAMPLE_MODE, CUDNN_TYPE_RESAMPLE_MODE, 1, &resample_mode);
	
	// Specifies the number of spatial dimensions to perform the resampling over.
	int64_t spacial_dims = 2;
	cudnnBackendSetAttribute(desc, CUDNN_ATTR_RESAMPLE_SPATIAL_DIMS, CUDNN_TYPE_INT64, 1, &spacial_dims);
	
	// Spatial dimensions of filter.
	int64_t window_dims[] = { h, w };
	cudnnBackendSetAttribute(desc, CUDNN_ATTR_RESAMPLE_WINDOW_DIMS, CUDNN_TYPE_INT64, 2, window_dims);
	
	// Stride in each dimension for the kernel/filter.
	int64_t resample_strides[] = { h, w };
	cudnnBackendSetAttribute(desc, CUDNN_ATTR_RESAMPLE_STRIDES, CUDNN_TYPE_INT64, 2, resample_strides);
	
	// Why does maxpool only work with this padding?  AVGPOOL doesn't need this.
	int64_t padding_mode = CUDNN_NEG_INF_PAD;
	cudnnBackendSetAttribute(desc, CUDNN_ATTR_RESAMPLE_PADDING_MODE, CUDNN_TYPE_PADDING_MODE, 1, &padding_mode);
/*
	// Why is this now not required???  Docs say we need this...  stupid.
	int64_t resample_paddings[] = { 0, 0 };
	cudnnBackendSetAttribute(desc, CUDNN_ATTR_RESAMPLE_PRE_PADDINGS, CUDNN_TYPE_INT64, 2, resample_paddings);
	cudnnBackendSetAttribute(desc, CUDNN_ATTR_RESAMPLE_POST_PADDINGS, CUDNN_TYPE_INT64, 2, resample_paddings);*/

	cudnnBackendFinalize(desc);
}

//
// set up maxpool descriptor
cudnnBackendDescriptor_t maxpool_desc;
create_maxpool_descriptor(maxpool_desc, 2, 2);

std::cout << "Creating Maxpool Resample Fwd Operation..." << std::endl;
cudnnBackendDescriptor_t maxpool_op_desc;
cudnnBackendCreateDescriptor(CUDNN_BACKEND_OPERATION_RESAMPLE_FWD_DESCRIPTOR, &maxpool_op_desc);
cudnnBackendSetAttribute(maxpool_op_desc, CUDNN_ATTR_OPERATION_RESAMPLE_FWD_DESC, CUDNN_TYPE_BACKEND_DESCRIPTOR, 1, &maxpool_desc);
cudnnBackendSetAttribute(maxpool_op_desc, CUDNN_ATTR_OPERATION_RESAMPLE_FWD_XDESC, CUDNN_TYPE_BACKEND_DESCRIPTOR, 1, &image_desc);
cudnnBackendSetAttribute(maxpool_op_desc, CUDNN_ATTR_OPERATION_RESAMPLE_FWD_YDESC, CUDNN_TYPE_BACKEND_DESCRIPTOR, 1, &resampled_desc);
cudnnBackendFinalize(maxpool_op_desc);

That seems to work, even though the graph creation stuff flips out.

As for upsampling…yeah, no. Nada, zip, zero. Nothing I do works. Here’s the code:

void create_upsample_descriptor(cudnnBackendDescriptor_t& desc, int64_t h, int64_t w)
{
	std::cout << "Creating Upsample Descriptor..." << std::endl;

	cudnnBackendCreateDescriptor(CUDNN_BACKEND_RESAMPLE_DESCRIPTOR, &desc);

	// this is set by default to nearest...
	int64_t resample_mode = CUDNN_RESAMPLE_NEAREST;
	cudnnBackendSetAttribute(desc, CUDNN_ATTR_RESAMPLE_MODE, CUDNN_TYPE_RESAMPLE_MODE, 1, &resample_mode);

	// Specifies the number of spatial dimensions to perform the resampling over.
	int64_t spacial_dims = 2;
	cudnnBackendSetAttribute(desc, CUDNN_ATTR_RESAMPLE_SPATIAL_DIMS, CUDNN_TYPE_INT64, 1, &spacial_dims);

	// Spatial dimensions of filter.
	int64_t window_dims[] = { h, w };
	cudnnBackendSetAttribute(desc, CUDNN_ATTR_RESAMPLE_WINDOW_DIMS, CUDNN_TYPE_INT64, 2, window_dims);

	// Stride in each dimension for the kernel/filter.
	int64_t resample_strides[] = { 2, 2 };
	cudnnBackendSetAttribute(desc, CUDNN_ATTR_RESAMPLE_STRIDES, CUDNN_TYPE_INT64, 2, resample_strides);

	/*int64_t padding_mode = CUDNN_EDGE_VAL_PAD;
	cudnnBackendSetAttribute(desc, CUDNN_ATTR_RESAMPLE_PADDING_MODE, CUDNN_TYPE_PADDING_MODE, 1, &padding_mode);*/
	
	/*int64_t resample_paddings[] = { 0, 0 };
	cudnnBackendSetAttribute(desc, CUDNN_ATTR_RESAMPLE_PRE_PADDINGS, CUDNN_TYPE_INT64, 2, resample_paddings);
	cudnnBackendSetAttribute(desc, CUDNN_ATTR_RESAMPLE_POST_PADDINGS, CUDNN_TYPE_INT64, 2, resample_paddings);*/

	cudnnBackendFinalize(desc);
}

//
// set up upsample descriptor
cudnnBackendDescriptor_t upsample_desc;
create_upsample_descriptor(upsample_desc, 2, 2);

std::cout << "Creating Upsample Resample Fwd Operation..." << std::endl;
cudnnBackendDescriptor_t upsample_op_desc;
cudnnBackendCreateDescriptor(CUDNN_BACKEND_OPERATION_RESAMPLE_FWD_DESCRIPTOR, &upsample_op_desc);
cudnnBackendSetAttribute(upsample_op_desc, CUDNN_ATTR_OPERATION_RESAMPLE_FWD_DESC, CUDNN_TYPE_BACKEND_DESCRIPTOR, 1, &upsample_desc);
cudnnBackendSetAttribute(upsample_op_desc, CUDNN_ATTR_OPERATION_RESAMPLE_FWD_XDESC, CUDNN_TYPE_BACKEND_DESCRIPTOR, 1, &resampled_desc);
cudnnBackendSetAttribute(upsample_op_desc, CUDNN_ATTR_OPERATION_RESAMPLE_FWD_YDESC, CUDNN_TYPE_BACKEND_DESCRIPTOR, 1, &upsampled_desc);
cudnnBackendFinalize(upsample_op_desc);

Are there any examples of upsampling? Or at least better docs? Or at least better examples that don’t have errors or skip things? Or at lease better errors? Because this sh*t doesn’t help:

Error: CUDNN_STATUS_BAD_PARAM; Reason: finalize_internal()

Thanks,
-Chris

EDIT: And just in case, here’s how I build the tensor descriptors:

void create_tensor_descriptor(cudnnBackendDescriptor_t& desc, int64_t n, int64_t c, int64_t h, int64_t w, int64_t uid)
{
	std::cout << "Creating Tensor Descriptor..." << std::endl;
	cudnnBackendCreateDescriptor(CUDNN_BACKEND_TENSOR_DESCRIPTOR, &desc);

	cudnnDataType_t dtype = CUDNN_DATA_FLOAT;
	int64_t alignment = 4;
	cudnnBackendSetAttribute(desc, CUDNN_ATTR_TENSOR_DATA_TYPE, CUDNN_TYPE_DATA_TYPE, 1, &dtype);
	cudnnBackendSetAttribute(desc, CUDNN_ATTR_TENSOR_BYTE_ALIGNMENT, CUDNN_TYPE_INT64, 1, &alignment);

	int64_t xDim[] = { n, c, h, w };
	int64_t xStr[] = { c * h * w, h * w, w, 1 };
	cudnnBackendSetAttribute(desc, CUDNN_ATTR_TENSOR_DIMENSIONS, CUDNN_TYPE_INT64, 4, &xDim);
	cudnnBackendSetAttribute(desc, CUDNN_ATTR_TENSOR_STRIDES, CUDNN_TYPE_INT64, 4, &xStr);
	cudnnBackendSetAttribute(desc, CUDNN_ATTR_TENSOR_UNIQUE_ID, CUDNN_TYPE_INT64, 1, &uid);
	cudnnBackendFinalize(desc);
}

Also, why is there no upsample in the cudnn_ops_infer.so library? It has pooling…

Here’s the best it gets. No idea why it doesn’t like upsampling. Maybe it’s not supported, even though there is nothing that says it isn’t, other than maybe one line in the error output from the Plan.

Found 1 GPUs.
Compute capability: 8.6
Created cuDNN handle

Creating Tensor Descriptor...
Creating Tensor Descriptor...
Creating Upsample Descriptor...
Creating Upsample Resample Fwd Operation...

E! CuDNN (v8401) function cudnnBackendFinalize() called:
e!         Error: CUDNN_STATUS_BAD_PARAM; Reason: finalize_internal()
e! Time: 2022-06-05T17:43:51.334843 (0d+0h+0m+2s since start)
e! Process=51472; Thread=53404; GPU=NULL; Handle=NULL; StreamId=NULL.

Creating Graph...

W! CuDNN (v8401) function cudnnBackendFinalize() called:
w!         Error: CUDNN_STATUS_NOT_INITIALIZED; Reason: LinearPatternMatcher::matchPattern(userGraph, doOpBinding)
w!         Warning: CUDNN_STATUS_NOT_SUPPORTED; Reason: (userGraph->getAllNodes().size() != 4) && (userGraph->getAllNodes().size() != 8)
w! Time: 2022-06-05T17:43:51.335842 (0d+0h+0m+2s since start)
w! Process=51472; Thread=53404; GPU=NULL; Handle=NULL; StreamId=NULL.

Creating Engine...
Creating Plan...

W! CuDNN (v8401) function cudnnBackendFinalize() called:
w!         Warning: CUDNN_STATUS_NOT_SUPPORTED; Reason: check_resample_fwd_support_fort(node)
w!         Warning: CUDNN_STATUS_NOT_SUPPORTED; Reason: check_node_support_fort(node_ptr)
w!         Warning: CUDNN_STATUS_NOT_SUPPORTED; Reason: check_for_support()
w!         Warning: CUDNN_STATUS_NOT_SUPPORTED; Reason: ptr.isSupported()
w!         Warning: CUDNN_STATUS_NOT_SUPPORTED; Reason: engine_post_checks(handle, *ebuf.get(), engine.getPerfKnobs(), req_size)
w!         Warning: CUDNN_STATUS_NOT_SUPPORTED; Reason: finalize_internal()
w! Time: 2022-06-05T17:43:51.336844 (0d+0h+0m+2s since start)
w! Process=51472; Thread=53404; GPU=NULL; Handle=NULL; StreamId=NULL.


Cleanup...

There is something up with the CUDNN_BACKEND_OPERATION_RESAMPLE_FWD_DESCRIPTOR b/c no matter what I do to the CUDNN_BACKEND_RESAMPLE_DESCRIPTOR, it always gives the finalize_internal()…like I’d know what that is. Here’s a hint programmers, don’t show code as an error if the user can’t see that code. :)

-Chris

I got bilinear upsampling working…It actually showed the values that should be in the stride and pre/post padding attribs. Here’s some code:

void create_upsample_descriptor(cudnnBackendDescriptor_t& desc, int64_t h, int64_t w)
{
	std::cout << "Creating Upsample Descriptor..." << std::endl;

	cudnnBackendCreateDescriptor(CUDNN_BACKEND_RESAMPLE_DESCRIPTOR, &desc);

	// this is set by default to nearest...
	int64_t resample_mode = CUDNN_RESAMPLE_BILINEAR;
	cudnnBackendSetAttribute(desc, CUDNN_ATTR_RESAMPLE_MODE, CUDNN_TYPE_RESAMPLE_MODE, 1, &resample_mode);

	// Specifies the number of spatial dimensions to perform the resampling over.
	int64_t spatial_dims = 2;
	cudnnBackendSetAttribute(desc, CUDNN_ATTR_RESAMPLE_SPATIAL_DIMS, CUDNN_TYPE_INT64, 1, &spatial_dims);

	// Spatial dimensions of filter.  (has to be 2 with NEAREST)
	int64_t window_dims[] = { 2, 2 };
	cudnnBackendSetAttribute(desc, CUDNN_ATTR_RESAMPLE_WINDOW_DIMS, CUDNN_TYPE_INT64, 2, &window_dims);

	//// Stride in each dimension for the kernel/filter.
	double resample_strides[] = { 0.5, 0.5 };
	cudnnBackendSetAttribute(desc, CUDNN_ATTR_RESAMPLE_STRIDES, CUDNN_TYPE_DOUBLE, 2, &resample_strides);

	double resample_pre_paddings[] = { 0.5, 0.5 };
	double resample_post_padding[] = { 1.0, 1.0 };
	cudnnBackendSetAttribute(desc, CUDNN_ATTR_RESAMPLE_PRE_PADDINGS, CUDNN_TYPE_DOUBLE, 2, resample_pre_paddings);
	cudnnBackendSetAttribute(desc, CUDNN_ATTR_RESAMPLE_POST_PADDINGS, CUDNN_TYPE_DOUBLE, 2, resample_post_padding);

	cudnnBackendFinalize(desc);
}

Output:

Found 1 GPUs.
Compute capability: 8.6
cuDNN version: 8401

Creating handle...
Creating Tensor Descriptor...
Creating Tensor Descriptor...
Creating Upsample Descriptor...
Creating Upsample Resample Fwd Operation...
Creating Graph...

W! CuDNN (v8401) function cudnnBackendFinalize() called:
w!         Error: CUDNN_STATUS_NOT_INITIALIZED; Reason: LinearPatternMatcher::matchPattern(userGraph, doOpBinding)
w!         Warning: CUDNN_STATUS_NOT_SUPPORTED; Reason: (userGraph->getAllNodes().size() != 4) && (userGraph->getAllNodes().size() != 8)
w! Time: 2022-06-05T20:50:09.891818 (0d+0h+0m+1s since start)
w! Process=33228; Thread=48064; GPU=NULL; Handle=NULL; StreamId=NULL.

Creating Engine...
Creating Plan...

Cleanup...

I still haven’t gotten nearest to work. I’m guessing it’s the stride/padding values as well now, except with no dump of what the operator wants.

-Chris

Digging around, I think I figured out why NEAREST doesn’t work. I believe the backend is built off of the libraries, with graph stuff added to it. If that is the case, then backend resample is actually just the cudnnCreate/SetPooling2d/NdDescriptor for MAXPOOL and cudnnCreateSpatialTransformerDescriptor/cudnnSetSpatialTransformerNdDescriptor for BILINEAR.

I don’t think NEAREST is implemented in the library:

typedef enum {
    CUDNN_SAMPLER_BILINEAR = 0,
} cudnnSamplerType_t;

Only bilinear is there. Is nearest neighbor going to be implemented soon? Could you at least update the docs so people don’t spend days trying to get something that doesn’t exist working?

Thanks,
-Chris

Hi,

Hope your queries are addressed as part of Tensor packing and cryptic errors - #6 by yanxu, please let us know if you still need assistance.

Thank you.