cuDNN custom convolution

jakub.mitura14 · June 28, 2021, 6:38pm

Hello I would like to take 3d medical image and calculate the mean and standard deviation of each voxel’s neighberhood - so I would like a kernel that operates on a cube of data that is centered on each voxel in image, can I use cudnn to achieve this ? The pseudocode would look sth like below:

I - 900x900x900 // image data
dim- convDim = 5 // the size of convolution filter is sth I will set by trial and error 
__global__ getLocalDat  (data = I(xa:xb, ya:yb, za:zb) )->  [mean,std] // kernel taking a cube of data around each voxel  where xa-xb= ya-yb=za-zb = dim and return vector with mean and standard deviation of neighberhood

flattened = flatten(data) // getting all data from a cube in convolution filter
return [mean(flattened), std(flattened) ]
}

Of course I omitted all memory allocations, resolving bank conflicts as we need to access the same data for multiple kernels at once , uploading data to shared memory of block … as I honestly suppose that those tasks are already implemented as this is common problem for all convolutions am I right?

I would be very grateful for any help with this problem

AakankshaS · July 1, 2021, 6:13pm

Hi @jakub.mitura14 ,

The answer to this would be No, as there is no pre-compiled mean, std dev. kernel to use in cuDNN,
However, you may try using convolution and a sequence of operations for the same.

Thanks!

jakub.mitura14 · July 2, 2021, 8:57am

Thanks! Ok and is there some state of the art 3 d cuda convolution example explained? I had found 2 d examples , but 3 d indexing remains tricky

Also I am wandering weather my idea is correct and how to achieve some points

divide the image into 3 dimensional block with some excess - padding that will be equal the half of edge length of kernel - and push those cubes of data to shared memory of each block - i suppose bigger blocks so with max amount of threads may be better as i would waste less memory on this padding
iterate over the data in a block in such a way that each thread at first pass will only analyze the non overlapping parts of data in order to prevent bank conflicts - it will require the amount of passes equal to number of voxels in a kernel, - in order to keep it safe from bank conflicts i planned to sync threads after each pass in the block - does it make sense ?

yanxu · July 8, 2021, 6:58am

There are no existing kernel doing this. Depending on your perf requirement, something below may or maynot work for you:

if you do a single channel convolution with NxNxN filter each filled element with 1/N^3 value, the result will be basically the NxNxN neighborhood mean.
Similarly you can launch a pointwise multiply to get the input tensor square, then get the mean of neighborhood square similar to above.
eventually you can use this equation to find the std.

see Standard deviation - Wikipedia

jakub.mitura14 · July 8, 2021, 7:08am

Thank you!

system · September 6, 2021, 7:08am

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Finding local mean in an image using mex-cuda CUDA Programming and Performance	4	920	February 9, 2015
Why is my 'trivial' convolution kernel faster than cuDNN? CUDA Programming and Performance	4	488	May 29, 2022
Is there tensorcore kernel for 3D convolution? cuDNN	0	1638	November 18, 2019
cudnnConvolutionForward returns wrong result when meets std::normal_distribution cuDNN cudnn	2	97	July 26, 2024
cuDNN for 3D convolution cuDNN cudnn	1	1870	October 24, 2023
3d convolutions and correlations Any experience with 3d filtering? CUDA Programming and Performance	3	8853	October 4, 2007
Is there tensorcore kernel for 3D convolution? Deep Learning (Training & Inference) mixed-precision	1	957	November 25, 2019
Computing 'local-box' statistics for 3D images CUDA Programming and Performance opencl	2	373	October 10, 2023
Batch normalization implementation using cuDNN cuDNN cuda , cudnn	1	2000	November 20, 2020
How to accelerate 3D convolution by CUDNN V5 GPU-Accelerated Libraries	2	999	June 15, 2016

cuDNN custom convolution

Related topics