# Spatially separable 3D convolution

I am struggling to implement something that seems to be something very basic.

I need to 3Dconvolve a 4D matrix of shape (32x128x128x128) with a kernel k with shape (32,32,6,6,6). However, my convolutional kernel is spatially (but not over the channel dimension) separable in the sense that k[c_in,c_out,i,j,k] = k_x[c_in, c_out,i] * k_y[c_in, c_out,j] k_z[c_in, c_out,k].

In principle, the fact that the kernel is separable should lead to a significant speed up since:
L[c_out, x,y,z] = \sum_{c_in} \sum_{i} \sum_{j} \sum_{k} M[c_in, x+i, y+j, z+k] k_x[c_in, c_out,i] * k_y[c_in, c_out,j] k_z[c_in, c_out,k]

This entails that:
L[c_out, x,y,z] = \sum_{c_in} \sum_{i} k_x[c_in, c_out,i] \sum_{j} k_y[c_in, c_out,j] \sum_{k} M[c_in, x+i, y+j, z+k] k_z[c_in, c_out,k]

L[c_out, x,y,z] = \sum_{c_in} \sum_{i} k_x[c_in, c_out,i] \sum_{j} k_y[c_in, c_out,j] Lz[c_out, c_in, x+i, y_j, z]

L[c_out, x,y,z] = \sum_{c_in} \sum_{i} k_x[c_in, c_out,i] Ly[c_out, c_in, x+i, y, z]

with

Lz[c_out, c_in, x+i, y+j, z] = \sum_{k} M[c_in, x+i, y+j, z+k] k_z[c_in, c_out,k]
Ly[c_out, c_in, x+i, y, z] = \sum_{j} k_y[c_in, c_out,j] Lz[c_out, c_in, x+i, y_j, z]

I cannot construct Ly in memory because it would be too large (32x32x128x128x128). In order to minimize global memory reads I am convolving M[c_in = i] with k_z and k_y and then add this to L at the appropriate locations but this leads to a significant overhead in global memory writes.

This seems to be so basic which makes me believe (or gives me hope) that this must have been implemented somewhere. Has anyone achieved this?

cuDNN can do 3D convolutions on a 4D tensor, however I wouldn’t be able to give you a roadmap and I’m not saying it takes into account the spatially separable kernel character, that seems to be the crux of your question.

There is a separate forum for cuDNN in case you are interested.

A simple google search turns up items like this and there is a separable convolution CUDA sample code, but it’s not designed with the dimensionality you describe.