2d convolution utilizing tensor cores

I’m doing 2d template matching between two 8-bit images. I’m looking for a template of size, say, 231X231 in a window of size 256 X 256. I was wondering whether there is an example implementation that utilizes tensor cores (ideally 8-bit input) to do the most basic 2D convolution (correlation). Note that for this specific problem, FFT-based convolution is not helpful.

Did you managed to find answer?

2D Convolution can be done with the use of the cublasSgemm and matrix multiplications or maybe with cudnn, it can be set to use the tensor cores. This should increase performance see. But I am not sure that this is the right answer to your question.