No, I mean the mathematical matrix size per MMA instruction.
Each single MMA instruction (not combining several) computes the matrix product A x B; and A and B each have a 2D size as they are 2D matrices.
If you do a 1D convolution, then you have a set of 1D input vectors (one dimension is the convolution dimension =K, the other dimension of the 2D matrix is the multitude of independent input vectors). And you have a 1D convolution filter, which you have to repeat in a transposed fashion, see below.
[My other answer got lost seemingly, technical forum problem.]
Either
a) your input data or your filter naturally has many zeroes and is sparse.
b) or for a convolution kernel, if the convolution dimension size of the overall problem is not >>K (K is the convolution size for a single mma instruction) or in other words a relatively small filter, then you can use the sparse matrix operations to an advantage for the zeroes around the kernel:
(In the following the K dimension is shown horizontally. Mathematically B would have the K diemension vertically instead.)
Input Data matrix:
I0 I1 I2 I3 I4 ... IN
...
[other input data sets]
Filter matrix:
F0 F1 F2 0 ...
0 F0 F1 F2 0 ...
0 0 F0 F1 F2 0 ...
...
The convolution filter matrix has the same filter moved by one element for each row (or column, if transposed).
As the requirement for Cuda sparse matrices (depending on data type) is that for each row each 4 columns can have a maximum of 2 zero elements (which are not computed), and here the 0 elements appear as blocks on the left and right side, we have to reorder the K dimension. It has to happen in the same way for both input matrix and filter matrix, even if only the filter matrix is sparse.
Example for reordering for K==16:
0 8 1 9 2 10 3 11 4 12 5 13 6 14 7 15
The first half appears as even elements, the second half as odd elements. Each by itself in the original order.
Now think about, if the originally left elements are all zero. E.g. elements 0…9.
Then each even (when starting to count with 0) element is zero.
If all right elements are zero. E.g. elements 6…15 then all odd elements are zero.
If the filter is in the middle, e.g. elements 0…4 and 11…15 are zero, then after reordering for some pairs of elements (first pair has original indices 0 8, second pair has indices 1 9, …) the even, for some pairs of elements the odd element is zero.
So it is usable with the cuda sparse matrix conditions.
c) If the convolution size is large compared to K than you probably can also use sparse matrices to an advantage, for the overlapping of multiple mma instructions.