A am trying to implement Sparse convolution (Sparse matrix, dense kernel) in TensorRT. But it seems there is a fundamental limitation:
Dense 1d matrix:
00001110000
Sparse 1d matrix (features, indices):
1 4
1 5
1 6
Dense 1d kernel:
111
Dense result:
001232100
Sparse result:
1 2
2 3
3 4
2 5
1 6
Dense 1d matrix:
00010101000
Sparse 1d matrix:
1 3
1 5
1 7
Dense 1d kernel:
111
Dense result:
011212110
Sparse result:
1 1
1 2
2 3
1 4
2 5
1 6
1 7
So output matrices shape depends not only on input matrices shape, but also on data in these input matrices (specifically on data in input indices matrix).
The only solution to this problem is to pre-calculate maximum output tensor shape and pad extra elements with -1:
Sample 1 (features, indices):
1 4
1 5
1 6
-1 -1
-1 -1
-1 -1
-1 -1
-1 -1
-1 -1
Sample 2 (features, indices):
1 1
1 2
2 3
1 4
2 5
1 6
1 7
-1 -1
-1 -1
But now consumer of sparse output of Sparse Convolution must be aware of these padded elements, i.e. it must can slice these tensors to their real size:
Sample 1 (features, indices):
1 4
1 5
1 6
Sample 2 (features, indices):
1 1
1 2
2 3
1 4
2 5
1 6
1 7
What I request - I want shape of arguments of enqueue(…) function to be the sizes of SLICED tensors (3 and 7) in this case, but let sizes, that getOutputDimensions(…) function return, to be “maximum padded” tensor shapes (9 in this case). So static TensorRT memory allocator will have all information to allocate memory before engine started execution, but consumers of Sparse convolution output would get tensors without any padding.
Example of API that allows to return sliced tensor shapes:
const nvinfer1::PluginTensorDesc* outputDesc in enqueue (…) change to nvinfer1::PluginTensorDesc* outputDesc
It it very desirable, if I am trying to convert large existing Pytorch model to TRT, and it is hard to make every layer working properly with padded tensors as inputs.