I would like to get more explanation about what do you mean by this DLA sparsity limitations here :

Only convolutions whose quantized INT8 weights are at most 256K can benefit from SS–in practice, the limitation may be more restrictive.

Only convolutions with K % 64 in {0, 1, 2, 4, 8, 16, 32}, where K is the number of kernels (corresponding to the number of output channels), can benefit from SS in this release.

I got the same question. For 2d conv, does it mean I can only use INT8 weights with sparsity 2:4? I would be grateful if you can provide some examples.