8-bit quantization in cuDNN without TensorRT?

Hi everyone,

I have been wondering, is it possible to implement 8-bit quantization in cuDNN, without using TensorRT? I could not see any useful information in the documentation or in code I saw on GitHub. Is it true that there is no support/way for 8-bit quantization in cuDNN without TensorRT?

Thank you so much))

Hi @yes,
As far as i understand, Quantization should be done in framework level.
Hence 8-bit quantization in cuDNN without TensorRT will not be possible.

Thanks!