Winograd in NVDLA

I am evaluating the performance of NVDLA, and am familiar with the Winograd convolutions to accelerate 3x3 filters.
My questions are:

  1. Does the Winograd transform work in 16 bit integer format? Does it work in 8 bit data format?
  2. Are the Winograd convolutions bit exact to regular 3x3 convolutions? If not, what are the accuracy implications of using Winograd transforms?
  3. Can we train the network to use the Winograd convolutions to minimize the accuracy impact? If so, please outline some procedure for doing that using any of TensorFlow, Keras, TensorRT, PyTorch code.