FP32 with TF32 precision

I’m using PyTorch with V100 GPU. As this GPU doesn’t support operations in TF32, I’m adjusting my x (input to the prediction model) and y (ground truth) tensors that are in FP32 to have 10-bit precision in the decimal places, the same way TF32 is represented, just using, for example, x = torch.round(x, decimals=4) (I’m using 4 decimal places following instructions from this site - FP64, FP32, FP16, BFLOAT16, TF32, and other members of the ZOO | by Grigory Sapunov | Medium, in the TF32 section). Would this rounding be enough for me to make the FP32 very close to what the TF32 would be? Doing that (considering that the way I’m doing is correct), should I also reduce model precision by performing model.half()? I’m making these adjustments because for some reason my model converges well on Ampere GPU (RTX A4000) but the same does not happen with Volta GPU (V100). I’m guessing it’s because I no longer use TF32 in the operations.
Thanks in advance.