Trained model giving slightly different values when tested on P100 and V100 . is there a way to make it consistent.?

yuvaramsingh94 · April 28, 2021, 5:56pm

Hi all,
i have currently trained a model using V100 GPU and my inference device is using P100. when i test it on test dataset, both the hardwares are giving slightly different values which eventually reduces my overall score on the P100 hardware. i tried to subtract the difference between the values and use the mean of it to scale my P100 prediction values . this helped me a little bit but, this is not solving my problem.
i am using Pytorch 1.8.0 for this work . is there a better way to address this problem of difference in performance due to hardware difference between Training and Inferencing environment.

thanks
yuvaram

SunilJB · April 29, 2021, 7:25am

Hi @yuvaramsingh94,

The generated plan files are not portable across platforms or TensorRT versions. Plans are specific to the exact GPU model they were built on (in addition to the platforms and the TensorRT version) and must be re-targeted to the specific GPU in case you want to run them on a different GPU.

Thanks

yuvaramsingh94 · April 29, 2021, 2:00pm

thanks for reply . i am currently not using TensorRT. i am just using the native pytorch for inferencing. my hardware is the main difference between training and inferencing

SunilJB · April 29, 2021, 4:11pm

Hi @yuvaramsingh94,

In that case I will request you to raise query in below forum.

Thanks

yuvaramsingh94 · April 29, 2021, 4:17pm

thanks . will do this

Topic		Replies	Views
Trained model giving slightly different values when tested on P100 and V100 . is there a way to make it consistent.? CUDA Programming and Performance pytorch	0	328	April 29, 2021
Building inference engine without target hardware? TensorRT	2	535	October 12, 2021
Tensor-rt plan file for specific target TensorRT	3	1255	June 2, 2022
Do tensorRT plan files are portable across different GPUs which have the same type TensorRT	2	6416	August 13, 2020
A100 graphics card inference performance is not strong TensorRT	4	563	April 12, 2022
TensorRT results in reduced accuracy and performance TensorRT tensorrt	1	1492	July 30, 2020
Big difference between infer results of onnxruntime and tensorrt TensorRT cudnn	2	80	March 20, 2025
tensorRT output and Pytorch->ONNX output are not same by FP32 inference TensorRT	0	494	September 9, 2019
The TensorRT engine produces different inference results when loaded using Python compared to C++ TensorRT cudnn , deepstream	1	19	April 28, 2025
Difference between A100 vs RTX 4090 in training deep learning models TensorRT cuda , python	2	529	November 30, 2024

Trained model giving slightly different values when tested on P100 and V100 . is there a way to make it consistent.?

Related topics