Correctness Problem Using Tensorflow with RTX 4090

chydavy · April 4, 2023, 3:21pm

I am writing to report a correctness issue I encountered while using the RTX 4090 GPU with Tensorflow.

I have a Tensorflow model which gives very close results when running on my RTX 3090 GPU and CPU. However, when I run it on RTX 4090 with Nvidia Tensorflow (version: nv22.11), it gives significantly different results when compared to running on the CPU.

Configurations:

OS: Ubuntu 20.04.5 LTS
CPU: Intel(R) Core™ i9-13900K
GPU: NVIDIA GeForce RTX 4090
Docker image: nvidia tensorflow, tag: 22.11-tf1-py3

My model is a regression U-net model. The change of results when switched to RTX4090 and this version of Tensorflow affects the performance. The range of the output of the model if about -3 to 3 and the difference in results running on different devices can be up to 0.08. When running on RTX3090 with older version of CUDA (11.1) and cuDNN (8.0), the difference is < 0.01.

Difference of results between RTX 4090 and CPU:

Maximum difference of output channel 2: 0.042948246002197266
Values of different runs at pixel (160, 219): 2.384153366088867, 2.4271016120910645

Maximum difference of output channel 3: 0.0736396312713623
Values of different runs at pixel (164, 175): 1.8166956901550293, 1.8903353214263916

Maximum difference of output channel 4: 0.05873298645019531
Values of different runs at pixel (137, 225): 2.382530927658081, 2.4412639141082764

Maximum difference of output channel 5: 0.07898902893066406
Values of different runs at pixel (170, 234): 2.7947018146514893, 2.715712785720825

sample_program.zip (1.9 MB)
I attached here a sample program to repeat the problem. The script run.sh runs inference on a GPU and the CPU and compare the results.

Since I will soon upgrade the GPU cards to newer models with new libraries. I am afraid that the performance will degrade. I would like to know that how I can get consistent result with the new GPU card. I look forward to hearing back from you soon.

AakankshaS · April 29, 2023, 5:00pm

Hi @chydavy ,
Apologies for the delay.
Let me check on this with the Engg team and get back to you.
Thanks

chydavy · May 15, 2023, 5:30pm

Thank you Aakanksha! I look forwards to hearing from your team.

Topic		Replies	Views
Quadro RTX 4000 : Outputs always [0,0,1,0...0] no matter which image is given in input TensorRT	3	416	January 23, 2023
A little error in accuracy of the tensorrt infer results with different gpus TensorRT tensorrt	1	537	March 14, 2022
TensorRT model inference result is not correctly TensorRT tensorrt , tensorflow , onnx	1	705	July 1, 2022
I think the 4090 is not performing properly CUDA Programming and Performance cuda , tensorflow , python	1	1204	March 11, 2023
TensorRT inference produces unexpected results TensorRT	1	559	July 11, 2019
Output is not stable TensorRT	7	668	October 12, 2021
TensorRT model accuracy on different GPUs TensorRT	3	2013	October 3, 2018
TensorRT result is different from Tensorflow results Frameworks (archived) tensorflow	1	622	January 9, 2021
Tensorrt inference runs slower in RTX4090 than RTX 3090 Ti TensorRT tensorrt	3	2253	January 10, 2023
RTX3090 vs A100 TensorRT	2	1135	April 4, 2023

Correctness Problem Using Tensorflow with RTX 4090

Related topics