Why 4090 training slower than P100 even writing same piece of code?


I’m asking this question because I saw benchmarks where 4090 is beating out Tesla p100 gpu. But when am doing same work in live class, I notice their seconds/iteration is 5(let say) and mine is 6.

Why is such difference even if am having better machine?

Is it because they are in Google Colab and am on local machine?

Hi there @himgos,

Benchmarks cited on the Internet are rarely a good source to assess real-live scenarios. Plus it always depends on the system as a whole. I am quite sure that the HPC cloud farm running Google Colab with a P100 has much more RAM and more CPU performance than your local machine. It also dedicates all compute power to the deep learning process where your local system might not. Temperatures, throttling, VRAM, memory bandwidth, those are more factors affecting throughput.

The P100 is a dedicated Compute GPU which was specifically made for things like Deep Learning.

The 4090 is a gaming consumer GPU. It is optimized for gaming workloads which are different than pure compute workloads.

I hope this helps clarify the situation.