I’m asking this question because I saw benchmarks where 4090 is beating out Tesla p100 gpu. But when am doing same work in live class, I notice their seconds/iteration is 5(let say) and mine is 6.
Why is such difference even if am having better machine?
Is it because they are in Google Colab and am on local machine?
Benchmarks cited on the Internet are rarely a good source to assess real-live scenarios. Plus it always depends on the system as a whole. I am quite sure that the HPC cloud farm running Google Colab with a P100 has much more RAM and more CPU performance than your local machine. It also dedicates all compute power to the deep learning process where your local system might not. Temperatures, throttling, VRAM, memory bandwidth, those are more factors affecting throughput.
The P100 is a dedicated Compute GPU which was specifically made for things like Deep Learning.
The 4090 is a gaming consumer GPU. It is optimized for gaming workloads which are different than pure compute workloads.
It is 2 years later but could help someone else. It is probably about your data, if you are using double precision, i.e. float64, P100 is a lot faster than RTX 4090 or even RTX 5090, but if you use single precision, it is the other way around. Those benchmarks are for marketing, if you need FP64 computation, newer cards are getting worse and worse per $.