Laptop gpu choice

ken1866 · May 2, 2023, 6:26pm

I’m considering getting a laptop with a CUDA enabled GPU to use as a training tool for myself and could use some advice on choosing the GPU. Not long ago I understood key gpu factors were # of cores and memory width. Is this still the case? I’d think that every generation would be “better” than the previous, but I’m confused by the specs I see. For example, RTX 3070 has 5120 cores and 256-bit memory width, but the newer generation RTX 4070 has only 4608 cores and 128-bit memory width. Yet, a gpu user benchmark site shows the 4070 outperforming the 3070 (in gaming). Does gaming performance correlate with CUDA computational performance, or would I expect the older 3070 to outperform the 4070 for CUDA computations?

mobile 3070 vs 4070

njuffa · May 2, 2023, 7:21pm

I would say that’s a misunderstanding, because it leaves out clock frequency and architecture. If you want to just compare based on a couple of performance metrics, you would want to look at FP32 TFLOPS and memory bandwidth. Citing from the TechPowerUp database (these are non-mobile numbers, I would think):

RTX 3070: 20.31 FP32 TFLOPS, 448.0 GB/sec
RTX 4070: 29.15 FP32 TFLOPS, 504.2 GB/sec

So depending on wether the code in question is compute bound or memory bound, one might expect a speed-up of 1.1x to 1.5x when moving from an RTX 3070 to an RTX 4070. Keep in mind that this is more of a trend indicator than an exact prediction. To make sure the code performs as desired, you would want to benchmark it on prospective target platforms.

Different games among themselves will have different performance characteristics and therefore scale by different factors when moving between GPUs, and the same applies when comparing scaling of games vs compute apps. Games may also make use of features like ray tracing that are irrelevant to compute application and vice versa. I would say for a very rough idea of relative performance between GPUs (with very large uncertainty bands to either side), looking at game performance might work if there is no other data available.

ken1866 · May 3, 2023, 4:29pm

Thanks for your thoughtful reply! I didn’t realize the TechPowerUp database had TFLOP information. Interestingly, the 3070 vs 4070 comparison looks much different using the mobile (laptop) versions:

RTX 3070 Mobile: 15.97 FP32 TFLOPS, 448.0 GB/sec, 160 tensor cores
RTX 4070 Mobile: 15.62 FP32 TFLOPS, 256.0 GB/sec, 144 tensor cores

Unless the data are wrong it looks like the 3070 is a better choice, especially for memory bound computations.

njuffa · May 3, 2023, 7:36pm

From what I understand the TechPowerUp database is crowd based and a volunteer effort like Wikipedia. Errors are possible. I never deal with laptop GPUs and have found the TechowerUp largely accurate for desktop GPU (consumer and professional).

In this case, NVIDIA’s “spec” pages are particularly unhelpful for non-gamers:

Maybe the marketing department did not have their thinking caps on when they created those pages, or the pages were written this way on purpose to impress potential buyers with “bling” instead of “speeds & feeds”. You decide.

The best approach is always to benchmark the applications whose performance you care about. Maybe friends or colleagues can help you with getting access to relevant systems. I am not aware of laptop configurations being available in the cloud, but I am not a cloud expert.

[Later:]

The memory throughput data in the TechPowerUp database is consistent with the other memory specifications, so there does not seem to be an obvious mistake:

RTX 3070 mobile
-------------------------
Memory Clock    1750 MHz 
Memory Type     GDDR6
Memory Bus      256 bit 
Bandwidth       448.0 GB/s 

RTX 4070 mobile
-------------------------
Memory Clock    2000 MHz 
Memory Type     GDDR6
Memory Bus      128 bit 
Bandwidth       256.0 GB/s

I also see complaints floating around the internet such as “RTX 4000 mobile series bottlenecked by poor memory bandwidth”, which would seem consistent with the above data.

Topic		Replies	Views
Kernel on GT 740 run slower than GT 430 CUDA Programming and Performance	1	969	August 13, 2015
Accelerated computing: GeForce RTX 4070 vs. Quardo RTX 5000 CUDA Programming and Performance	4	1174	August 26, 2023
Hardware comparison CUDA Programming and Performance	3	1381	January 23, 2014
CUDA behaviour differences between GTX 1070 laptop and desktop CUDA Programming and Performance	2	483	April 10, 2019
Deciding on what card to get General Discussion board-design , gpu	3	462	October 14, 2021
RTX 3070 vs RTX 3070 laptop TensorRT	1	706	December 8, 2021
Jetson AGX ORIN vs RTX 4070 Super CUDA Programming and Performance	3	3213	June 10, 2024
RTX 3070 with CUDA10.0 compatibility [UbuntuOS, any version] Linux	15	11739	February 25, 2021
GPU for physics simulations CUDA Programming and Performance	5	1466	January 13, 2023
Compatibility between RTX4070 and RTX Mobile GPU CUDA Programming and Performance cuda	2	82	October 21, 2025

Laptop gpu choice

Related topics