my question is, I don’t know much but here’s some information i found about the GeForce rtx 4090:
|Memory Speed|21000 effective = 1313 MHz|
|Memory Bus Width|384 Bit|
|Max. Amount of Memory|24 GB|
|Memory Bandwidth|1008 GB/s|
in NVIDIA GeForce RTX 4090 GPU - Benchmarks and Specs - NotebookCheck.net Tech
21 Gbps effective
in NVIDIA GeForce RTX 4090 Specs | TechPowerUp GPU Database
I don’t how long it will take the geforce rtx 4090’s 24gb memory from computer’s ram (OS) to get filled with data. anybody got any information?
also i think it involves 0.75seconds for 24GB fill up. Is that correct? (since PCIe 4.0 x16 is at 32GB/sec)
also do you know if the 4090 ti is comming soon? and if it is, what slots and at what speed is it occupying (PCIe slots)…
thank you :)
Hi there @rownasser,
You could have found that information also on the NVIDIA product pages. Just check the Specs and click on “View Full Specs” and you see all the details of the 4090.
Why would you be interested in how fast you can saturate the video memory? It is not a realistic test since normal workloads do not wait until VRAM is full before starting to do computations. And the memory content also does not stay static. So whatever you calculate here is highly theoretical.
I hope you will understand that we can only share information about GPUs that are publicly available.
Hi… thanks for the reply.
For workloads where I might need more than 24GB of memory for example :)…
Where i have to divide the data and repeat each section into GPU. Like for neural networks for example,
I might need a large neural network. Etc…
Also it doesn’t have all the figures there in the 4090 web-page in nvidia.com.
Also, is the Memory Bus Width related to the PCIe connection? Or within the GPU? I don’t understand the figures you know, and i can’t find any datasheet or anything on the GDDR6X website either. Or much online for the little amount of time i spent searching ;)
You are right, the memory details cannot be found on the NVIDIA pages. This is might be related to the fact that OEM cards might introduce even more variance here than they do for GPU core clock.
One good summary also comparing to other GPU versions is usually found here. It is not an officially maintained content, but in general comes very close to technical specs.
The Bus speed is GPU internal, not PCIe related. For the GDDR6 memory specs I can’t help you, though.
The workload sizes I understand of course. I was only wondering why it would be an interesting piece of information to know how fast you run out of memory.
OK thank you… just bought the gigabyte 4090 and I’m going to enjoy it soon :)
Do you mean that the bandwidth (1008GB/s) is the PCIe speed? i mean the bottleneck is where the PCIe is isn’t it? – i think the bandwidth is just the CUDA kernel’s memory space’s bandwidth (where mentioned above, internal to GPU) :)… i heard PCIe can be overclocked, but i hardly believe 32GB/sec to be enough :) specially where i have my 4090 on my PCIe Gen 5 slot :) – the PCIe Gen 4 x16 system i heard it’s running would be just 32GB/s, where it resides on the PCIe Gen 5 slot acting the Gen 4.
note: it’s mentioned in the CUDA DOCS that the bottle neck is where the PCIe is. in: (CUDA Toolkit Documentation 12.1 Update 1).
Best Practices Guide => Memory Optimizations…
CUDA C++ Best Practices (nvidia.com)
“…The peak theoretical bandwidth between the device memory and the GPU is much higher (898 GB/s on the NVIDIA Tesla V100, for example) than the peak theoretical bandwidth between host memory and device memory (16 GB/s on the PCIe x16 Gen3). Hence, for best overall application performance, it is important to minimize data transfer between the host and the device, even if that means running kernels on the GPU that do not demonstrate any speedup compared with running them on the host CPU…”
I’m thinking we can overclock the PCIe in case we needed more… any particular information about that? maybe from nVIDIA?