Theoretical and actual values of cuda memory transfer rate

lf98mail · September 7, 2020, 3:35am

Recently, I encountered some problems in the process of learning cuda and would like to ask you some questions. As we know, using CUDA-Z can measure the actual transmission rate between Host and Device and Device to Device. But how should the theoretical rate of this device to device be calculated? In addition, the theoretical rate of the traffic between global and GPU chips should be described by the memory bandwidth. The shared bandwidth inside the GPU is obviously much higher than the memory bandwidth. But is there any way to measure the actual communication rate between global and shared? I very much hope that you can help me answer these questions or provide some information in this regard. thank you very much.

njuffa · September 7, 2020, 5:49am

The theoretical transfer rate is the product of the memory interface width, the memory interface clock, and a memory-type specific multiplier (a power of two, e.g. 2 for DDR3). Note that memory clock for a given GPU generally fluctuates based on power management state, so one would have to find the maximum memory clock by running a memory intensive GPU task and observing the clock rate. I have not seen memory clock being influenced by clock boosting mechanism, but I cannot exclude the possibility of a boost-able memory clock on some GPU.

Since the theoretical transfer rate is not achievable in practice, what is usually of interest (e.g. for a roofline model) is the maximum bandwidth that can be achieved using the most favorable access pattern. For the memory subsystem of modern CPUs and GPUs that is typically on the order of 80% of theoretical.

I am not sure CUDA-Z is a particular reliable way of determining the maximum achievable device-to-device throughput. In the machine I am typing on right now there is a tiny GPU, a Quadro K420. Two different versions of CUDA-Z report device to device memory copy speed as 10026 MiB/sec and 10031 MiB/sec, however with my own program I measure a throughput of 28.51 GB/sec during copying (so the copy transfers data at half that rate, 14.25 GB/sec).

lf98mail · September 7, 2020, 9:41am

First of all, I want to thank you for your patience.

Question 1: Based on your answer above, can it be considered that the theoretical transfer rate of device-to-device is the video memory bandwidth?

Question 2: I think the video memory bandwidth is the theoretical upper limit of the transfer rate from the global memory to the GPU chip (for example, shared memory). Is this correct?

Question 3: According to your answer, I still don’t know how to measure the transfer rate from global memory to shared memory. Do you have any method?

I sincerely hope to have further communication with you, thank you very much!

lf98mail · September 7, 2020, 11:21am

Are you here？

rs277 · September 7, 2020, 9:55pm

While not directly addressing your questions, you may find useful related information here and the bibliography may offer direction on how to perform the testing:

lf98mail · September 10, 2020, 11:08am

Thank you very much for the documentation, I got some useful information from it. In addition, can you tell me where you got this document, or do you have similar documents? Looking forward to your reply, thank you very much.

rs277 · September 10, 2020, 10:15pm

I got it from the same place as you. There is another paper with the same name, but “Volta” instead of “Turing”, which is almost the same.

Topic		Replies	Views
The theoretical and actual values of cuda memory transfer CUDA Developer Tools	1	421	September 9, 2020
How to calculate the theoretical memory bandwidth? CUDA Programming and Performance	8	9771	December 18, 2024
Theoretical calculation of memory transfer rate CUDA Programming and Performance cuda	0	398	August 25, 2020
the theoretical device-device bandwidth CUDA Programming and Performance	6	3416	February 18, 2009
Is my bandwidth calculation right? bandwidth CUDA Programming and Performance	3	1557	November 13, 2009
THEORETICAL BANDWIDTH vs EFFECTIVE BANDWIDTH CUDA Programming and Performance	13	7611	February 23, 2017
Bandwith Problem CUDA Programming and Performance	7	2805	March 16, 2009
How to calculate memory bandwidth from device properties ? CUDA Programming and Performance	11	5781	June 20, 2015
upper limit for memory bandwidth on the device ? CUDA Programming and Performance	13	11481	July 8, 2009
Bandwidth measurement Theortical bandwidth vs BandwidthTest(SDK) results CUDA Programming and Performance	4	1676	May 30, 2011

Theoretical and actual values of cuda memory transfer rate

Related topics