How to know where the bottleneck is?

Kubrick · February 29, 2008, 10:12am

How can I now if my kernels expend most of their time accessing global memory or performing computations? I have tried the CUDA Profiler but it just provides information about the time expent by each kernel…

AndreiB · February 29, 2008, 10:28am

Do some maths: if your reads are coalesced you can achieve 70 GiB/sec. If your kernel shows close figures — memory is the bottleneck.
Alternatively, you can add some computations and check if running time of your kernel changes.

Kubrick · February 29, 2008, 11:07am

Ok, so let’s say for example, that my kernel follows the next process:

-Read X1 MB of data.
-Process X1 MB of data.
-Write X1 MB of data.
-Read X2 MB of data.
-Process X2 MB of data.
-Write X2 MB of data.

If all my accesses to global memory are coalesced and my card provides a memory bandwidth of BW (in MB per second), I guess that the time spent accessing the memory by each kernel invocation could be estimated by the next formula:

T = (2·X1 + 2·X2)/BW

Am I right? Also, are the constant memory accesses also faster if they are coalesced?
I supposse that those 70 GB/sec are device dependent, right? I have a GeForce 8600M GT which, as far as I know (Wikipedia) can give up to 12.8 or 22.4 GB/sec, but not 70GB/sec.

AndreiB · February 29, 2008, 12:22pm

Constant memory access is fast if all threads in warp access same memory location. In other cases using textures may be a better option.

Yes, 8600M have much smaller bandwidth. Take a look at bandwidthTest SDK sample which will show you actual bandwidth for your device.

Topic		Replies	Views
About global memory CUDA Programming and Performance	0	1952	October 19, 2008
memory bandwidth device to SM bandwidth CUDA Programming and Performance	9	4827	June 10, 2008
coalesced data accesses in global memory CUDA Programming and Performance	1	984	May 11, 2010
Effective global memory bandwidth? CUDA Programming and Performance	17	17715	September 18, 2007
CUDA: Memory performance, What is Global memory bandwidth CUDA Programming and Performance	2	6307	November 2, 2011
Uncoalesced global memory bandwidth CUDA Programming and Performance	3	2285	March 28, 2009
Visual debugger to see if mem access is coalesced CUDA Programming and Performance	7	1114	November 1, 2011
Memory management issues Global and Shared memory management CUDA Programming and Performance	12	4032	March 2, 2009
Global memory overhead CUDA Programming and Performance	3	2150	February 9, 2008
About coalescing CUDA Programming and Performance	6	2712	April 16, 2010

How to know where the bottleneck is?

Related topics