Understand Performance Difference

Greetings all!

I am running PageRank on GTX 480 and Titan X. I notice that when I run PageRank with nvprof it gives the following numbers on GTX 480:

Time(%) Time Calls Avg Min Max Name
59.74% 1.49669s 116 12.903ms 12.443ms 13.977ms spmv(int*, float*, int*, int, float*, float*, bool*, bool*, __int64*, __int64*, Lock)
26.40% 661.27ms 362 1.8267ms 672ns 193.40ms [CUDA memcpy HtoD]
12.11% 303.29ms 328 924.68us 1.3760us 17.357ms [CUDA memcpy DtoH]
1.75% 43.878ms 636 68.990us 28.448us 109.89us [CUDA memcpy DtoD]
0.00% 26.624us 2 13.312us 12.576us 14.048us [CUDA memset]

When I run on Titan X it gives the following numbers:

Time(%) Time Calls Avg Min Max Name
85.08% 6.68180s 146 45.766ms 43.759ms 45.990ms spmv(int*, float*, int*, int, float*, float*, bool*, bool*, __int64*, __int64*, Lock)
10.39% 815.93ms 452 1.8052ms 864ns 229.08ms [CUDA memcpy HtoD]
3.76% 295.50ms 418 706.94us 2.1760us 3.9042ms [CUDA memcpy DtoH]
0.76% 59.879ms 816 73.381us 31.137us 116.23us [CUDA memcpy DtoD]
0.00% 28.129us 2 14.064us 13.281us 14.848us [CUDA memset]

I was wondering the number of calls to the functions (like say spmv in this example) should be the same. Why is it different i.e., the number of calls to spmv on GTX 480 is 116 while on the Titan it is 146. I even tried running the same PageRank on both the GPUs and still I notice the same difference. Can anyone help me ?


If you are using pagerank from the nvgraph library in CUDA 8, then its entirely possible that the library is querying the device type in use, and is adjusting its code path based on the detected device.

This is common practice in many NVIDIA CUDA libraries, as well as probably other libraries.

For example, a GTX 480 is a compute capability 2.0 device. One ramification is that the grid x dimension for kernel launches is limited to 65535, whereas on later devices this limit increases to 2^31-1. Many libraries account for this difference with modified kernel calls, to take advantage of the differing capabilities of these GPU types.

Hi there!

Thanks for the prompt reply. I have in fact written my own pagerank for CUDA and I dont do any fancy device query, it is just simple CUDA code. If needed I can post the code.