cudaGetDeviceProperties executes very slow on GTX 980

TrammyKelevra · October 22, 2014, 4:14pm

Hello all,

I’ve just got the new GTX980 to work with and while working with it I encountered the problem that the execution time of cudaGetDeviceProperties is painstakingly slow. Right now, the function needs roughly 0.25 ms to execute while on the old system it only needs 0.025 ms. Can anyone explain to me why it’s this much slower?

Here is some information about the systems and the code.

Specs of the current system
Dell Precision T7600 Workstation
CPU: Intel Xeon E5-2630 @ 2.30 GHz
GPU: NVIDIA GeForce GTX 980
RAM: 4x8 GB 1333 MHz ECC RDIMM in 4 channel mode
HDD: 255 GB SSD
OS: Ubuntu 14.04
Driver Verstion: 343.22
CUDA Version: 6.5 with support for GTX 9xx GPUs

Specs of the old system
CPU: Intel Core i7-2600S @ 2.8 GHz
GPU: NVIDIA GeFordce GTX 680
RAM 4x4 GB 1333 MHz in 2 channel mode
HDD 255 GB SSD
OS: Ubuntu 12.04
Driver Verstion: 331.62
CUDA Version: 6.0
(The CPU and memory of the old system are overclocked, though I don’t know any details since I just work with these workstations)

Host code

#include <stdio.h>
#include <stdlib.h>

// host main function
int main(void) {

	// define number of runs
	int runs = 50;

	// create events
	cudaEvent_t start, stop;
	cudaEventCreate(&start);
	cudaEventCreate(&stop);
	float time;

	// get device props
	cudaEventRecord(start, 0);

	for (int run = 0; run < runs; run++)
	{
		cudaDeviceProp prop;
		int device;
		cudaGetDevice(&device);
		cudaGetDeviceProperties(&prop, device);
	}
	cudaEventRecord(stop, 0);
	cudaEventSynchronize(stop);
	cudaEventElapsedTime(&time, start, stop);
	printf("Time to get device properties: %5.5f ms.\n", time/runs);

	return 0;
}

Compiled with: nvcc -v -O2 GetDeviceProp.cu -o getProps

Is the difference in execution simply to explain with the overclocked system parts or is there something else to consider?

Thanks for your time. Any help is much appreciated.

Best regards,

Trammy

njuffa · October 22, 2014, 5:21pm

A few questions for clarification:

(1) What is the operating system on the two respective systems?
(2) Are both systems using the same CUDA toolkit and CUDA driver versions?
(3) How is the (sub-millisecond) execution time of cudaGetDeviceProperties() relevant to application performance, given that this is a function usually called once at the start of an application? The execution time for cudaGetDeviceProperties() is probably well below the CUDA context initialization time.

TrammyKelevra · October 23, 2014, 8:04am

Thanks for your reply. To answer your questions

(1) The current system uses Ubuntu 14.04, the old system Ubuntu 12.04

(2) Both systems are using the CUDA Toolkit 6.5. On the current system the version with the support for the GTX 9xx GPUs is installed.
The old system runs with driver version 331.62 and the current one with driver version 343.22.

(3) Well, I was using the function getNumBlocksAndThreads from the CUDA-reduction sample in my code and it was called a couple of times. There, prop.maxGridSize and prop.maxThreadsPerBlock are used and the device properties are fetched in every function call. I never gave much thought about it and I now realize that it is not very smart to use the cudaGetDeviceProperties function in the first place. My program needs to be executed in <30 ms and therefore the cumulative timing of this function did matter. I’ve written a workaround and everything is fine. So my question is more out of curiosity.

Cheers,

Trammy

njuffa · October 23, 2014, 1:11pm

Your assumption seems to be that the type of GPU leads to a different execution time. But given that different OS versions and different driver versions are involved, one cannot conclude that. One would have to do controlled experiments, in which only a single variable is changed at a time. I would think that even CPU performance is one of those variables, as the driver code executes on the CPU.

You may also want to re-think the measurement methodology. Often the first execution of a piece of code triggers additional “penalties”, anything from cache misses to move code into the CPUs ICache to one-time initialization overhead for device hardware. A better methodology usually is to report the best time out of N runs, where N >= 2. This is the methodology used by the well-known STREAM benchmark for example, which uses N=10 by default.

Robert_Crovella · October 23, 2014, 1:56pm

CUDA 6.5 is not compatible with driver 331.62

TrammyKelevra · October 23, 2014, 2:35pm

Thanks again for you replies.

You are right, I checked and CUDA 6.0 is installed on the old system. My mistake.

I guess it won’t be possible for me to find out what exactly triggered the high execution times since I can’t change the hardware of the workstations. But thanks for your input. Luckily it was not hard to rewrite the code.

njuffa · October 23, 2014, 3:14pm

txbob raises an important point. If the driver version 331.62 is insufficient to run CUDA 6.5, it stands to reason that the execution time of cudaGetDeviceProperties() was lower on the older system because the API call failed and returned right away. A best practice is to check the return status of all API calls.

Topic		Replies	Views
cudaGetDeviceProperties - wrong results CUDA Programming and Performance	0	6001	June 16, 2009
cudaGetDeviceProperties return cudaErrorInvalidValue CUDA Programming and Performance	1	2216	June 1, 2009
Cuda2.1 and Cuda2.0 give different results for cudaDeviceGetProperties on non-cuda machines CUDA Programming and Performance	0	1459	January 20, 2009
Driver API version and toolkit version CUDA Programming and Performance	4	8391	July 9, 2009
really slow cudaGetDeviceCount() several seconds to complete a cudaGetDeviceCount() call CUDA Programming and Performance	3	1246	May 18, 2011
I don't understand the execution time (k40c & GTX580). CUDA Programming and Performance	9	2507	April 23, 2015
cudaGetDeviceProperties fails on a GeForce 9600M GS CUDA Programming and Performance	1	925	May 7, 2009
cudaSetDevice() time, so weird! cudaSetDevice() take a long time. CUDA Programming and Performance	10	4657	August 2, 2010
My GPU Became Slower... after 1 month of not testing cuda CUDA Programming and Performance	18	12236	August 23, 2010
more time taken by CUDA rather than reducing time CUDA Programming and Performance	7	4636	November 18, 2010

cudaGetDeviceProperties executes very slow on GTX 980

Related topics