Slow compiling with CUDA 7.5 and MS VS 2013

catrexis · January 5, 2016, 7:14am

Hi, I just started programming with cuda… it took 1 day to get everything to work and try a small testprogramm I found at youtube. Is it normal, that compiling takes about 20 sec for such a simple code? It takes the same time in release mode…

#include <iostream>
#include "device_launch_parameters.h"
#include <cuda_runtime_api.h>
#include <stdlib.h>
#include <ctime>

using namespace std;

__global__ void add(int* a, int *b,int * c,  int count)
{
	int id = blockIdx.x*blockDim.x + threadIdx.x;
	if (id < count)
		 c[id] = a[id] + b[id];
}

int main()
{
	srand(time(NULL));
	int count = 1000;


	int *h_a = new int[count];
	int *h_b = new int[count];
	int *h_c = new int[count];

	for (int i = 0; i < count; i++)
	{
		h_a[i] = rand() % 1000;
		h_b[i] = rand() % 1000;
		h_c[i] = 0;
	}


	int* d_a;
	int* d_b;
	int* d_c;

	if (cudaMalloc((void**)&d_a, sizeof(int)*count) != cudaSuccess || 
		cudaMalloc((void**)&d_b, sizeof(int)*count) != cudaSuccess ||
		cudaMalloc((void**)&d_c, sizeof(int)*count) != cudaSuccess)
	{
		cout << "false alloc" << endl;
		cudaFree(d_a);
		cudaFree(d_b);
		cudaFree(d_c);

		delete[] h_a;
		delete[] h_b;
		delete[] h_c;
		return 0;
	}

	if (cudaMemcpy(d_a, h_a, sizeof(int)*count, cudaMemcpyHostToDevice) != cudaSuccess || 
		cudaMemcpy(d_b, h_b, sizeof(int)*count, cudaMemcpyHostToDevice) != cudaSuccess)
	{
		cout << "false cpy" << endl;
		cudaFree(d_a);
		cudaFree(d_b);
		cudaFree(d_c);

		delete[] h_a;
		delete[] h_b;
		delete[] h_c;
		return 0;
	}
	
	add<<<count/256 +1,256>>>(d_a, d_b, d_c, count);

	if (cudaMemcpy(h_c, d_c, sizeof(int)*count, cudaMemcpyDeviceToHost) != cudaSuccess)
	{
		cout << "false cpy back" << endl;
		cudaFree(d_a);
		cudaFree(d_b);
		cudaFree(d_c);

		delete[] h_a;
		delete[] h_b;
		delete[] h_c;
		return 0;
	}

	for (int i = 0; i < 100;i++)
		cout << h_a[i] << "+" << h_b[i]  << "=" << h_c[i] << endl;

	cudaFree(d_a);
	cudaFree(d_b);
	cudaFree(d_c);

	delete[] h_a;
	delete[] h_b;
	delete[] h_c;


	return 0;
}

Or do I have to change some options in my visual studio 2013 community? What can I do? And please keep explanations simple. I’m just starting with CUDA.

The buildnotes look like:

1>  Compiling CUDA source file main.cu...
1>  
1>  d:\dokumente\visual studio 2013\Projects\Cudatest3\Cudatest3>"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v7.5\bin\nvcc.exe" -gencode=arch=compute_20,code=\"sm_20,compute_20\" --use-local-env --cl-version 2013 -ccbin "C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\bin"  -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v7.5\include" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v7.5\include"  -G   --keep-dir Debug -maxrregcount=0  --machine 32 --compile -cudart static  -g   -DWIN32 -D_DEBUG -D_CONSOLE -D_LIB -D_UNICODE -DUNICODE -Xcompiler "/EHsc /W3 /nologo /Od /Zi /RTC1 /MDd " -o Debug\main.cu.obj "d:\dokumente\visual studio 2013\Projects\Cudatest3\Cudatest3\main.cu" 
1>  main.cu
1>  Cudatest3.vcxproj -> d:\dokumente\visual studio 2013\Projects\Cudatest3\Debug\Cudatest3.exe

the long second line takes the most time…
(i’m using a i7-2600k, 16GB RAM, GTX 970)
thanks in advance =)
cat

PS: btw why do I have to use

#include "device_launch_parameters.h"
#include <cuda_runtime_api.h>

instead of

#include <cuda.h>

like i saw in many tutorials??

HannesF99 · January 5, 2016, 8:57am

Regarding the different cuda header files, see header files - Difference between cuda.h, cuda_runtime.h, cuda_runtime_api.h - Stack Overflow

Robert_Crovella · January 5, 2016, 3:30pm

In my experience, a ~10 second compile time even for very simple CUDA codes in Visual Studio is common. The CUDA toolchain is a multi-step process that invokes a sequence of operations to compile a CUDA code. Each one of those operations might only be a second or 2, but when you stack them up, you get 10 seconds, even for a simple code. The cuda toolchain compiler supports an option to show all the steps it is performing, although I’ve never used that option in windows.

If you build a CUDA sample code, you will discover those projects are set up to target multiple different GPU types. If you build your own projects this way, the compile times will be longer. You can reduce the compile times by only building binaries for the GPUs you intend to run the code on.

cuda.h is the header file you would use if you were programming to the CUDA driver API.
cuda_runtime_api.h is the header file you would use if you were programming to the CUDA runtime API.
Programs with kernel calls like you have shown:

add<<<count/256 +1,256>>>(d_a, d_b, d_c, count);

are using the CUDA runtime API method to access the GPU.

You don’t have to use device_launch_parameters.h People include that because it helps with intellisense. But it’s not necessary to actually get a CUDA code to compile (the nvcc compiler will include that automatically as part of the compilation process).

catrexis · January 6, 2016, 10:06am

Okay thx for your help =)
How do I “only build binaries for the GPUs you intend to run the code on”. Can you explain, how to do that?

Robert_Crovella · January 6, 2016, 3:00pm

It’s in the “code generation” option described here:

[url]http://docs.nvidia.com/nsight-visual-studio-edition/3.2/Content/CUDA_Properties_Config.htm[/url]

Pick code generation to match the GPU you are intending to use.

For your GTX 970, that means pick code generation of compute_52,sm_52

You can have multiple code generation options selected (take a look at one of the sample projects) in which case it will take longer to compile.

Topic		Replies	Views
CUDA slower than CPU? CUDA Programming and Performance	7	796	August 18, 2023
64 bit Windows 10, gtx 1060, CUDA kernel startup time? CUDA Programming and Performance	12	2837	October 10, 2017
Slow compilation C++/Cuda GenerateDeps CUDA Setup and Installation	6	3135	April 28, 2020
Generate CUDA at run-time ? CUDA Programming and Performance	13	3066	September 28, 2011
well how do I know if cuda runs on the gpu CUDA Programming and Performance	20	13256	July 9, 2008
Slow compile and cudaMalloc CUDA Programming and Performance	8	3691	February 2, 2011
more time taken by CUDA rather than reducing time CUDA Programming and Performance	7	4608	November 18, 2010
need a help from employees or guys who know compiler well CUDA Programming and Performance	22	8614	December 18, 2008
Unable to compile CUDA file CUDA Setup and Installation	9	10210	May 19, 2017
Error in my code... CUDA Programming and Performance	11	2538	December 19, 2014

Slow compiling with CUDA 7.5 and MS VS 2013

Related topics