i am trouble that when i comple cuda code, it costs too much time. i noticed that compiling some function needs much more time than others. why?how to resolve it.
Same here. I have eight entry-functions. The --ptxas-options=-v parameter shows, that the needed time increases from function to function. While the first one needs much less than one second, I have to wait about 2 minutes until the last function has been proceeded. The functions are quite small, all the same size.
I already made kernels without that effect. The only thing that is different now: in THIS kernel I specified a static array in constant space with 48kB size.
Does anybody have an idea?
I solved my problem. Indeed the compiler took that much time because of the constant data. First I had something like that:
__constant__ double data[4321] = { .................many entries....... }; //causes the compiler to allocate memory and set with values
I changed my code so that the array is not initialized at startup. I copy the data into the constant space explicitly:
__constant__ double data[4321]; //just allocate the memory
double cpudata[4321] = { ..........many entries...........}; // initialize second array with constants in cpu memory
...
cudaMemcpyToSymbol(data, cpudata, sizeof(double)*4321, 0, cudaMemcpyHostToDevice); // before first kernelusage of the constant array: copy the data to the constant space
That shrinked my compilation time from 3 minutes to 3 seconds :)
thank for your answer. but i do like yours.
constant double data[4321]; //just allocate the memory
double cpudata[4321] = { …many entries…}; // initialize second array with constants in cpu memory
it’s still costs too much time to compile. i find that the function which needs too much time to compile also uses many registers. 88 registers.
do someone also have the same question?