I got a problem with one of my device functions. I have to call this function on the output of the previous run again and again, so splitting the Kernel and moving data back and forward is not really an option. But with each consecutive call within the kernel NVCC takes longer and longer to compile.
Calling the function once is compiled within a couple of minutes while calling it two times already takes almost an hour. The runtime is below 1/10 milliseconds, the kernel produces the expected results and cuda-memcheck doesn’t report any errors. With my current settings it’s far from exhausting any of the memory limits and calling the function again should not increase the ressource demands of the kernel.
I call nvcc like this:
nvcc -arch=sm_20 file
Does anyone have a clue what might be the cause of this exponential increase in compile time and how I might reduce it? Thanks in advance!