So I am working on a project and i wrote a cuda kernel and a bunch of small device functions that it calls. the cuda kernel is run with ~960 blocks of 16x16 threads. I found that when i compile the .cu file, nvcc first spits out a bunch of the warning message below, but then sits “idle” for a while (ranging from 5-20 minutes, depending on computer) before it finished compiling. I was wondering what is causing the compiler to take so long?
my other questions is how to properly allocate the local variables I need for my functions. I am adapting someone else’s code and some of the functions are pretty hideous and take in like 20 arguments and try to allocated ~40 local variables. I think I read that the integers (most of the local vars) would be put in registers and with them as they were, i got a error from nvcc saying it ran out of registers. I was wondering what the best way to handle these kinds of functions is? I ended up created a struct with all the needed fields and just allocatin that, as I think it goes into shared memory if i read that right. I could also break up the functions more, but then I would need to pass more and then the functions with ~20 parameters would just get even worse. Any ideas?