Very large kernels How to compile a large cuda kernel?

ovaround · December 13, 2008, 6:34am

We have a (machine-generated) kernel file that is quite large. When trying to compile, the nvcc bails out with “out of heap error”.

As a work-around, we tried splitting the one large files into multiple kernel files. Whether multiple template/kernel files are permitted is unclear from the docs – we had no luck getting that to go using Visual studio environment. I don’t know if that is something we did wrong with visual studio, or just a fundamental limitation of how nvcc works. Basically, visual studio only wanted to compile one of the template files (even though we did the same kind of “custom build setup” on all the template files).

Even weirder, when we try putting multiple #include statements in a single template file, only the first kernel file got included in. The rest seemed to be ignored.

The manual says there is an upper limit of 2 million ptx instructions in the kernel. Does that limitation manifest itself in the “out of heap error” encountered with nvcc compilation? And just out of curiousity, why such a low limit on instruction size, and are there workarounds?

jack · December 13, 2008, 4:03pm

Could you split your kernel into some smaller ones, compile them to cubin or PTX files, and them call them in order using the driver API? You wouldn’t need to copy back the resulting data until the last sub-kernel has completed.

As for the 2M instruction limit…I don’t think there are any workarounds besides splitting your kernel into pieces. Something to do with a hardware limitation if I remember correctly.

alex_dubinsky · December 13, 2008, 8:48pm

Out of heap? I’ve had nvcc run out of stack, which I corrected by using ‘editbin’ to modify the executable. I’m not sure you can run out of “heap” unless you just run out of memory on your system.

I don’t know why visual studio is not letting you compile multiple files, try doing “rebuild all.”

Once you get it compiling, separate object files probably won’t work right away. You may need to put a c wrapper around each kernel call to let you call into it. (I don’t think the <<< >>> syntax will work)

P.S. to test you idea on a basic level, copy your kernel to its own file and run nvcc -cubin on it. If that doesn’t work, then neither will juggling .cu files.

Topic		Replies	Views
NVIDIA people, please pay attention, still have no meaningful answer How to estimate the proximity t CUDA Programming and Performance	5	596	November 5, 2010
CUDA kernel size What if it exceeds 2MB CUDA Programming and Performance	4	3811	November 5, 2007
kernel function size limit? how many lines or variables are allowed? CUDA Programming and Performance	7	7068	November 15, 2007
Multiple CUDA files CUDA Programming and Performance	0	6827	June 3, 2010
Maximum number of instruction inside a Kernel CUDA Programming and Performance	9	2849	October 13, 2009
NVCC crash! maybe my code is too complicated? CUDA Programming and Performance	1	3386	November 4, 2009
Size of CUDA Object Code? CUDA Programming and Performance	5	1746	November 24, 2010
Total GPU code size limit per process? CUDA fails upon loading too many code CUDA Programming and Performance	0	1351	April 28, 2008
How to find out how many ptx instructions are in the kernel ? Keeping in mind the 2 million ptx inst CUDA Programming and Performance	11	7308	September 18, 2009
Organizing Code Many long kernels in a single .cu file failing compilation CUDA Programming and Performance	1	1895	August 6, 2009

Very large kernels How to compile a large cuda kernel?

Related topics