reducing cubin size ?


meybe silly question ;)

I’m working on 64kb intro and currently i’m using CUDA to generate procedural textures (in realtime)
and the whole renderer is olso in CUDA.

all is fine, speed is ok, but the size of cubin ‘bincode’ section is huge :/
(after compressing all with Crinkler the kernels size is 90kb :/)

I cannot compress soruce code in the final executable like for shaders, i need the compilled cubin :/
is there any way to reduce ‘bincode’ section size ?

My current compilation options for nvcc are -cubin -use_fast_math -arch sm_13 -code sm_13
so bincode section should only contain machinecode for GF 2xx series - right ?

I’m using driver api, and intro will be only for GF 2xx series (its realtime distance field rendering that need a looooooot of processing power to be really realtime :))
so i’m not care for other cards (right now) [and i could live witch 1 exec per card situation :)]

Thans for any sugestions.

Yes, plus all the constants.

And make sure all your floating-point literals (also stored with constants) end in an ‘f’ so that they’re encoded as floats and not doubles.

Try to prevent the compiler from unrolling loops. That can make a big difference in the code size if you have a lot of unrolled loops.

You can actually compress .cubin and then decompress it at runtime before loading into CUDA context. This may be a good choice if you can find good compressor with small decompression routine…