To use cbuffer or texfetch, we have to declare global-scope vars, such as:
//kernel1.cu:
__constant__ int cbuf1[2000];
//below are functions
...
Now i have 10 alternative cu’s, but i’ll use only 1 alternative and its corresponding kernel at each main() call. Which kernel is called depends on the cmdline args:
//main.cu:
#include alt1.cu
...
#include alt10.cu
int main(int argc, char** argv)
{
switch(argv[1])
case 1:
runAlt1();
...
case 10:
runAlt10();
}
The problem is: I can’t #include Alt1.cu through Alt10.cu: the TOTAL cbuffer size exceeds 64KB. Although i use only one alt, the compiler always try to allocate the global vars for ALL alts.
Therefore, i have to manually comment on/off the #include’s everytime i run some alt, and rebuild all. I can’t therefore use batch file to run data. That’s very time-wasting.
You mean int or float? Then I think you can use __int_as_float
And the size of the constant buffer is not important, constants are as slow as global memory, what makes them fast is the constant cache (which is only useful if you access the same element in all threads of a warp (or block I forgot)).
If your threads access different values, you are better off using a texture. Which might be even easier for you.
Hmm, how to structures I do not really have an idea other than packing them into float4 and stuff.
I found a thread asking how blockidx, etc are implemented, and this is what mark harris from NVIDIA said:
threadIdx is initially in register R0 of each thread (the x and y components are in the 16 LSBs and MSBs, respectively), not in shared memory. It is placed there by the hardware on invocation of each block. If the kernel doesn’t use threadIdx the compiler may choose to use the register for something else.
blockIdx, blockDim, and gridDim are passed as parameters in shared memory, because all threads in a block will read the same location when these are read.
So filling up your const memory and not using it should not have impact.
If you define the kernels in different compilation units (cu files), you won’t run against the 64kB limit. Constant buffers are only provided to a kernel if they are in the same unit.
Of course, this might give other trouble, as you can’t share anything between compilation units.