I’m building a toolkit that offers different algorithms in CUDA. However, many of these algorithms use static constant global data that will be used by all threads, declared this way for example:
static device constant real buf[MAX_NB];
My problem is that if I include all the .cuh files in the library, when the library will be instantiated all this memory will be allocated on the device, even though the user might want to use only one of these algorithms. Is there any any way around this? Will I absolutely have to use the typical dynamically allocated memory?
I want the fastest constant memory possible that can be used by all threads at runtime. Any ideas?