Question about device buffer as global variable

Hi,

I met a problem that I don’t know how to deal with.

I am writing some device kernel functions that are supposed to be used by upper level user kernel functions, in another word, I am writing a lib which will be called by CUDA kernel functions.

The problem is, each of these interface need to share same buffers (both for reading and writing, each thread should have a such buffer). Since CUDA does not support dynamic allocation, it seems that I have to define buffer of device memory as global variable…

However, the total thread number specified for users kernel is not fixed, so I can not use it to do the allocation, the total size will be huge if I use the maximum values of grid/block dim given by a certain GPU device, such as 65535 x 65535 x 512 x 512 x 64 x 512…

Since these interface might be called for multipole times by once running of user kernel, I can not use a single function with interface flags to wrap it, otherwise, I could allocate buffer in this wrap function…

I really need your kindly help~ Any suggestions are greatly welcomed~!

Susan