CUDA non-global functions Shared memory declaration, pointer usage

My Dear Al-CUDA bretheren,

I have some long standing doubts on usage of CUDA functions.

Lets say. I have a non-global function declared like this:

device void fetchDescriptor(descr_t *src, descr_t *dst);

Now, I invoke this function in many places in my global kernel. In few cases both “src” and “dst” are Shared Memory Pointers. In some other cases, one of them is “global” and the other is “Shared” and so on.

Will such a thing work? Assuming that the CUDA compiler resolves the arguments at the place of invocation (shared/global), Will the compiler inline the function according to the arguments? I doubt it.

Can a future release of CUDA compiler add some support on these lines?

  1. Now, let us say, I have two non-global functions like this:

    device void fetchDescriptor(descr_t *globalMemory);
    device void processDescriptor();

    Both these functions use a “shared memory structure” in common. Now, How
    do I declare that shared memory structure.
    a ) SHould I declare it in both functions?
    b ) Should I declare it in global function and pass the structure as a POINTER? Anyway, Pointers have lot of loop-holes.
    c ) Should I declare it in global function and pass the entire structure as argument?
    d ) Should I declare it in global function and have the function definitions below it – so that the compiler understands the reference???

How about adding these details in the CUDA programming guide? Thanks.

Can some1 enlighten me on these issues?

Thanks a bunch!

  1. Try it and let us know how it goes. My hunch is that compiler should be able to resolve the pointers during inlining. If not, please post a small repro case.

  2. Declaring smem variables inside these functions should be OK, but will most likey separate storage will be assignd for each invocation during inlining. So, if you want to conserve smem space, I’d recommend declaring smem data in the caller and pass the pointer.

  3. The compiler will warn if it cannot figure out whether a pointer is to gmem or smem. These aren’t really CUDA, but more of compiler features, so that’s why they don’t appear in the Programming Guide. We keep improving the documentation, so your feedback is welcome.

Paulius

I tried before. But the compiler did NOT seem to do that. I will repeat the exercise and file the case with more substantial evidence.

Aaah… Thats kindaa tough. It would be nice if the compiler generates a union of shared memory space and make it accessible by all device functions pertaining to that particular kernel. This means that there must be a way to tell which device function belongs to which kernel – because there could be multiple kernels written in a single file.

From a design point of view, if one can think of a GPU kernel as an object – then, one can write each kernel as a “Class” and put in all SHARED memory objects and local variables as “private” data. THis way, member device functions can access these data without having to pass pointers around and will also eliminate the un-necessary SHARED memory problem with functions. This will make it easy to functionalize and maintain CUDA code. This coupled with shared/global attribute for pointers will make CUDA code very much maintainable and readable.

Thinking of it again, it is NOT actually a CLASS. Coz, CLASS is more of a template. But what we create here is the object itself. May b, NVCC could extend itself to write OBJECTs directly instead of CLASSes.

But yes, this means extensive re-designing… But somewhere down the line, NVIDIA could consider doing it – mainly because I find CUDA code as extremely un-readable even to the programmer after it crosses some 300 lines… :-(

Sure. Thank you. But I feel that a programmer’s guide is different from an architecture guide. Programmer’s guide should contain all information necessary for the programmer. And, these nitty gritties of the compiler must be known to the programmer, IMHO.

okay, Thanks for your time.

Best REgards,

Sarnath