My Dear Al-CUDA bretheren,
I have some long standing doubts on usage of CUDA functions.
Lets say. I have a non-global function declared like this:
device void fetchDescriptor(descr_t *src, descr_t *dst);
Now, I invoke this function in many places in my global kernel. In few cases both “src” and “dst” are Shared Memory Pointers. In some other cases, one of them is “global” and the other is “Shared” and so on.
Will such a thing work? Assuming that the CUDA compiler resolves the arguments at the place of invocation (shared/global), Will the compiler inline the function according to the arguments? I doubt it.
Can a future release of CUDA compiler add some support on these lines?
Now, let us say, I have two non-global functions like this:
device void fetchDescriptor(descr_t *globalMemory);
device void processDescriptor();
Both these functions use a “shared memory structure” in common. Now, How
do I declare that shared memory structure.
a ) SHould I declare it in both functions?
b ) Should I declare it in global function and pass the structure as a POINTER? Anyway, Pointers have lot of loop-holes.
c ) Should I declare it in global function and pass the entire structure as argument?
d ) Should I declare it in global function and have the function definitions below it – so that the compiler understands the reference???
How about adding these details in the CUDA programming guide? Thanks.
Can some1 enlighten me on these issues?
Thanks a bunch!