device functions newbie question - passing shared mem

Hi,

I have 2 questions:

  1. I want to access the shared memory of my kernel from a device function. Can I do this:

device void MyDeviceFunc() {
[indent]extern shared float s;[/indent]
[indent]//…[/indent]
}

global void MyKernel() {
[indent]extern shared float s;[/indent]
[indent]MyDeviceFunc();[/indent]
}

or should I do this:

device void MyDeviceFunc(float s) {
[indent]//…[/indent]
}

global void MyKernel() {
[indent]extern shared float s;[/indent]
[indent]MyDeviceFunc(s);[/indent]
}

  1. By checking the ptx code I found differences when using device functions. But since device functions are inline shouldn’t the code be the same? Its better to avoid them in order to improve the speed of your kernels?

Thanks

In your first example, it will work because extern shared arrays always refer to the same space. Your second example might work, but sometimes the compiler has trouble determining whether the pointer points to shared or global space, and if it guesses wrong, the program won’t work. It will generally issue a warning in this case.

As for your second question, I would point out that a good deal of optimization happens in the translation from PTX to cubin. So certain kinds of differences in PTX could produce identical cubin, or equally fast. I don’t have any experience one way or the other, but if you’re really concerned, use decuda to determine the machine instructions, and run benchmarks.