Hi,
I have 2 questions:
- I want to access the shared memory of my kernel from a device function. Can I do this:
device void MyDeviceFunc() {
[indent]extern shared float s;[/indent]
[indent]//…[/indent]
}
global void MyKernel() {
[indent]extern shared float s;[/indent]
[indent]MyDeviceFunc();[/indent]
}
or should I do this:
device void MyDeviceFunc(float s) {
[indent]//…[/indent]
}
global void MyKernel() {
[indent]extern shared float s;[/indent]
[indent]MyDeviceFunc(s);[/indent]
}
- By checking the ptx code I found differences when using device functions. But since device functions are inline shouldn’t the code be the same? Its better to avoid them in order to improve the speed of your kernels?
Thanks