functions in kernel module


i’ve a kernel module who has some calls to functions who have some other calls to other functions and so on…
Since each function has its own local parameters and i pass some others too, what happens when the threads actually execute the functions? Do the threads make separate copies of all the variables or there is the risk they read/write the same variables?
Which is the best way in this case to be sure the threads do not interfere themselves?

device functions are inlined by default (unless you use noinline in CUDA 1.1). As such, parameters are “copied” in the sense that new registers are probably created for them in the ptx. But much register optimization happens between the ptx and the cubin, so the copies are optimized away. The result should be no different from you copying and pasting the function code into that part of the kernel.

Unless you declare a variable shared there is absolutely no possibility other threads can overwrite it. “int a” defines a variable a that is independent in every thread executed. Now, if you are passing a device pointer into the function and writing to that device pointer then you can likely cause a race condition if you have multiple threads writing to the same location, but that is no different whether it is in a device function or a global function.

I am not sure i undestood your last sentence, could you rephrase please?

And i’m actually passing some device pointers, is there any precautions i should take?

It’s simple: you have to imagine that every single thread within a single kernel invocation running on the device is running simultaneously. Thus there are things you should avoid doing to get consistent and correct results.
Avoid doing these in global memory:

  1. Reading output of one thread in another
  2. Two threads writing to the same location (exception: atomic ops on sm11 hardware)