I am wondering if when one device function calls threads, it is sure that all threads are finished at the end of this function.
For example, I have two device functions A and B which respectively create use( in the same block) an amount of M and N threads. If I execute in a kernel: A and then B, are all A threads finished when B begins? :mellow:
What do you mean by “create threads”? Device functions cannot alter the blocksize, this is set at kernel launch. The only time the thread number changes at runtime is when threads exit.
What do you mean by “create threads”? Device functions cannot alter the blocksize, this is set at kernel launch. The only time the thread number changes at runtime is when threads exit.
Sorry, it was not really explicit… I have edited :)
In facts, I just want to know if same block threads “used” in two different device functions, are really executed with a kind of synchronisation between A and B call like this:
device
void toto(int * tab)
{
tab[ThreadId.x]=1;
}
device
void toto2(int * tab)
{
tab[ThreadId.x]+=1;
}
global
void kernel()
{
int [4] tab={-1,-1,-1,-1};
toto(tab); // <= example of A device function
toto2(tab); // <= example of B device function
// here: does tab==[2,2,2,2]??? or something else ??
Sorry, it was not really explicit… I have edited :)
In facts, I just want to know if same block threads “used” in two different device functions, are really executed with a kind of synchronisation between A and B call like this:
device
void toto(int * tab)
{
tab[ThreadId.x]=1;
}
device
void toto2(int * tab)
{
tab[ThreadId.x]+=1;
}
global
void kernel()
{
int [4] tab={-1,-1,-1,-1};
toto(tab); // <= example of A device function
toto2(tab); // <= example of B device function
// here: does tab==[2,2,2,2]??? or something else ??
Your example works, but only because each array element is always accessed by the same thread. In general, if array elements were to be accessed by different threads, you would have to put __syncthreads() between the accesses, i.e. between the two function calls in your example.
Function calls do not provide any synchronization. Some threads may be inside the function, while others have already left it or are in a different function.
Your example works, but only because each array element is always accessed by the same thread. In general, if array elements were to be accessed by different threads, you would have to put __syncthreads() between the accesses, i.e. between the two function calls in your example.
Function calls do not provide any synchronization. Some threads may be inside the function, while others have already left it or are in a different function.