about device threads

Hello everybody :wave: ,

I am wondering if when one device function calls threads, it is sure that all threads are finished at the end of this function.

For example, I have two device functions A and B which respectively create use( in the same block) an amount of M and N threads. If I execute in a kernel: A and then B, are all A threads finished when B begins? :mellow:

What do you mean by “create threads”? Device functions cannot alter the blocksize, this is set at kernel launch. The only time the thread number changes at runtime is when threads exit.

What do you mean by “create threads”? Device functions cannot alter the blocksize, this is set at kernel launch. The only time the thread number changes at runtime is when threads exit.

Sorry, it was not really explicit… I have edited :)

In facts, I just want to know if same block threads “used” in two different device functions, are really executed with a kind of synchronisation between A and B call like this:

device

void toto(int * tab)

{

     tab[ThreadId.x]=1;

}

device

void toto2(int * tab)

{

     tab[ThreadId.x]+=1;

}

global

void kernel()

{

 int [4] tab={-1,-1,-1,-1};

 toto(tab);                          //   <= example of A device function

toto2(tab);                          //   <= example of B device function

// here: does tab==[2,2,2,2]??? or something else ??

}

main()

{

 kernel<<<1,4>>>();

}

Sorry, it was not really explicit… I have edited :)

In facts, I just want to know if same block threads “used” in two different device functions, are really executed with a kind of synchronisation between A and B call like this:

device

void toto(int * tab)

{

     tab[ThreadId.x]=1;

}

device

void toto2(int * tab)

{

     tab[ThreadId.x]+=1;

}

global

void kernel()

{

 int [4] tab={-1,-1,-1,-1};

 toto(tab);                          //   <= example of A device function

toto2(tab);                          //   <= example of B device function

// here: does tab==[2,2,2,2]??? or something else ??

}

main()

{

 kernel<<<1,4>>>();

}

This example is very simple, I could test , but I want to be sure of the synchronisation for very more complex software… :)

This example is very simple, I could test , but I want to be sure of the synchronisation for very more complex software… :)

Your example works, but only because each array element is always accessed by the same thread. In general, if array elements were to be accessed by different threads, you would have to put __syncthreads() between the accesses, i.e. between the two function calls in your example.

Function calls do not provide any synchronization. Some threads may be inside the function, while others have already left it or are in a different function.

Your example works, but only because each array element is always accessed by the same thread. In general, if array elements were to be accessed by different threads, you would have to put __syncthreads() between the accesses, i.e. between the two function calls in your example.

Function calls do not provide any synchronization. Some threads may be inside the function, while others have already left it or are in a different function.

Thanks a lot for your answers :yes:

Do you know where I could find more information about that, eg official documentation please? :)

Thanks a lot for your answers :yes:

Do you know where I could find more information about that, eg official documentation please? :)

I can’t offer anything else than the Programming Guide.

I can’t offer anything else than the Programming Guide.

^^

ok, thx a lot :)

^^

ok, thx a lot :)