In which way can I syncronize all the threads of all the grids (or of a thread block) that i launch to execution? that it to say, I want to stop all the threads in one point until the last thread comes to these point, and then, all the threads start of new his execution. May I do this with “__syncronize()”??? I do it, but it’s doesn’t work perfectly.
I have a second question: how can i know in a point of my kernel code, the last thread that past above this point? I tried it with a device variable that takes the count of the number of threads that are passing in this point, but it doesn’t work.
In CUDA all are problems! I’m getting crazy!