how to use the .uni suffix....?!?

Hello,

I wold like to know how to use .uni suffix?!?

I need this… read this on page 35 which says:

7.5. Divergence of Threads in Control Constructs

“All control constructs are assumed to be divergent points unless the control-flow
instruction is marked as uniform, using the .uni suffix.”

.uni is a suffix for a PTX branch, call, return instructions. You have to manually edit the assembly code of your program to insert these. They merely say that the branch is always taken in the same direction by every thread in a warp, which can be used by the PTX assembler to optimize the code for a particular GPU. The optimizations are not that dramatic (I implemented it once for an architecture without hardware support for divergence with two extra instructions per branch), and NVCC should detect a significant number of these cases already. Unless you are writing a compiler for CUDA I wouldn’t worry about this at all.

understand… maybe you could help… i have the following code:

block[3,3]…

aux = [0 1 0 0 0 0

        0 0 0 1 0 0

        0 1 0 0 0 0

        0 0 0 0 0 0

        0 1 0 0 0 0

        0 0 0 0 0 0 ]

mask=0;

if (aux>0) mask = 4;

answer:

mask = [0 4 0 0 0 0

         0 0 0 4 0 0

         0 4 0 0 0 0

         0 0 0 0 0 0

         0 4 0 0 0 0

         0 0 0 0 0 0 ]

Now, i need the answer is:

mask = [4 4 4 4 4 4

         4 4 4 4 4 4

         4 4 4 4 4 4

         4 4 4 4 4 4

         4 4 4 4 4 4

         4 4 4 4 4 4 ]

already tried:

if (__any(aux) > 0) mask = 4;

answer:

mask = [4 4 4 4 4 4

         4 4 4 4 4 4

         4 4 4 4 4 4

         4 4 4 0 0 0 

         4 4 4 0 0 0

         4 4 4 0 0 0 ]

how do I do this?

I would assign one thread to each element and then do a conditional select based on aux;

Something like

mask[id] = (aux[id] > 0) ? 4 : 0;

Hello, tried but failed!!! The real problem is this:

[codebox]

global void cuMykernel( float* g_odata, int width, int height)

{

unsigned int x = blockIdx.x*blockDim.x + threadIdx.x;

unsigned int y = blockIdx.y*blockDim.y + threadIdx.y;		

__shared__ float p,q,q_out,out;

__shared__ int pos,i,j,mask,flag,idx,idy;

out = {4 4 4 4 4 5 6 7 3 0

      0 3 0 0 0 0 6 8 2 0

      1 2 0 0 0 4 4 4 5 6

      4 2 1 0 0 3 3 0 0 5

      3 3 2 1 1 2 2 0 0 0

      4 5 6 0 7 4 2 0 0 1

      3 0 7 1 8 3 2 0 0 7

      2 1 8 2 1 2 2 1 1 8};

count = 0;

	do{	

		flag = 0;

                    g_odata[y * width + x] = out;			

                    fcount++; 

		mask = out == 0;					

		p = tex2D(tex_img,(float)x,(float)y);		

		

		for(pos=1;pos<9;pos++){										

			arrow(pos,&i,&j);			

			q = tex2D(tex_img,(float)x+i,(float)y+j);			

			idy = y+j; if(idy<0) idy=0; if(idy>=height) idy=height-1;

			idx = x+i; if(idx<0) idx=0; if(idx>=width) idx=width-1;

			q_out = g_odata[idy * width + idx];			

							

			if(q_out > 0 && q <= p && mask==1){

				p = q;

				out = pos;	

				flag = 1;

			}				

		}		

	}while(__any(flag));

}

[/codebox]

Need to run this loop to stabilize the flag. (all are 0). The point is it running this time 1 and leaves the loop. how could fix this?