Intructions in a half warp

Cuda_Libre · September 3, 2010, 7:03am

Hello,

I have a little question :

in a kernel, I have :

T[idx1]Â = T[idx2]
Where T is an array of shared memory, and idx1 and idx2 fall into the same half warp (but not in the same warp)
(idx1 and idx2 could be different or not, and there may be bank conflict)
Is it safe, since the source and destination array are the same ?

avidday · September 3, 2010, 7:41am

That statement doesn’t make any sense. If the threads idx1 and idx2 are in the same half-warp, then they are in the same warp. If they are not in the same warp, they cannot be in the same half-warp.

avidday · September 3, 2010, 7:41am

That statement doesn’t make any sense. If the threads idx1 and idx2 are in the same half-warp, then they are in the same warp. If they are not in the same warp, they cannot be in the same half-warp.

Cuda_Libre · September 3, 2010, 8:06am

Sorry, I was totally confused when I wrote this. I just wanted to mean that T[idx1] and T[idx2] fall in the same half warp (and I tried to emphasize the fact that they didn’t fall in different half warps in a given warp External Media )

EDIT : and I realize that “fall in the same half warp” doesn’t make any sense too

The thing to understand is that

T[idx1] = T[idx2]

is an instruction executed by all threads of the grid, with :

T is written by one thread in the grid for all possible x (ie : the value of idx1 is unique in the whole grid)

T[y] is read by 0, 1, or more threads and only in a half warp (ie : the value of idx2 may be the same in different threads in a half warp)

Cuda_Libre · September 3, 2010, 8:06am

Sorry, I was totally confused when I wrote this. I just wanted to mean that T[idx1] and T[idx2] fall in the same half warp (and I tried to emphasize the fact that they didn’t fall in different half warps in a given warp External Media )

EDIT : and I realize that “fall in the same half warp” doesn’t make any sense too

The thing to understand is that

T[idx1] = T[idx2]

is an instruction executed by all threads of the grid, with :

T is written by one thread in the grid for all possible x (ie : the value of idx1 is unique in the whole grid)

T[y] is read by 0, 1, or more threads and only in a half warp (ie : the value of idx2 may be the same in different threads in a half warp)

avidday · September 3, 2010, 8:52am

If T is in shared memory, you only need to consider block level mechanics, because shared memory scope is limited to per block allocations. Even inside a single warp or half warp there will be no guarantees that operation can be safe from read after write problems.

avidday · September 3, 2010, 8:52am

If T is in shared memory, you only need to consider block level mechanics, because shared memory scope is limited to per block allocations. Even inside a single warp or half warp there will be no guarantees that operation can be safe from read after write problems.

Cuda_Libre · September 3, 2010, 9:34am

Hmm, thank you… but I’m now confused. I think forgot again a crucial information :

The half warp that handles a T[idx2] read will also handle the only T[idx1] write where idx1 = idx2.

I’m really sorry for my confuseness…

Cuda_Libre · September 3, 2010, 9:34am

Hmm, thank you… but I’m now confused. I think forgot again a crucial information :

The half warp that handles a T[idx2] read will also handle the only T[idx1] write where idx1 = idx2.

I’m really sorry for my confuseness…

Topic		Replies	Views
Warp writes to the shared memory CUDA Programming and Performance	0	1654	June 2, 2009
the relation between Thread Index and Shared Memory CUDA Programming and Performance	4	3244	February 14, 2009
Basic question on array in shared memory CUDA Programming and Performance	12	8120	December 7, 2009
shared memory bank conflicts when reading? CUDA Programming and Performance	5	2558	August 3, 2007
Conflict in shared memory CUDA Programming and Performance	5	5823	November 16, 2010
Conflict resulting in serialization when multiple warp blocks access (read) the same shared memory v CUDA Programming and Performance	4	1041	April 28, 2014
CUDA Reduction CUDA Programming and Performance	2	1756	March 1, 2009
Shared memory access question CUDA Programming and Performance	3	4037	October 20, 2010
Warp synchronous programming CUDA Programming and Performance	4	1731	January 30, 2015
__syncthreads and shared memory CUDA Programming and Performance	21	4400	June 15, 2011

Intructions in a half warp

Related topics