Shared queue

CUDAcab · March 4, 2010, 9:38pm

I need to implement a shared queue on CUDA. It needs to be shared by all threads, so it will probably need to be implemented in global memory. Does anybody know of any good implementations already in use for a shared queue on CUDA?

Thanks

seibert · March 4, 2010, 9:54pm

This kind of data structure is very hard to construct in CUDA and is rather inefficient. Can you redefine your program to remove the need for a shared queue?

CUDAcab · March 4, 2010, 10:07pm

I’m new to CUDA so a “shared queue free” solution is not immediately obvious to me. I have a set of shared data that each thread operates on independently with a single given input. i.e. each thread operates on one element of an array given a single read only input. Then, each thread’s result (if it produces a result) needs to be added to the queue and used as an input in a subsequent iteration of a while loop, which runs until the queue is empty. Now I could just store the result in a local queue for each thread, but then coordinating that becomes a mess.

seibert · March 4, 2010, 10:33pm

This sounds like an input array, an output array, and an “output generated” flag array. You might need to follow this output with a compaction step to remove the unused slots from the output array.

CUDAcab · March 4, 2010, 10:52pm

Hm… that seems like something that I should be able to work with, I’ll look into that. Thanks.

Mariam · November 8, 2011, 6:35pm

I implemented the enqueue I hope this can help u. if u find any solution for the makeququ plz provide it here in this forum.

define MaxSize 10

typedef struct Queue

{

int *QItems;

int *index;

}PriQueue q;

shared int localQ[MaxSize];

shared int localQ_index;

shared int globalQ_index;

global void makequeue(){

global void dequeue (){

global void enqueue (int newnode){

//insert new node in the local queue

int index=atomicAdd(&localQ_index,1);

localQ[index]=newnode;

//thread 0 obtain the index of the global queue

if (threadIdx.x==0)

globalQ_index=atomicAdd(q.index,localQ_index);

__synchthreads();

//copy the local queue inot the global queue (other kernels can see)

if (threadIdx.x<localQ_index)

q.QItems[globalQ_index+threadIdx.x]=localQ[threadIdx.x];

}

Topic		Replies	Views
shared memory CUDA Programming and Performance	2	2151	January 30, 2009
Getting wrong output from CUDA kernel CUDA Programming and Performance	6	8297	April 15, 2011
How can I configure this problem is it too big to fit in shared memory? CUDA Programming and Performance	7	3762	October 14, 2008
Basic question about shared memory usage and device functions CUDA Programming and Performance cuda , kernel	3	1821	May 16, 2022
Thread Synchronization CUDA Programming and Performance	4	8504	October 2, 2009
Can CUDA do sequential processing? CUDA Programming and Performance	7	6616	August 24, 2011
Synchronization across all threads CUDA Programming and Performance	9	6609	August 22, 2008
Help with NVIDIA DLI Class "Fundamentals of Accelerated Computing with CUDA Python" CUDA Programming and Performance	3	789	September 14, 2020
Need synchronization between blocks? CUDA Programming and Performance	3	3095	September 16, 2009
copy from 1D array to shared memory matrix in cuda CUDA Programming and Performance	7	2064	June 9, 2015

Shared queue

Related topics