modulation scheem on cuda qpsk, qam, dqpsk, etc...

hi everyone

i have a feeling that this probably has been discussed already. if it has been the i probably didnt search hard enough (and apologies for the spam)…

ok. my question goes like so:

is there any way to force cuda execution to be sequentioal? and i mean in terms of lets say 2D matrix (X x Y). where i would like for rows to execute in sequence and columns to execute in parallel. or vise versa doesnt matter as row/col can be made to correspond to data or channel beforehand.

i want to use this for a dqpsk symbol mapper. and my problem is that dqpsk needs reference to previously computed value in order to compute the new symbol. this means that i cant just start a kernel and hope that it would behave exactly the same way every time. there needs to be some sort of flow control that would force execution to be carried out as described above.


for ( int i = 0; i < noutput_items; i++ ) {

out[ i ] = ( in[ i ] + last_out ) % modulus;

last_out = out [ i ];


dqpsk code snipet for cpu that executes single thread i.e. one channel

at a time (i think) and what i want to do is to make that execute multiple

channels at the same time on gpu

is that possible?


i would also like to note that i am by no means an expert in programing and i just started to work with cuda few weeks ago. so any help or pointer would be welcome (even things like hey dude you suck).

thanks a million

ps: dont mind my bad spelling and grammar

Porting your idea to cuda is not difficult. On the other side modulated messages are the lowest layer of many, and in no way CUDA is suited for it: imagine a parallel CUDA SIMT code running throw the state machine of e V.92 modem - all the decoders would take different branches and serialize. If you have not this problem (you have inifinte length QAM streams) than it is reasonable. Maybe the many channel of a DSL line…? However they are not “that many” to fill up a GPU.

On the other side , inside one stream (one stream decoding per warp… sounds better) you can parallelize FIR evaluations, FFTs, correlations… So maybe there is more room for working on this aspect.

I am just sharing my thoughts: I am working with QAM stuff, but not on CUDA, and learning CUDA but not for QAM ;) - I would be happy if you can share any other outcome from such an idea.

thanks for your reply sigismondo

you mentioned that porting my idea to cuda is not difficult… i was windering how could i approach it from your point of view?

i am not worried about this part yet… as i am only conducting an investigation into cuda to see what parts of SDR can be impleented in cuda, and compare if it is worth changing SDR implemention that we have here into SDR implementation in cuda. i’ve actually have just about two weeks to finishup my investigation. but nevertheless you are correct in saying that FIR, FFt are greate for cuda and someone already implemented a whole WFM receiver for cuda [post=“0”]GNU Radio and Cuda[/post].