Design issues for RT

Hi everyone!

I am new to GPU programming, so please be patient…

I am doing a signal processing project with a partner and we plan to implement it on a CUDA GPU.

The device is ‘GeForce GTX 480’ driver version 4.0

The project involves the simultaneous operation of about 4000 filters.

Each filters has 3 types of variables:

  1. Constant variables

  2. Variables which are the output of a previous filter

  3. Input Variable

The basic ‘plan’ is to design a block of threads (max 1024 for GTX 480) that will operate the filters.

Now here are the problems:

a. We don’t want to read all the parameters (constant and generated) from global memory each time we evoke the kernel. That would be lots of data transfers from global

memory into local memory. Is there a way to ‘save’ the data for future use? Is the texture memory a good option here?

b. The maximum number of threads is 1024, so we need to use more than one block. Is there a way to synchronize threads between blocks?

c. This is a general question- would you recommend CUDA for real time processing? Are there certain problems which can arise?

Thanks in advance!


Read the Programming Guide first, particularly chapter 2 about the programming model and the hierarchy of kernels, blocks, warps, and threads.

Most likely you don’t want to implement a filter per thread, because the threads of a warp are not independent. A filter (or chain of filters) per block is more likely what you want.

If filters depend on the output of other filters, most likely it is more efficient to implement them together, so that results do not have to be written to global memory in between.

There is no way to save data permanently in on-chip memory between kernel invocations.

You cannot synchronize between different blocks (without trickery), because the order of execution of blocks is undefined.

CUDA makes no hard realtime guarantees. However, due to it’s origins in graphic processing it should be suitable for soft realtime tasks.