Hi everyone!
I am new to GPU programming, so please be patient…
I am doing a signal processing project with a partner and we plan to implement it on a CUDA GPU.
The device is ‘GeForce GTX 480’ driver version 4.0
The project involves the simultaneous operation of about 4000 filters.
Each filters has 3 types of variables:
-
Constant variables
-
Variables which are the output of a previous filter
-
Input Variable
The basic ‘plan’ is to design a block of threads (max 1024 for GTX 480) that will operate the filters.
Now here are the problems:
a. We don’t want to read all the parameters (constant and generated) from global memory each time we evoke the kernel. That would be lots of data transfers from global
memory into local memory. Is there a way to ‘save’ the data for future use? Is the texture memory a good option here?
b. The maximum number of threads is 1024, so we need to use more than one block. Is there a way to synchronize threads between blocks?
c. This is a general question- would you recommend CUDA for real time processing? Are there certain problems which can arise?
Thanks in advance!
Ariel