2 Kernels in a 9800GTX+


I use the cuFFT to calculate a big FFT. But this FFT dont use a Hamming Window. Now I will create a second kernel. This Kernel should calculate the Hamming Window. If the calculation (Hamming Window) is ready, the results will be stored in the in input data array. Then the cuFFT kernel use this data to calculate the FFT. If the FFT is ready, the results will be stored in data array. Then I will copy the results from the GPU-Memory in the Computermemory.

  1. If this possible
  2. How I know how much resources the FFT kernel uses?

You can’t run two kernels at once on a single GPU.


there are three solutions.

First, if you want to get low level with code, you can effectively do this manually by using initial tests in a single kernel.
Use the block number and just switch to one routine or another based on the number.
The disadvantage is that both computes will have the same shared memory and thread configuration, and your code does get
more complicated. And if you’re using a library like cuFFT you may not want to break into the library code itself and start modifying it.

Second: Don’t be scared of kernel launches. They’re pretty cheap, about 15us overhead. So for many applications it’s fine to just queue up
different kernels to run sequentially as a stream. This is quite common and works especially well if at least one of the kernels has some heavier work to do, say 1ms of computes. That kind of workload will make the extra kernel launch overhead pretty trivial.

Third, if your extra data is independent and not too coupled to the “big” compute you’re doing, you could try to do it on the CPU while the GPU crunches on the heavy stuff. This depends a lot on whether the second kernel needs much data transfer to or from the device, and how intensive the second kernel is (maybe it’s too heavy for the CPU).

The two separate kernel idea is both easiest to implement and also often gives just fine performance… so start there.