Swapping cuFFT callbacks

Can I set the callback for an FFT plan with different callerInfo data before each call of the FFT? Is this a defined behavior?

BACKGROUND:

I am generating a Cross-Ambiguity Function (CAF) from two related input data files. (A CAF is just a correlation in time and frequency.) The steps I am taking are in essence:

  1. Take the forward FFT of each input data file
  2. Conjugate-multiply the FFTs after shifting one of them by the frequency offset
  3. Inverse FFT to generate a correlation in time at the specified frequency offset

I perform step 1 only once (with a single call to a forward “many” FFT where “batch” is set to 2), but steps 2 and 3 must be performed multiple times, once for each frequency shift. I can condition the data in a separate kernel and then do an in-place IFFT using a “many” plan with “batch” equal to the number of frequency shifts I am calculating. In this manner, I can do the computations with just one “exec” call, given sufficient on-board memory. If I lack in memory (relative to the input data size), I must reduce the “batch” count of the IFFT “many” plan and the size of my output buffer and perform multiple “exec” calls of my “many” plan.

In order to increase parallelization, I have allocated two inverse FFT “many” plans and two output arrays so that memory off-loading to the host occurs simultaneously with subsequent calls to the IFFT (I call this ping-pong buffering). I have set the same store callback on each IFFT plan. However, I need to set a different load callback for each IFFT exec call to accommodate the changing frequency shift offset. (I could just perform data conditioning in a separate kernel which I have successfully done. I am now trying to use the callback feature to reduce my memory footprint.) Is setting different callbacks for each pass a defined behavior?

I have coded this up. The first eight times through (four times through each IFFT plan), the large values are close, but not exact to a previous run without the callback. The biggest relative variations occur with samples that have a small value. After about eight times through, something settles and then the no-callback and callback runs have similar data. The first half shows discrepancies below about -60 dB but the discrepancies disappear in the second half. I’ve plotted each line in the CAF sequentially, rather than in a 2D array, to emphasize the difference.

I have discovered my problem. It was actually an error in how I was calculating my shifts. I have corrected it and now the algorithm works correctly, including swapping the callback.

I am interested in how you were able to swap the callback for each run in the batch. How does your callback know which FFT/IFFT it is running in the batch?