The current approach to this problem is to:
// Pseudo code
Set batch to total_num_ffts needed.
last_sucessfull = 0
// Now here's where things get fabulous sausage making in progress... turn the crank... mmm sausage)
while( forever_and_a_few_nanoseconds )
create the 2 plans. 1 forward and 1 reverse. This allocates memory for the plan.
allocate memory for forward data and reverse transforms (2 total - out-of-place).
allocate memory for fft filter
check if either of the above failed
if failed batch is to big
deallocate all requested gpu memory
prev = batch
// go half way to last sucessfull
batch = batch - (batch - last_sucessfull)/2;
// we may have a good batch num, but because of bidirecional divide by 2 search
// it could be too small
last_sucessful = batch
temp = batch;
// go half way between previous (failed) and current batch.
batch = batch + (prev - batch) / 2;
// See if we found a good batch for our plan
if( prev == batch )
sausage_making = complete
// the below needs to be checked as state of gpu could change next time arround
serialize magic batch num to a file for later use for this gpu
break // out dancing
use batch and total_num_batches to chunk up data using a planner for multiple gpu transfers
This is the current … err ummm … approach
This could have been avoided (and maybe there is another way) if the createPlan1D,2D,3D,Many were of the form:
// Why batch is int and not size_t ??? same could be asked for all functions in this lib
int batch = num_batches_requested;
// this would provided guaranteed memory allocation at the time of the request
cufftHandle* plan, int nx,cufftType type,
void* data_buff, void * fft_buff
so batch requested is sent in, but batch is updated with
what is possible along with num_chunks (number of times
to perform gpu data transfers) and
num_overflow_mod_remainer either zero or remaining
number of batches which must be performed in last
remaining chunk. so total num chunks is num_chunks+1 if
there are remaining uneven chunks.
or if a planner function could be provided
cufftHandle* plan, int nx, cufftType type,
size_t& batch, size_t& num_chunks,
calculate what is possible and return batch, num_chunks, and num_overflow_mod_remainer to user
num_chunks = total_num_ffts / batch as flored int
num_overflow_mod_remainer is the number of overflow: total_num_ffts % batch
Performance has it’s price… paid in the denomination of asrprin tablets.