I went through the callback feature available in cuFFT. I would like to know how one can use the shared memory pointer in the callback function. Since user callback function is not aware of the block size used by the library, how will caller know what is the safe data length for synchronizing threads within a block.
I am not able find any example in cuda sdk or parallelforall git link for this usage. Can you please more details and any example?