Changes in cufftdx vs 0.3.1 patch

himes.benjamin · January 4, 2022, 7:01pm

I’m looking at the changes in the 0.3.1 patch and wanted to confirm I see the important bits (I hope I didn’t miss a changelog in the docs somewhere.)

I’m sad to see sm86 is still not supported, is this correct?
It looks like most of the changes are in include/database, to definitions and kernel ptx and the addition of several look up tables.
- are these mainly performance related?
- how significant are they?
I see in the examples a call to set the MaxDynamicSharedMemorySize is added. (Glad to see this as i’ve been doing the same for quite a while, see below). Even though in your case the amount of memory is known at compile time (FFT::shared_memory_size) the call cudaFuncSetAttribute() is a runtime function right? Do you know if the compiler makes any different decisions?

github.com

StochasticAnalytics/FastFFT/blob/69b56d124165b9c91de0f096bbf0d9df4d238073/src/FastFFT.cu#L2080-L2082


      
          int shared_mem = LP.mem_offsets.shared_output * sizeof(complex_type);
          CheckSharedMemory(shared_mem, device_properties);
          cudaErr(cudaFuncSetAttribute((void*)thread_fft_kernel_R2C_decomposed<FFT,complex_type>, cudaFuncAttributeMaxDynamicSharedMemorySize, shared_mem));

Thanks!

Ben

mnicely · January 4, 2022, 8:18pm

The sole intent of 0.3.1 was to fix a compiler issue. You can try to manually remove the HW checks and see if the kernel works.
I don’t recall many performance improvements, in any.
Yes, cudaFuncSetAttribute(). Nothing is different with compiler

himes.benjamin · January 5, 2022, 9:28pm

Thanks for the info Matt, I hacked out the HW checks and was able to get things to run on sm86. I won’t comment on performance since it isn’t actually supported yet, but it is good to know I can run on most of our GPUs now : )

Topic		Replies	Views
Shared memory size of cuFFTDx: 0.3.0 vs 1.1.0 GPU-Accelerated Libraries cufft	7	577	September 8, 2023
cudaFuncSetAttribute and dynamic parallelism CUDA Programming and Performance	2	594	January 10, 2023
Is cudaFuncAttributeMaxDynamicSharedMemorySize a supported attriburw? Legacy PGI Compilers	8	2926	June 16, 2020
simpleCUFFT_callback, CUDA 7.0 and compute capability 3.7 GPU-Accelerated Libraries	4	2087	December 2, 2015
cuFuncSetAttribute locks until H2D/D2H async memcpy finishes CUDA Programming and Performance cuda , performance	4	90	February 25, 2025
I can't run cuFFTDx with fft points more than 8192 GPU-Accelerated Libraries cufft	3	644	July 20, 2023
Device side FFT GPU-Accelerated Libraries	9	1111	November 27, 2020
Template function set cudaFuncAttributeMaxDynamicSharedMemorySize error CUDA Programming and Performance	4	449	February 19, 2024
cuFFT 3.1 and data alignment with CUDA FFT library problem CUDA Programming and Performance	16	15877	August 24, 2010
Dynamic SM with Dynamic Parallelism CUDA Programming and Performance	12	1522	September 8, 2025

Changes in cufftdx vs 0.3.1 patch

Related topics